Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Similar documents
Introduction to Grid Computing

Based on: Grid Intro and Fundamentals Review Talk by Gabrielle Allen Talk by Laura Bright / Bill Howe

Grid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen

Day 1 : August (Thursday) An overview of Globus Toolkit 2.4

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

Data Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices

Grid Scheduling Architectures with Globus

Introduction to GT3. Introduction to GT3. What is a Grid? A Story of Evolution. The Globus Project

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Knowledge Discovery Services and Tools on Grids

Grid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007

The Google File System

Grid Computing Middleware. Definitions & functions Middleware components Globus glite

Grid Computing Systems: A Survey and Taxonomy

Grid Architectural Models

Assignment 5. Georgia Koloniari

High Performance Computing Course Notes Grid Computing I

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

High Throughput WAN Data Transfer with Hadoop-based Storage

Grid Computing: Status and Perspectives. Alexander Reinefeld Florian Schintke. Outline MOTIVATION TWO TYPICAL APPLICATION DOMAINS

Top Trends in DBMS & DW

Grid Computing M2 DL.

NPTEL Course Jan K. Gopinath Indian Institute of Science

Cloud Computing. Up until now

Grid Compute Resources and Job Management

Globus GTK and Grid Services

Layered Architecture

SDS: A Scalable Data Services System in Data Grid

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

Design of Distributed Data Mining Applications on the KNOWLEDGE GRID

Chapter 18 Distributed Systems and Web Services

Knowledge-based Grids

Profiling Grid Data Transfer Protocols and Servers. George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

CSE 124: Networked Services Lecture-16

A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme

A Federated Grid Environment with Replication Services

By Ian Foster. Zhifeng Yun

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

GFS: The Google File System. Dr. Yingwu Zhu

MOHA: Many-Task Computing Framework on Hadoop

LHC and LSST Use Cases

Adaptive Cluster Computing using JavaSpaces

Chapter 20: Database System Architectures

Performance Analysis of Applying Replica Selection Technology for Data Grid Environments*

Introduction to Grid Technology

The Google File System

Fault tolerance based on the Publishsubscribe Paradigm for the BonjourGrid Middleware

CSE 124: Networked Services Lecture-17

The Google File System

HEP replica management

A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

CSE 124: Networked Services Fall 2009 Lecture-19

system of systems: such as a cloud of clouds, a grid of clouds, or a cloud of grids, or inter-clouds as a basic SOA architecture.

A Simple Mass Storage System for the SRB Data Grid

Replica Selection in the Globus Data Grid

The MOSIX Scalable Cluster Computing for Linux. mosix.org

Lecture 23 Database System Architectures

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

Google File System (GFS) and Hadoop Distributed File System (HDFS)

Grid Middleware and Globus Toolkit Architecture

CA485 Ray Walshe Google File System

Computational Web Portals. Tomasz Haupt Mississippi State University

PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM

Simplifying Collaboration in the Cloud

M. Roehrig, Sandia National Laboratories. Philipp Wieder, Research Centre Jülich Nov 2002

GFS: The Google File System

Map-Reduce. Marco Mura 2010 March, 31th

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

Data Movement and Storage. 04/07/09 1

Distributed File Systems II

The Google File System (GFS)

The Grid: Feng Shui for the Terminally Rectilinear

CSE 5306 Distributed Systems. Course Introduction

Design The way components fit together

Grid Computing. Grid Computing 2

Data Management Components for a Research Data Archive

A Data-Aware Resource Broker for Data Grids

Resolving Load Balancing Issue of Grid Computing through Dynamic Approach

Introduction to Distributed Systems

Design The way components fit together

The Google File System

Data Management for Distributed Scientific Collaborations Using a Rule Engine

Grid Data Management

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

Data Intensive processing with irods and the middleware CiGri for the Whisper project Xavier Briand

Embedded Technosolutions

Grid-Based Data Mining and the KNOWLEDGE GRID Framework

R. K. Ghosh Dept of CSE, IIT Kanpur

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

CS10 The Beauty and Joy of Computing

The GridWay. approach for job Submission and Management on Grids. Outline. Motivation. The GridWay Framework. Resource Selection

SNAG: SDN-managed Network Architecture for GridFTP Transfers

The Google File System

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

GT-OGSA Grid Service Infrastructure

Data Staging: Moving large amounts of data around, and moving it close to compute resources

Transcription:

Chapter 4:- Introduction to Grid and its Evolution Prepared By:- Assistant Professor SVBIT.

Overview Background: What is the Grid? Related technologies Grid applications Communities Grid Tools Case Studies 2

What is a Grid? Many definitions exist in the literature Early defs: Foster and Kesselman, 1998 A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational facilities Kleinrock 1969: We will probably see the spread of computer utilities, which, like present electric and telephone utilities, will service individual homes and offices across the country. 3

Grid computing (1) Coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organisations (I. Foster) 4

Grid computing (2) Information grid large access to distributed data (the Web) Data grid management and processing of very large distributed data sets Computing grid meta computer 5

Parallelism vs grids: some recalls Grids date back only 1996 Parallelism is older! (first classification in 1972) Motivations: need more computing power (weather forecast, atomic simulation, genomics ) need more storage capacity (Petabytes and more) in a word: improve performance! 3 ways... Work harder --> Use faster hardware Work smarter --> Optimize algorithms Get help --> Use more computers! 6

The performance? Ideally it grows linearly Speed-up: if T S is the best time to process a problem sequentially, then the parallel processing time should be T P =T S /P with P processors speedup = T S /T P the speedup is limited by Amdhal law: any parallel program has a purely sequential and a parallelizable part T S = F + T //, thus the speedup is limited: S = (F + T // ) / (F + (T // /P)) < P Scale-up: if T PS is the time to solve a problem of size S with P processors, then T PS should also be the time to process a problem of size n*s with n*p processors 7

Why do we need Grids? Many large-scale problems cannot be solved by a single computer Globally distributed data and resources 8

Background: Related technologies Cluster computing Peer-to-peer computing Internet computing 9

Cluster computing Idea: put some PCs together and get them to communicate Cheaper to build than a mainframe supercomputer Different sizes of clusters Scalable can grow a cluster by adding more PCs 10

Cluster Architecture 11

Peer-to-Peer computing Connect to other computers Can access files from any computer on the network Allows data sharing without going through central server Decentralized approach also useful for Grid 12

Peer to Peer architecture 13

Internet computing Idea: many idle PCs on the Internet Can perform other computations while not being used Cycle scavenging rely on getting free time on other people s computers Example: SETI@home What are advantages/disadvantages of cycle scavenging? 14

Some Grid Applications Distributed supercomputing High-throughput computing On-demand computing Data-intensive computing Collaborative computing 15

Grid Users Many levels of users Grid developers Tool developers Application developers End users System administrators 16

Some Grid challenges Data movement Data replication Resource management Job submission 17

Computational grid Hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities (I. Foster) Performance criteria: security reliability computing power latency throughput scalability services 18

Grid characteristics Large scale Heterogeneity Multiple administration domain Autonomy and coordination Dynamicity Flexibility Extensibility Security 19

Levels of cooperation in a computing grid End system (computer, disk, sensor ) multithreading, local I/O Cluster synchronous communications, DSM, parallel I/O parallel processing Intranet/Organization heterogeneity, distributed admin, distributed FS and databases load balancing access control Internet/Grid global supervision brokers, negotiation, cooperation 20

Basic services Authentication/Authorization/Traceability Activity control (monitoring) Resource discovery Resource brokering Scheduling Job submission, data access/migration and execution Accounting 21

Layered Grid Architecture (By Analogy to Internet Architecture) Coordinating multiple resources : ubiquitous infrastructure services, app-specific distributed services Sharing single resources : negotiating access, controlling use Talking to things : communication (Internet protocols) & security Controlling things locally : Access to, & control of, resources Application Collective Resource Connectivity Fabric Application Transport Internet Link Internet Protocol Architecture 22 From I. Foster

Resources Description Advertising Cataloging Matching Claiming Reserving Checkpointing 24

Resource management (1) Services and protocols depend on the infrastructure Some parameters stability of the infrastructure (same set of resources or not) freshness of the resource availability information reservation facilities multiple resource or single resource brokering Example of request: I need from 10 to 100 CE each with at least 512 MB RAM and a computing power of 150 Mflops 25

Resource management and scheduling (1) Levels of scheduling job scheduling (global level ; perf: throughput) resource scheduling (perf: fairness, utilization) application scheduling (perf: response time, speedup, produced data ) Mapping/Scheduling process resource discovery and selection assignment of tasks to computing resources data distribution task scheduling on the computing resources (communication scheduling) 26

Resource management and scheduling (2) Individual perfs are not necessarily consistent with the global (system) perf! Grid problems predictions are not definitive: dynamicity! Heterogeneous platforms Checkpointing and migration 27

A Resource Management System Example (Globus) RSL Resource Specification Language Broker RSL specialization Application Ground RSL Queries & Info Information Service Co-allocator Local resource managers Simple ground RSL GRAM GRAM GRAM LSF Condor NQE 28 LSF: Load Sharing Facility (task scheduling and load balancing; Developed by Platform Computing) NQE: Network Queuing Env. (batch management; developed by Cray Research

Resource information (1) What is to be stored? virtual organizations, people, computing resources, software packages, communication resources, event producers, devices what about data??? A key issue in such dynamics environments A first approach : (distributed) directory (LDAP) easy to use tree structure distribution static mostly read ; not efficient updating hierarchical poor procedural language 29

Resource information (2) Goal: dynamicity complex relationships frequent updates complex queries A second approach: (relational) database 30

Programming on the grid: potential programming models Message passing (PVM, MPI) Distributed Shared Memory Data Parallelism (HPF, HPC++) Task Parallelism (Condor) Client/server - RPC Agents Integration system (Corba, DCOM, RMI) 31

Program execution: issues Parallelize the program with the right job structure, communication patterns/procedures, algorithms Discover the available resources Select the suitable resources Allocate or reserve these resources Migrate the data Initiate computations Monitor the executions ; checkpoints? React to changes Collect results 32

Data management It was long forgotten!!! Though it is a key issue! Issues: indexing retrieval replication caching traceability (auditing) And security!!! 33

Some Grid-Related Projects Globus Condor Nimrod-G 34

Globus Grid Toolkit Open source toolkit for building Grid systems and applications Enabling technology for the Grid Share computing power, databases, and other tools securely online Facilities for: Resource monitoring Resource discovery Resource management Security File management 35

Data Management in Globus Toolkit Data movement GridFTP Reliable File Transfer (RFT) Data replication Replica Location Service (RLS) Data Replication Service (DRS) 36

GridFTP High performance, secure, reliable data transfer protocol Optimized for wide area networks Superset of Internet FTP protocol Features: Multiple data channels for parallel transfers Partial file transfers Third party transfers Reusable data channels Command pipelining 37

More GridFTP features Auto tuning of parameters Striping Transfer data in parallel among multiple senders and receivers instead of just one Extended block mode Send data in blocks Know block size and offset Data can arrive out of order Allows multiple streams 38

Striping Architecture Use Striped servers 39

Limitations of GridFTP Not a web service protocol (does not employ SOAP, WSDL, etc.) Requires client to maintain open socket connection throughout transfer Inconvenient for long transfers Cannot recover from client failures 40

GridFTP 41

Reliable File Transfer (RFT) Web service with job-scheduler functionality for data movement User provides source and destination URLs Service writes job description to a database and moves files Service methods for querying transfer status 42

RFT 43

Replica Location Service (RLS) Registry to keep track of where replicas exist on physical storage system Users or services register files in RLS when files created Distributed registry May consist of multiple servers at different sites Increase scale Fault tolerance 44

Replica Location Service (RLS) Logical file name unique identifier for contents of file Physical file name location of copy of file on storage system User can provide logical name and ask for replicas Or query to find logical name associated with physical file location 45

Data Replication Service (DRS) Pull-based replication capability Implemented as a web service Higher-level data management service built on top of RFT and RLS Goal: ensure that a specified set of files exists on a storage site First, query RLS to locate desired files Next, creates transfer request using RFT Finally, new replicas are registered with RLS 46

Condor Original goal: high-throughput computing Harvest wasted CPU power from other machines Can also be used on a dedicated cluster Condor-G Condor interface to Globus resources 47

Earth System Grid Provide climate studies scientists with access to large datasets Data generated by computational models requires massive computational power Most scientists work with subsets of the data Requires access to local copies of data 48

ESG Infrastructure Archival storage systems and disk storage systems at several sites Storage resource managers and GridFTP servers to provide access to storage systems Metadata catalog services Replica location services Web portal user interface 49

Earth System Grid 50

Earth System Grid Interface 51

Laser Interferometer Gravitational Wave Observatory (LIGO) Instruments at two sites to detect gravitational waves Each experiment run produces millions of files Scientists at other sites want these datasets on local storage LIGO deploys RLS servers at each site to register local mappings and collect info about mappings at other sites 52

Large Scale Data Replication for LIGO Goal: detection of gravitational waves Three interferometers at two sites Generate 1 TB of data daily Need to replicate this data across 9 sites to make it available to scientists Scientists need to learn where data items are, and how to access them 53

LIGO 54

LIGO Solution Lightweight data replicator (LDR) Uses parallel data streams, tunable TCP windows, and tunable write/read buffers Tracks where copies of specific files can be found Stores descriptive information (metadata) in a database Can select files based on description rather than filename 55

TeraGrid NSF high-performance computing facility Nine distributed sites, each with different capability, e.g., computation power, archiving facilities, visualization software Applications may require more than one site Data sizes on the order of gigabytes or terabytes 56

TeraGrid 57

TeraGrid Solution: Use GridFTP and RFT with front end command line tool (tgcp) Benefits of system: Simple user interface High performance data transfer capability Ability to recover from both client and server software failures Extensible configuration 58

TGCP Details Idea: hide low level GridFTP commands from users Copy file smallfile.dat in a working directory to another system: tgcp smallfile.dat tg-login.sdsc.teragrid.org:/users/ux454332 GridFTP command: globus-url-copy -p 8 -tcp-bs 1198372 \ gsiftp://tg-gridftprr.uc.teragrid.org:2811/home/navarro/smallfile.dat \ gsiftp://tg-login.sdsc.teragrid.org:2811/users/ux454332/smallfile.dat 59

The reality We have spent a lot of time talking about The Grid There is the Web and the Internet Is there a single Grid? 60

The reality Many types of Grids exist Private vs. public Regional vs. Global All-purpose vs. particular scientific problem 61