A Grid Research Toolbox

Size: px
Start display at page:

Download "A Grid Research Toolbox"

Transcription

1 and Cloud A Grid Research Toolbox The Failure Trace Archive DGSim A. Iosup, O. Sonmez, N. Yigitbasi, H. Mohamed, S. Anoep, D.H.J. Epema PDS Group, ST/EWI, TU Delft I. Raicu, C. Dumitrescu, I. Foster U. Chicago M. Jan LRI/INRIA Futurs Paris, INRIA H. Li, L. Wolters LIACS, U. Leiden Paris, France 1

2 A Layered View of the Grid World Layer 1: Hardware + OS Automated Non-grid (XtreemOS?) Layers 2-4: Grid Middleware Stack Low Level: file transfers, local resource allocation, etc. High Level: grid scheduling Very High Level: application environments (e.g., distributed objects) Automated/user control Simple to complex Layer 5: Grid Applications User control Simple to complex Grid MW Stack Grid Applications Grid Very High Level MW Grid High Level MW Grid Low Level MW HW + OS 2

3 Grid Work: Science or Engineering? Work on Grid Middleware and Applications When is work in grid computing science? Studying systems to uncover their hidden laws Designing innovative systems Proposing novel algorithms Methodological aspects: repeatable experiments to verify and extend hypotheses When is work in grid computing engineering? Showing that the system works in a common case, or in a special case of great importance (e.g., weather prediction) When our students can do it (H. Casanova s argument) 3

4 Grid Research Problem: We Are Missing Both Data and Tools Lack of data Infrastructure number and type of resources, resource availability and failures Workloads arrival process, resource consumption Lack of tools Simulators SimGrid, GridSim, MicroGrid, GangSim, OptorGrid, MONARC, Testing tools that operate in real environments DiPerF, QUAKE/FAIL-FCI 4

5 Anecdote: Grids are far from being reliable job execution environments Server Small Cluster Production Cluster DAS-2 TeraGrid % reliable % reliable 5x decrease in failure rate after first year [Schroeder and Gibson, DSN 06] >10% jobs fail [Iosup et al., CCGrid 06] 20-45% failures [Khalili et al., Grid 06] Grid3 27% failures, 5-10 retries [Dumitrescu et al., GCC 05] Source: dboard-gr.cern.ch, May 07. 5

6 The Anecdote at Scale NMI Build-and-Test Environment at U.Wisc.-Madison: 112 hosts, >40 platforms (e.g., X86-32/Solaris/5, X86-64/RH/9) Serves >50 grid middleware packages: Condor, Globus, VDT, glite, GridFTP, RLS, NWS, INCA(-2), APST, NINF-G, BOINC A. Iosup, D.H.J.Epema, P. Couvares, A. Karp, M. Livny, Build -and-test Workloads for Grid Middleware: Problem, Analysis, and Applications, CCGrid,

7 A Grid Research Toolbox Hypothesis: (a) is better than (b). For scenario 1, 1 3 DGSim 2 7

8 Research Questions 8

9 Outline 1. Introduction and Motivation 2. Q1: Exchange Data 1. The Grid Workloads Archive 2. The Failure Trace Archive 3. The Cloud Workloads Archive (?) 3. Q2: System Characteristics 1. Grid Workloads 2. Grid Infrastructure 4. Q3: System Testing and Evaluation 9

10 Traces in Distributed Systems Research My system/method/algorithm is better than yours (on my carefully crafted workload) Unrealistic (trivial): Prove that prioritize jobs from users whose name starts with A is a good scheduling policy Realistic? 85% jobs are short ; 10% Writes ;... Major problem in Computer Systems research Workload Trace = recording of real activity from a (real) system, often as a sequence of jobs / requests submitted by users for execution Main use: compare and cross-validate new job and resource management techniques and algorithms Major problem: real workload traces from several sources August 26,

11 2.1. The Grid Workloads Archive [1/3] Content 6 traces online 1.5 yrs >750K >250 A. Iosup, H. Li, M. Jan, S. Anoep, C. Dumitrescu, L. Wolters, D. Epema, The Grid Workloads Archive, FGCS 24, ,

12 2.1. The Grid Workloads Archive [2/3] Approach: Standard Data Format (GWF) Goals Provide a unitary format for Grid workloads; Same format in plain text and relational DB (SQLite/SQL92); To ease adoption, base on the Parallel Workloads Format (SWF). Existing Identification data: Job/User/Group/Application ID Time and Status: Sub/Start/Finish Time, Job Status and Exit code Request vs. consumption: CPU/Wallclock/Mem Added Job submission site Job structure: bag-of-tasks, workflows Extensions: co-allocation, reservations, others possible A. Iosup, H. Li, M. Jan, S. May Anoep, 10, 2011 C. Dumitrescu, L. Wolters, 12 D. Epema, The Grid Workloads Archive, FGCS 24, , 2008.

13 2.1. The Grid Workloads Archive [3/3] Approach: GWF Example Submit Wait[s] Run #CPUs Used Mem [KB] Req #CPUs A. Iosup, H. Li, M. Jan, S. Anoep, C. Dumitrescu, L. Wolters, D. Epema, The Grid Workloads Archive, FGCS 24, ,

14 2.2. The Failure Trace Archive Presentation The Failure Trace Archive Types of systems (Desktop) Grids DNS servers HPC Clusters P2P systems Stats 25 traces 100,000 nodes Decades of operation 14

15 2.2. The Cloud Workloads Archive [1/2] One Format Fits Them All Flat format CWJ CWJD CWT CWTD Job and Tasks Summary (20 unique data fields) and Detail (60 fields) Categories of information Shared with GWA, PWA: Time, Disk, Memory, Net Jobs/Tasks that change resource consumption profile MapReduce-specific (two-thirds data fields) A. Iosup, R. Griffith, A. Konwinski, M. Zaharia, A. Ghodsi, I. Stoica, Data Format for the Cloud Workloads Archive, v.3, 13/07/

16 2.2. The Cloud Workloads Archive [2/2] The Cloud Workloads Archive Looking for invariants Wr [%] ~40% Total IO, but absolute values vary Trace ID Total IO [MB] Rd. [MB] Wr [%] HDFS Wr[MB] CWA-01 10,934 6,805 38% 1,538 CWA-02 75,546 47,539 37% 8,563 # Tasks/Job, ratio M:(M+R) Tasks, vary Understanding workload evolution 16

17 Outline 1. Introduction and Motivation 2. Q1: Exchange Data 1. The Grid Workloads Archive 2. The Failure Trace Archive 3. The Cloud Workloads Archive (?) 3. Q2: System Characteristics 1. Grid Workloads 2. Grid Infrastructure 4. Q3: System Testing and Evaluation 17

18 3.1. Grid Workloads [1/7] Analysis Summary: Grid workloads different, e.g., from parallel production envs. (HPC) Traces: LCG, Grid3, TeraGrid, and DAS long traces (6+ months), active environments (500+K jobs per trace, 100s of users), >4 million jobs Analysis System-wide, VO, group, user characteristics Environment, user evolution System performance Selected findings Almost no parallel jobs Top 2-5 groups/users dominate the workloads Performance problems: high job wait time, high failure rates A. Iosup, C. Dumitrescu, D.H.J. Epema, H. Li, L. Wolters, How are Real Grids Used? The Analysis of Four Grid Traces and Its Implications, Grid

19 3.1. Grid Workloads [2/7] Analysis Summary: Grids vs. Parallel Production Systems Similar CPUTime/Year, 5x larger arrival bursts Parallel Production Environments (Large clusters, supercomputers) Grids LCG cluster daily peak: 22.5k jobs A. Iosup, D.H.J. Epema, C. Franke, A. Papaspyrou, L. Schley, B. Song, R. Yahyapour, On Grid Performance Evaluation using Synthetic Workloads, JSSPP

20 3.1. Grid Workloads [3/7] More Analysis: Special Workload Components Bags-of-Tasks (BoTs) Workflows (WFs) Time [units] BoT = set of jobs that start at most Δs after the first job WF = set of jobs with precedence (think Direct Acyclic Graph) Parameter Sweep App. = BoT with same binary 20

21 3.1. Grid Workloads [4/7] BoTs are predominant in grids Selected Findings Batches predominant in grid workloads; up to 96% CPUTime Grid 5000 NorduGrid GLOW (Condor) Submissions 26k 50k 13k Jobs 808k (951k) 738k (781k) 205k (216k) CPU time 193y (651y) 2192y (2443y) 53y (55y) Average batch size (Δ 120s) is (500 max) 75% of the batches are sized 20 jobs or less A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, The Characteristics and Performance of Groups of Jobs in Grids, Euro-Par, LNCS, vol.4641, pp ,

22 3.1. Grid Workloads [5/7] Workflows exist, but they seem small Traces Selected Findings Loose coupling Graph with 3-4 levels Average WF size is 30/44 jobs 75%+ WFs are sized 40 jobs or less, 95% are sized 200 jobs or less S. Ostermann, A. Iosup, R. Prodan, D.H.J. Epema, and T. Fahringer. On the Characteristics of Grid Workflows, CoreGRID Integrated Research in Grid Computing (CGIW),

23 3.1. Grid Workloads [6/7] Modeling Grid Workloads: Feitelson adapted Adapted to grids: percentage parallel jobs, other values. Validated with 4 grid and 7 parallel production env. traces A. Iosup, D.H.J. Epema, T. Tannenbaum, M. Farrellee, and M. Livny. Inter-Operating Grids Through Delegated MatchMaking, ACM/IEEE Conference on High May 10, Performance 2011 Networking and Computing (SC), pp ,

24 3.1. Grid Workloads [7/7] Modeling Grid Workloads: adding users, BoTs Single arrival process for both BoTs and parallel jobs Reduce over-fitting and complexity of Feitelson adapted by removing the RunTime-Parallelism correlated model Validated with 7 grid workloads A. Iosup, O. Sonmez, S. Anoep, and D.H.J. Epema. The Performance of Bags-of-Tasks in Large-Scale Distributed Systems, HPDC, pp ,

25 3.2. Grid Infrastructure [1/5] Existing resource models and data Compute Resources Commodity clusters [Kee et al., SC 04] Desktop grids resource availability [Kondo et al., FCFS 07] Network Resources Source: H. Casanova Structural generators: GT-ITM [Zegura et al., 1997] Degree-based generators: BRITE [Medina et al., 2001] Storage Resources, other resources? 25

26 3.2. Grid Infrastructure [2/5] Resource dynamics in cluster-based grids Environment: Grid 5000 traces jobs 05/ /2006 (30 mo., 950K jobs) resource availability traces 05/ /2006 (18 mo., 600K events) Resource availability model for multi-cluster grids Grid-level availability: 70% A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, On the Dynamic Resource Availability in Grids, Grid 2007, Sep

27 3.2. Grid Infrastructure [3/5] Correlated Failures Correlated failure Maximal set of failures (ordered according to increasing event time), of time parameter in which for any two successive failures E and F, where returns the timestamp of the event; = s. Grid-level view Range: Average: 11 Cluster span Range: 1-3 Average: 1.06 CDF Failures stay within cluster Average Grid-level view Size of correlated failures A. Iosup, M. Jan, O. Sonmez, May 10, and 2011D.H.J. Epema, On the Dynamic Resource Availability in Grids, Grid 2007, Sep

28 3.2. Grid Infrastructure [4/5] Dynamics Model MTBF MTTR Correl. Assume no correlation of failure occurrence between clusters Which site/cluster? f s, fraction of failures at cluster s Weibull distribution for IAT Shape parameter > 1: increasing hazard rate the longer a node is online, the higher the chances that it will fail A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, On the Dynamic Resource Availability in Grids, Grid 2007, Sep

29 3.2. Grid Infrastructure [5/5] Evolution Model A. Iosup, O. Sonmez, and D. Epema, DGSim: Comparing Grid Resource Management Architectures through Trace-Based Simulation, Euro-Par

30 Grid workloads very different from those of other systems, e.g., parallel production envs. (large clusters, supercomputers) Batches of jobs are predominant [Euro-Par 07,HPDC 08] Almost no parallel jobs [Grid 06] Workload model [SC 07, HPDC 08] Clouds? (upcoming) Grid resources are not static Resource dynamics model [Grid 07] Resource evolution model [EuroPar 08] Clouds? [CCGrid 11] Archives: easy to share traces and associated research 30

31 Outline 1. Introduction and Motivation 2. Q1: Exchange Data 1. The Grid Workloads Archive 2. The Failure Trace Archive 3. The Cloud Workloads Archive (?) 3. Q2: System Characteristics 1. Grid Workloads 2. Grid Infrastructure 4. Q3: System Testing and Evaluation 31

32 4.1. GrenchMark: Testing in LSDCSs Analyzing, Testing, and Comparing Systems Use cases for automatically analyzing, testing, and comparing systems (or middleware) Functionality testing and system tuning Performance testing/analysis of applications Reliability testing of middleware For grids and clouds, this problem is difficult! Testing in real environments is difficult/costly/both Grids/clouds change rapidly Validity and reproducibility of tests 32

33 4.1. GrenchMark: Testing LSDCSs Architecture Overview GrenchMark = Grid Benchmark 33

34 4.1. GrenchMark: Testing LSDCSs Rather Complex Workload structure User-defined and statistical models Dynamic jobs arrival Burstiness and self-similarity Feedback, background load Machine usage assumptions Users, VOs Metrics A(W) Run/Wait/Resp. Time Efficiency, MakeSpan Failure rate [!] Notions Co-allocation, interactive jobs, malleable, moldable, Measurement methods Long workloads Saturated / non-saturated system Start-up, production, and cool-down scenarios Scaling workload to system Applications Synthetic Real Workload definition language Base language layer Extended language layer Other Can use the same workload for both simulations and real environments 34

35 4.1. GrenchMark: Testing LSDCSs Testing a Large-Scale Environment (1/2) Testing a 1500-processors Condor environment Workloads of 1000 jobs, grouped by 2, 10, 20, 50, 100, 200 Test finishes 1h after the last submission Results >150,000 jobs submitted >100,000 jobs successfully run, >2 yr CPU time in 1 week 5% jobs failed (much less than other grids average) 25% jobs did not start in time and where cancelled 35

36 4.1. GrenchMark: Testing LSDCSs Testing a Large-Scale Environment (2/2) Performance metrics system-, job-, operational-, application-, and service-level 36

37 4.1. GrenchMark: Testing in LSDCSs ServMark: Scalable GrenchMark DiPerF GrenchMark ServMark Blending DiPerF and GrenchMark. Tackles two orthogonal issues: Multi-sourced testing (multi-user scenarios, scalability) Generate and run dynamic test workloads with complex structure (real-world scenarios, flexibility) Adds Coordination and automation layers Fault tolerance module 37

38 Performance Evaluation of Clouds [1/3] C-Meter: Cloud-Oriented GrenchMark Yigitbasi et al.: C-Meter: A Framework for Performance Analysis of Computing Clouds. Proc. of CCGRID

39 Performance Evaluation of Clouds [2/3] Low Performance for Sci.Comp. Evaluated the performance of resources from four production, commercial clouds. GrenchMark for evaluating the performance of cloud resources C-Meter for complex workloads Four production, commercial IaaS clouds: Amazon Elastic Compute Cloud (EC2), Mosso, Elastic Hosts, and GoGrid. Finding: cloud performance low for sci.comp. S. Ostermann et al., A Performance Analysis of EC2 Cloud Computing Services for Scientific Computing, Cloudcomp 2009, LNICST 34, pp , A. Iosup et al.,performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing, IEEE TPDS, vol.22(6),

40 Performance Evaluation of Clouds [3/3] Cloud Performance Variability Long-term performance variability of production cloud services IaaS: Amazon Web Services PaaS: Google App Engine Amazon S3: GET US HI operations Year-long performance information for nine services Finding: about half of the cloud services investigated in this work exhibits yearly and daily patterns; impact of performance variability depends on application. A. Iosup, N. Yigitbasi, and D. Epema, On the Performance Variability of Production Cloud Services, CCGrid

41 4.2. DGSim: Simulating Multi-Cluster Grids Goal and Challenges Simulate various grid resource management architectures Multi-cluster grids Grids of grids (THE grid) Challenges Many types of architectures Two GRM architectures Generating and replaying grid workloads Management of simulations Many repetitions of a simulation for statistical relevance Simulations with many parameters Managing results (e.g., analysis tools) Enabling collaborative experiments DGSim 41

42 4.2. DGSim: Simulating Multi-Cluster Grids Overview Discrete-Event Simulator DGSim 42

43 4.2. DGSim: Simulating Multi-Cluster Grids Simulated Architectures (Sep 2007) Hybrid hierarchical/ decentralized Independent Centralized Hierarchical Decentralized DGSim A. Iosup, D.H.J.Epema, T. Tannenbaum, M. Farrellee, M. Livny, Inter-Operating Grids through Delegated MatchMaking, SC,

44 GrenchMark+C-Meter: testing large-scale distrib. sys. Framework Testing in real environments performance, reliability, functionality Uniform process: metrics, workloads Real tool available DGSim: simulating multi-cluster grids Many types of architectures Generating and replaying grid workloads Management of the simulations 44

45 Understanding how real systems work Modeling workloads and infrastructure Compare grids and clouds with other platforms (parallel production env., ) The Archives: easy to share system traces and associated research Grid Workloads Archive Failure Trace Archive Cloud Workloads Archive (upcoming) Testing/Evaluating Grids/Clouds GrenchMark ServMark: Scalable GrenchMark C-Meter: Cloud-oriented GrenchMark DGSim: Simulating Grids (and Clouds?) Publications 2006: Grid, CCGrid, JSSPP 2007: SC, Grid, CCGrid, 2008: HPDC, SC, Grid, 2009: HPDC, CCGrid, 2010: HPDC, CCGrid (Best Paper Award), EuroPar, 2011: IEEE TPDS, IEEE Internet Computing, CCGrid, 45

46 Thank you for your attention! Questions? Suggestions? Observations? More Info: Alexandru Iosup Do not hesitate to contact me A.Iosup@tudelft.nl (or google iosup ) Parallel and Distributed Systems Group Delft University of Technology 46

Research on Performance Modeling and Evaluation at TU Delft (2004 )

Research on Performance Modeling and Evaluation at TU Delft (2004 ) Research on Performance Modeling and Evaluation at TU Delft (2004 ) Alexandru Iosup Parallel and Distributed Systems Group Delft University of Technology The Netherlands The Failure Trace Archive Our team:

More information

This research work is carried out under the FP6 Network of Excellence CoreGRID

This research work is carried out under the FP6 Network of Excellence CoreGRID TOWARDS SERVMARK AN ARCHITECTURE FOR TESTING GRIDS M. Andreica and N. Tǎpus Polytechnic University of Bucharest and A. Iosup and D.H.J. Epema Delft Technical University and C.L. Dumitrescu University of

More information

Trace-Based Evaluation of Job Runtime and Queue Wait Time Predictions in Grids

Trace-Based Evaluation of Job Runtime and Queue Wait Time Predictions in Grids Trace-Based Evaluation of Job Runtime and Queue Wait Time Predictions in Grids Ozan Sonmez, Nezih Yigitbasi, Alexandru Iosup, Dick Epema Parallel and Distributed Systems Group (PDS) Department of Software

More information

C-Meter: A Framework for Performance Analysis of Computing Clouds

C-Meter: A Framework for Performance Analysis of Computing Clouds 9th IEEE/ACM International Symposium on Cluster Computing and the Grid C-Meter: A Framework for Performance Analysis of Computing Clouds Nezih Yigitbasi, Alexandru Iosup, and Dick Epema Delft University

More information

Modeling and Tolerating Heterogeneous Failures in Large Parallel Systems

Modeling and Tolerating Heterogeneous Failures in Large Parallel Systems Modeling and Tolerating Heterogeneous Failures in Large Parallel Systems Eric Heien 1, Derrick Kondo 1, Ana Gainaru 2, Dan LaPine 2, Bill Kramer 2, Franck Cappello 1, 2 1 INRIA, France 2 UIUC, USA Context

More information

SCIENTIFIC computing requires an ever-increasing number

SCIENTIFIC computing requires an ever-increasing number IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 22, NO. 6, JUNE 2011 931 Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing Alexandru Iosup, Member, IEEE,

More information

An Enhanced Scheduling in Weighted Round Robin for the Cloud Infrastructure Services

An Enhanced Scheduling in Weighted Round Robin for the Cloud Infrastructure Services An Enhanced Scheduling in for the Cloud Infrastructure Services 1 R. Bhaskar, 2 Rhymend Uthariaraj, D. Chitra Devi 1 Final Year M.E (SEOR), 2 Professor & Director, 3 Research Scholar Ramanujan Computing

More information

Analysis and modeling of time-correlated failures in largescale distributed systems

Analysis and modeling of time-correlated failures in largescale distributed systems Analysis and modeling of time-correlated failures in largescale distributed systems Citation for published version (APA): Yigitbasi, M. N., Gallet, M., Kondo, D., Iosup, A., & Epema, D. H. J. (). Analysis

More information

Towards ServMark, an Architecture for Testing Grids

Towards ServMark, an Architecture for Testing Grids Towards ServMark, an Architecture for Testing Grids M. Andreica, N. Tapus Computer Science Department Polytechnic University of Bucharest e-mail: mugurel,tapus@cs.pub.ro A. Iosup, D.H.J. Epema Electrical

More information

Fault tolerance based on the Publishsubscribe Paradigm for the BonjourGrid Middleware

Fault tolerance based on the Publishsubscribe Paradigm for the BonjourGrid Middleware University of Paris XIII INSTITUT GALILEE Laboratoire d Informatique de Paris Nord (LIPN) Université of Tunis École Supérieure des Sciences et Tehniques de Tunis Unité de Recherche UTIC Fault tolerance

More information

IaaS Cloud Benchmarking:

IaaS Cloud Benchmarking: IaaS Cloud Benchmarking: Approaches, Challenges, and Experience Alexandru Iosup Parallel and Distributed Systems Group Delft University of Technology The Netherlands Our team: Undergrad Nassos Antoniou,

More information

VIRTUAL DOMAIN SHARING IN E-SCIENCE BASED ON USAGE SERVICE LEVEL AGREEMENTS

VIRTUAL DOMAIN SHARING IN E-SCIENCE BASED ON USAGE SERVICE LEVEL AGREEMENTS VIRTUAL DOMAIN SHARING IN E-SCIENCE BASED ON USAGE SERVICE LEVEL AGREEMENTS Cǎtǎlin L. Dumitrescu CoreGRID Institute on Programming Models Mathematics and Computer Science Department, The University of

More information

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

Advanced School in High Performance and GRID Computing November Introduction to Grid computing. 1967-14 Advanced School in High Performance and GRID Computing 3-14 November 2008 Introduction to Grid computing. TAFFONI Giuliano Osservatorio Astronomico di Trieste/INAF Via G.B. Tiepolo 11 34131 Trieste

More information

IaaS Cloud Benchmarking:

IaaS Cloud Benchmarking: IaaS Cloud Benchmarking: Approaches, Challenges, and Experience Alexandru Iosup Parallel and Distributed Systems Group Delft University of Technology The Netherlands Our team: Undergrad Nassos Antoniou,

More information

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac. g-eclipse A Framework for Accessing Grid Infrastructures Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.cy) EGEE Training the Trainers May 6 th, 2009 Outline Grid Reality The Problem g-eclipse

More information

Grid Scheduling Architectures with Globus

Grid Scheduling Architectures with Globus Grid Scheduling Architectures with Workshop on Scheduling WS 07 Cetraro, Italy July 28, 2007 Ignacio Martin Llorente Distributed Systems Architecture Group Universidad Complutense de Madrid 1/38 Contents

More information

Problems for Resource Brokering in Large and Dynamic Grid Environments

Problems for Resource Brokering in Large and Dynamic Grid Environments Problems for Resource Brokering in Large and Dynamic Grid Environments Cătălin L. Dumitrescu Computer Science Department The University of Chicago cldumitr@cs.uchicago.edu (currently at TU Delft) Kindly

More information

Peer-to-Peer Research

Peer-to-Peer Research Peer-to-Peer Research http://www.pds.ewi.tudelft.nl/~iosup/research_p2p.html Rationale why and how is this work relevant? Peer-to-Peer computing is a paradigm under which participating entities in a distributed

More information

An Introduction to Virtualization and Cloud Technologies to Support Grid Computing

An Introduction to Virtualization and Cloud Technologies to Support Grid Computing New Paradigms: Clouds, Virtualization and Co. EGEE08, Istanbul, September 25, 2008 An Introduction to Virtualization and Cloud Technologies to Support Grid Computing Distributed Systems Architecture Research

More information

Introduction to Grid Computing

Introduction to Grid Computing Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able

More information

Ioan Raicu. Everyone else. More information at: Background? What do you want to get out of this course?

Ioan Raicu. Everyone else. More information at: Background? What do you want to get out of this course? Ioan Raicu More information at: http://www.cs.iit.edu/~iraicu/ Everyone else Background? What do you want to get out of this course? 2 Data Intensive Computing is critical to advancing modern science Applies

More information

Systematic Cooperation in P2P Grids

Systematic Cooperation in P2P Grids 29th October 2008 Cyril Briquet Doctoral Dissertation in Computing Science Department of EE & CS (Montefiore Institute) University of Liège, Belgium Application class: Bags of Tasks Bag of Task = set of

More information

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT. Chapter 4:- Introduction to Grid and its Evolution Prepared By:- Assistant Professor SVBIT. Overview Background: What is the Grid? Related technologies Grid applications Communities Grid Tools Case Studies

More information

Introduction to Distributed Computing Systems

Introduction to Distributed Computing Systems Introduction to Distributed Computing Systems Alexandru Iosup Parallel and Distributed Systems Group Delft University of Technology Some material kindly provided by the PDS group. 1 What is a Distributed

More information

Cost-efficient Task Farming with ConPaaS Ana Oprescu, Thilo Kielmann Vrije Universiteit, Amsterdam Haralambie Leahu, Technical University Eindhoven

Cost-efficient Task Farming with ConPaaS Ana Oprescu, Thilo Kielmann Vrije Universiteit, Amsterdam Haralambie Leahu, Technical University Eindhoven Cost-efficient Task Farming with ConPaaS Ana Oprescu, Thilo Kielmann Vrije Universiteit, Amsterdam Haralambie Leahu, Technical University Eindhoven contrail is co-funded by the EC 7th Framework Programme

More information

Experiments with Job Scheduling in MetaCentrum

Experiments with Job Scheduling in MetaCentrum Experiments with Job Scheduling in MetaCentrum Dalibor Klusáček, Hana Rudová, and Miroslava Plachá Faculty of Informatics, Masaryk University Botanická 68a, 602 00 Brno Czech Republic {xklusac,hanka@fi.muni.cz

More information

Modeling User Submission Strategies on Production Grids

Modeling User Submission Strategies on Production Grids Modeling User Submission Strategies on Production Grids Diane Lingrand, Johan Montagnat, Tristan Glatard Sophia-Antipolis, FRANCE University of Lyon, FRANCE HPDC 2009 München, Germany The EGEE production

More information

MOHA: Many-Task Computing Framework on Hadoop

MOHA: Many-Task Computing Framework on Hadoop Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction

More information

Scheduling Strategies for Cycle Scavenging in Multicluster Grid Systems

Scheduling Strategies for Cycle Scavenging in Multicluster Grid Systems Scheduling Strategies for Cycle Scavenging in Multicluster Grid Systems Ozan Sonmez Bart Grundeken Hashim Mohamed Alexandru Iosup Dick Epema Delft University of Technology, NL {O.O.Sonmez, H.H.Mohamed,

More information

Interactive Analysis of Large Distributed Systems with Scalable Topology-based Visualization

Interactive Analysis of Large Distributed Systems with Scalable Topology-based Visualization Interactive Analysis of Large Distributed Systems with Scalable Topology-based Visualization Lucas M. Schnorr, Arnaud Legrand, and Jean-Marc Vincent e-mail : Firstname.Lastname@imag.fr Laboratoire d Informatique

More information

Clouds: An Opportunity for Scientific Applications?

Clouds: An Opportunity for Scientific Applications? Clouds: An Opportunity for Scientific Applications? Ewa Deelman USC Information Sciences Institute Acknowledgements Yang-Suk Ki (former PostDoc, USC) Gurmeet Singh (former Ph.D. student, USC) Gideon Juve

More information

Part IV. Workflow Mapping and Execution in Pegasus. (Thanks to Ewa Deelman)

Part IV. Workflow Mapping and Execution in Pegasus. (Thanks to Ewa Deelman) AAAI-08 Tutorial on Computational Workflows for Large-Scale Artificial Intelligence Research Part IV Workflow Mapping and Execution in Pegasus (Thanks to Ewa Deelman) 1 Pegasus-Workflow Management System

More information

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science T. Maeno, K. De, A. Klimentov, P. Nilsson, D. Oleynik, S. Panitkin, A. Petrosyan, J. Schovancova, A. Vaniachine,

More information

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems Distributed Systems Outline Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems What Is A Distributed System? A collection of independent computers that appears

More information

image credit Fabien Hermenier Cloud compting 101

image credit   Fabien Hermenier Cloud compting 101 image credit http://eyepluscamera.files.wordpress.com/ Fabien Hermenier Cloud compting 101 1 was cloud computing needed? 2 3 Mainframes Then came with affordable PCs Then we spread out the load for security,

More information

Jobs Resource Utilization as a Metric for Clusters Comparison and Optimization. Slurm User Group Meeting 9-10 October, 2012

Jobs Resource Utilization as a Metric for Clusters Comparison and Optimization. Slurm User Group Meeting 9-10 October, 2012 Jobs Resource Utilization as a Metric for Clusters Comparison and Optimization Joseph Emeras Cristian Ruiz Jean-Marc Vincent Olivier Richard Slurm User Group Meeting 9-10 October, 2012 INRIA - MESCAL Jobs

More information

A Comparative Study of Various Computing Environments-Cluster, Grid and Cloud

A Comparative Study of Various Computing Environments-Cluster, Grid and Cloud Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.1065

More information

RELIABILITY IN CLOUD COMPUTING SYSTEMS: SESSION 1

RELIABILITY IN CLOUD COMPUTING SYSTEMS: SESSION 1 RELIABILITY IN CLOUD COMPUTING SYSTEMS: SESSION 1 Dr. Bahman Javadi School of Computing, Engineering and Mathematics Western Sydney University, Australia 1 TUTORIAL AGENDA Session 1: Reliability in Cloud

More information

I/O Characterization of Commercial Workloads

I/O Characterization of Commercial Workloads I/O Characterization of Commercial Workloads Kimberly Keeton, Alistair Veitch, Doug Obal, and John Wilkes Storage Systems Program Hewlett-Packard Laboratories www.hpl.hp.com/research/itc/csl/ssp kkeeton@hpl.hp.com

More information

AWS Solution Architecture Patterns

AWS Solution Architecture Patterns AWS Solution Architecture Patterns Objectives Key objectives of this chapter AWS reference architecture catalog Overview of some AWS solution architecture patterns 1.1 AWS Architecture Center The AWS Architecture

More information

Next-Generation Cloud Platform

Next-Generation Cloud Platform Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology

More information

BACK OF TASK COMPUTING USING GRID CLUSTERS AND CLOUD COMPUTING

BACK OF TASK COMPUTING USING GRID CLUSTERS AND CLOUD COMPUTING BACK OF TASK COMPUTING USING GRID CLUSTERS AND CLOUD COMPUTING P.Ravindra 1*, K.John Poul 2*, Ch. Raja Jacob 3* 1. M.Tech (SE) Student, Dept of CSE, Nova College of Engg & Tech, Janga Reddy Gudam Dist:

More information

Multiple Broker Support by Grid Portals* Extended Abstract

Multiple Broker Support by Grid Portals* Extended Abstract 1. Introduction Multiple Broker Support by Grid Portals* Extended Abstract Attila Kertesz 1,3, Zoltan Farkas 1,4, Peter Kacsuk 1,4, Tamas Kiss 2,4 1 MTA SZTAKI Computer and Automation Research Institute

More information

EGEE and Interoperation

EGEE and Interoperation EGEE and Interoperation Laurence Field CERN-IT-GD ISGC 2008 www.eu-egee.org EGEE and glite are registered trademarks Overview The grid problem definition GLite and EGEE The interoperability problem The

More information

CS / Cloud Computing. Recitation 3 September 9 th & 11 th, 2014

CS / Cloud Computing. Recitation 3 September 9 th & 11 th, 2014 CS15-319 / 15-619 Cloud Computing Recitation 3 September 9 th & 11 th, 2014 Overview Last Week s Reflection --Project 1.1, Quiz 1, Unit 1 This Week s Schedule --Unit2 (module 3 & 4), Project 1.2 Questions

More information

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

MATE-EC2: A Middleware for Processing Data with Amazon Web Services MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering

More information

Corral: A Glide-in Based Service for Resource Provisioning

Corral: A Glide-in Based Service for Resource Provisioning : A Glide-in Based Service for Resource Provisioning Gideon Juve USC Information Sciences Institute juve@usc.edu Outline Throughput Applications Grid Computing Multi-level scheduling and Glideins Example:

More information

Reliability of Computational Experiments on Virtualised Hardware

Reliability of Computational Experiments on Virtualised Hardware Reliability of Computational Experiments on Virtualised Hardware Ian P. Gent and Lars Kotthoff {ipg,larsko}@cs.st-andrews.ac.uk University of St Andrews arxiv:1110.6288v1 [cs.dc] 28 Oct 2011 Abstract.

More information

PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP

PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP ISSN: 0976-2876 (Print) ISSN: 2250-0138 (Online) PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP T. S. NISHA a1 AND K. SATYANARAYAN REDDY b a Department of CSE, Cambridge

More information

NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC

NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Segregated storage and compute NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Co-located storage and compute HDFS, GFS Data

More information

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University CPSC 426/526 Cloud Computing Ennan Zhai Computer Science Department Yale University Recall: Lec-7 In the lec-7, I talked about: - P2P vs Enterprise control - Firewall - NATs - Software defined network

More information

DiPerF: automated DIstributed PERformance testing Framework

DiPerF: automated DIstributed PERformance testing Framework DiPerF: automated DIstributed PERformance testing Framework Catalin Dumitrescu, Ioan Raicu, Matei Ripeanu, Ian Foster Distributed Systems Laboratory Computer Science Department University of Chicago Introduction

More information

image credit Fabien Hermenier Cloud compting 101

image credit  Fabien Hermenier Cloud compting 101 image credit http://eyepluscamera.files.wordpress.com/ Fabien Hermenier Cloud compting 101 1 ? was cloud computing needed 2 3 Mainframes Then came with affordable PCs Then we spread out the load for security,

More information

Easy Access to Grid Infrastructures

Easy Access to Grid Infrastructures Easy Access to Grid Infrastructures Dr. Harald Kornmayer (NEC Laboratories Europe) On behalf of the g-eclipse consortium WP11 Grid Workshop Grenoble, France 09 th of December 2008 Background in astro particle

More information

Some thoughts on the evolution of Grid and Cloud computing

Some thoughts on the evolution of Grid and Cloud computing Some thoughts on the evolution of Grid and Cloud computing D. Salomoni INFN Tier-1 Computing Manager Davide.Salomoni@cnaf.infn.it SuperB Computing R&D Workshop - Ferrara, 9-12 March, 2010 D. Salomoni (INFN-CNAF)

More information

PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM

PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM Szabolcs Pota 1, Gergely Sipos 2, Zoltan Juhasz 1,3 and Peter Kacsuk 2 1 Department of Information Systems, University of Veszprem, Hungary 2 Laboratory

More information

Data Centers and Cloud Computing. Data Centers

Data Centers and Cloud Computing. Data Centers Data Centers and Cloud Computing Slides courtesy of Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet

More information

Grid Computing Middleware. Definitions & functions Middleware components Globus glite

Grid Computing Middleware. Definitions & functions Middleware components Globus glite Seminar Review 1 Topics Grid Computing Middleware Grid Resource Management Grid Computing Security Applications of SOA and Web Services Semantic Grid Grid & E-Science Grid Economics Cloud Computing 2 Grid

More information

GSSIM A tool for distributed computing experiments

GSSIM A tool for distributed computing experiments Scientific Programming 19 (2011) 231 251 231 DOI 10.3233/SPR-2011-0332 IOS Press GSSIM A tool for distributed computing experiments Sławomir Bąk a, Marcin Krystek a, Krzysztof Kurowski a, Ariel Oleksiak

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)

More information

The MOSIX Algorithms for Managing Cluster, Multi-Clusters, GPU Clusters and Clouds

The MOSIX Algorithms for Managing Cluster, Multi-Clusters, GPU Clusters and Clouds The MOSIX Algorithms for Managing Cluster, Multi-Clusters, GPU Clusters and Clouds Prof. Amnon Barak Department of Computer Science The Hebrew University of Jerusalem http:// www. MOSIX. Org 1 Background

More information

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009 Magellan Project Jeff Broughton NERSC Systems Department Head October 7, 2009 1 Magellan Background National Energy Research Scientific Computing Center (NERSC) Argonne Leadership Computing Facility (ALCF)

More information

Managing CAE Simulation Workloads in Cluster Environments

Managing CAE Simulation Workloads in Cluster Environments Managing CAE Simulation Workloads in Cluster Environments Michael Humphrey V.P. Enterprise Computing Altair Engineering humphrey@altair.com June 2003 Copyright 2003 Altair Engineering, Inc. All rights

More information

Large Scale Computing Infrastructures

Large Scale Computing Infrastructures GC3: Grid Computing Competence Center Large Scale Computing Infrastructures Lecture 2: Cloud technologies Sergio Maffioletti GC3: Grid Computing Competence Center, University

More information

EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD

EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD S.THIRUNAVUKKARASU 1, DR.K.P.KALIYAMURTHIE 2 Assistant Professor, Dept of IT, Bharath University, Chennai-73 1 Professor& Head, Dept of IT, Bharath

More information

Addressing the Stranded Power Problem in Datacenters using Storage Workload Characterization. January 30 th, 2010 Sriram Sankar and Kushagra Vaid

Addressing the Stranded Power Problem in Datacenters using Storage Workload Characterization. January 30 th, 2010 Sriram Sankar and Kushagra Vaid Addressing the Stranded Power Problem in Datacenters using Storage Workload Characterization January 30 th, 2010 Sriram Sankar and Kushagra Vaid 1 Microsoft Online Services Across the company, all over

More information

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

Data Centers and Cloud Computing. Slides courtesy of Tim Wood Data Centers and Cloud Computing Slides courtesy of Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet

More information

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision

More information

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers CHEP 2016 - San Francisco, United States of America Gunther Erli, Frank Fischer, Georg Fleig, Manuel Giffels, Thomas

More information

Resource Management for Dynamic MapReduce Clusters in Multicluster Systems

Resource Management for Dynamic MapReduce Clusters in Multicluster Systems Resource Management for Dynamic MapReduce Clusters in Multicluster Systems Bogdan Ghiţ, Nezih Yigitbasi, Dick Epema Delft University of Technology, the Netherlands {b.i.ghit, m.n.yigitbasi, d.h.j.epema}@tudelft.nl

More information

The Fusion Distributed File System

The Fusion Distributed File System Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique

More information

Performance Analysis of Public Cloud Computing Providers

Performance Analysis of Public Cloud Computing Providers Association for Information Systems AIS Electronic Library (AISeL) MCIS 2016 Proceedings Mediterranean Conference on Information Systems (MCIS) 2016 Performance Analysis of Public Cloud Computing Providers

More information

Problems for Resource Brokering in Large and Dynamic Grid Environments*

Problems for Resource Brokering in Large and Dynamic Grid Environments* Problems for Resource Brokering in Large and Dynamic Grid Environments* Catalin L. Dumitrescu CoreGRID Institute on Resource Management and Scheduling Electrical Eng., Math. and Computer Science, Delft

More information

BigDataBench-MT: Multi-tenancy version of BigDataBench

BigDataBench-MT: Multi-tenancy version of BigDataBench BigDataBench-MT: Multi-tenancy version of BigDataBench Gang Lu Beijing Academy of Frontier Science and Technology BigDataBench Tutorial, ASPLOS 2016 Atlanta, GA, USA n Software perspective Multi-tenancy

More information

Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets

Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets Page 1 of 5 1 Year 1 Proposal Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets Year 1 Progress Report & Year 2 Proposal In order to setup the context for this progress

More information

HPC learning using Cloud infrastructure

HPC learning using Cloud infrastructure HPC learning using Cloud infrastructure Florin MANAILA IT Architect florin.manaila@ro.ibm.com Cluj-Napoca 16 March, 2010 Agenda 1. Leveraging Cloud model 2. HPC on Cloud 3. Recent projects - FutureGRID

More information

Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack

Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack Robert Collazo Systems Engineer Rackspace Hosting The Rackspace Vision Agenda Truly a New Era of Computing 70 s 80 s Mainframe Era 90

More information

Module Day Topic. 1 Definition of Cloud Computing and its Basics

Module Day Topic. 1 Definition of Cloud Computing and its Basics Module Day Topic 1 Definition of Cloud Computing and its Basics 1 2 3 1. How does cloud computing provides on-demand functionality? 2. What is the difference between scalability and elasticity? 3. What

More information

Scientific Workflows and Cloud Computing. Gideon Juve USC Information Sciences Institute

Scientific Workflows and Cloud Computing. Gideon Juve USC Information Sciences Institute Scientific Workflows and Cloud Computing Gideon Juve USC Information Sciences Institute gideon@isi.edu Scientific Workflows Loosely-coupled parallel applications Expressed as directed acyclic graphs (DAGs)

More information

DiPerF: automated DIstributed PERformance testing Framework

DiPerF: automated DIstributed PERformance testing Framework DiPerF: automated DIstributed PERformance testing Framework Ioan Raicu, Catalin Dumitrescu, Matei Ripeanu, Ian Foster Distributed Systems Laboratory Computer Science Department University of Chicago Introduction

More information

Cloud Computing. Ennan Zhai. Computer Science at Yale University

Cloud Computing. Ennan Zhai. Computer Science at Yale University Cloud Computing Ennan Zhai Computer Science at Yale University ennan.zhai@yale.edu About Final Project About Final Project Important dates before demo session: - Oct 31: Proposal v1.0 - Nov 7: Source code

More information

On Cluster Resource Allocation for Multiple Parallel Task Graphs

On Cluster Resource Allocation for Multiple Parallel Task Graphs On Cluster Resource Allocation for Multiple Parallel Task Graphs Henri Casanova Frédéric Desprez Frédéric Suter University of Hawai i at Manoa INRIA - LIP - ENS Lyon IN2P3 Computing Center, CNRS / IN2P3

More information

High Performance and Cloud Computing (HPCC) for Bioinformatics

High Performance and Cloud Computing (HPCC) for Bioinformatics High Performance and Cloud Computing (HPCC) for Bioinformatics King Jordan Georgia Tech January 13, 2016 Adopted From BIOS-ICGEB HPCC for Bioinformatics 1 Outline High performance computing (HPC) Cloud

More information

A Data Diffusion Approach to Large Scale Scientific Exploration

A Data Diffusion Approach to Large Scale Scientific Exploration A Data Diffusion Approach to Large Scale Scientific Exploration Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Joint work with: Yong Zhao: Microsoft Ian Foster:

More information

OVER the last decade, multicluster grids have become the

OVER the last decade, multicluster grids have become the 778 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 21, NO. 6, JUNE 2010 On the Benefit of Processor Coallocation in Multicluster Grid Systems Omer Ozan Sonmez, Hashim Mohamed, and Dick H.J.

More information

Data Centers and Cloud Computing

Data Centers and Cloud Computing Data Centers and Cloud Computing CS677 Guest Lecture Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet

More information

Intro to Software as a Service (SaaS) and Cloud Computing

Intro to Software as a Service (SaaS) and Cloud Computing UC Berkeley Intro to Software as a Service (SaaS) and Cloud Computing Armando Fox, UC Berkeley Reliable Adaptive Distributed Systems Lab 2009-2012 Image: John Curley http://www.flickr.com/photos/jay_que/1834540/

More information

Cloud Computing 4/17/2016. Outline. Cloud Computing. Centralized versus Distributed Computing Some people argue that Cloud Computing. Cloud Computing.

Cloud Computing 4/17/2016. Outline. Cloud Computing. Centralized versus Distributed Computing Some people argue that Cloud Computing. Cloud Computing. Cloud Computing By: Muhammad Naseem Assistant Professor Department of Computer Engineering, Sir Syed University of Engineering & Technology, Web: http://sites.google.com/site/muhammadnaseem105 Email: mnaseem105@yahoo.com

More information

Grid Compute Resources and Job Management

Grid Compute Resources and Job Management Grid Compute Resources and Job Management How do we access the grid? Command line with tools that you'll use Specialised applications Ex: Write a program to process images that sends data to run on the

More information

Forget about the Clouds, Shoot for the MOON

Forget about the Clouds, Shoot for the MOON Forget about the Clouds, Shoot for the MOON Wu FENG feng@cs.vt.edu Dept. of Computer Science Dept. of Electrical & Computer Engineering Virginia Bioinformatics Institute September 2012, W. Feng Motivation

More information

Pegasus Workflow Management System. Gideon Juve. USC Informa3on Sciences Ins3tute

Pegasus Workflow Management System. Gideon Juve. USC Informa3on Sciences Ins3tute Pegasus Workflow Management System Gideon Juve USC Informa3on Sciences Ins3tute Scientific Workflows Orchestrate complex, multi-stage scientific computations Often expressed as directed acyclic graphs

More information

Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago

Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Running 1 Million Jobs in 10 Minutes via the Falkon Fast and Light-weight Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago In Collaboration with: Ian Foster,

More information

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud? DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing Slide 1 Slide 3 ➀ What is Cloud Computing? ➁ X as a Service ➂ Key Challenges ➃ Developing for the Cloud Why is it called Cloud? services provided

More information

PERFORMANCE ANALYSIS AND OPTIMIZATION OF MULTI-CLOUD COMPUITNG FOR LOOSLY COUPLED MTC APPLICATIONS

PERFORMANCE ANALYSIS AND OPTIMIZATION OF MULTI-CLOUD COMPUITNG FOR LOOSLY COUPLED MTC APPLICATIONS PERFORMANCE ANALYSIS AND OPTIMIZATION OF MULTI-CLOUD COMPUITNG FOR LOOSLY COUPLED MTC APPLICATIONS V. Prasathkumar, P. Jeevitha Assiatant Professor, Department of Information Technology Sri Shakthi Institute

More information

Cloud Computing Introduction & Offerings from IBM

Cloud Computing Introduction & Offerings from IBM Cloud Computing Introduction & Offerings from IBM Gytis Račiukaitis IT Architect, IBM Global Business Services Agenda What is cloud computing? Benefits Risks & Issues Thinking about moving into the cloud?

More information

Improving the MapReduce Big Data Processing Framework

Improving the MapReduce Big Data Processing Framework Improving the MapReduce Big Data Processing Framework Gistau, Reza Akbarinia, Patrick Valduriez INRIA & LIRMM, Montpellier, France In collaboration with Divyakant Agrawal, UCSB Esther Pacitti, UM2, LIRMM

More information

Cloud Computing. What is cloud computing. CS 537 Fall 2017

Cloud Computing. What is cloud computing. CS 537 Fall 2017 Cloud Computing CS 537 Fall 2017 What is cloud computing Illusion of infinite computing resources available on demand Scale-up for most apps Elimination of up-front commitment Small initial investment,

More information

Introduction To Cloud Computing

Introduction To Cloud Computing Introduction To Cloud Computing What is Cloud Computing? Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g.,

More information

EFFICIENT SCHEDULING TECHNIQUES AND SYSTEMS FOR GRID COMPUTING

EFFICIENT SCHEDULING TECHNIQUES AND SYSTEMS FOR GRID COMPUTING EFFICIENT SCHEDULING TECHNIQUES AND SYSTEMS FOR GRID COMPUTING By JANG-UK IN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR

More information

Distributed Computer Systems = Making Computer Systems Scalable, Reliable, Performant, etc., Yet Able to Form an Efficient Ecosystem

Distributed Computer Systems = Making Computer Systems Scalable, Reliable, Performant, etc., Yet Able to Form an Efficient Ecosystem Distributed Computer Systems = Making Computer Systems Scalable, Reliable, Performant, etc., Yet Able to Form an Efficient Ecosystem Co-sponsored by: @AIosup Prof. dr. ir. Alexandru Iosup 1 What is a Distributed

More information