A Grid Research Toolbox

Size: px

Start display at page:

Download "A Grid Research Toolbox"

Karen Martin
5 years ago
Views:

1 and Cloud A Grid Research Toolbox The Failure Trace Archive DGSim A. Iosup, O. Sonmez, N. Yigitbasi, H. Mohamed, S. Anoep, D.H.J. Epema PDS Group, ST/EWI, TU Delft I. Raicu, C. Dumitrescu, I. Foster U. Chicago M. Jan LRI/INRIA Futurs Paris, INRIA H. Li, L. Wolters LIACS, U. Leiden Paris, France 1

2 A Layered View of the Grid World Layer 1: Hardware + OS Automated Non-grid (XtreemOS?) Layers 2-4: Grid Middleware Stack Low Level: file transfers, local resource allocation, etc. High Level: grid scheduling Very High Level: application environments (e.g., distributed objects) Automated/user control Simple to complex Layer 5: Grid Applications User control Simple to complex Grid MW Stack Grid Applications Grid Very High Level MW Grid High Level MW Grid Low Level MW HW + OS 2

3 Grid Work: Science or Engineering? Work on Grid Middleware and Applications When is work in grid computing science? Studying systems to uncover their hidden laws Designing innovative systems Proposing novel algorithms Methodological aspects: repeatable experiments to verify and extend hypotheses When is work in grid computing engineering? Showing that the system works in a common case, or in a special case of great importance (e.g., weather prediction) When our students can do it (H. Casanova s argument) 3

4 Grid Research Problem: We Are Missing Both Data and Tools Lack of data Infrastructure number and type of resources, resource availability and failures Workloads arrival process, resource consumption Lack of tools Simulators SimGrid, GridSim, MicroGrid, GangSim, OptorGrid, MONARC, Testing tools that operate in real environments DiPerF, QUAKE/FAIL-FCI 4

Anecdote: Grids are far from being reliable job execution environments Server Small Cluster Production Cluster DAS-2 TeraGrid 99.99999% reliable 99.

5 Anecdote: Grids are far from being reliable job execution environments Server Small Cluster Production Cluster DAS-2 TeraGrid % reliable % reliable 5x decrease in failure rate after first year [Schroeder and Gibson, DSN 06] >10% jobs fail [Iosup et al., CCGrid 06] 20-45% failures [Khalili et al., Grid 06] Grid3 27% failures, 5-10 retries [Dumitrescu et al., GCC 05] Source: dboard-gr.cern.ch, May 07. 5

6 The Anecdote at Scale NMI Build-and-Test Environment at U.Wisc.-Madison: 112 hosts, >40 platforms (e.g., X86-32/Solaris/5, X86-64/RH/9) Serves >50 grid middleware packages: Condor, Globus, VDT, glite, GridFTP, RLS, NWS, INCA(-2), APST, NINF-G, BOINC A. Iosup, D.H.J.Epema, P. Couvares, A. Karp, M. Livny, Build -and-test Workloads for Grid Middleware: Problem, Analysis, and Applications, CCGrid,

7 A Grid Research Toolbox Hypothesis: (a) is better than (b). For scenario 1, 1 3 DGSim 2 7

8 Research Questions 8

9 Outline 1. Introduction and Motivation 2. Q1: Exchange Data 1. The Grid Workloads Archive 2. The Failure Trace Archive 3. The Cloud Workloads Archive (?) 3. Q2: System Characteristics 1. Grid Workloads 2. Grid Infrastructure 4. Q3: System Testing and Evaluation 9

10 Traces in Distributed Systems Research My system/method/algorithm is better than yours (on my carefully crafted workload) Unrealistic (trivial): Prove that prioritize jobs from users whose name starts with A is a good scheduling policy Realistic? 85% jobs are short ; 10% Writes ;... Major problem in Computer Systems research Workload Trace = recording of real activity from a (real) system, often as a sequence of jobs / requests submitted by users for execution Main use: compare and cross-validate new job and resource management techniques and algorithms Major problem: real workload traces from several sources August 26,

11 2.1. The Grid Workloads Archive [1/3] Content 6 traces online 1.5 yrs >750K >250 A. Iosup, H. Li, M. Jan, S. Anoep, C. Dumitrescu, L. Wolters, D. Epema, The Grid Workloads Archive, FGCS 24, ,

12 2.1. The Grid Workloads Archive [2/3] Approach: Standard Data Format (GWF) Goals Provide a unitary format for Grid workloads; Same format in plain text and relational DB (SQLite/SQL92); To ease adoption, base on the Parallel Workloads Format (SWF). Existing Identification data: Job/User/Group/Application ID Time and Status: Sub/Start/Finish Time, Job Status and Exit code Request vs. consumption: CPU/Wallclock/Mem Added Job submission site Job structure: bag-of-tasks, workflows Extensions: co-allocation, reservations, others possible A. Iosup, H. Li, M. Jan, S. May Anoep, 10, 2011 C. Dumitrescu, L. Wolters, 12 D. Epema, The Grid Workloads Archive, FGCS 24, , 2008.

13 2.1. The Grid Workloads Archive [3/3] Approach: GWF Example Submit Wait[s] Run #CPUs Used Mem [KB] Req #CPUs A. Iosup, H. Li, M. Jan, S. Anoep, C. Dumitrescu, L. Wolters, D. Epema, The Grid Workloads Archive, FGCS 24, ,

14 2.2. The Failure Trace Archive Presentation The Failure Trace Archive Types of systems (Desktop) Grids DNS servers HPC Clusters P2P systems Stats 25 traces 100,000 nodes Decades of operation 14

2.2. The Cloud Workloads Archive [1/2] One Format Fits Them All Flat format CWJ CWJD CWT CWTD Job and Tasks Summary (20 unique data fields) and Detail (60 fields) Categories of information Shared

15 2.2. The Cloud Workloads Archive [1/2] One Format Fits Them All Flat format CWJ CWJD CWT CWTD Job and Tasks Summary (20 unique data fields) and Detail (60 fields) Categories of information Shared with GWA, PWA: Time, Disk, Memory, Net Jobs/Tasks that change resource consumption profile MapReduce-specific (two-thirds data fields) A. Iosup, R. Griffith, A. Konwinski, M. Zaharia, A. Ghodsi, I. Stoica, Data Format for the Cloud Workloads Archive, v.3, 13/07/

16 2.2. The Cloud Workloads Archive [2/2] The Cloud Workloads Archive Looking for invariants Wr [%] ~40% Total IO, but absolute values vary Trace ID Total IO [MB] Rd. [MB] Wr [%] HDFS Wr[MB] CWA-01 10,934 6,805 38% 1,538 CWA-02 75,546 47,539 37% 8,563 # Tasks/Job, ratio M:(M+R) Tasks, vary Understanding workload evolution 16

17 Outline 1. Introduction and Motivation 2. Q1: Exchange Data 1. The Grid Workloads Archive 2. The Failure Trace Archive 3. The Cloud Workloads Archive (?) 3. Q2: System Characteristics 1. Grid Workloads 2. Grid Infrastructure 4. Q3: System Testing and Evaluation 17

18 3.1. Grid Workloads [1/7] Analysis Summary: Grid workloads different, e.g., from parallel production envs. (HPC) Traces: LCG, Grid3, TeraGrid, and DAS long traces (6+ months), active environments (500+K jobs per trace, 100s of users), >4 million jobs Analysis System-wide, VO, group, user characteristics Environment, user evolution System performance Selected findings Almost no parallel jobs Top 2-5 groups/users dominate the workloads Performance problems: high job wait time, high failure rates A. Iosup, C. Dumitrescu, D.H.J. Epema, H. Li, L. Wolters, How are Real Grids Used? The Analysis of Four Grid Traces and Its Implications, Grid

19 3.1. Grid Workloads [2/7] Analysis Summary: Grids vs. Parallel Production Systems Similar CPUTime/Year, 5x larger arrival bursts Parallel Production Environments (Large clusters, supercomputers) Grids LCG cluster daily peak: 22.5k jobs A. Iosup, D.H.J. Epema, C. Franke, A. Papaspyrou, L. Schley, B. Song, R. Yahyapour, On Grid Performance Evaluation using Synthetic Workloads, JSSPP

that start at most Δs after the first job WF = set of jobs with

20 3.1. Grid Workloads [3/7] More Analysis: Special Workload Components Bags-of-Tasks (BoTs) Workflows (WFs) Time [units] BoT = set of jobs that start at most Δs after the first job WF = set of jobs with precedence (think Direct Acyclic Graph) Parameter Sweep App. = BoT with same binary 20

21 3.1. Grid Workloads [4/7] BoTs are predominant in grids Selected Findings Batches predominant in grid workloads; up to 96% CPUTime Grid 5000 NorduGrid GLOW (Condor) Submissions 26k 50k 13k Jobs 808k (951k) 738k (781k) 205k (216k) CPU time 193y (651y) 2192y (2443y) 53y (55y) Average batch size (Δ 120s) is (500 max) 75% of the batches are sized 20 jobs or less A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, The Characteristics and Performance of Groups of Jobs in Grids, Euro-Par, LNCS, vol.4641, pp ,

3.1. Grid Workloads [5/7] Workflows exist, but they seem small Traces Selected Findings Loose coupling Graph with 3-4 levels Average WF size is 30/44 jobs 75%+ WFs are sized 40 jobs or less, 95%

22 3.1. Grid Workloads [5/7] Workflows exist, but they seem small Traces Selected Findings Loose coupling Graph with 3-4 levels Average WF size is 30/44 jobs 75%+ WFs are sized 40 jobs or less, 95% are sized 200 jobs or less S. Ostermann, A. Iosup, R. Prodan, D.H.J. Epema, and T. Fahringer. On the Characteristics of Grid Workflows, CoreGRID Integrated Research in Grid Computing (CGIW),

23 3.1. Grid Workloads [6/7] Modeling Grid Workloads: Feitelson adapted Adapted to grids: percentage parallel jobs, other values. Validated with 4 grid and 7 parallel production env. traces A. Iosup, D.H.J. Epema, T. Tannenbaum, M. Farrellee, and M. Livny. Inter-Operating Grids Through Delegated MatchMaking, ACM/IEEE Conference on High May 10, Performance 2011 Networking and Computing (SC), pp ,

24 3.1. Grid Workloads [7/7] Modeling Grid Workloads: adding users, BoTs Single arrival process for both BoTs and parallel jobs Reduce over-fitting and complexity of Feitelson adapted by removing the RunTime-Parallelism correlated model Validated with 7 grid workloads A. Iosup, O. Sonmez, S. Anoep, and D.H.J. Epema. The Performance of Bags-of-Tasks in Large-Scale Distributed Systems, HPDC, pp ,

3.2. Grid Infrastructure [1/5] Existing resource models and data Compute Resources Commodity clusters [Kee et al., SC 04] Desktop grids resource availability [Kondo et al.

25 3.2. Grid Infrastructure [1/5] Existing resource models and data Compute Resources Commodity clusters [Kee et al., SC 04] Desktop grids resource availability [Kondo et al., FCFS 07] Network Resources Source: H. Casanova Structural generators: GT-ITM [Zegura et al., 1997] Degree-based generators: BRITE [Medina et al., 2001] Storage Resources, other resources? 25

3.2. Grid Infrastructure [2/5] Resource dynamics in cluster-based grids Environment: Grid 5000 traces jobs 05/2004-11/2006 (30 mo., 950K jobs) resource availability traces 05/2005-11/2006 (18 mo.

26 3.2. Grid Infrastructure [2/5] Resource dynamics in cluster-based grids Environment: Grid 5000 traces jobs 05/ /2006 (30 mo., 950K jobs) resource availability traces 05/ /2006 (18 mo., 600K events) Resource availability model for multi-cluster grids Grid-level availability: 70% A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, On the Dynamic Resource Availability in Grids, Grid 2007, Sep

3.2. Grid Infrastructure [3/5] Correlated Failures Correlated failure Maximal set of failures (ordered according to increasing event time), of time parameter in which for any two successive failures

27 3.2. Grid Infrastructure [3/5] Correlated Failures Correlated failure Maximal set of failures (ordered according to increasing event time), of time parameter in which for any two successive failures E and F, where returns the timestamp of the event; = s. Grid-level view Range: Average: 11 Cluster span Range: 1-3 Average: 1.06 CDF Failures stay within cluster Average Grid-level view Size of correlated failures A. Iosup, M. Jan, O. Sonmez, May 10, and 2011D.H.J. Epema, On the Dynamic Resource Availability in Grids, Grid 2007, Sep

28 3.2. Grid Infrastructure [4/5] Dynamics Model MTBF MTTR Correl. Assume no correlation of failure occurrence between clusters Which site/cluster? f s, fraction of failures at cluster s Weibull distribution for IAT Shape parameter > 1: increasing hazard rate the longer a node is online, the higher the chances that it will fail A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, On the Dynamic Resource Availability in Grids, Grid 2007, Sep

29 3.2. Grid Infrastructure [5/5] Evolution Model A. Iosup, O. Sonmez, and D. Epema, DGSim: Comparing Grid Resource Management Architectures through Trace-Based Simulation, Euro-Par

30 Grid workloads very different from those of other systems, e.g., parallel production envs. (large clusters, supercomputers) Batches of jobs are predominant [Euro-Par 07,HPDC 08] Almost no parallel jobs [Grid 06] Workload model [SC 07, HPDC 08] Clouds? (upcoming) Grid resources are not static Resource dynamics model [Grid 07] Resource evolution model [EuroPar 08] Clouds? [CCGrid 11] Archives: easy to share traces and associated research 30

31 Outline 1. Introduction and Motivation 2. Q1: Exchange Data 1. The Grid Workloads Archive 2. The Failure Trace Archive 3. The Cloud Workloads Archive (?) 3. Q2: System Characteristics 1. Grid Workloads 2. Grid Infrastructure 4. Q3: System Testing and Evaluation 31

32 4.1. GrenchMark: Testing in LSDCSs Analyzing, Testing, and Comparing Systems Use cases for automatically analyzing, testing, and comparing systems (or middleware) Functionality testing and system tuning Performance testing/analysis of applications Reliability testing of middleware For grids and clouds, this problem is difficult! Testing in real environments is difficult/costly/both Grids/clouds change rapidly Validity and reproducibility of tests 32

33 4.1. GrenchMark: Testing LSDCSs Architecture Overview GrenchMark = Grid Benchmark 33

34 4.1. GrenchMark: Testing LSDCSs Rather Complex Workload structure User-defined and statistical models Dynamic jobs arrival Burstiness and self-similarity Feedback, background load Machine usage assumptions Users, VOs Metrics A(W) Run/Wait/Resp. Time Efficiency, MakeSpan Failure rate [!] Notions Co-allocation, interactive jobs, malleable, moldable, Measurement methods Long workloads Saturated / non-saturated system Start-up, production, and cool-down scenarios Scaling workload to system Applications Synthetic Real Workload definition language Base language layer Extended language layer Other Can use the same workload for both simulations and real environments 34

35 4.1. GrenchMark: Testing LSDCSs Testing a Large-Scale Environment (1/2) Testing a 1500-processors Condor environment Workloads of 1000 jobs, grouped by 2, 10, 20, 50, 100, 200 Test finishes 1h after the last submission Results >150,000 jobs submitted >100,000 jobs successfully run, >2 yr CPU time in 1 week 5% jobs failed (much less than other grids average) 25% jobs did not start in time and where cancelled 35

36 4.1. GrenchMark: Testing LSDCSs Testing a Large-Scale Environment (2/2) Performance metrics system-, job-, operational-, application-, and service-level 36

37 4.1. GrenchMark: Testing in LSDCSs ServMark: Scalable GrenchMark DiPerF GrenchMark ServMark Blending DiPerF and GrenchMark. Tackles two orthogonal issues: Multi-sourced testing (multi-user scenarios, scalability) Generate and run dynamic test workloads with complex structure (real-world scenarios, flexibility) Adds Coordination and automation layers Fault tolerance module 37

38 Performance Evaluation of Clouds [1/3] C-Meter: Cloud-Oriented GrenchMark Yigitbasi et al.: C-Meter: A Framework for Performance Analysis of Computing Clouds. Proc. of CCGRID

39 Performance Evaluation of Clouds [2/3] Low Performance for Sci.Comp. Evaluated the performance of resources from four production, commercial clouds. GrenchMark for evaluating the performance of cloud resources C-Meter for complex workloads Four production, commercial IaaS clouds: Amazon Elastic Compute Cloud (EC2), Mosso, Elastic Hosts, and GoGrid. Finding: cloud performance low for sci.comp. S. Ostermann et al., A Performance Analysis of EC2 Cloud Computing Services for Scientific Computing, Cloudcomp 2009, LNICST 34, pp , A. Iosup et al.,performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing, IEEE TPDS, vol.22(6),

40 Performance Evaluation of Clouds [3/3] Cloud Performance Variability Long-term performance variability of production cloud services IaaS: Amazon Web Services PaaS: Google App Engine Amazon S3: GET US HI operations Year-long performance information for nine services Finding: about half of the cloud services investigated in this work exhibits yearly and daily patterns; impact of performance variability depends on application. A. Iosup, N. Yigitbasi, and D. Epema, On the Performance Variability of Production Cloud Services, CCGrid

41 4.2. DGSim: Simulating Multi-Cluster Grids Goal and Challenges Simulate various grid resource management architectures Multi-cluster grids Grids of grids (THE grid) Challenges Many types of architectures Two GRM architectures Generating and replaying grid workloads Management of simulations Many repetitions of a simulation for statistical relevance Simulations with many parameters Managing results (e.g., analysis tools) Enabling collaborative experiments DGSim 41

42 4.2. DGSim: Simulating Multi-Cluster Grids Overview Discrete-Event Simulator DGSim 42

43 4.2. DGSim: Simulating Multi-Cluster Grids Simulated Architectures (Sep 2007) Hybrid hierarchical/ decentralized Independent Centralized Hierarchical Decentralized DGSim A. Iosup, D.H.J.Epema, T. Tannenbaum, M. Farrellee, M. Livny, Inter-Operating Grids through Delegated MatchMaking, SC,

44 GrenchMark+C-Meter: testing large-scale distrib. sys. Framework Testing in real environments performance, reliability, functionality Uniform process: metrics, workloads Real tool available DGSim: simulating multi-cluster grids Many types of architectures Generating and replaying grid workloads Management of the simulations 44

45 Understanding how real systems work Modeling workloads and infrastructure Compare grids and clouds with other platforms (parallel production env., ) The Archives: easy to share system traces and associated research Grid Workloads Archive Failure Trace Archive Cloud Workloads Archive (upcoming) Testing/Evaluating Grids/Clouds GrenchMark ServMark: Scalable GrenchMark C-Meter: Cloud-oriented GrenchMark DGSim: Simulating Grids (and Clouds?) Publications 2006: Grid, CCGrid, JSSPP 2007: SC, Grid, CCGrid, 2008: HPDC, SC, Grid, 2009: HPDC, CCGrid, 2010: HPDC, CCGrid (Best Paper Award), EuroPar, 2011: IEEE TPDS, IEEE Internet Computing, CCGrid, 45

Thank you for your attention! Questions? Suggestions? Observations? More Info: - http://www.st.ewi.tudelft.nl/~iosup/research.html - http://www.st.ewi.tudelft.nl/~iosup/research_gaming.

46 Thank you for your attention! Questions? Suggestions? Observations? More Info: Alexandru Iosup Do not hesitate to contact me A.Iosup@tudelft.nl (or google iosup ) Parallel and Distributed Systems Group Delft University of Technology 46

Research on Performance Modeling and Evaluation at TU Delft (2004 )

Research on Performance Modeling and Evaluation at TU Delft (2004 ) Alexandru Iosup Parallel and Distributed Systems Group Delft University of Technology The Netherlands The Failure Trace Archive Our team: