NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC

Size: px
Start display at page:

Download "NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC"

Transcription

1

2 Segregated storage and compute NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Co-located storage and compute HDFS, GFS Data centers at Google, Yahoo, and others Programming paradigm: MapReduce Others from academia: Sector, MosaStore, Chirp 2

3 Segregated storage and compute NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Co-located storage and compute HDFS, GFS Data centers at Google, Yahoo, and others Programming paradigm: MapReduce Others from academia: Sector, MosaStore, Chirp 3

4 Segregated storage and compute NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Co-located storage and compute HDFS, GFS Data centers at Google, Yahoo, and others Programming paradigm: MapReduce Others from academia: Sector, MosaStore, Chirp 4

5 Segregated storage and compute NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Co-located storage and compute HDFS, GFS Data centers at Google, Yahoo, and others Programming paradigm: MapReduce Others from academia: Sector, MosaStore, Chirp 5

6 Local Disk: 22-24: ANL/UC TG Site (7GB SCSI) Today: PADS (RAID-, 6 drives 75GB SATA) Cluster: 22-24: ANL/UC TG Site (GPFS, 8 servers, 1Gb/s each) Today: PADS (GPFS, SAN) Supercomputer: MB/s per Processor Core : IBM Blue Gene/L (GPFS).1 Today: IBM Blue Gene/P (GPFS) 1 Local Disk Cluster Supercomputer -2.2X -99X Today -15X -438X 6

7 What if we could combine the scientific community s existing programming paradigms, but yet still exploit the data locality that naturally occurs in scientific workloads? 7

8 8

9 Input Hi Data Hi Data Size Size Med Med MapReduce/MTC MapReduce/MTC (Data (Data Analysis, Analysis, Mining) Mining) MTC MTC (Big (Big Data Data and and Many Many Tasks) Tasks) Low Low [MTAGS8] Many-Task Computing for Grids and Supercomputers HPC HPC (Heroic (Heroic HTC/MTC MPI HTC/MTC MPI (Many (Many Loosely Tasks) Loosely Tasks) Coupled Coupled Tasks) Tasks) 1 1K 1K 1M 1M Number of of Tasks 9

10 Significant performance improvements can be obtained in the analysis of large dataset by leveraging information about data analysis workloads rather than individual data analysis tasks. Important concepts related to the hypothesis Workload: a complex query (or set of queries) decomposable into simpler tasks to answer broader analysis questions Data locality is crucial to the efficient use of large scale distributed systems for scientific and data-intensive applications Allocate computational and caching storage resources, co-scheduled to optimize workload performance 1

11 Resource acquired in response to demand Data diffuse from archival storage to newly acquired transient resources Resource caching allows faster responses to subsequent requests Task Dispatcher Data-Aware Task Dispatcher Scheduler Data-Aware Scheduler Resources are released when demand drops Optimizes performance by coscheduling data and computations Decrease dependency of a shared/parallel file systems Critical to support data intensive MTC Idle Resources Idle Resources Persistent Storage Shared Persistent File System Storage Shared File System text text Provisioned Resources Provisioned Resources [DADC8] Accelerating Large-scale Data Exploration through Data Diffusion 11

12 What would data diffusion look like in practice? Extend the Falkon framework [SC7] Falkon: a Fast and Light-weight task execution framework 12

13 FA: first-available simple load balancing MCH: max-cache-hit maximize cache hits MCU: max-compute-util maximize processor utilization GCC: good-cache-compute maximize both cache hit and processor utilization at the same time [DADC8] Accelerating Large-scale Data Exploration through Data Diffusion 13

14 3GHz dual CPUs ANL/UC TG with 128 processors Scheduling window 25 tasks Dataset 1K files 1 byte each Tasks Read 1 file Write 1 file CPU Time per Task (ms) CPU Time per Task (ms) Task Submit Notification Task Submit for Task Availability Task Notification Dispatch for (data-aware Task Availability scheduler) Task Task Results Dispatch (data-aware (data-aware scheduler) scheduler) Notification Task Results for (data-aware Task Resultsscheduler) WS Notification Communication for Task Results Throughput WS Communication (tasks/sec) Throughput (tasks/sec) firstavailablavailablcompute-utihicache- first- max- max-cache- good- firstavailablavailablcompute-utihicache- first- max- max-cache- good- without I/O with I/O compute without I/O with I/O compute Throughput (tasks/sec) Throughput (tasks/sec) [DIDC9] Towards Data Intensive Many-Task Computing, under review 14

15 Monotonically Increasing Workload Emphasizes increasing loads Sine-Wave Workload Emphasizes varying loads All-Pairs Workload Compare to best case model of active storage Image Stacking Workload (Astronomy) Evaluate data diffusion on a real large-scale dataintensive application from astronomy domain [DADC8] Accelerating Large-scale Data Exploration through Data Diffusion 15

16 25K tasks 1MB reads 1ms compute Vary arrival rate: Min: 1 task/sec Increment function: CEILING(*1.3) Max: 1 tasks/sec 128 processors Ideal case: 1415 sec 8Gb/s peak throughput Arrival Rate (per second) Arrival Rate Tasks completed Time (sec) Tasks Completed 16

17 GPFS vs. ideal: 511 sec vs sec Nodes Allocated Throughput (Gb/s) Queue Length (x1k) Throughput (Gb/s) Wait Queue Length Time (sec) Demand (Gb/s) Number of Nodes 17

18 Nodes Allocated Throughput (Gb/s) Queue Length (x1k) Max-compute-util Cache Hit/Miss % Nodes Allocated Throughput (Gb/s) Queue Length (x1k) Max-cache-hit Cache Hit/Miss % CPU Utilization % Time (sec) Cache Miss % Cache Hit Global % Cache Hit Local % Throughput (Gb/s) Demand (Gb/s) Wait Queue Length Number of Nodes Time (sec) Cache Miss % Cache Hit Global % Cache Hit Local % Throughput (Gb/s) Demand (Gb/s) Wait Queue Length Number of Nodes CPU Utilization 18

19 Nodes Allocated Throughput (Gb/s) Queue Length (x1k) % 9% 8% 7% 6% 5% 4% 3% 2% 1% % Cache Hit/Miss % 1GB 1.5GB Nodes Allocated Throughput (Gb/s) Queue Length (x1k) % 9% 8% 7% 6% 5% 4% 3% Cache Hit/Miss % 2% 1% % Nodes Allocated Throughput (Gb/s) Queue Length (x1k) Time (sec) Cache Miss % Cache Hit Global % Cache Hit Local % Demand (Gb/s) Throughput (Gb/s) Wait Queue Length Number of Nodes 1 1% 9 9% 8 8% 7 7% 6 6% 5 5% 4 4% 3 3% 2 2% 1 1% % Time (sec) Cache Miss % Cache Hit Global % Cache Hit Local % Throughput (Gb/s) Demand (Gb/s) Wait Queue Length Number of Nodes Cache Hit/Miss % 2GB 4GB Nodes Allocated Throughput (Gb/s) Queue Length (x1k) Time (sec) Cache Miss % Cache Hit Global % Cache Hit Local % Throughput (Gb/s) Demand (Gb/s) Wait Queue Length Number of Nodes Time (sec) Cache Miss % Cache Hit Global % Cache Hit Local % Throughput (Gb/s) Demand (Gb/s) Wait Queue Length Number of Nodes Cache Hit/Miss %

20 Data Diffusion vs. ideal: 1436 sec vs 1415 sec Nodes Allocated Throughput (Gb/s) Queue Length (x1k) Cache Hit/Miss % Time (sec) Cache Miss % Cache Hit Global % Cache Hit Local % Throughput (Gb/s) Demand (Gb/s) Wait Queue Length Number of Nodes 2

21 Throughput (Gb/s) Ideal FA GCC 1GB 73 GCC 1.5GB GCC 2GB Response Time Local Worker Caches (Gb/s) Remote Worker Caches (Gb/s) GPFS Throughput (Gb/s) GCC 4GB 3 sec vs 1569 sec 56X 21 MCH 4GB 46 MCU 4GB Average Response Time (sec) Throughput: Average: 14Gb/s vs 4Gb/s Peak: 81Gb/s vs. 6Gb/s 1569 FA 184 GCC 1GB 114 GCC 1.5GB GCC 2GB GCC 4GB 23 MCH 4GB 287 MCU 4GB 21

22 Performance Index: 34X higher Speedup 3.5X faster than GPFS Performance Index Speedup (comp. to LAN GPFS) FA GCC 1GB GCC 1.5GB GCC 2GB GCC 4GB GCC 4GB SRP MCH 4GB MCU 4GB Performance Index Speedup (compared to first-available) 22

23 2M tasks 1MB reads 1ms compute Vary arrival rate: Min: 1 task/sec Arrival rate function: Max: 1 tasks/sec 2 processors Ideal case: 655 sec 8Gb/s peak throughput A Arrival Rate (per sec) Arrival Rate (per sec) (sin( sqrt( time Arrival Rate Arrival Rate Number of Tasks Number of Tasks.11)* ) Time (sec) Time (sec) 1)*( time.11)* Number of Tasks Completed Number of Tasks Completed 23

24 GPFS 5.7 hrs, ~8Gb/s, 1138 CPU hrs Nodes Allocated Throughput (Gb/s) Queue Length (x1k) Time (sec) Throughput (Gb/s) Wait Queue Length Demand (Gb/s) Number of Nodes 24

25 GPFS 5.7 hrs, ~8Gb/s, 1138 CPU hrs GCC+SRP 1.8 hrs, ~25Gb/s, 361 CPU hrs Nodes Allocated Throughput (Gb/s) Queue Length (x1k) j 1% 9% 8% 7% 6% 5% 4% 3% 2% 1% % Cache Hit/Miss Time (sec) Cache Hit Local % Cache Hit Global % Cache Miss % Demand (Gb/s) Throughput (Gb/s) Wait Queue Length Number of Nodes 25

26 GPFS 5.7 hrs, ~8Gb/s, 1138 CPU hrs GCC+SRP 1.8 hrs, ~25Gb/s, 361 CPU hrs GCC+DRP 1.86 hrs, ~24Gb/s, 253 CPU hrs Nodes Allocated Throughput (Gb/s) Queue Length (x1k) % 9% 8% 7% 6% 5% 4% 3% 2% 1% % Cache Hit/Miss % Time (sec) Cache Miss % Cache Hit Global % Cache Hit Local % Throughput (Gb/s) Demand (Gb/s) Wait Queue Length Number of Nodes 26

27 5x5 25K tasks 24MB reads 1ms compute 2 CPUs 1x1 1M tasks 24MB reads 4sec compute 496 CPUs Ideal case: 655 sec 8Gb/s peak throughput All-Pairs( set A, set B, function F ) returns matrix M: Compare all elements of set A to all elements of set B via function F, yielding matrix M, such that M[i,j] = F(A[i],B[j]) 1 foreach $i in A 2 foreach $j in B 3 submit_job F $i $j 4 end 5 end [DIDC9] Towards Data Intensive Many-Task Computing, under review 27

28 Throughput (Gb/s) Efficiency: 75% 1% 9% 8% 7% 6% 5% 4% 3% 2% 1% % Cache Hit/Miss Time (sec) Cache Hit Local % Cache Hit Global % Cache Miss % Max Throughput (GPFS) Throughput (Data Diffusion) Max Throughput (Local Disk) 28

29 Throughput (Gb/s) Efficiency: 86% 1% 9% 8% 7% 6% 5% 4% 3% 2% 1% % Cache Hit/Miss Time (sec) Cache Hit Local % Cache Hit Global % Cache Miss % Max Throughput (GPFS) Throughput (Data Diffusion) Max Throughput (Local Memory) [DIDC9] Towards Data Intensive Many-Task Computing, under review 29

30 Pull vs. Push Data Diffusion Pulls task working set Incremental spanning forest Active Storage: Pushes workload working set to all nodes Static spanning tree Christopher Moretti, Douglas Thain, University of Notre Dame [DIDC9] Towards Data Intensive Many-Task Computing, under review Efficiency 1% 9% 8% 7% 6% 5% 4% 3% 2% 1% % Experiment 5x5 2 CPUs 1 sec 5x5 2 CPUs.1 sec 1x1 496 CPUs 4 sec 1x CPUs 4 sec 5x5 2 CPUs 1 sec Approach Best Case (active storage) Falkon (data diffusion) Best Case (active storage) Falkon (data diffusion) Best Case (active storage) Falkon (data diffusion) Best Case (active storage) Falkon (data diffusion) 5x5 2 CPUs.1 sec Experiment Local Disk/Memory (GB) Best Case (active storage) Falkon (data diffusion) Best Case (parallel file system) 1x1 496 CPUs 4 sec 1x CPUs 4 sec Network (node-to-node) (GB) Shared File System (GB)

31 Best to use active storage if Slow data source Workload working set fits on local node storage Best to use data diffusion if Medium to fast data source Task working set << workload working set Task working set fits on local node storage If task working set does not fit on local node storage Use parallel file system (i.e. GPFS, Lustre, PVFS, etc) 31

32 Purpose On-demand stacks of random locations within ~1TB dataset Challenge Processing Costs: O(1ms) per object Data Intensive: 4MB:1sec Rapid access to 1-1K random files Time-varying load [DADC8] Accelerating Large-scale Data Exploration through Data Diffusion [TG6] AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis AP = Sloan Data Locality Number of Objects Number of Files

33 open radec2xy readhdu+gettile+curl+convertarray calibration+interpolation+dostacking writestacking Time (ms) GPFS GZ LOCAL GZ GPFS FIT LOCAL FIT Filesystem and Image Format [DADC8] Accelerating Large-scale Data Exploration through Data Diffusion 33

34 Low data locality Similar (but better) performance to GPFS Time (ms) per stack per CPU Time (ms) per stack per CPU Data Diffusion (GZ) Data Data Diffusion Diffusion (FIT) (GZ) GPFS Data (GZ) Diffusion (FIT) GPFS GPFS (FIT) (GZ) GPFS (FIT) Number of CPUs Number of CPUs Time (ms) per stack per CPU Time (ms) per stack per CPU Data Diffusion (GZ) Data Data Diffusion Diffusion (FIT) (GZ) GPFS Data (GZ) Diffusion (FIT) GPFS GPFS (FIT) (GZ) GPFS (FIT) Number of CPUs Number of CPUs High data locality Near perfect scalability [DADC8] Accelerating Large-scale Data Exploration through Data Diffusion 34

35 Aggregate throughput: 39Gb/s 1X higher than GPFS Reduced load on GPFS.49Gb/s 1/1 of the original load Aggregate Throughput (Gb/s) Aggregate Throughput (Gb/s) Data Diffusion (GZ) 2 2 Data Data Diffusion (FIT) (GZ) 18 GPFS Data (GZ) Diffusion (FIT) 18 GPFS GPFS (FIT) (GZ) 16 GPFS (FIT) Ideal Ideal Locality Locality [DADC8] Accelerating Large-scale Data Exploration through Data Diffusion Time (ms) per stack per CPU Time (ms) per stack per CPU 5 Data Diffusion Throughput Local 5 Data Data Diffusion Diffusion Throughput Cache-to-Cache Local Data Diffusion Cache-to-Cache 45 Data Diffusion Throughput GPFS 45 GPFS Data Throughput Diffusion Throughput (FIT) GPFS 4 GPFS GPFS Throughput (GZ) (FIT) 4 GPFS Throughput (GZ) Locality Locality Big performance gains as locality increases 35

36 Data access patterns: write once, read many Task definition must include input/output files metadata Per task working set must fit in local storage Needs IP connectivity between hosts Needs local storage (disk, memory, etc) Needs Java

37 [Ghemawat3,Dean4]: MapReduce+GFS [Bialecki5]: Hadoop+HDFS [Gu6]: Sphere+Sector [Tatebe4]: Gfarm [Chervenak4]: RLS, DRS [Kosar6]: Stork Conclusions None focused on the co-location of storage and generic black box computations with data-aware scheduling while operating in a dynamic elastic environment Swift + Falkon + Data Diffusion is arguably a more generic and powerful solution than MapReduce 37

38 Identified that data locality is crucial to the efficient use of large scale distributed systems for data-intensive applications Data Diffusion Integrated streamlined task dispatching with data aware scheduling policies Heuristics to maximize real world performance Suitable for varying, data-intensive workloads Proof of O(NM) Competitive Caching 38

39 Falkon is a real system Late 25: Initial prototype, AstroPortal January 27: Falkon v November 27: Globus incubator project v.1 February 29: Globus incubator project v.9 Implemented in Java (~2K lines of code) and C (~1K lines of code) Open source: svn co Source code contributors (beside myself) Yong Zhao, Zhao Zhang, Ben Clifford, Mihael Hategan [Globus7] Falkon: A Proposal for Project Globus Incubation 39

40 Workload 16K CPUs 1M tasks 6 sec per task 2 CPU years in 453 sec Throughput: 2312 tasks/sec 85% efficiency [TPDS9] Middleware Support for Many-Task Computing, under preparation 4

41 [TPDS9] Middleware Support for Many-Task Computing, under preparation 41

42 ACM MTAGS9 SC9 Due Date: August 1 st, 29 42

43 IEEE TPDS Journal Special Issue on MTC Due Date: December 1 st, 29 43

44 More information: Other publications: Falkon: Swift: Funding: NASA: Ames Research Center, GSRP DOE: Office of Advanced Scientific Computing Research, Office of Science, U.S. Dept. of Energy NSF: TeraGrid Relevant activities: ACM MTAGS9 Workshop at Supercomputing 29 Special Issue on MTC in IEEE TPDS Journal 44

NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC

NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Segregated storage and compute NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Co-located storage and compute HDFS, GFS Data

More information

Synonymous with supercomputing Tightly-coupled applications Implemented using Message Passing Interface (MPI) Large of amounts of computing for short

Synonymous with supercomputing Tightly-coupled applications Implemented using Message Passing Interface (MPI) Large of amounts of computing for short Synonymous with supercomputing Tightly-coupled applications Implemented using Message Passing Interface (MPI) Large of amounts of computing for short periods of time Usually requires low latency interconnects

More information

Typically applied in clusters and grids Loosely-coupled applications with sequential jobs Large amounts of computing for long periods of times

Typically applied in clusters and grids Loosely-coupled applications with sequential jobs Large amounts of computing for long periods of times Typically applied in clusters and grids Loosely-coupled applications with sequential jobs Large amounts of computing for long periods of times Measured in operations per month or years 2 Bridge the gap

More information

Ioan Raicu. Distributed Systems Laboratory Computer Science Department University of Chicago

Ioan Raicu. Distributed Systems Laboratory Computer Science Department University of Chicago The Quest for Scalable Support of Data Intensive Applications in Distributed Systems Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago In Collaboration with: Ian

More information

Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago

Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Running 1 Million Jobs in 10 Minutes via the Falkon Fast and Light-weight Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago In Collaboration with: Ian Foster,

More information

Overview Past Work Future Work. Motivation Proposal. Work-in-Progress

Overview Past Work Future Work. Motivation Proposal. Work-in-Progress Overview Past Work Future Work Motivation Proposal Work-in-Progress 2 HPC: High-Performance Computing Synonymous with supercomputing Tightly-coupled applications Implemented using Message Passing Interface

More information

Managing and Executing Loosely-Coupled Large-Scale Applications on Clusters, Grids, and Supercomputers

Managing and Executing Loosely-Coupled Large-Scale Applications on Clusters, Grids, and Supercomputers Managing and Executing Loosely-Coupled Large-Scale Applications on Clusters, Grids, and Supercomputers Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Collaborators:

More information

A Data Diffusion Approach to Large Scale Scientific Exploration

A Data Diffusion Approach to Large Scale Scientific Exploration A Data Diffusion Approach to Large Scale Scientific Exploration Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Joint work with: Yong Zhao: Microsoft Ian Foster:

More information

Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago

Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Falkon, a Fast and Light-weight task execution framework for Clusters, Grids, and Supercomputers Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago In Collaboration

More information

The Quest for Scalable Support of Data-Intensive Workloads in Distributed Systems

The Quest for Scalable Support of Data-Intensive Workloads in Distributed Systems The Quest for Scalable Support of Data-Intensive Workloads in Distributed Systems Ioan Raicu, 1 Ian T. Foster, 1,2,3 Yong Zhao 4 Philip Little, 5 Christopher M. Moretti, 5 Amitabh Chaudhary, 5 Douglas

More information

MOHA: Many-Task Computing Framework on Hadoop

MOHA: Many-Task Computing Framework on Hadoop Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction

More information

Ian Foster, An Overview of Distributed Systems

Ian Foster, An Overview of Distributed Systems The advent of computation can be compared, in terms of the breadth and depth of its impact on research and scholarship, to the invention of writing and the development of modern mathematics. Ian Foster,

More information

Arguably one of the most fundamental discipline that touches all other disciplines and people

Arguably one of the most fundamental discipline that touches all other disciplines and people The scientific and mathematical approach in information technology and computing Started in the 1960s from Mathematics or Electrical Engineering Today: Arguably one of the most fundamental discipline that

More information

Ioan Raicu. Everyone else. More information at: Background? What do you want to get out of this course?

Ioan Raicu. Everyone else. More information at: Background? What do you want to get out of this course? Ioan Raicu More information at: http://www.cs.iit.edu/~iraicu/ Everyone else Background? What do you want to get out of this course? 2 Data Intensive Computing is critical to advancing modern science Applies

More information

Storage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore

Storage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore Storage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago DSL Seminar November st, 006 Analysis

More information

Data Diffusion: Dynamic Resource Provision and Data-Aware Scheduling for Data-Intensive Applications

Data Diffusion: Dynamic Resource Provision and Data-Aware Scheduling for Data-Intensive Applications Data Diffusion: Dynamic Resource Provision and Data-Aware Scheduling for Data-Intensive Applications Ioan Raicu, Yong Zhao 2, Ian Foster,3,4, Alex Szalay 5 iraicu@cs.uchicago.edu, yozha@microsoft.com,

More information

Keywords: many-task computing; MTC; high-throughput computing; resource management; Falkon; Swift

Keywords: many-task computing; MTC; high-throughput computing; resource management; Falkon; Swift Editorial Manager(tm) for Cluster Computing Manuscript Draft Manuscript Number: Title: Middleware Support for Many-Task Computing Article Type: HPDC Special Issue Section/Category: Keywords: many-task

More information

Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work

Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work Today (2014):

More information

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

MATE-EC2: A Middleware for Processing Data with Amazon Web Services MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering

More information

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Crossing the Chasm: Sneaking a parallel file system into Hadoop Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large

More information

Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets

Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets Page 1 of 5 1 Year 1 Proposal Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets Year 1 Progress Report & Year 2 Proposal In order to setup the context for this progress

More information

Accelerating Large Scale Scientific Exploration through Data Diffusion

Accelerating Large Scale Scientific Exploration through Data Diffusion Accelerating Large Scale Scientific Exploration through Data Diffusion Ioan Raicu *, Yong Zhao *, Ian Foster #*+, Alex Szalay - {iraicu,yongzh }@cs.uchicago.edu, foster@mcs.anl.gov, szalay@jhu.edu * Department

More information

Towards Data Intensive Many-Task Computing

Towards Data Intensive Many-Task Computing Towards Data Intensive Many-Task Computing Ioan Raicu 1, Ian Foster 1,2,3, Yong Zhao 4, Alex Szalay 5 Philip Little 6, Christopher M. Moretti 6, Amitabh Chaudhary 6, Douglas Thain 6 iraicu@cs.uchicago.edu,

More information

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Crossing the Chasm: Sneaking a parallel file system into Hadoop Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large

More information

The Fusion Distributed File System

The Fusion Distributed File System Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique

More information

Extreme-scale scripting: Opportunities for large taskparallel applications on petascale computers

Extreme-scale scripting: Opportunities for large taskparallel applications on petascale computers Extreme-scale scripting: Opportunities for large taskparallel applications on petascale computers Michael Wilde, Ioan Raicu, Allan Espinosa, Zhao Zhang, Ben Clifford, Mihael Hategan, Kamil Iskra, Pete

More information

Assistant Professor at Illinois Institute of Technology (CS) Director of the Data-Intensive Distributed Systems Laboratory (DataSys)

Assistant Professor at Illinois Institute of Technology (CS) Director of the Data-Intensive Distributed Systems Laboratory (DataSys) Current position: Assistant Professor at Illinois Institute of Technology (CS) Director of the Data-Intensive Distributed Systems Laboratory (DataSys) Guest Research Faculty, Argonne National Laboratory

More information

Data Management in Parallel Scripting

Data Management in Parallel Scripting Data Management in Parallel Scripting Zhao Zhang 11/11/2012 Problem Statement Definition: MTC applications are those applications in which existing sequential or parallel programs are linked by files output

More information

Assistant Professor at Illinois Institute of Technology (CS) Director of the Data-Intensive Distributed Systems Laboratory (DataSys)

Assistant Professor at Illinois Institute of Technology (CS) Director of the Data-Intensive Distributed Systems Laboratory (DataSys) Current position: Assistant Professor at Illinois Institute of Technology (CS) Director of the Data-Intensive Distributed Systems Laboratory (DataSys) Guest Research Faculty, Argonne National Laboratory

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

Introduction to Grid Computing

Introduction to Grid Computing Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able

More information

Case Studies in Storage Access by Loosely Coupled Petascale Applications

Case Studies in Storage Access by Loosely Coupled Petascale Applications Case Studies in Storage Access by Loosely Coupled Petascale Applications Justin M Wozniak and Michael Wilde Petascale Data Storage Workshop at SC 09 Portland, Oregon November 15, 2009 Outline Scripted

More information

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Programming Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Computing Only required amount of CPU and storage can be used anytime from anywhere via network Availability, throughput, reliability

More information

Disk Cache-Aware Task Scheduling

Disk Cache-Aware Task Scheduling Disk ache-aware Task Scheduling For Data-Intensive and Many-Task Workflow Masahiro Tanaka and Osamu Tatebe University of Tsukuba, JST/REST Japan Science and Technology Agency IEEE luster 2014 2014-09-24

More information

System Software for Big Data and Post Petascale Computing

System Software for Big Data and Post Petascale Computing The Japanese Extreme Big Data Workshop February 26, 2014 System Software for Big Data and Post Petascale Computing Osamu Tatebe University of Tsukuba I/O performance requirement for exascale applications

More information

Storage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore Ioan Raicu, Ian Foster

Storage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore Ioan Raicu, Ian Foster Storage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore Ioan Raicu, Ian Foster. Overview Both the industry and academia have an increase demand for good policies and mechanisms to

More information

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning September 22 nd 2015 Tommaso Cecchi 2 What is IME? This breakthrough, software defined storage application

More information

Parallel File Systems for HPC

Parallel File Systems for HPC Introduction to Scuola Internazionale Superiore di Studi Avanzati Trieste November 2008 Advanced School in High Performance and Grid Computing Outline 1 The Need for 2 The File System 3 Cluster & A typical

More information

Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work

Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work Today (2014):

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)

More information

Grid Scheduling Architectures with Globus

Grid Scheduling Architectures with Globus Grid Scheduling Architectures with Workshop on Scheduling WS 07 Cetraro, Italy July 28, 2007 Ignacio Martin Llorente Distributed Systems Architecture Group Universidad Complutense de Madrid 1/38 Contents

More information

Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov

Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright njwright @ lbl.gov NERSC- National Energy Research Scientific Computing Center Mission: Accelerate the pace of scientific discovery

More information

Accelerating Parallel Analysis of Scientific Simulation Data via Zazen

Accelerating Parallel Analysis of Scientific Simulation Data via Zazen Accelerating Parallel Analysis of Scientific Simulation Data via Zazen Tiankai Tu, Charles A. Rendleman, Patrick J. Miller, Federico Sacerdoti, Ron O. Dror, and David E. Shaw D. E. Shaw Research Motivation

More information

Today (2010): Multicore Computing 80. Near future (~2018): Manycore Computing Number of Cores Processing

Today (2010): Multicore Computing 80. Near future (~2018): Manycore Computing Number of Cores Processing Number of Cores Manufacturing Process 300 250 200 150 100 50 0 2004 2006 2008 2010 2012 2014 2016 2018 100 Today (2010): Multicore Computing 80 1~12 cores commodity architectures 70 60 80 cores proprietary

More information

Forming an ad-hoc nearby storage, based on IKAROS and social networking services

Forming an ad-hoc nearby storage, based on IKAROS and social networking services Forming an ad-hoc nearby storage, based on IKAROS and social networking services Christos Filippidis1, Yiannis Cotronis2 and Christos Markou1 1 Institute of Nuclear & Particle Physics, NCSR Demokritos,

More information

Extreme scripting and other adventures in data-intensive computing

Extreme scripting and other adventures in data-intensive computing Extreme scripting and other adventures in data-intensive computing Ian Foster Allan Espinosa, Ioan Raicu, Mike Wilde, Zhao Zhang Computation Institute Argonne National Lab & University of Chicago How data

More information

A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing

A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou, K. Magoutis, M. Marazakis, A. Bilas Institute of Computer Science (ICS) Foundation for Research and

More information

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype? Big data hype? Big Data: Hype or Hallelujah? Data Base and Data Mining Group of 2 Google Flu trends On the Internet February 2010 detected flu outbreak two weeks ahead of CDC data Nowcasting http://www.internetlivestats.com/

More information

Data Movement & Storage Using the Data Capacitor Filesystem

Data Movement & Storage Using the Data Capacitor Filesystem Data Movement & Storage Using the Data Capacitor Filesystem Justin Miller jupmille@indiana.edu http://pti.iu.edu/dc Big Data for Science Workshop July 2010 Challenges for DISC Keynote by Alex Szalay identified

More information

Scientific Workflows and Cloud Computing. Gideon Juve USC Information Sciences Institute

Scientific Workflows and Cloud Computing. Gideon Juve USC Information Sciences Institute Scientific Workflows and Cloud Computing Gideon Juve USC Information Sciences Institute gideon@isi.edu Scientific Workflows Loosely-coupled parallel applications Expressed as directed acyclic graphs (DAGs)

More information

ZHT A Fast, Reliable and Scalable Zero- hop Distributed Hash Table

ZHT A Fast, Reliable and Scalable Zero- hop Distributed Hash Table ZHT A Fast, Reliable and Scalable Zero- hop Distributed Hash Table 1 What is KVS? Why to use? Why not to use? Who s using it? Design issues A storage system A distributed hash table Spread simple structured

More information

Advances of parallel computing. Kirill Bogachev May 2016

Advances of parallel computing. Kirill Bogachev May 2016 Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being

More information

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files Addressable by a filename ( foo.txt ) Usually supports hierarchical

More information

The Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler

The Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler The Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler MSST 10 Hadoop in Perspective Hadoop scales computation capacity, storage capacity, and I/O bandwidth by

More information

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009 Magellan Project Jeff Broughton NERSC Systems Department Head October 7, 2009 1 Magellan Background National Energy Research Scientific Computing Center (NERSC) Argonne Leadership Computing Facility (ALCF)

More information

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support Data Management Dr David Henty HPC Training and Support d.henty@epcc.ed.ac.uk +44 131 650 5960 Overview Lecture will cover Why is IO difficult Why is parallel IO even worse Lustre GPFS Performance on ARCHER

More information

The RAMDISK Storage Accelerator

The RAMDISK Storage Accelerator The RAMDISK Storage Accelerator A Method of Accelerating I/O Performance on HPC Systems Using RAMDISKs Tim Wickberg, Christopher D. Carothers wickbt@rpi.edu, chrisc@cs.rpi.edu Rensselaer Polytechnic Institute

More information

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Torben Kling-Petersen, PhD Presenter s Name Principle Field Title andengineer Division HPC &Cloud LoB SunComputing Microsystems

More information

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1 Acknowledgements: Petra Kogel Sami Saarinen Peter Towers 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1 Motivation Opteron and P690+ clusters MPI communications IFS Forecast Model IFS 4D-Var

More information

Data Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures. 13 November 2016

Data Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures. 13 November 2016 National Aeronautics and Space Administration Data Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures 13 November 2016 Carrie Spear (carrie.e.spear@nasa.gov) HPC Architect/Contractor

More information

IBM Spectrum Scale IO performance

IBM Spectrum Scale IO performance IBM Spectrum Scale 5.0.0 IO performance Silverton Consulting, Inc. StorInt Briefing 2 Introduction High-performance computing (HPC) and scientific computing are in a constant state of transition. Artificial

More information

Improved Solutions for I/O Provisioning and Application Acceleration

Improved Solutions for I/O Provisioning and Application Acceleration 1 Improved Solutions for I/O Provisioning and Application Acceleration August 11, 2015 Jeff Sisilli Sr. Director Product Marketing jsisilli@ddn.com 2 Why Burst Buffer? The Supercomputing Tug-of-War A supercomputer

More information

SAS Enterprise Miner Performance on IBM System p 570. Jan, Hsian-Fen Tsao Brian Porter Harry Seifert. IBM Corporation

SAS Enterprise Miner Performance on IBM System p 570. Jan, Hsian-Fen Tsao Brian Porter Harry Seifert. IBM Corporation SAS Enterprise Miner Performance on IBM System p 570 Jan, 2008 Hsian-Fen Tsao Brian Porter Harry Seifert IBM Corporation Copyright IBM Corporation, 2008. All Rights Reserved. TABLE OF CONTENTS ABSTRACT...3

More information

Falkon: a Fast and Light-weight task execution framework

Falkon: a Fast and Light-weight task execution framework : a Fast and Light-weight task execution framework Ioan Raicu *, Yong Zhao *, Catalin Dumitrescu *, Ian Foster #*+, Mike Wilde #+ * Department of Computer Science, University of Chicago, IL, USA + Computation

More information

Beyond Petascale. Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center

Beyond Petascale. Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center Beyond Petascale Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center GPFS Research and Development! GPFS product originated at IBM Almaden Research Laboratory! Research continues to

More information

Lecture 11 Hadoop & Spark

Lecture 11 Hadoop & Spark Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem

More information

Characterizing Storage Resources Performance in Accessing the SDSS Dataset Ioan Raicu Date:

Characterizing Storage Resources Performance in Accessing the SDSS Dataset Ioan Raicu Date: Characterizing Storage Resources Performance in Accessing the SDSS Dataset Ioan Raicu Date: 8-17-5 Table of Contents Table of Contents...1 Table of Figures...1 1 Overview...4 2 Experiment Description...4

More information

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad

More information

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete 1 DDN Who We Are 2 We Design, Deploy and Optimize Storage Systems Which Solve HPC, Big Data and Cloud Business

More information

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT. Chapter 4:- Introduction to Grid and its Evolution Prepared By:- Assistant Professor SVBIT. Overview Background: What is the Grid? Related technologies Grid applications Communities Grid Tools Case Studies

More information

Fast Forward I/O & Storage

Fast Forward I/O & Storage Fast Forward I/O & Storage Eric Barton Lead Architect 1 Department of Energy - Fast Forward Challenge FastForward RFP provided US Government funding for exascale research and development Sponsored by 7

More information

Distributed NoSQL Storage for Extreme-scale System Services

Distributed NoSQL Storage for Extreme-scale System Services Distributed NoSQL Storage for Extreme-scale System Services Short Bio 6 th year PhD student from DataSys Lab, CS Department, Illinois Institute oftechnology,chicago Academic advisor: Dr. Ioan Raicu Research

More information

Analytics in the cloud

Analytics in the cloud Analytics in the cloud Dow we really need to reinvent the storage stack? R. Ananthanarayanan, Karan Gupta, Prashant Pandey, Himabindu Pucha, Prasenjit Sarkar, Mansi Shah, Renu Tewari Image courtesy NASA

More information

Emerging Technologies for HPC Storage

Emerging Technologies for HPC Storage Emerging Technologies for HPC Storage Dr. Wolfgang Mertz CTO EMEA Unstructured Data Solutions June 2018 The very definition of HPC is expanding Blazing Fast Speed Accessibility and flexibility 2 Traditional

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

Parallel File Systems. John White Lawrence Berkeley National Lab

Parallel File Systems. John White Lawrence Berkeley National Lab Parallel File Systems John White Lawrence Berkeley National Lab Topics Defining a File System Our Specific Case for File Systems Parallel File Systems A Survey of Current Parallel File Systems Implementation

More information

Reduction of Workflow Resource Consumption Using a Density-based Clustering Model

Reduction of Workflow Resource Consumption Using a Density-based Clustering Model Reduction of Workflow Resource Consumption Using a Density-based Clustering Model Qimin Zhang, Nathaniel Kremer-Herman, Benjamin Tovar, and Douglas Thain WORKS Workshop at Supercomputing 2018 The Cooperative

More information

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE DELL EMC ISILON F800 AND H600 I/O PERFORMANCE ABSTRACT This white paper provides F800 and H600 performance data. It is intended for performance-minded administrators of large compute clusters that access

More information

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE BRETT WENINGER, MANAGING DIRECTOR 10/21/2014 ADURANT APPROACH TO BIG DATA Align to Un/Semi-structured Data Instead of Big Scale out will become Big Greatest

More information

Job sample: SCOPE (VLDBJ, 2012)

Job sample: SCOPE (VLDBJ, 2012) Apollo High level SQL-Like language The job query plan is represented as a DAG Tasks are the basic unit of computation Tasks are grouped in Stages Execution is driven by a scheduler Job sample: SCOPE (VLDBJ,

More information

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments LCI HPC Revolution 2005 26 April 2005 Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments Matthew Woitaszek matthew.woitaszek@colorado.edu Collaborators Organizations National

More information

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017 Hadoop File System 1 S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y Moving Computation is Cheaper than Moving Data Motivation: Big Data! What is BigData? - Google

More information

PERFORMANCE ANALYSIS AND OPTIMIZATION OF MULTI-CLOUD COMPUITNG FOR LOOSLY COUPLED MTC APPLICATIONS

PERFORMANCE ANALYSIS AND OPTIMIZATION OF MULTI-CLOUD COMPUITNG FOR LOOSLY COUPLED MTC APPLICATIONS PERFORMANCE ANALYSIS AND OPTIMIZATION OF MULTI-CLOUD COMPUITNG FOR LOOSLY COUPLED MTC APPLICATIONS V. Prasathkumar, P. Jeevitha Assiatant Professor, Department of Information Technology Sri Shakthi Institute

More information

The Red Storm System: Architecture, System Update and Performance Analysis

The Red Storm System: Architecture, System Update and Performance Analysis The Red Storm System: Architecture, System Update and Performance Analysis Douglas Doerfler, Jim Tomkins Sandia National Laboratories Center for Computation, Computers, Information and Mathematics LACSI

More information

New User Seminar: Part 2 (best practices)

New User Seminar: Part 2 (best practices) New User Seminar: Part 2 (best practices) General Interest Seminar January 2015 Hugh Merz merz@sharcnet.ca Session Outline Submitting Jobs Minimizing queue waits Investigating jobs Checkpointing Efficiency

More information

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads Liran Zvibel CEO, Co-founder WekaIO @liranzvibel 1 WekaIO Matrix: Full-featured and Flexible Public or Private S3 Compatible

More information

Optimizing Load Balancing and Data-Locality with Data-aware Scheduling

Optimizing Load Balancing and Data-Locality with Data-aware Scheduling Optimizing Load Balancing and Data-Locality with Data-aware Scheduling Ke Wang *, Xiaobing Zhou, Tonglin Li *, Dongfang Zhao *, Michael Lang, Ioan Raicu * * Illinois Institute of Technology, Hortonworks

More information

PARDA: Proportional Allocation of Resources for Distributed Storage Access

PARDA: Proportional Allocation of Resources for Distributed Storage Access PARDA: Proportional Allocation of Resources for Distributed Storage Access Ajay Gulati, Irfan Ahmad, Carl Waldspurger Resource Management Team VMware Inc. USENIX FAST 09 Conference February 26, 2009 The

More information

EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD

EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD S.THIRUNAVUKKARASU 1, DR.K.P.KALIYAMURTHIE 2 Assistant Professor, Dept of IT, Bharath University, Chennai-73 1 Professor& Head, Dept of IT, Bharath

More information

Map Reduce. Yerevan.

Map Reduce. Yerevan. Map Reduce Erasmus+ @ Yerevan dacosta@irit.fr Divide and conquer at PaaS 100 % // Typical problem Iterate over a large number of records Extract something of interest from each Shuffle and sort intermediate

More information

MapReduce and Friends

MapReduce and Friends MapReduce and Friends Craig C. Douglas University of Wyoming with thanks to Mookwon Seo Why was it invented? MapReduce is a mergesort for large distributed memory computers. It was the basis for a web

More information

April Copyright 2013 Cloudera Inc. All rights reserved.

April Copyright 2013 Cloudera Inc. All rights reserved. Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and the Virtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here April 2014 Analytic Workloads on

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc Fuxi Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc {jiamang.wang, yongjun.wyj, hua.caihua, zhipeng.tzp, zhiqiang.lv,

More information

Lustre A Platform for Intelligent Scale-Out Storage

Lustre A Platform for Intelligent Scale-Out Storage Lustre A Platform for Intelligent Scale-Out Storage Rumi Zahir, rumi. May 2003 rumi.zahir@intel.com Agenda Problem Statement Trends & Current Data Center Storage Architectures The Lustre File System Project

More information

davidklee.net gplus.to/kleegeek linked.com/a/davidaklee

davidklee.net gplus.to/kleegeek linked.com/a/davidaklee @kleegeek davidklee.net gplus.to/kleegeek linked.com/a/davidaklee Specialties / Focus Areas / Passions: Performance Tuning & Troubleshooting Virtualization Cloud Enablement Infrastructure Architecture

More information

Introduction to FREE National Resources for Scientific Computing. Dana Brunson. Jeff Pummill

Introduction to FREE National Resources for Scientific Computing. Dana Brunson. Jeff Pummill Introduction to FREE National Resources for Scientific Computing Dana Brunson Oklahoma State University High Performance Computing Center Jeff Pummill University of Arkansas High Peformance Computing Center

More information

Warehouse- Scale Computing and the BDAS Stack

Warehouse- Scale Computing and the BDAS Stack Warehouse- Scale Computing and the BDAS Stack Ion Stoica UC Berkeley UC BERKELEY Overview Workloads Hardware trends and implications in modern datacenters BDAS stack What is Big Data used For? Reports,

More information

Analyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016

Analyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016 Analyzing the High Performance Parallel I/O on LRZ HPC systems Sandra Méndez. HPC Group, LRZ. June 23, 2016 Outline SuperMUC supercomputer User Projects Monitoring Tool I/O Software Stack I/O Analysis

More information

Clouds: An Opportunity for Scientific Applications?

Clouds: An Opportunity for Scientific Applications? Clouds: An Opportunity for Scientific Applications? Ewa Deelman USC Information Sciences Institute Acknowledgements Yang-Suk Ki (former PostDoc, USC) Gurmeet Singh (former Ph.D. student, USC) Gideon Juve

More information