NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC

Similar documents
NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC

Ioan Raicu. Distributed Systems Laboratory Computer Science Department University of Chicago

Synonymous with supercomputing Tightly-coupled applications Implemented using Message Passing Interface (MPI) Large of amounts of computing for short

Typically applied in clusters and grids Loosely-coupled applications with sequential jobs Large amounts of computing for long periods of times

Managing and Executing Loosely-Coupled Large-Scale Applications on Clusters, Grids, and Supercomputers

Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago

Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago

Overview Past Work Future Work. Motivation Proposal. Work-in-Progress

The Quest for Scalable Support of Data-Intensive Workloads in Distributed Systems

MOHA: Many-Task Computing Framework on Hadoop

Data Diffusion: Dynamic Resource Provision and Data-Aware Scheduling for Data-Intensive Applications

A Data Diffusion Approach to Large Scale Scientific Exploration

Keywords: many-task computing; MTC; high-throughput computing; resource management; Falkon; Swift

Ian Foster, An Overview of Distributed Systems

Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work

Accelerating Large Scale Scientific Exploration through Data Diffusion

Arguably one of the most fundamental discipline that touches all other disciplines and people

Ioan Raicu. Everyone else. More information at: Background? What do you want to get out of this course?

Crossing the Chasm: Sneaking a parallel file system into Hadoop

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

Towards Data Intensive Many-Task Computing

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

The Fusion Distributed File System

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

Storage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore

Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work

Disk Cache-Aware Task Scheduling

Data Management in Parallel Scripting

System Software for Big Data and Post Petascale Computing

Assistant Professor at Illinois Institute of Technology (CS) Director of the Data-Intensive Distributed Systems Laboratory (DataSys)

Parallel File Systems for HPC

The RAMDISK Storage Accelerator

Today (2010): Multicore Computing 80. Near future (~2018): Manycore Computing Number of Cores Processing

Storage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore Ioan Raicu, Ian Foster

Introduction to Grid Computing

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

Assistant Professor at Illinois Institute of Technology (CS) Director of the Data-Intensive Distributed Systems Laboratory (DataSys)

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

Scientific Workflows and Cloud Computing. Gideon Juve USC Information Sciences Institute

ZHT A Fast, Reliable and Scalable Zero- hop Distributed Hash Table

Advances of parallel computing. Kirill Bogachev May 2016

Emerging Technologies for HPC Storage

Accelerating Parallel Analysis of Scientific Simulation Data via Zazen

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing

Forming an ad-hoc nearby storage, based on IKAROS and social networking services

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

Falkon: a Fast and Light-weight task execution framework

Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov

Lecture 11 Hadoop & Spark

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

Fast Forward I/O & Storage

The Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler

Extreme scripting and other adventures in data-intensive computing

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

Distributed Filesystem

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Improved Solutions for I/O Provisioning and Application Acceleration

Analyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Optimizing Load Balancing and Data-Locality with Data-aware Scheduling

Parallel File Systems. John White Lawrence Berkeley National Lab

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

Case Studies in Storage Access by Loosely Coupled Petascale Applications

Data Movement & Storage Using the Data Capacitor Filesystem

MapReduce and Friends

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Characterizing Storage Resources Performance in Accessing the SDSS Dataset Ioan Raicu Date:

IBM Spectrum Scale IO performance

Storage for HPC, HPDA and Machine Learning (ML)

Page 1. Goals for Today" Background of Cloud Computing" Sources Driving Big Data" CS162 Operating Systems and Systems Programming Lecture 24

SAS Enterprise Miner Performance on IBM System p 570. Jan, Hsian-Fen Tsao Brian Porter Harry Seifert. IBM Corporation

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009

Introduction to MapReduce

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

davidklee.net gplus.to/kleegeek linked.com/a/davidaklee

HPC File Systems and Storage. Irena Johnson University of Notre Dame Center for Research Computing

Adapted from: TRENDS AND ATTRIBUTES OF HORIZONTAL AND VERTICAL COMPUTING ARCHITECTURES

Executing dynamic heterogeneous workloads on Blue Waters with RADICAL-Pilot

Grid Scheduling Architectures with Globus

Analytics in the cloud

Grid Computing Competence Center Large Scale Computing Infrastructures (MINF 4526 HS2011)

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

GPFS for Life Sciences at NERSC

The Red Storm System: Architecture, System Update and Performance Analysis

Enosis: Bridging the Semantic Gap between

MixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

HPC Storage Use Cases & Future Trends

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

Map Reduce. Yerevan.

PARDA: Proportional Allocation of Resources for Distributed Storage Access

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc

Lustre A Platform for Intelligent Scale-Out Storage

Transcription:

Segregated storage and compute NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Co-located storage and compute HDFS, GFS Data centers at Google, Yahoo, and others Programming paradigm: MapReduce Others from academia: Sector, MosaStore, Chirp 2

Segregated storage Network and compute NAS NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Compute Programming paradigm: HPC, MTC, and HTC Co-located storage and compute Network Link(s) HDFS, GFS Fabric Resources Data centers at Google, Yahoo, and others Programming paradigm: MapReduce Others from academia: Sector, MosaStore, Chirp 3

Segregated storage and compute NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Co-located storage and compute HDFS, GFS Data centers at Google, Yahoo, and others Programming paradigm: MapReduce Others from academia: Sector, MosaStore, Chirp 4

Segregated storage Network and compute Fabric NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Compute & Storage Resources Programming paradigm: HPC, MTC, and HTC Co-located storage and compute HDFS, GFS Data centers at Google, Yahoo, and others Programming paradigm: MapReduce Others from academia: Sector, MosaStore, Chirp 5

MB/s per Processor Core Local Disk: 22-24: ANL/UC TG Site (7GB SCSI) Today: PADS (RAID-, 6 drives 75GB SATA) Cluster: 22-24: ANL/UC TG Site (GPFS, 8 servers, 1Gb/s each) Today: PADS (GPFS, SAN) Supercomputer: 1 1 1 22-24: IBM Blue Gene/L (GPFS).1 Today: IBM Blue Gene/P (GPFS) 1 Local Disk Cluster Supercomputer -2.2X -99X 22-24 Today -15X -438X 6

What if we could combine the scientific community s existing programming paradigms, but yet still exploit the data locality that naturally occurs in scientific workloads? 7

Network Fabric NAS Compute & Storage Resources Network Link(s) 8

Input Data Size Hi Med MapReduce/MTC (Data Analysis, Mining) MTC (Big Data and Many Tasks) Low HPC (Heroic MPI Tasks) HTC/MTC (Many Loosely Coupled Tasks) 1 1K 1M Number of Tasks [MTAGS8] Many-Task Computing for Grids and Supercomputers 9

Significant performance improvements can be obtained in the analysis of large dataset by leveraging information about data analysis workloads rather than individual data analysis tasks. Important concepts related to the hypothesis Workload: a complex query (or set of queries) decomposable into simpler tasks to answer broader analysis questions Data locality is crucial to the efficient use of large scale distributed systems for scientific and data-intensive applications Allocate computational and caching storage resources, co-scheduled to optimize workload performance 1

Resource acquired in response to demand Data diffuse from archival storage to newly acquired transient resources Resource caching allows faster responses to subsequent requests Task Dispatcher Data-Aware Scheduler Resources are released when demand drops Optimizes performance by coscheduling data and computations Decrease dependency of a shared/parallel file systems Critical to support data intensive MTC Idle Resources Persistent Storage Shared File System text Provisioned Resources [DADC8] Accelerating Large-scale Data Exploration through Data Diffusion 11

What would data diffusion look like in practice? Extend the Falkon framework User Task Dispatcher Data-Aware Scheduler Persistent Storage Dynamic Resource Provisioning Wait Queue Available Resources (GRAM4) Provisioned Resources Executor 1 Executor i Executor n text [SC7] Falkon: a Fast and Light-weight task execution framework 12

FA: first-available simple load balancing MCH: max-cache-hit maximize cache hits MCU: max-compute-util maximize processor utilization GCC: good-cache-compute maximize both cache hit and processor utilization at the same time [DADC8] Accelerating Large-scale Data Exploration through Data Diffusion 13

CPU Time per Task (ms) Throughput (tasks/sec) 3GHz dual CPUs ANL/UC TG with 128 processors Scheduling window 25 tasks Dataset 1K files 1 byte each Tasks Read 1 file Write 1 file 5 4 3 2 1 firstavailable without I/O Task Submit Notification for Task Availability Task Dispatch (data-aware scheduler) Task Results (data-aware scheduler) Notification for Task Results WS Communication Throughput (tasks/sec) firstavailable with I/O maxcompute-util max-cachehit goodcachecompute 5 4 3 2 1 [DIDC9] Towards Data Intensive Many-Task Computing, under review 14

Monotonically Increasing Workload Emphasizes increasing loads Sine-Wave Workload Emphasizes varying loads All-Pairs Workload Compare to best case model of active storage Image Stacking Workload (Astronomy) Evaluate data diffusion on a real large-scale dataintensive application from astronomy domain [DADC8] Accelerating Large-scale Data Exploration through Data Diffusion 15

Arrival Rate (per second) Tasks Completed 25K tasks 1MB reads 1ms compute Vary arrival rate: Min: 1 task/sec Increment function: CEILING(*1.3) Max: 1 tasks/sec 128 processors Ideal case: 1415 sec 8Gb/s peak throughput 1 9 8 7 6 5 4 3 2 1 Arrival Rate Tasks completed Time (sec) 25 2 15 1 5 16

Nodes Allocated Throughput (Gb/s) Queue Length (x1k) GPFS vs. ideal: 511 sec vs. 1415 sec 1 9 8 7 6 5 4 3 2 1 Throughput (Gb/s) Wait Queue Length Time (sec) Demand (Gb/s) Number of Nodes 17

Nodes Allocated Throughput (Gb/s) Queue Length (x1k) Cache Hit/Miss % Nodes Allocated Throughput (Gb/s) Queue Length (x1k) Cache Hit/Miss % CPU Utilization % 1 9 8 7 6 5 4 3 2 1 Max-compute-util 1.9.8.7.6.5.4.3.2.1 1 9 8 7 6 5 4 3 2 1 Max-cache-hit 1.9.8.7.6.5.4.3.2.1 Time (sec) Cache Miss % Cache Hit Global % Cache Hit Local % Throughput (Gb/s) Demand (Gb/s) Wait Queue Length Number of Nodes Time (sec) Cache Miss % Cache Hit Global % Cache Hit Local % Throughput (Gb/s) Demand (Gb/s) Wait Queue Length Number of Nodes CPU Utilization 18

Nodes Allocated Throughput (Gb/s) Queue Length (x1k) Cache Hit/Miss % Nodes Allocated Throughput (Gb/s) Queue Length (x1k) Cache Hit/Miss % Nodes Allocated Throughput (Gb/s) Queue Length (x1k) Cache Hit/Miss % Nodes Allocated Throughput (Gb/s) Queue Length (x1k) Cache Hit/Miss % 1 9 8 7 6 5 4 3 2 1 1% 9% 8% 7% 6% 5% 4% 3% 2% 1% % 1GB 1.5GB 1 9 8 7 6 5 4 3 2 1 1% 9% 8% 7% 6% 5% 4% 3% 2% 1% % Time (sec) Cache Miss % Cache Hit Global % Cache Hit Local % Demand (Gb/s) Throughput (Gb/s) Wait Queue Length Number of Nodes 1 9 8 7 6 5 4 3 2 1 1% 9% 8% 7% 6% 5% 4% 3% 2% 1% Time (sec) Cache Miss % Cache Hit Global % Cache Hit Local % Throughput (Gb/s) Demand (Gb/s) Wait Queue Length Number of Nodes % 2GB 4GB Time (sec) Cache Miss % Cache Hit Global % Cache Hit Local % Throughput (Gb/s) Demand (Gb/s) Wait Queue Length Number of Nodes 1 1 9 8 7 6 5 4 3 2 1 Time (sec) Cache Miss % Cache Hit Global % Cache Hit Local % Throughput (Gb/s) Demand (Gb/s) Wait Queue Length Number of Nodes 19.9.8.7.6.5.4.3.2.1

Nodes Allocated Throughput (Gb/s) Queue Length (x1k) Cache Hit/Miss % Data Diffusion vs. ideal: 1436 sec vs 1415 sec 1 9.9 8.8 7.7 6.6 5.5 4.4 3.3 2.2 1.1 1 Time (sec) Cache Miss % Cache Hit Global % Cache Hit Local % Throughput (Gb/s) Demand (Gb/s) Wait Queue Length Number of Nodes 2

Average Response Time (sec) Throughput (Gb/s) 2 18 16 14 12 1 8 6 4 2 8 6 12 Ideal FA GCC 1GB 73 GCC 1.5GB GCC 2GB 81 81 Local Worker Caches (Gb/s) Remote Worker Caches (Gb/s) GPFS Throughput (Gb/s) GCC 4GB 21 MCH 4GB 46 MCU 4GB Throughput: 18 16 14 12 1 8 Average: 14Gb/s vs 4Gb/s Peak: 81Gb/s vs. 6Gb/s 1569 184 6 Response Time 3 sec vs 1569 sec 56X 4 2 FA GCC 1GB 114 GCC 1.5GB 3.4 3.1 GCC 2GB GCC 4GB 23 MCH 4GB 287 MCU 4GB 21

Performance Index Speedup (comp. to LAN GPFS) 1 3.5 Performance Index: 34X higher Speedup 3.5X faster than GPFS.8.7.6.5.4.3.2 3 2.5 2 1.5.9.1 1 FA GCC 1GB GCC 1.5GB GCC 2GB GCC 4GB GCC 4GB SRP MCH 4GB MCU 4GB Performance Index Speedup (compared to first-available) 22

Arrival Rate (per sec) Number of Tasks Completed 2M tasks 1MB reads 1ms compute Vary arrival rate: Min: 1 task/sec Arrival rate function: Max: 1 tasks/sec 2 processors Ideal case: 655 sec 8Gb/s peak throughput A (sin( sqrt( time.11)*2.859678) 1)*( time 1 9 8 7 6 5 4 3 2 1 6 Arrival Rate Number of Tasks 12 18 24 3 36 Time (sec) 42 48 54 6 66.11)*5.75 2 18 16 14 12 1 8 6 4 2 23

Nodes Allocated Throughput (Gb/s) Queue Length (x1k) GPFS 5.7 hrs, ~8Gb/s, 1138 CPU hrs 1 9 8 7 6 5 4 3 2 1 Throughput (Gb/s) Wait Queue Length Time (sec) Demand (Gb/s) Number of Nodes 24

Nodes Allocated Throughput (Gb/s) Queue Length (x1k) Cache Hit/Miss GPFS 5.7 hrs, ~8Gb/s, 1138 CPU hrs GCC+SRP 1.8 hrs, ~25Gb/s, 361 CPU hrs 1 9 8 7 6 5 4 3 2 1 j 1% 9% 8% 7% 6% 5% 4% 3% 2% 1% % Time (sec) Cache Hit Local % Cache Hit Global % Cache Miss % Demand (Gb/s) Throughput (Gb/s) Wait Queue Length Number of Nodes 25

Nodes Allocated Throughput (Gb/s) Queue Length (x1k) Cache Hit/Miss % GPFS 5.7 hrs, ~8Gb/s, 1138 CPU hrs GCC+SRP 1.8 hrs, ~25Gb/s, 361 CPU hrs GCC+DRP 1.86 hrs, ~24Gb/s, 253 CPU hrs 1 9 8 7 6 5 4 3 2 1 1% 9% 8% 7% 6% 5% 4% 3% 2% 1% % Time (sec) Cache Miss % Cache Hit Global % Cache Hit Local % Throughput (Gb/s) Demand (Gb/s) Wait Queue Length Number of Nodes 26

5x5 25K tasks 24MB reads 1ms compute 2 CPUs 1x1 1M tasks 24MB reads 4sec compute 496 CPUs Ideal case: 655 sec 8Gb/s peak throughput All-Pairs( set A, set B, function F ) returns matrix M: Compare all elements of set A to all elements of set B via function F, yielding matrix M, such that M[i,j] = F(A[i],B[j]) 1 foreach $i in A 2 foreach $j in B 3 submit_job F $i $j 4 end 5 end [DIDC9] Towards Data Intensive Many-Task Computing, under review 27

Throughput (Gb/s) Cache Hit/Miss 1. 9. 8. 7. 6. 5. 4. 3. 2. 1.. Efficiency: 75% 1% 9% 8% 7% 6% 5% 4% 3% 2% 1% % Time (sec) Cache Hit Local % Cache Hit Global % Cache Miss % Max Throughput (GPFS) Throughput (Data Diffusion) Max Throughput (Local Disk) 28

Throughput (Gb/s) Cache Hit/Miss 2. 18. 16. 14. 12. 1. 8. 6. 4. 2.. Efficiency: 86% 1% 9% 8% 7% 6% 5% 4% 3% 2% 1% % Time (sec) Cache Hit Local % Cache Hit Global % Cache Miss % Max Throughput (GPFS) Throughput (Data Diffusion) Max Throughput (Local Memory) [DIDC9] Towards Data Intensive Many-Task Computing, under review 29

Efficiency Pull vs. Push Data Diffusion Pulls task working set Incremental spanning forest Active Storage: Pushes workload working set to all nodes Static spanning tree Christopher Moretti, Douglas Thain, University of Notre Dame [DIDC9] Towards Data Intensive Many-Task Computing, under review 1% 9% 8% 7% 6% 5% 4% 3% 2% 1% % Experiment 5x5 2 CPUs 1 sec 5x5 2 CPUs.1 sec 1x1 496 CPUs 4 sec 1x1 5832 CPUs 4 sec 5x5 2 CPUs 1 sec Approach Best Case (active storage) Falkon (data diffusion) Best Case (active storage) Falkon (data diffusion) Best Case (active storage) Falkon (data diffusion) Best Case (active storage) Falkon (data diffusion) 5x5 2 CPUs.1 sec Experiment Local Disk/Memory (GB) Best Case (active storage) Falkon (data diffusion) Best Case (parallel file system) 1x1 496 CPUs 4 sec 1x1 5832 CPUs 4 sec Network (node-to-node) (GB) Shared File System (GB) 6 1536 12 6 1698 34 6 1536 12 6 1528 62 24 12288 24 24 4676 384 24 12288 24 3 24 3867 96

Best to use active storage if Slow data source Workload working set fits on local node storage Best to use data diffusion if Medium to fast data source Task working set << workload working set Task working set fits on local node storage If task working set does not fit on local node storage Use parallel file system (i.e. GPFS, Lustre, PVFS, etc) 31

Purpose On-demand stacks of random locations within ~1TB dataset Challenge Processing Costs: O(1ms) per object Data Intensive: 4MB:1sec Rapid access to 1-1K random files Time-varying load [DADC8] Accelerating Large-scale Data Exploration through Data Diffusion [TG6] AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis AP Sloan Data Locality Number of Objects Number of Files 1 1117 1117 1.38 154345 111699 2 97999 49 3 88857 2962 4 76575 19145 5 659 1212 1 4648 465 2 446 225 3 23695 79 32 + + + + + + + =

Time (ms) 45 4 35 open radec2xy readhdu+gettile+curl+convertarray calibration+interpolation+dostacking writestacking 3 25 2 15 1 5 GPFS GZ LOCAL GZ GPFS FIT LOCAL FIT Filesystem and Image Format [DADC8] Accelerating Large-scale Data Exploration through Data Diffusion 33

Time (ms) per stack per CPU Time (ms) per stack per CPU Low data locality Similar (but better) performance to GPFS 2 18 16 14 12 1 8 6 4 2 2 4 8 16 32 64 128 Number of CPUs Data Diffusion (GZ) Data Diffusion (FIT) GPFS (GZ) GPFS (FIT) 2 18 16 14 12 1 8 6 4 2 Data Diffusion (GZ) Data Diffusion (FIT) GPFS (GZ) GPFS (FIT) 2 4 8 16 32 64 128 Number of CPUs High data locality Near perfect scalability [DADC8] Accelerating Large-scale Data Exploration through Data Diffusion 34

Time (ms) per stack per CPU Aggregate Throughput (Gb/s) Aggregate throughput: 39Gb/s 1X higher than GPFS Reduced load on GPFS.49Gb/s 2 18 16 14 12 1 8 6 4 2 1/1 of the original load Data Diffusion (GZ) Data Diffusion (FIT) GPFS (GZ) GPFS (FIT) 1 1.38 2 3 4 5 1 2 3 Ideal Locality [DADC8] Accelerating Large-scale Data Exploration through Data Diffusion 5 45 4 35 3 25 2 15 1 5 Data Diffusion Throughput Local Data Diffusion Throughput Cache-to-Cache Data Diffusion Throughput GPFS GPFS Throughput (FIT) GPFS Throughput (GZ) 1 1.38 2 3 4 5 1 2 3 Locality Big performance gains as locality increases 35

Data access patterns: write once, read many Task definition must include input/output files metadata Per task working set must fit in local storage Needs IP connectivity between hosts Needs local storage (disk, memory, etc) Needs Java 1.4+ 36

[Ghemawat3,Dean4]: MapReduce+GFS [Bialecki5]: Hadoop+HDFS [Gu6]: Sphere+Sector [Tatebe4]: Gfarm [Chervenak4]: RLS, DRS [Kosar6]: Stork Conclusions None focused on the co-location of storage and generic black box computations with data-aware scheduling while operating in a dynamic elastic environment Swift + Falkon + Data Diffusion is arguably a more generic and powerful solution than MapReduce 37

Identified that data locality is crucial to the efficient use of large scale distributed systems for data-intensive applications Data Diffusion Integrated streamlined task dispatching with data aware scheduling policies Heuristics to maximize real world performance Suitable for varying, data-intensive workloads Proof of O(NM) Competitive Caching 38

More information: Falkon: http://dev.globus.org/wiki/incubator/falkon Swift: http://www.ci.uchicago.edu/swift/index.php 44