A Data Diffusion Approach to Large Scale Scientific Exploration
|
|
- Giles Reynolds
- 6 years ago
- Views:
Transcription
1 A Data Diffusion Approach to Large Scale Scientific Exploration Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Joint work with: Yong Zhao: Microsoft Ian Foster: Univ. of Chicago, CS & CI, Argonne National Laboratory, MCS Alex Szalay: The Johns Hopkins Univ., Dept. of Physics and Astronomy Funded by: NSF: TeraGrid DOE: Advanced Scientific Computing Research NASA: Ames Research Center 27 Microsoft escience Workshop at RENCI October 21 st, 27
2 Data Diffusion Example: AstroPortal Stacking Service Purpose On-demand stacks of random locations within ~1TB dataset Challenge Rapid access to 1-1K random files Time-varying load Solution Dynamic acquisition of compute, storage 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 2 S 4 Web page or Web Service = Sloan Data
3 Falkon: a Fast and Light-weight task execution framework Goal: enable the rapid and efficient execution of many independent jobs on large compute clusters Combines three techniques: multi-level scheduling techniques to enable separate treatments of resource provisioning a streamlined task dispatcher able to achieve order-of-magnitude higher task dispatch rates than conventional schedulers performs data diffusion and uses a data-aware scheduler to leverage the co-located computational and storage resources 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 3
4 Execution Model Dispatch Policy first-available, max-compute-util, max-cache-hit* Caching Policy random, FIFO, LRU, LFU Replay policy Resource Acquisition Policy one-at-a-time, additive, exponential, all-at-once, optimal* Resource Release Policy distributed, centralized* 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 4
5 Falkon 2-Tier Architecture Tier 1: Dispatcher GT4 Web Service accepting task submissions from clients and sending them to available executors Tier 2: Executor Run tasks on local resources Provisioner Static and dynamic resource provisioning WS Dispatcher Provisioner Compute Resource 1 WS Clients Compute Resource m Compute Resources Executor 1 Executor n 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 5
6 Dispatch Throughput Throughput (tasks/sec) WS Calls (no security) Falkon (no security) Falkon (GSISecureConversation) Number of Executors 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 7
7 Execution Efficiency Efficiency Sleep 64 Sleep 32 Sleep 16 Sleep 8 Sleep 4 Sleep 2 Sleep Number of Executors 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 8
8 Resource Provisioning. provisioner registration 1. task(s) submit 2. resource allocation to GRAM 3. resource allocation to LRM 4. executor registration 5. notification for work 6. pick up task(s) 7. deliver task(s) results 8. notification for task result 9. pick up task(s) results 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 1
9 Dynamic Resource Provisioning End-to-end execution time: 126 sec in ideal case 494 sec 1276 sec Average task queue time: 42.2 sec in ideal case 611 sec 43.5 sec Trade-off: Resource Utilization for Execution Efficiency GRAM +PBS Ideal (32 nodes) Falkon-15 Falkon-6 Falkon-12 Falkon-18 Falkon- Queue Time (sec) Execution Time (sec) Execution Time % 8.5% 17.% 17.6% 19.3% 28.7% 29.2% 29.7% GRAM +PBS Ideal (32 nodes) Falkon-15 Falkon-6 Falkon-12 Falkon-18 Falkon- Time to complete (sec) Resouce Utilization 3% 89% 75% 65% 59% 44% 1% Execution Efficiency 26% 72% 75% 84% 85% 99% 1% Resource Allocations /22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 13
10 Data Diffusion Data caching at executor local disk RANDOM FIFO LRU LFU Avoid the need for a shared file system Theoretical linear Throughput (Mb/s) scalability with compute resources Local Read Local Read+Write GPFS Read GPFS Read+Write File Size (MB) /22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 14
11 Shared File System Performance Read Throughput (Mb/s) Nodes 32 Nodes 16 Nodes 8 Nodes 4 Nodes 2 Nodes 1 Node Read+Write Throughput (Mb/s) Nodes 32 Nodes 16 Nodes 8 Nodes 4 Nodes 2 Nodes 1 Node 1E File Size (MB) 1 1 1E File Size (MB) 1 1 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 15
12 Local Disk Theoretical Performance Read Throughput (Mb/s) Nodes 32 Nodes 16 Nodes 8 Nodes 4 Nodes 2 Nodes 1 Node Read+Write Throughput (Mb/s) Nodes 32 Nodes 16 Nodes 8 Nodes 4 Nodes 2 Nodes 1 Node 1E File Size (MB) 1 1 1E File Size (MB) 1 1 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 16
13 Data Diffusion: Micro-Benchmarks 1. Model (local disk): local disk performance model 2. Model (shared file system): shared file system (GPFS) performance model 3. Falkon (next-available policy): load balancing across available executors, operates on shared file system 4. Falkon (next-available policy) + Wrapper: same as (3), but tasks execute through a wrapper that creates a temp scratch directory on the shared file system, makes symbolic links, executes task, and removes the temp scratch directory and symbolic links 5. Falkon (first-available policy % locality): load balancing across available executors with data caching; % data locality with data read from the shared file system 6. Falkon (first-available policy 1% locality): same as (5), but with the workload from (5) repeating four times; the caches are first populated (not as part of the timed experiment) 7. Falkon (max-compute-util policy % locality): same as (5), but uses the dataaware scheduler to send tasks to the executors that have the most data cached; the workload has % data locality 8. Falkon (max-compute-util policy 1% locality): same as (7), but with the workload repeated four times 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 19
14 Micro-Benchmarks Results: 1-64 Nodes Read Throughput (Mb/s) Model (local disk) 2. Model (shared file system) 3. Falkon (next-available policy) 4. Falkon (next-available policy) + Wrapper 5. Falkon (first-available policy % locality) 6. Falkon (first-available policy 1% locality) 7. Falkon (max-compute-util policy % locality) 8. Falkon (max-compute-util policy 1% locality) Read+Write Throughput (Mb/s) Model (local disk) 2. Model (shared file system) 3. Falkon (next-available policy) 4. Falkon (next-available policy) + Wrapper 5. Falkon (first-available policy % locality) 6. Falkon (first-available policy 1% locality) 7. Falkon (max-compute-util policy % locality) 8. Falkon (max-compute-util policy 1% locality) Number of Nodes Number of Nodes 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 2
15 Micro-Benchmarks Results: 1 Node Read Throughput (Mb/s) Model (local disk) 2. Model (shared file system) 3. Falkon (next-available policy) 4. Falkon (next-available policy) + Wrapper 5. Falkon (first-available policy % locality) 6. Falkon (first-available policy 1% locality) 7. Falkon (max-compute-util policy % locality) 8. Falkon (max-compute-util policy 1% locality) Read+Write Throughput (Mb/s) Model (local disk) 2. Model (shared file system) 3. Falkon (next-available policy) 4. Falkon (next-available policy) + Wrapper 5. Falkon (first-available policy % locality) 6. Falkon (first-available policy 1% locality) 7. Falkon (max-compute-util policy % locality) 8. Falkon (max-compute-util policy 1% locality) 1B 1KB 1KB 1KB 1MB 1MB 1MB 1GB File Size 1B 1KB 1KB 1KB 1MB 1MB 1MB 1GB File Size 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 21
16 Micro-Benchmarks Results: 8 Nodes Read Throughput (Mb/s) Model (local disk) 2. Model (shared file system) 3. Falkon (next-available policy) 4. Falkon (next-available policy) + Wrapper 5. Falkon (first-available policy % locality) 6. Falkon (first-available policy 1% locality) 7. Falkon (max-compute-util policy % locality) 8. Falkon (max-compute-util policy 1% locality) Read+Write Throughput (Mb/s) Model (local disk) 2. Model (shared file system) 3. Falkon (next-available policy) 4. Falkon (next-available policy) + Wrapper 5. Falkon (first-available policy % locality) 6. Falkon (first-available policy 1% locality) 7. Falkon (max-compute-util policy % locality) 8. Falkon (max-compute-util policy 1% locality) 1B 1KB 1KB 1KB 1MB 1MB 1MB 1GB File Size 1B 1KB 1KB 1KB 1MB 1MB 1MB 1GB File Size 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 22
17 Micro-Benchmarks Results: 64 Nodes Read Throughput (Mb/s) Model (local disk) 2. Model (shared file system) 3. Falkon (next-available policy) 4. Falkon (next-available policy) + Wrapper 5. Falkon (first-available policy % locality) 6. Falkon (first-available policy 1% locality) 7. Falkon (max-compute-util policy % locality) 8. Falkon (max-compute-util policy 1% locality) Read+Write Throughput (Mb/s) Model (local disk) 2. Model (shared file system) 3. Falkon (next-available policy) 4. Falkon (next-available policy) + Wrapper 5. Falkon (first-available policy % locality) 6. Falkon (first-available policy 1% locality) 7. Falkon (max-compute-util policy % locality) 8. Falkon (max-compute-util policy 1% locality) 1B 1KB 1KB 1KB 1MB 1MB 1MB 1GB File Size 1B 1KB 1KB 1KB 1MB 1MB 1MB 1GB File Size 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 23
18 Task Dispatch Throughput: 64 Nodes, 1 Byte files on GPFS Read Read+Write Throughput (tasks/sec) No Data Access 3. Falkon (nextavailable policy) 4. Falkon (nextavailable policy) + Wrapper 5. Falkon (firstavailable policy % locality) 6. Falkon (firstavailable policy 1% locality) 7. Falkon (maxcompute-util policy compute-util policy 8. Falkon (max- % locality) 1% locality) 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 24
19 Dispatch Policies 1% 1%1%1% 1%1%1% 99% 1% 99% 1% 99% 1% 1% 97% 1% 97% Cache Hit Rate (local disk) 9% 8% 7% 6% 5% 4% 3% 63% 31% first-available max-compute-util max-cache-hit 2% 16% 1% % 8% 4% Number of Nodes 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 25
20 Applications via Swift Application #Tasks/workflow #Stages ATLAS: High Energy Physics Event Simulation 5K 1 fmri DBIC: AIRSN Image Processing 1s 12 FOAM: Ocean/Atmosphere Model 2 3 GADU: Genomics 4K 4 HNL: fmri Aphasia Study 5 4 NVO/NASA: Photorealistic Montage/Morphology 1s 16 QuarkNet/I2U2: Physics Science Education 1s 3 ~ 6 RadCAD: Radiology Specification 1s 5 Classifier Training Abstract SIDGrid: EEG Wavelet 1s 2 computation Processing, Gaze Analysis SDSS: Coadd, Cluster Search 4K, 5K 2, 8 SDSS: Stacking, AstroPortal 1Ks ~ 1Ks 2 ~ 4 SwiftScript MolDyn: Molecular Dynamics 1Ks ~ 2Ks 8 Compiler Virtual Data Catalog SwiftScript Swift Architecture Scheduling Execution Engine (Karajan w/ Swift Runtime) C C C C Swift runtime callouts Status reporting Provenance collector Execution Virtual Node(s) Provenance data launcher Provenance data launcher file1 App F1 file2 App F2 file3 Provisioning Falkon Resource Provisioner Amazon EC2 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 29 13
21 Applications Medicine: fmri (Falkon vs. GRAM) Astronomy: Montage (Falkon vs. GRAM vs. MPI) Stacking (Falkon with Data Diffusion vs. Falkon using shared file system) 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 3
22 Applications: fmri GRAM vs. Falkon: 85%~9% lower run time GRAM/Clustering vs. Falkon: 4%~74% lower run time GRAM GRAM/Clustering Falkon Time (s) /22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 31 Input Data Size (Volumes)
23 Applications: Montage GRAM/Clustering vs. Falkon: 57% lower application run time MPI* vs. Falkon: 4% higher application run time * MPI should be lower bound GRAM/Clustering MPI Falkon Time (s) mproject mdiff/fit mbackground madd(sub) 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 32 Components madd total
24 Applications: Stacking Object Count per File Files 32M+ objects 1.5M+ files 3.3 TB compressed / 9TB uncompressed 1E+6 1E+6 1E+6 SDSS DR5 Dataset 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 33
25 Stacking Workloads Uniform sample from 32M objects ZIPF distribution of object popularity alpha =.1, 1., 2. Stacking Size (# of objects) Object Distribution Working Set Size (# of files) Working Set Size (# of objects) Locality (accesses per file) 1 ZIPF - alpha ZIPF - alpha ZIPF - alpha UNIFORM /22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 34
26 Stacking Performance Evaluation Time (sec) ZIPF - alpha.1 ZIPF - alpha 1. ZIPF - alpha 2. UNIFORM Web Server Shared File System (WAN) Shared File System (LAN) Data Diffusion Original Dataset Location 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 35
27 Object Size Effect 35 Shared File System (LAN) Data Diffusion 3 25 Time (sec) x1 1x1 5x5 Object Size 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 36
28 Data Diffusion Effect Shared File System (LAN - ZIPF 2.) Shared File System (LAN - ZIPF 1.) Shared File System (LAN - ZIPF.1) Shared File System (LAN - UNIFORM) Data Diffusion (ZIPF 2.) Data Diffusion (ZIPF 1.) 1st Time 2nd Time Data Diffusion (ZIPF.1) Data Diffusion (UNIFORM) Time (sec) 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 37
29 1K*2 Stacking Workload 1 9 1K Objects - UNIFORM 1st Time: Instantenous Throughput 1K Objects - UNIFORM 1st Time: Average Throughput 1K Objects - UNIFORM 2nd Time: Instantenous Throughput 1K Objects - UNIFORM 2nd Time: Average Throughput Throughput (Cutouts/sec) Time (sec) 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 38
30 1K Stacking Workload 25 1K Objects - UNIFORM: Instantenous Throughput 1K Objects - UNIFORM: Average Throughput Throughput (Cutouts/sec) Time (sec) 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 39
31 Future Work Explore data pre-fetching policies Update Swift to use data diffusion 3-Tier Architecture (essential on BG/P) Extend provisioner to support the Virtual Workspace Service (opens door to EC2) Globus Incubator Project Alternate languages and technologies 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 4
32 More Information Web: Related Projects: Swift: AstroPortal: Data Diffusion Collaborators: Ioan Raicu, Computer Science Dept., The University of Chicago Yong Zhao, Microsoft Ian Foster, Math and Computer Science Div., Argonne National Laboratory & Computer Science Dept., The University of Chicago Alex Szalay, Department of Physics and Astronomy, The Johns Hopkins University Other Falkon Collaborators: Catalin Dumitrescu, Computer Science Dept., The University of Chicago Mike Wilde, Computation Institute, University of Chicago & Argonne National Laboratory Funding: NASA: Ames Research Center, Graduate Student Research Program (GSRP) DOE: Mathematical, Information, and Computational Sciences Division subprogram of the Office of Advanced Scientific Computing Research, Office of Science, U.S. Dept. of Energy NSF: TeraGrid 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 44
33 More Information: Related Documents Ioan Raicu, Yong Zhao, Catalin Dumitrescu, Ian Foster, Mike Wilde. Falkon: a Fast and Light-weight task execution framework, to appear at IEEE/ACM SuperComputing 27. Ioan Raicu, Catalin Dumitrescu, Ian Foster. Dynamic Resource Provisioning in Grid Environments, to appear TeraGrid Conference 27. Yong Zhao, Mihael Hategan, Ben Clifford, Ian Foster, Gregor von Laszewski, Ioan Raicu, Tiberiu Stef-Praun, Mike Wilde. Swift: Fast, Reliable, Loosely Coupled Parallel Computation, to appear at IEEE Workshop on Scientific Workflows 27. Yong Zhao, Mihael Hategan, Ioan Raicu, Mike Wilde, Ian Foster. Swift: a Parallel Programming Tool for Large Scale Scientific Computations, under review at Scientific Programming Journal, Special Issue on Dynamic Computational Workflows: Discovery, Optimization, and Scheduling. Ioan Raicu, Ian Foster, Alex Szalay. Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets, poster presentation, IEEE/ACM SuperComputing 26. Ioan Raicu, Ian Foster, Alex Szalay, Gabriela Turcu. AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis, TeraGrid Conference 26, June 26. Alex Szalay, Julian Bunn, Jim Gray, Ian Foster, Ioan Raicu. The Importance of Data Locality in Distributed Computing Applications, NSF Workflow Workshop 26. 1/22/27 A Data Diffusion Approach to Large Scale Scientific Exploration 45
Managing and Executing Loosely-Coupled Large-Scale Applications on Clusters, Grids, and Supercomputers
Managing and Executing Loosely-Coupled Large-Scale Applications on Clusters, Grids, and Supercomputers Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Collaborators:
More informationIoan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago
Running 1 Million Jobs in 10 Minutes via the Falkon Fast and Light-weight Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago In Collaboration with: Ian Foster,
More informationStorage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore
Storage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago DSL Seminar November st, 006 Analysis
More informationIoan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago
Falkon, a Fast and Light-weight task execution framework for Clusters, Grids, and Supercomputers Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago In Collaboration
More informationFalkon: a Fast and Light-weight task execution framework
: a Fast and Light-weight task execution framework Ioan Raicu *, Yong Zhao *, Catalin Dumitrescu *, Ian Foster #*+, Mike Wilde #+ * Department of Computer Science, University of Chicago, IL, USA + Computation
More informationFalkon: a Fast and Light-weight task execution framework
Falkon: a Fast and Light-weight task execution framework Ioan Raicu *, Yong Zhao *, Catalin Dumitrescu *, Ian Foster #*+, Mike Wilde #+ {iraicu,yongzh,catalind}@cs.uchicago.edu, {foster,wilde}@mcs.anl.gov
More informationNFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC
Segregated storage and compute NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Co-located storage and compute HDFS, GFS Data
More informationSynonymous with supercomputing Tightly-coupled applications Implemented using Message Passing Interface (MPI) Large of amounts of computing for short
Synonymous with supercomputing Tightly-coupled applications Implemented using Message Passing Interface (MPI) Large of amounts of computing for short periods of time Usually requires low latency interconnects
More informationTypically applied in clusters and grids Loosely-coupled applications with sequential jobs Large amounts of computing for long periods of times
Typically applied in clusters and grids Loosely-coupled applications with sequential jobs Large amounts of computing for long periods of times Measured in operations per month or years 2 Bridge the gap
More informationHarnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets
Page 1 of 5 1 Year 1 Proposal Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets Year 1 Progress Report & Year 2 Proposal In order to setup the context for this progress
More informationAccelerating Large Scale Scientific Exploration through Data Diffusion
Accelerating Large Scale Scientific Exploration through Data Diffusion Ioan Raicu *, Yong Zhao *, Ian Foster #*+, Alex Szalay - {iraicu,yongzh }@cs.uchicago.edu, foster@mcs.anl.gov, szalay@jhu.edu * Department
More informationIoan Raicu. Distributed Systems Laboratory Computer Science Department University of Chicago
The Quest for Scalable Support of Data Intensive Applications in Distributed Systems Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago In Collaboration with: Ian
More informationNFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC
Segregated storage and compute NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Co-located storage and compute HDFS, GFS Data
More informationStorage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore Ioan Raicu, Ian Foster
Storage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore Ioan Raicu, Ian Foster. Overview Both the industry and academia have an increase demand for good policies and mechanisms to
More informationExtreme-scale scripting: Opportunities for large taskparallel applications on petascale computers
Extreme-scale scripting: Opportunities for large taskparallel applications on petascale computers Michael Wilde, Ioan Raicu, Allan Espinosa, Zhao Zhang, Ben Clifford, Mihael Hategan, Kamil Iskra, Pete
More informationOverview Past Work Future Work. Motivation Proposal. Work-in-Progress
Overview Past Work Future Work Motivation Proposal Work-in-Progress 2 HPC: High-Performance Computing Synonymous with supercomputing Tightly-coupled applications Implemented using Message Passing Interface
More informationCase Studies in Storage Access by Loosely Coupled Petascale Applications
Case Studies in Storage Access by Loosely Coupled Petascale Applications Justin M Wozniak and Michael Wilde Petascale Data Storage Workshop at SC 09 Portland, Oregon November 15, 2009 Outline Scripted
More informationIoan Raicu. Everyone else. More information at: Background? What do you want to get out of this course?
Ioan Raicu More information at: http://www.cs.iit.edu/~iraicu/ Everyone else Background? What do you want to get out of this course? 2 Data Intensive Computing is critical to advancing modern science Applies
More informationIntroduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work
Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work Today (2014):
More informationKeywords: many-task computing; MTC; high-throughput computing; resource management; Falkon; Swift
Editorial Manager(tm) for Cluster Computing Manuscript Draft Manuscript Number: Title: Middleware Support for Many-Task Computing Article Type: HPDC Special Issue Section/Category: Keywords: many-task
More informationMOHA: Many-Task Computing Framework on Hadoop
Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction
More informationDiPerF: automated DIstributed PERformance testing Framework
DiPerF: automated DIstributed PERformance testing Framework Catalin Dumitrescu, Ioan Raicu, Matei Ripeanu, Ian Foster Distributed Systems Laboratory Computer Science Department University of Chicago Introduction
More informationData Diffusion: Dynamic Resource Provision and Data-Aware Scheduling for Data-Intensive Applications
Data Diffusion: Dynamic Resource Provision and Data-Aware Scheduling for Data-Intensive Applications Ioan Raicu, Yong Zhao 2, Ian Foster,3,4, Alex Szalay 5 iraicu@cs.uchicago.edu, yozha@microsoft.com,
More informationGoing Places with Distributed Computing
Going Places with Distributed Computing ESC Grid Workshop 2007 Kate Keahey University of Chicago Argonne National Laboratory keahey@mcs.anl.gov There is more to Grid applications than running simple batch
More informationWorkflow languages and systems
Swift is a system for the rapid and reliable specification, execution, and management of large-scale science and engineering workflows. It supports applications that execute many tasks coupled by disk-resident
More informationDisk Cache-Aware Task Scheduling
Disk ache-aware Task Scheduling For Data-Intensive and Many-Task Workflow Masahiro Tanaka and Osamu Tatebe University of Tsukuba, JST/REST Japan Science and Technology Agency IEEE luster 2014 2014-09-24
More informationArguably one of the most fundamental discipline that touches all other disciplines and people
The scientific and mathematical approach in information technology and computing Started in the 1960s from Mathematics or Electrical Engineering Today: Arguably one of the most fundamental discipline that
More informationThe Fusion Distributed File System
Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique
More informationProblems for Resource Brokering in Large and Dynamic Grid Environments
Problems for Resource Brokering in Large and Dynamic Grid Environments Cătălin L. Dumitrescu Computer Science Department The University of Chicago cldumitr@cs.uchicago.edu (currently at TU Delft) Kindly
More informationExtreme scripting and other adventures in data-intensive computing
Extreme scripting and other adventures in data-intensive computing Ian Foster Allan Espinosa, Ioan Raicu, Mike Wilde, Zhao Zhang Computation Institute Argonne National Lab & University of Chicago How data
More informationTowards a provenance-aware distributed filesystem
Towards a provenance-aware distributed filesystem Chen Shou, Dongfang Zhao, Tanu Malik, Ioan Raicu {cshou, dzhao8}@hawk.iit.edu, tanum@ci.uchicago.edu, iraicu@cs.iit.edu Department of Computer Science,
More informationRevealing Applications Access Pattern in Collective I/O for Cache Management
Revealing Applications Access Pattern in for Yin Lu 1, Yong Chen 1, Rob Latham 2 and Yu Zhuang 1 Presented by Philip Roth 3 1 Department of Computer Science Texas Tech University 2 Mathematics and Computer
More informationHigh Performance Computing Course Notes Grid Computing I
High Performance Computing Course Notes 2008-2009 2009 Grid Computing I Resource Demands Even as computer power, data storage, and communication continue to improve exponentially, resource capacities are
More informationDiPerF: automated DIstributed PERformance testing Framework
DiPerF: automated DIstributed PERformance testing Framework Ioan Raicu, Catalin Dumitrescu, Matei Ripeanu, Ian Foster Distributed Systems Laboratory Computer Science Department University of Chicago Introduction
More informationTowards Data Intensive Many-Task Computing
Towards Data Intensive Many-Task Computing Ioan Raicu 1, Ian Foster 1,2,3, Yong Zhao 4, Alex Szalay 5 Philip Little 6, Christopher M. Moretti 6, Amitabh Chaudhary 6, Douglas Thain 6 iraicu@cs.uchicago.edu,
More informationData Management in Parallel Scripting
Data Management in Parallel Scripting Zhao Zhang 11/11/2012 Problem Statement Definition: MTC applications are those applications in which existing sequential or parallel programs are linked by files output
More informationThe Quest for Scalable Support of Data-Intensive Workloads in Distributed Systems
The Quest for Scalable Support of Data-Intensive Workloads in Distributed Systems Ioan Raicu, 1 Ian T. Foster, 1,2,3 Yong Zhao 4 Philip Little, 5 Christopher M. Moretti, 5 Amitabh Chaudhary, 5 Douglas
More informationINTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)
More informationGrid Scheduling Architectures with Globus
Grid Scheduling Architectures with Workshop on Scheduling WS 07 Cetraro, Italy July 28, 2007 Ignacio Martin Llorente Distributed Systems Architecture Group Universidad Complutense de Madrid 1/38 Contents
More informationIan Foster, An Overview of Distributed Systems
The advent of computation can be compared, in terms of the breadth and depth of its impact on research and scholarship, to the invention of writing and the development of modern mathematics. Ian Foster,
More informationEFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD
EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD S.THIRUNAVUKKARASU 1, DR.K.P.KALIYAMURTHIE 2 Assistant Professor, Dept of IT, Bharath University, Chennai-73 1 Professor& Head, Dept of IT, Bharath
More informationClouds: An Opportunity for Scientific Applications?
Clouds: An Opportunity for Scientific Applications? Ewa Deelman USC Information Sciences Institute Acknowledgements Yang-Suk Ki (former PostDoc, USC) Gurmeet Singh (former Ph.D. student, USC) Gideon Juve
More informationIntroduction to FREE National Resources for Scientific Computing. Dana Brunson. Jeff Pummill
Introduction to FREE National Resources for Scientific Computing Dana Brunson Oklahoma State University High Performance Computing Center Jeff Pummill University of Arkansas High Peformance Computing Center
More informationIME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning
IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning September 22 nd 2015 Tommaso Cecchi 2 What is IME? This breakthrough, software defined storage application
More informationBy Ian Foster. Zhifeng Yun
By Ian Foster Zhifeng Yun Outline Introduction Globus Architecture Globus Software Details Dev.Globus Community Summary Future Readings Introduction Globus Toolkit v4 is the work of many Globus Alliance
More informationPERFORMANCE ANALYSIS AND OPTIMIZATION OF MULTI-CLOUD COMPUITNG FOR LOOSLY COUPLED MTC APPLICATIONS
PERFORMANCE ANALYSIS AND OPTIMIZATION OF MULTI-CLOUD COMPUITNG FOR LOOSLY COUPLED MTC APPLICATIONS V. Prasathkumar, P. Jeevitha Assiatant Professor, Department of Information Technology Sri Shakthi Institute
More informationMagellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009
Magellan Project Jeff Broughton NERSC Systems Department Head October 7, 2009 1 Magellan Background National Energy Research Scientific Computing Center (NERSC) Argonne Leadership Computing Facility (ALCF)
More informationDistributed NoSQL Storage for Extreme-scale System Services
Distributed NoSQL Storage for Extreme-scale System Services Short Bio 6 th year PhD student from DataSys Lab, CS Department, Illinois Institute oftechnology,chicago Academic advisor: Dr. Ioan Raicu Research
More informationSDSS Dataset and SkyServer Workloads
SDSS Dataset and SkyServer Workloads Overview Understanding the SDSS dataset composition and typical usage patterns is important for identifying strategies to optimize the performance of the AstroPortal
More informationAssistant Professor at Illinois Institute of Technology (CS) Director of the Data-Intensive Distributed Systems Laboratory (DataSys)
Current position: Assistant Professor at Illinois Institute of Technology (CS) Director of the Data-Intensive Distributed Systems Laboratory (DataSys) Guest Research Faculty, Argonne National Laboratory
More informationExperiences in Optimizing a $250K Cluster for High- Performance Computing Applications
Experiences in Optimizing a $250K Cluster for High- Performance Computing Applications Kevin Brandstatter Dan Gordon Jason DiBabbo Ben Walters Alex Ballmer Lauren Ribordy Ioan Raicu Illinois Institute
More informationAssistant Professor at Illinois Institute of Technology (CS) Director of the Data-Intensive Distributed Systems Laboratory (DataSys)
Current position: Assistant Professor at Illinois Institute of Technology (CS) Director of the Data-Intensive Distributed Systems Laboratory (DataSys) Guest Research Faculty, Argonne National Laboratory
More informationDynamic Resource Allocation Using Nephele Framework in Cloud
Dynamic Resource Allocation Using Nephele Framework in Cloud Anand Prabu P 1, Dhanasekar P 2, Sairamprabhu S G 3 Department of Computer Science and Engineering, RVS College of Engineering and Technology,
More informationGateways to Discovery: Cyberinfrastructure for the Long Tail of Science
Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science ECSS Symposium, 12/16/14 M. L. Norman, R. L. Moore, D. Baxter, G. Fox (Indiana U), A Majumdar, P Papadopoulos, W Pfeiffer, R. S.
More informationIntroduction to Grid Computing
Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able
More informationStorage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan
Storage Virtualization Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan Storage Virtualization In computer science, storage virtualization uses virtualization to enable better functionality
More informationAccelerated solution of a moral hazard problem with Swift
Accelerated solution of a moral hazard problem with Swift Tiberiu Stef-Praun 1, Gabriel A. Madeira 2, Ian Foster 1, Robert Townsend 2 1 Computation Institute, University of Chicago, USA 2 Economics Department,
More informationLeveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands
Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands Unleash Your Data Center s Hidden Power September 16, 2014 Molly Rector CMO, EVP Product Management & WW Marketing
More informationEnergy Prediction for I/O Intensive Workflow Applications
Energy Prediction for I/O Intensive Workflow Applications ABSTRACT As workflow-based data-intensive applications have become increasingly popular, the lack of support tools to aid resource provisioning
More informationIntroduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work
Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work Today (2014):
More informationToward Scalable Monitoring on Large-Scale Storage for Software Defined Cyberinfrastructure
Toward Scalable Monitoring on Large-Scale Storage for Software Defined Cyberinfrastructure Arnab K. Paul, Ryan Chard, Kyle Chard, Steven Tuecke, Ali R. Butt, Ian Foster Virginia Tech, Argonne National
More informationGrid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007
Grid Programming: Concepts and Challenges Michael Rokitka SUNY@Buffalo CSE510B 10/2007 Issues Due to Heterogeneous Hardware level Environment Different architectures, chipsets, execution speeds Software
More informationChapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.
Chapter 4:- Introduction to Grid and its Evolution Prepared By:- Assistant Professor SVBIT. Overview Background: What is the Grid? Related technologies Grid applications Communities Grid Tools Case Studies
More informationMATE-EC2: A Middleware for Processing Data with Amazon Web Services
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering
More informationCorral: A Glide-in Based Service for Resource Provisioning
: A Glide-in Based Service for Resource Provisioning Gideon Juve USC Information Sciences Institute juve@usc.edu Outline Throughput Applications Grid Computing Multi-level scheduling and Glideins Example:
More informationScientific data processing at global scale The LHC Computing Grid. fabio hernandez
Scientific data processing at global scale The LHC Computing Grid Chengdu (China), July 5th 2011 Who I am 2 Computing science background Working in the field of computing for high-energy physics since
More informationTechnology Insight Series
IBM ProtecTIER Deduplication for z/os John Webster March 04, 2010 Technology Insight Series Evaluator Group Copyright 2010 Evaluator Group, Inc. All rights reserved. Announcement Summary The many data
More informationIntroduction to GT3. Introduction to GT3. What is a Grid? A Story of Evolution. The Globus Project
Introduction to GT3 The Globus Project Argonne National Laboratory USC Information Sciences Institute Copyright (C) 2003 University of Chicago and The University of Southern California. All Rights Reserved.
More informationA GridFTP Transport Driver for Globus XIO
A GridFTP Transport Driver for Globus XIO Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Joseph Link 5, and John Bresnahan 1,2,3 1 Mathematics and Computer Science Division, Argonne National Laboratory, Argonne,
More informationDesign and Evaluation of a Collective IO Model for Loosely Coupled Petascale Programming
Design and Evaluation of a Collective IO Model for Loosely Coupled Petascale Programming Zhao Zhang +, Allan Espinosa *, Kamil Iskra #, Ioan Raicu *, Ian Foster #*+, Michael Wilde #+ + Computation Institute,
More informationParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing
ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing Prof. Wu FENG Department of Computer Science Virginia Tech Work smarter not harder Overview Grand Challenge A large-scale biological
More informationALICE Grid Activities in US
ALICE Grid Activities in US 1 ALICE-USA Computing Project ALICE-USA Collaboration formed to focus on the ALICE EMCal project Construction, installation, testing and integration participating institutions
More informationIBM Spectrum Scale IO performance
IBM Spectrum Scale 5.0.0 IO performance Silverton Consulting, Inc. StorInt Briefing 2 Introduction High-performance computing (HPC) and scientific computing are in a constant state of transition. Artificial
More informationTERAGRID 2007 CONFERENCE, MADISON, WI 1. GridFTP Pipelining
TERAGRID 2007 CONFERENCE, MADISON, WI 1 GridFTP Pipelining John Bresnahan, 1,2,3 Michael Link, 1,2 Rajkumar Kettimuthu, 1,2 Dan Fraser, 1,2 Ian Foster 1,2,3 1 Mathematics and Computer Science Division
More informationGlobus Toolkit 4 Execution Management. Alexandra Jimborean International School of Informatics Hagenberg, 2009
Globus Toolkit 4 Execution Management Alexandra Jimborean International School of Informatics Hagenberg, 2009 2 Agenda of the day Introduction to Globus Toolkit and GRAM Zoom In WS GRAM Usage Guide Architecture
More informationXSEDE s Campus Bridging Project Jim Ferguson National Institute for Computational Sciences
January 3, 2016 XSEDE s Campus Bridging Project Jim Ferguson National Institute for Computational Sciences jwf@utk.edu What is XSEDE? extreme Science and Engineering Discovery Environment $121M project
More informationStork: State of the Art
Stork: State of the Art Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu http://www.cs.wisc.edu/condor/stork 1 The Imminent Data deluge Exponential growth of
More informationProfessor: Ioan Raicu. TA: Wei Tang. Everyone else
Professor: Ioan Raicu http://www.cs.iit.edu/~iraicu/ http://datasys.cs.iit.edu/ TA: Wei Tang http://mypages.iit.edu/~wtang6/ Everyone else Background? What do you want to get out of this course? 2 General
More informationThe Swi( parallel scrip/ng language for Science Clouds and other parallel resources
The Swi( parallel scrip/ng language for Science Clouds and other parallel resources Michael Wilde Computa1on Ins1tute, University of Chicago and Argonne Na1onal Laboratory wilde@mcs.anl.gov Revised 2012.0229
More informationAdvances of parallel computing. Kirill Bogachev May 2016
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
More informationData publication and discovery with Globus
Data publication and discovery with Globus Questions and comments to outreach@globus.org The Globus data publication and discovery services make it easy for institutions and projects to establish collections,
More informationToday (2010): Multicore Computing 80. Near future (~2018): Manycore Computing Number of Cores Processing
Number of Cores Manufacturing Process 300 250 200 150 100 50 0 2004 2006 2008 2010 2012 2014 2016 2018 100 Today (2010): Multicore Computing 80 1~12 cores commodity architectures 70 60 80 cores proprietary
More informationIsilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team
Isilon: Raising The Bar On Performance & Archive Use Cases John Har Solutions Product Manager Unstructured Data Storage Team What we ll cover in this session Isilon Overview Streaming workflows High ops/s
More informationTuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov
Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright njwright @ lbl.gov NERSC- National Energy Research Scientific Computing Center Mission: Accelerate the pace of scientific discovery
More informationTask-based distributed processing for radio-interferometric imaging with CASA
H2020-Astronomy ESFRI and Research Infrastructure Cluster (Grant Agreement number: 653477). Task-based distributed processing for radio-interferometric imaging with CASA BOJAN NIKOLIC 2 nd ASTERICS-OBELICS
More informationUsage of LDAP in Globus
Usage of LDAP in Globus Gregor von Laszewski and Ian Foster Mathematics and Computer Science Division Argonne National Laboratory, Argonne, IL 60439 gregor@mcs.anl.gov Abstract: This short note describes
More informationA Survey Paper on Grid Information Systems
B 534 DISTRIBUTED SYSTEMS A Survey Paper on Grid Information Systems Anand Hegde 800 North Smith Road Bloomington Indiana 47408 aghegde@indiana.edu ABSTRACT Grid computing combines computers from various
More informationData Movement & Storage Using the Data Capacitor Filesystem
Data Movement & Storage Using the Data Capacitor Filesystem Justin Miller jupmille@indiana.edu http://pti.iu.edu/dc Big Data for Science Workshop July 2010 Challenges for DISC Keynote by Alex Szalay identified
More informationGPFS for Life Sciences at NERSC
GPFS for Life Sciences at NERSC A NERSC & JGI collaborative effort Jason Hick, Rei Lee, Ravi Cheema, and Kjiersten Fagnan GPFS User Group meeting May 20, 2015-1 - Overview of Bioinformatics - 2 - A High-level
More informationDatabases in the Cloud
Databases in the Cloud Ani Thakar Alex Szalay Nolan Li Center for Astrophysical Sciences and Institute for Data Intensive Engineering and Science (IDIES) The Johns Hopkins University Cloudy with a chance
More informationManaging large-scale workflows with Pegasus
Funded by the National Science Foundation under the OCI SDCI program, grant #0722019 Managing large-scale workflows with Pegasus Karan Vahi ( vahi@isi.edu) Collaborative Computing Group USC Information
More informationpre-ws MDS and WS-MDS Performance Study Ioan Raicu 01/05/06 Page 1 of 17 pre-ws MDS and WS-MDS Performance Study Ioan Raicu 01/05/06
Ioan Raicu 0/05/06 Page of 7 pre-ws MDS and WS-MDS Performance Study Ioan Raicu 0/05/06 Table of Contents Table of Contents... Table of Figures... Abstract... 3 Contributing People... 3 2 Testbeds and
More informationPegasus Workflow Management System. Gideon Juve. USC Informa3on Sciences Ins3tute
Pegasus Workflow Management System Gideon Juve USC Informa3on Sciences Ins3tute Scientific Workflows Orchestrate complex, multi-stage scientific computations Often expressed as directed acyclic graphs
More informationExecuting dynamic heterogeneous workloads on Blue Waters with RADICAL-Pilot
Executing dynamic heterogeneous workloads on Blue Waters with RADICAL-Pilot Research in Advanced DIstributed Cyberinfrastructure & Applications Laboratory (RADICAL) Rutgers University http://radical.rutgers.edu
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationProblems for Resource Brokering in Large and Dynamic Grid Environments*
Problems for Resource Brokering in Large and Dynamic Grid Environments* Catalin L. Dumitrescu CoreGRID Institute on Resource Management and Scheduling Electrical Eng., Math. and Computer Science, Delft
More informationA Performance Evaluation of WS-MDS in the Globus Toolkit
A Performance Evaluation of WS-MDS in the Globus Toolkit Ioan Raicu * Catalin Dumitrescu * Ian Foster +* * Computer Science Department The University of Chicago {iraicu,cldumitr}@cs.uchicago.edu Abstract
More informationOutline. March 5, 2012 CIRMMT - McGill University 2
Outline CLUMEQ, Calcul Quebec and Compute Canada Research Support Objectives and Focal Points CLUMEQ Site at McGill ETS Key Specifications and Status CLUMEQ HPC Support Staff at McGill Getting Started
More informationTransparent Throughput Elas0city for IaaS Cloud Storage Using Guest- Side Block- Level Caching
Transparent Throughput Elas0city for IaaS Cloud Storage Using Guest- Side Block- Level Caching Bogdan Nicolae (IBM Research, Ireland) Pierre Riteau (University of Chicago, USA) Kate Keahey (Argonne National
More informationOptimizing Load Balancing and Data-Locality with Data-aware Scheduling
Optimizing Load Balancing and Data-Locality with Data-aware Scheduling Ke Wang *, Xiaobing Zhou, Tonglin Li *, Dongfang Zhao *, Michael Lang, Ioan Raicu * * Illinois Institute of Technology, Hortonworks
More information