Workflow System for Data-intensive Many-task Computing
|
|
- Lynn Hicks
- 6 years ago
- Views:
Transcription
1 Workflow System for Data-intensive Many-task Computing Masahiro Tanaka and Osamu Tatebe University of Tsukuba, JST/CREST Japan Science and Technology Agency ISP2S2 Dec 4,
2 Outline Background Pwrake Workflow System IO-aware Task Scheduling Locality-aware Task Scheduling using Multi-Constraint Graph Partitioning (MCGP) Disk cache-aware Tasks Scheduling Conclusion ISP2S2 Dec 4,
3 Background: Data-intensive Science Many Scientific Fields Astronomy, Bioinformatics, Earth Science, Particle Physics, Data I/O > Computation Interaction through File System Handles huge amount of data HSC for Subaru Telescope generates ~300GB/night. Requires Parallel processing on Distributed Computer Systems ISP2S2 Dec 4,
4 Example of scientific workflow: Montage (Astronomy image processing) Workflow DAG Input images mprojectpp mdiff+mfitplane mbgmodel mbackground madd mshrink madd mjpeg Output image Process Task output file input ISP2S2 Dec 4,
5 Pwrake Workflow System ISP2S2 Dec 4,
6 Pwrake Workflow System Parallel Workflow extension to Rake Target: data-intensive and many-task scientific workflow Pwrake is based on: Rake : Ruby version of UNIX Make Workflow definition language for Many-Task Scientific Workflows Gfarm : Distributed File System Scalable I/O performance Use local storage of compute nodes ISP2S2 Dec 4,
7 Workflow Definition Language Task definition format e.g. DAX Need script to define many tasks. Design a new language e.g. Swift (Wilde et al. 2011) Learning cost, Niche community. Use an existing language e.g. GXP Make (Taura et al. 2013) Extension rule is not enough for scientific workflows. ISP2S2 Dec 4,
8 Our solution: Rake Ruby Make Build tool written in Ruby Widely-used tool in the Ruby community Rake is an internal DSL Ruby is an host language reduces learning cost able to use Ruby language features ISP2S2 Dec 4,
9 Useful features of Rake For-Loop BASENAMES = Array of basenames for i in BASENAMES file "out/#{i}.fits" => "src/#{i}.fits" do t sh "mprojectpp #{t.prerequisites[0]} #{t.name} region.hdr" end end Complex Rule using script FILEMAP = Mapping input files to output files rule /^d /.*.fits$/ => proc{ x FILEMAP[x]} do t p1,p2 = t.prerequisites sh "mdiff #{p1} #{p2} #{t.name} region.hdr" end ISP2S2 Dec 4,
10 Design of Pwrake Inherit Rake Workflow (task) definition language File-based task dependency Resume/Restart Implementation, e.g., Task class, Application module Implement Pwrake extension remote process execution parallel task execution task queue (includes scheduling) find file location using Gfarm API ISP2S2 Dec 4,
11 Pwrake Archetecture Master node Worker nodes Pwrake process Task Graph worker thread Task process process files files Queue worker thread SSH Gfarm worker thread enq deq worker thread process process files files ISP2S2 Dec 4,
12 Design of Task Queue Task NodeQueue TaskQueue enq Node 1 Node 2 deq worker thread Node 3 Other nodes ISP2S2 Dec 4,
13 MB/s I/O-aware Task Scheduling Issues: Read performance of File Locality Disk cache Gfarm file (HDD, GbE) (buffer/page cache) disk process cache Local file cache file cache Remote 0 Remote Local file disk Gfarm file disk ISP2S2 Dec 4,
14 Locality-aware Scheduling based on MCGP (Multi-Constraint Graph Partitioning) (CCGrid 2012) ISP2S2 Dec 4,
15 Locality-aware Scheduling Methods 1. Naïve locality scheduling Assign a task to a node where its input file is stored. 2. Method using MCGP (Multi-Constraint Graph Partitioning) Our proposal (CCGrid 2012) (Idle workers steal tasks) ISP2S2 Dec 4,
16 Graph Partitioning Task Scheduling Is Graph Partitioning also applicable to Workflow DAG? Vertex Computation Edge Communication Minimize: Edge-cut Data movement ISP2S2 Dec 4,
17 Graph Partitioning on DAG Standard Graph Partitioning Ideal Partitioning for Scheduling Node-A Node-B Node-C Node-D Former Tasks Latter Tasks Standard GP is not aware of parallelizable tasks ISP2S2 Dec 4,
18 Multi-Constraint Graph Partitioning (MCGP) Vertex Weight Vectors w 1 VA w 4 (1,3,1) VB w i w 5 i V A w i j 1 w1 j V (1,2,4) (2,2,2) i j 2 nd dim: w w 6 i i i w1, w2, w3 1 st dim: i V V A A B 2 2 j V B V B 6 w 2 w 3 (2,3,1) (3,1,1) w 6 (3,1,3) 3 rd dim: i V A w i j 3 w3 j V B 6 Balance the sum of Vertex Weights at each dimension ISP2S2 Dec 4,
19 Proposed method: MCGP (Multi-Constraint Graph Partitioning) w 1 w 1 w 1 w 1 w 2 w 2 w 2 w 3 w 4 w 4 w 4 w 4 w 5 w 5 w 6 w 6 w 7 w 1 = (1,0,0,0,0) w 2 = (0,1,0,0,0) w 3 = (0,0,0,0,0) w 4 = (0,0,1,0,0) w 5 = (0,0,0,1,0) w 6 = (0,0,0,0,1) w 7 = (0,0,0,0,0) N task N group : set 1 at i th dim set 0 at others N task < N group : set 0 to all dims ISP2S2 Dec 4,
20 Result of MCGP Node-A Node-B w 1 = (1,0,0,0,0) w 2 = (0,1,0,0,0) w 3 = (0,0,0,0,0) w 4 = (0,0,1,0,0) Uniform Distribution in each phase Minimize Edge-cut applied to Entire Graph w 5 = (0,0,0,1,0) w 6 = (0,0,0,0,1) w 7 = (0,0,0,0,0) ISP2S2 Dec 4,
21 Image Position and Task Nodes (Initially, input files are stored in one node) Naïve locality MCGP ISP2S2 Dec 4,
22 Data Movement between nodes Data Size Ratio (%) A(Unconcern) B(Naïve locality) C(MCGP) ISP2S2 Dec 4,
23 Workflow Execution Time Elapsed Time (sec) A(Unconcern) B(Naïve locality) C(MCGP) 31% down 22% down Includes time to solve MCGP (30 ms) ISP2S2 Dec 4,
24 Disk cache-aware Task Scheduling (Cluster 2014) ISP2S2 Dec 4,
25 Disk Cache-aware Task Scheduling LIFO queue: Intermediate files are read soon. High probability that the file is cached. Trailing task problem (Armstrong et al. MTAGS 2010) Workflow DAG LIFO queue Execution order Last-in First-out A1 A2 A3 A4 A5 A6 A6 A6 B6 B6 A5 B1 B2 B3 B4 B5 B6 A5 A4 B5 A3 C A2 time First-in A1 ISP2S2 Dec 4,
26 Proposed Scheduling methods Workflow DAG A1 A2.. An B1 B2.. Bn C A1 A3 An-2 An B2 Bn-3 Bn-1 A2 A4 An-1 B1 B3 Bn-2 Bn Trailing tasks A1 B1 A3 B3 An-2 Bn-2 An Bn : A2 B2 A4 B4 An-1 Bn-1 idle core A1 B1 A3 B3 An-2 Bn-2 Bn-1 : A2 B2 A4 B4 An-1 An Bn LIFO HRF A1 B1 B2 B3 An-1 Bn-1 Bn-2 HRF: Highest Rank First : A2 A3 A4 A5 An-2 An Bn Bn-3 FIFO (HRF) LIFO LIFO+HRF Rank Equalization+HRF Disk Cache Trailing Task Task Overlap Rank Overlap HRF ISP2S2 Dec 4,
27 Core Utilization FIFO mprojectpp mdiff LIFO Trailing Task Sequential tasks Rank Eq+HRF (different setting) mdiff overlaps mprojectpp LIFO+HRF No Trailing Task ISP2S2 Dec 4,
28 Measurement of Strong Scaling (1-12 nodes, Logarithmic) 1node 8cores x1.9 speedup from FIFO 1/n cores 4nodes 32cores Next Slide ISP2S2 Dec 4,
29 Measurement of Strong Scaling (4-12 nodes, Linear) 4nodes 32cores 12nodes 96cores 12 % speedup from LIFO (trailing task) 1/n cores ISP2S2 Dec 4,
30 Conclusion We developed Pwrake workflow system for data-intensive, many-task workflows. I/O-aware workflow scheduling: Locality-aware scheduling using MCGP remote file access: 88% 14% workflow execution: 31% speedup Disk cache-aware scheduling LIFO: 1.9x speedup HRF: ~12% speedup ISP2S2 Dec 4,
Disk Cache-Aware Task Scheduling
Disk ache-aware Task Scheduling For Data-Intensive and Many-Task Workflow Masahiro Tanaka and Osamu Tatebe University of Tsukuba, JST/REST Japan Science and Technology Agency IEEE luster 2014 2014-09-24
More informationSystem Software for Big Data and Post Petascale Computing
The Japanese Extreme Big Data Workshop February 26, 2014 System Software for Big Data and Post Petascale Computing Osamu Tatebe University of Tsukuba I/O performance requirement for exascale applications
More informationData Management in Parallel Scripting
Data Management in Parallel Scripting Zhao Zhang 11/11/2012 Problem Statement Definition: MTC applications are those applications in which existing sequential or parallel programs are linked by files output
More informationOptimizing Local File Accesses for FUSE-Based Distributed Storage
Optimizing Local File Accesses for FUSE-Based Distributed Storage Shun Ishiguro 1, Jun Murakami 1, Yoshihiro Oyama 1,3, Osamu Tatebe 2,3 1. The University of Electro-Communications, Japan 2. University
More informationNFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC
Segregated storage and compute NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Co-located storage and compute HDFS, GFS Data
More informationHSC Data Processing Environment ~Upgrading plan of open-use computing facility for HSC data
HSC Data Processing Environment ~Upgrading plan of open-use computing facility for HSC data Hisanori Furusawa (NAOJ) 2018.1.17 Subaru Users Meeting 1 Missions of HSC Data Processing in NAOJ Support General
More informationA high-speed data processing technique for time-sequential satellite observation data
A high-speed data processing technique for time-sequential satellite observation data Ken T. Murata 1a), Hidenobu Watanabe 1, Kazunori Yamamoto 1, Eizen Kimura 2, Masahiro Tanaka 3,OsamuTatebe 3, Kentaro
More informationPractice of Software Development: Dynamic scheduler for scientific simulations
Practice of Software Development: Dynamic scheduler for scientific simulations @ SimLab EA Teilchen STEINBUCH CENTRE FOR COMPUTING - SCC KIT Universität des Landes Baden-Württemberg und nationales Forschungszentrum
More informationData Throttling for Data-Intensive Workflows
Data Throttling for Data-Intensive Workflows Sang-Min Park and Marty Humphrey Dept. of Computer Science, University of Virginia This work was supported by the National Science Foundation under Grant No.
More informationA Data Diffusion Approach to Large Scale Scientific Exploration
A Data Diffusion Approach to Large Scale Scientific Exploration Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Joint work with: Yong Zhao: Microsoft Ian Foster:
More informationMOHA: Many-Task Computing Framework on Hadoop
Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction
More informationOvercoming Data Locality: an In-Memory Runtime File System with Symmetrical Data Distribution
Overcoming Data Locality: an In-Memory Runtime File System with Symmetrical Data Distribution Alexandru Uta a,, Andreea Sandu a, Thilo Kielmann a a Dept. of Computer Science, VU University Amsterdam, The
More informationMoore s Law. Computer architect goal Software developer assumption
Moore s Law The number of transistors that can be placed inexpensively on an integrated circuit will double approximately every 18 months. Self-fulfilling prophecy Computer architect goal Software developer
More informationPiccolo. Fast, Distributed Programs with Partitioned Tables. Presenter: Wu, Weiyi Yale University. Saturday, October 15,
Piccolo Fast, Distributed Programs with Partitioned Tables 1 Presenter: Wu, Weiyi Yale University Outline Background Intuition Design Evaluation Future Work 2 Outline Background Intuition Design Evaluation
More informationLecture 4, 04/08/2015. Scribed by Eric Lax, Andreas Santucci, Charles Zheng.
CME 323: Distributed Algorithms and Optimization, Spring 2015 http://stanford.edu/~rezab/dao. Instructor: Reza Zadeh, Databricks and Stanford. Lecture 4, 04/08/2015. Scribed by Eric Lax, Andreas Santucci,
More informationIntroduction to Parallel Computing
Introduction to Parallel Computing Bootcamp for SahasraT 7th September 2018 Aditya Krishna Swamy adityaks@iisc.ac.in SERC, IISc Acknowledgments Akhila, SERC S. Ethier, PPPL P. Messina, ECP LLNL HPC tutorials
More informationChapter 5. The MapReduce Programming Model and Implementation
Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing
More informationScientific Workflows and Cloud Computing. Gideon Juve USC Information Sciences Institute
Scientific Workflows and Cloud Computing Gideon Juve USC Information Sciences Institute gideon@isi.edu Scientific Workflows Loosely-coupled parallel applications Expressed as directed acyclic graphs (DAGs)
More informationChisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique
Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique Prateek Dhawalia Sriram Kailasam D. Janakiram Distributed and Object Systems Lab Dept. of Comp.
More informationLEEN: Locality/Fairness- Aware Key Partitioning for MapReduce in the Cloud
LEEN: Locality/Fairness- Aware Key Partitioning for MapReduce in the Cloud Shadi Ibrahim, Hai Jin, Lu Lu, Song Wu, Bingsheng He*, Qi Li # Huazhong University of Science and Technology *Nanyang Technological
More informationNFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC
Segregated storage and compute NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Co-located storage and compute HDFS, GFS Data
More informationIntroduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work
Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work Today (2014):
More informationPegasus. Automate, recover, and debug scientific computations. Rafael Ferreira da Silva.
Pegasus Automate, recover, and debug scientific computations. Rafael Ferreira da Silva http://pegasus.isi.edu Experiment Timeline Scientific Problem Earth Science, Astronomy, Neuroinformatics, Bioinformatics,
More informationOutline. ASP 2012 Grid School
Distributed Storage Rob Quick Indiana University Slides courtesy of Derek Weitzel University of Nebraska Lincoln Outline Storage Patterns in Grid Applications Storage
More informationDynamic Fine Grain Scheduling of Pipeline Parallelism. Presented by: Ram Manohar Oruganti and Michael TeWinkle
Dynamic Fine Grain Scheduling of Pipeline Parallelism Presented by: Ram Manohar Oruganti and Michael TeWinkle Overview Introduction Motivation Scheduling Approaches GRAMPS scheduling method Evaluation
More informationMATE-EC2: A Middleware for Processing Data with Amazon Web Services
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering
More informationSynonymous with supercomputing Tightly-coupled applications Implemented using Message Passing Interface (MPI) Large of amounts of computing for short
Synonymous with supercomputing Tightly-coupled applications Implemented using Message Passing Interface (MPI) Large of amounts of computing for short periods of time Usually requires low latency interconnects
More informationMost real programs operate somewhere between task and data parallelism. Our solution also lies in this set.
for Windows Azure and HPC Cluster 1. Introduction In parallel computing systems computations are executed simultaneously, wholly or in part. This approach is based on the partitioning of a big task into
More informationScientific Workflow Scheduling with Provenance Data in a Multisite Cloud
Scientific Workflow Scheduling with Provenance Data in a Multisite Cloud Ji Liu, Esther Pacitti, Patrick Valduriez, Marta Mattoso To cite this version: Ji Liu, Esther Pacitti, Patrick Valduriez, Marta
More informationParallel Processing. Majid AlMeshari John W. Conklin. Science Advisory Committee Meeting September 3, 2010 Stanford University
Parallel Processing Majid AlMeshari John W. Conklin 1 Outline Challenge Requirements Resources Approach Status Tools for Processing 2 Challenge A computationally intensive algorithm is applied on a huge
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Piccolo: Building Fast, Distributed Programs
More informationCloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe
Cloud Programming Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Computing Only required amount of CPU and storage can be used anytime from anywhere via network Availability, throughput, reliability
More informationEnlightening the I/O Path: A Holistic Approach for Application Performance
Enlightening the I/O Path: A Holistic Approach for Application Performance Sangwook Kim 13, Hwanju Kim 2, Joonwon Lee 3, and Jinkyu Jeong 3 Apposha 1 Dell EMC 2 Sungkyunkwan University 3 Data-Intensive
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationTypically applied in clusters and grids Loosely-coupled applications with sequential jobs Large amounts of computing for long periods of times
Typically applied in clusters and grids Loosely-coupled applications with sequential jobs Large amounts of computing for long periods of times Measured in operations per month or years 2 Bridge the gap
More informationI/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)
I/O Systems Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) I/O Systems 1393/9/15 1 / 57 Motivation Amir H. Payberah (Tehran
More informationJyotheswar Kuricheti
Jyotheswar Kuricheti 1 Agenda: 1. Performance Tuning Overview 2. Identify Bottlenecks 3. Optimizing at different levels : Target Source Mapping Session System 2 3 Performance Tuning Overview: 4 What is
More informationLecture 13 Input/Output (I/O) Systems (chapter 13)
Bilkent University Department of Computer Engineering CS342 Operating Systems Lecture 13 Input/Output (I/O) Systems (chapter 13) Dr. İbrahim Körpeoğlu http://www.cs.bilkent.edu.tr/~korpe 1 References The
More informationAlgorithm Engineering with PRAM Algorithms
Algorithm Engineering with PRAM Algorithms Bernard M.E. Moret moret@cs.unm.edu Department of Computer Science University of New Mexico Albuquerque, NM 87131 Rome School on Alg. Eng. p.1/29 Measuring and
More informationImplementing Scheduling Algorithms. Real-Time and Embedded Systems (M) Lecture 9
Implementing Scheduling Algorithms Real-Time and Embedded Systems (M) Lecture 9 Lecture Outline Implementing real time systems Key concepts and constraints System architectures: Cyclic executive Microkernel
More informationApache Giraph: Facebook-scale graph processing infrastructure. 3/31/2014 Avery Ching, Facebook GDM
Apache Giraph: Facebook-scale graph processing infrastructure 3/31/2014 Avery Ching, Facebook GDM Motivation Apache Giraph Inspired by Google s Pregel but runs on Hadoop Think like a vertex Maximum value
More informationEnergy Prediction for I/O Intensive Workflow Applications
Energy Prediction for I/O Intensive Workflow Applications ABSTRACT As workflow-based data-intensive applications have become increasingly popular, the lack of support tools to aid resource provisioning
More informationCourse Syllabus. Operating Systems
Course Syllabus. Introduction - History; Views; Concepts; Structure 2. Process Management - Processes; State + Resources; Threads; Unix implementation of Processes 3. Scheduling Paradigms; Unix; Modeling
More informationOracle Database 12c: JMS Sharded Queues
Oracle Database 12c: JMS Sharded Queues For high performance, scalable Advanced Queuing ORACLE WHITE PAPER MARCH 2015 Table of Contents Introduction 2 Architecture 3 PERFORMANCE OF AQ-JMS QUEUES 4 PERFORMANCE
More informationData Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions
Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing
More informationModule 12: I/O Systems
Module 12: I/O Systems I/O hardwared Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Performance 12.1 I/O Hardware Incredible variety of I/O devices Common
More informationmodern database systems lecture 10 : large-scale graph processing
modern database systems lecture 1 : large-scale graph processing Aristides Gionis spring 18 timeline today : homework is due march 6 : homework out april 5, 9-1 : final exam april : homework due graphs
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationI/O Systems. 04/16/2007 CSCI 315 Operating Systems Design 1
I/O Systems Notice: The slides for this lecture have been largely based on those accompanying the textbook Operating Systems Concepts with Java, by Silberschatz, Galvin, and Gagne (2007). Many, if not
More informationMigrating Scientific Workflows to the Cloud
Migrating Scientific Workflows to the Cloud Through Graph-partitioning, Scheduling and Peer-to-Peer Data Sharing Satish Narayana Srirama, Jaagup Viil Mobile & Cloud Lab, Institute of Computer Science University
More informationEnosis: Bridging the Semantic Gap between
Enosis: Bridging the Semantic Gap between File-based and Object-based Data Models Anthony Kougkas - akougkas@hawk.iit.edu, Hariharan Devarajan, Xian-He Sun Outline Introduction Background Approach Evaluation
More informationScalable Algorithmic Techniques Decompositions & Mapping. Alexandre David
Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05 adavid@cs.aau.dk Introduction Focus on data parallelism, scale with size. Task parallelism limited. Notion of scalability
More informationParallel Programming Patterns. Overview and Concepts
Parallel Programming Patterns Overview and Concepts Outline Practical Why parallel programming? Decomposition Geometric decomposition Task farm Pipeline Loop parallelism Performance metrics and scaling
More informationChapter 7. Multicores, Multiprocessors, and Clusters. Goal: connecting multiple computers to get higher performance
Chapter 7 Multicores, Multiprocessors, and Clusters Introduction Goal: connecting multiple computers to get higher performance Multiprocessors Scalability, availability, power efficiency Job-level (process-level)
More informationHigh Performance Computing and Data Mining
High Performance Computing and Data Mining Performance Issues in Data Mining Peter Christen Peter.Christen@anu.edu.au Data Mining Group Department of Computer Science, FEIT Australian National University,
More informationShared Memory and Distributed Multiprocessing. Bhanu Kapoor, Ph.D. The Saylor Foundation
Shared Memory and Distributed Multiprocessing Bhanu Kapoor, Ph.D. The Saylor Foundation 1 Issue with Parallelism Parallel software is the problem Need to get significant performance improvement Otherwise,
More informationA Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004
A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into
More informationCS MapReduce. Vitaly Shmatikov
CS 5450 MapReduce Vitaly Shmatikov BackRub (Google), 1997 slide 2 NEC Earth Simulator, 2002 slide 3 Conventional HPC Machine ucompute nodes High-end processors Lots of RAM unetwork Specialized Very high
More informationTHE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano
THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL Jun Sun, Yasushi Shinjo and Kozo Itano Institute of Information Sciences and Electronics University of Tsukuba Tsukuba,
More informationBackground: I/O Concurrency
Background: I/O Concurrency Brad Karp UCL Computer Science CS GZ03 / M030 5 th October 2011 Outline Worse Is Better and Distributed Systems Problem: Naïve single-process server leaves system resources
More informationDistributed Computations MapReduce. adapted from Jeff Dean s slides
Distributed Computations MapReduce adapted from Jeff Dean s slides What we ve learnt so far Basic distributed systems concepts Consistency (sequential, eventual) Fault tolerance (recoverability, availability)
More informationHigh Performance Computing. Introduction to Parallel Computing
High Performance Computing Introduction to Parallel Computing Acknowledgements Content of the following presentation is borrowed from The Lawrence Livermore National Laboratory https://hpc.llnl.gov/training/tutorials
More informationTowards Jungle Computing with Ibis/Constellation
Towards Jungle Computing with Ibis/Constellation Jason Maassen, Niels Drost Henri Bal, Frank Seinstra Department of Computer Science VU University, Amsterdam, The Netherlands Introduction HPC is entering
More informationChapter 13: I/O Systems
Chapter 13: I/O Systems DM510-14 Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations STREAMS Performance 13.2 Objectives
More informationChapter 13 Strong Scaling
Chapter 13 Strong Scaling Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables
More informationMapReduce: A Programming Model for Large-Scale Distributed Computation
CSC 258/458 MapReduce: A Programming Model for Large-Scale Distributed Computation University of Rochester Department of Computer Science Shantonu Hossain April 18, 2011 Outline Motivation MapReduce Overview
More informationHPX. High Performance ParalleX CCT Tech Talk Series. Hartmut Kaiser
HPX High Performance CCT Tech Talk Hartmut Kaiser (hkaiser@cct.lsu.edu) 2 What s HPX? Exemplar runtime system implementation Targeting conventional architectures (Linux based SMPs and clusters) Currently,
More informationA Method for Order/Degree Problem Based on Graph Symmetry and Simulated Annealing
A Method for Order/Degree Problem Based on Graph Symmetry and Simulated Annealing Masahiro Nakao (RIKEN Center for Computational Science) CANDAR'18, Nov. 2018@Takayama, Gifu, Japan What is Order/Degree
More informationShared-memory Parallel Programming with Cilk Plus
Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 30 August 2018 Outline for Today Threaded programming
More informationEnterprise. Breadth-First Graph Traversal on GPUs. November 19th, 2015
Enterprise Breadth-First Graph Traversal on GPUs Hang Liu H. Howie Huang November 9th, 5 Graph is Ubiquitous Breadth-First Search (BFS) is Important Wide Range of Applications Single Source Shortest Path
More informationTITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP
TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop
More informationHigh Performance Computing Systems
High Performance Computing Systems Shared Memory Doug Shook Shared Memory Bottlenecks Trips to memory Cache coherence 2 Why Multicore? Shared memory systems used to be purely the domain of HPC... What
More informationA Cloud Framework for Big Data Analytics Workflows on Azure
A Cloud Framework for Big Data Analytics Workflows on Azure Fabrizio MAROZZO a, Domenico TALIA a,b and Paolo TRUNFIO a a DIMES, University of Calabria, Rende (CS), Italy b ICAR-CNR, Rende (CS), Italy Abstract.
More informationRunning Data-Intensive Scientific Workflows in the Cloud
2014 15th International Conference on Parallel and Distributed Computing, Applications and Technologies Running Data-Intensive Scientific Workflows in the Cloud Chiaki Sato University of Sydney, Australia
More informationManaging and Executing Loosely-Coupled Large-Scale Applications on Clusters, Grids, and Supercomputers
Managing and Executing Loosely-Coupled Large-Scale Applications on Clusters, Grids, and Supercomputers Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Collaborators:
More informationIoan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago
Falkon, a Fast and Light-weight task execution framework for Clusters, Grids, and Supercomputers Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago In Collaboration
More informationPegasus Workflow Management System. Gideon Juve. USC Informa3on Sciences Ins3tute
Pegasus Workflow Management System Gideon Juve USC Informa3on Sciences Ins3tute Scientific Workflows Orchestrate complex, multi-stage scientific computations Often expressed as directed acyclic graphs
More informationRequest Sequencing: Enabling Workflow for Efficient Problem Solving in GridSolve
Request Sequencing: Enabling Workflow for Efficient Problem Solving in GridSolve Yinan Li 1, Jack Dongarra 2, Keith Seymour 3, Asim YarKhan 4 1 yili@eecs.utk.edu 2 dongarra@eecs.utk.edu 3 seymour@eecs.utk.edu
More informationScalable Multi Agent Simulation on the GPU. Avi Bleiweiss NVIDIA Corporation San Jose, 2009
Scalable Multi Agent Simulation on the GPU Avi Bleiweiss NVIDIA Corporation San Jose, 2009 Reasoning Explicit State machine, serial Implicit Compute intensive Fits SIMT well Collision avoidance Motivation
More informationParallel Computing with MATLAB
Parallel Computing with MATLAB CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University
More informationChapter 12: I/O Systems
Chapter 12: I/O Systems Chapter 12: I/O Systems I/O Hardware! Application I/O Interface! Kernel I/O Subsystem! Transforming I/O Requests to Hardware Operations! STREAMS! Performance! Silberschatz, Galvin
More informationChapter 13: I/O Systems
Chapter 13: I/O Systems Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations STREAMS Performance Silberschatz, Galvin and
More informationChapter 12: I/O Systems. Operating System Concepts Essentials 8 th Edition
Chapter 12: I/O Systems Silberschatz, Galvin and Gagne 2011 Chapter 12: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations STREAMS
More informationA Non-Relational Storage Analysis
A Non-Relational Storage Analysis Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Cloud Computing - 2nd semester 2012/2013 Universitat Politècnica de Catalunya Microblogging - big data?
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class
More informationThe MOSIX Scalable Cluster Computing for Linux. mosix.org
The MOSIX Scalable Cluster Computing for Linux Prof. Amnon Barak Computer Science Hebrew University http://www. mosix.org 1 Presentation overview Part I : Why computing clusters (slide 3-7) Part II : What
More informationOnline Course Evaluation. What we will do in the last week?
Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do
More informationParallel Mesh Partitioning in Alya
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Parallel Mesh Partitioning in Alya A. Artigues a *** and G. Houzeaux a* a Barcelona Supercomputing Center ***antoni.artigues@bsc.es
More informationVirtual SQL Servers. Actual Performance. 2016
@kleegeek davidklee.net heraflux.com linkedin.com/in/davidaklee Specialties / Focus Areas / Passions: Performance Tuning & Troubleshooting Virtualization Cloud Enablement Infrastructure Architecture Health
More informationInterconnect Technology and Computational Speed
Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented
More informationAnalytical Modeling of Parallel Programs
Analytical Modeling of Parallel Programs Alexandre David Introduction to Parallel Computing 1 Topic overview Sources of overhead in parallel programs. Performance metrics for parallel systems. Effect of
More informationIoan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago
Running 1 Million Jobs in 10 Minutes via the Falkon Fast and Light-weight Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago In Collaboration with: Ian Foster,
More informationClouds: An Opportunity for Scientific Applications?
Clouds: An Opportunity for Scientific Applications? Ewa Deelman USC Information Sciences Institute Acknowledgements Yang-Suk Ki (former PostDoc, USC) Gurmeet Singh (former Ph.D. student, USC) Gideon Juve
More informationChapter 13: I/O Systems
Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Streams Performance I/O Hardware Incredible variety of I/O devices Common
More informationKAAPI : Adaptive Runtime System for Parallel Computing
KAAPI : Adaptive Runtime System for Parallel Computing Thierry Gautier, thierry.gautier@inrialpes.fr Bruno Raffin, bruno.raffin@inrialpes.fr, INRIA Grenoble Rhône-Alpes Moais Project http://moais.imag.fr
More informationChapter 13: I/O Systems. Operating System Concepts 9 th Edition
Chapter 13: I/O Systems Silberschatz, Galvin and Gagne 2013 Chapter 13: I/O Systems Overview I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations
More informationScalable GPU Graph Traversal!
Scalable GPU Graph Traversal Duane Merrill, Michael Garland, and Andrew Grimshaw PPoPP '12 Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming Benwen Zhang
More informationChapter 13: I/O Systems
Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Streams Performance Objectives Explore the structure of an operating
More informationSuperGlue and DuctTEiP: Using data versioning for dependency-aware task-based parallelization
SuperGlue and DuctTEiP: Using data versioning for dependency-aware task-based parallelization Elisabeth Larsson Martin Tillenius Afshin Zafari UPMARC Workshop on Task-Based Parallel Programming September
More informationDevice-Functionality Progression
Chapter 12: I/O Systems I/O Hardware I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Incredible variety of I/O devices Common concepts Port
More information