Workflow System for Data-intensive Many-task Computing

Size: px
Start display at page:

Download "Workflow System for Data-intensive Many-task Computing"

Transcription

1 Workflow System for Data-intensive Many-task Computing Masahiro Tanaka and Osamu Tatebe University of Tsukuba, JST/CREST Japan Science and Technology Agency ISP2S2 Dec 4,

2 Outline Background Pwrake Workflow System IO-aware Task Scheduling Locality-aware Task Scheduling using Multi-Constraint Graph Partitioning (MCGP) Disk cache-aware Tasks Scheduling Conclusion ISP2S2 Dec 4,

3 Background: Data-intensive Science Many Scientific Fields Astronomy, Bioinformatics, Earth Science, Particle Physics, Data I/O > Computation Interaction through File System Handles huge amount of data HSC for Subaru Telescope generates ~300GB/night. Requires Parallel processing on Distributed Computer Systems ISP2S2 Dec 4,

4 Example of scientific workflow: Montage (Astronomy image processing) Workflow DAG Input images mprojectpp mdiff+mfitplane mbgmodel mbackground madd mshrink madd mjpeg Output image Process Task output file input ISP2S2 Dec 4,

5 Pwrake Workflow System ISP2S2 Dec 4,

6 Pwrake Workflow System Parallel Workflow extension to Rake Target: data-intensive and many-task scientific workflow Pwrake is based on: Rake : Ruby version of UNIX Make Workflow definition language for Many-Task Scientific Workflows Gfarm : Distributed File System Scalable I/O performance Use local storage of compute nodes ISP2S2 Dec 4,

7 Workflow Definition Language Task definition format e.g. DAX Need script to define many tasks. Design a new language e.g. Swift (Wilde et al. 2011) Learning cost, Niche community. Use an existing language e.g. GXP Make (Taura et al. 2013) Extension rule is not enough for scientific workflows. ISP2S2 Dec 4,

8 Our solution: Rake Ruby Make Build tool written in Ruby Widely-used tool in the Ruby community Rake is an internal DSL Ruby is an host language reduces learning cost able to use Ruby language features ISP2S2 Dec 4,

9 Useful features of Rake For-Loop BASENAMES = Array of basenames for i in BASENAMES file "out/#{i}.fits" => "src/#{i}.fits" do t sh "mprojectpp #{t.prerequisites[0]} #{t.name} region.hdr" end end Complex Rule using script FILEMAP = Mapping input files to output files rule /^d /.*.fits$/ => proc{ x FILEMAP[x]} do t p1,p2 = t.prerequisites sh "mdiff #{p1} #{p2} #{t.name} region.hdr" end ISP2S2 Dec 4,

10 Design of Pwrake Inherit Rake Workflow (task) definition language File-based task dependency Resume/Restart Implementation, e.g., Task class, Application module Implement Pwrake extension remote process execution parallel task execution task queue (includes scheduling) find file location using Gfarm API ISP2S2 Dec 4,

11 Pwrake Archetecture Master node Worker nodes Pwrake process Task Graph worker thread Task process process files files Queue worker thread SSH Gfarm worker thread enq deq worker thread process process files files ISP2S2 Dec 4,

12 Design of Task Queue Task NodeQueue TaskQueue enq Node 1 Node 2 deq worker thread Node 3 Other nodes ISP2S2 Dec 4,

13 MB/s I/O-aware Task Scheduling Issues: Read performance of File Locality Disk cache Gfarm file (HDD, GbE) (buffer/page cache) disk process cache Local file cache file cache Remote 0 Remote Local file disk Gfarm file disk ISP2S2 Dec 4,

14 Locality-aware Scheduling based on MCGP (Multi-Constraint Graph Partitioning) (CCGrid 2012) ISP2S2 Dec 4,

15 Locality-aware Scheduling Methods 1. Naïve locality scheduling Assign a task to a node where its input file is stored. 2. Method using MCGP (Multi-Constraint Graph Partitioning) Our proposal (CCGrid 2012) (Idle workers steal tasks) ISP2S2 Dec 4,

16 Graph Partitioning Task Scheduling Is Graph Partitioning also applicable to Workflow DAG? Vertex Computation Edge Communication Minimize: Edge-cut Data movement ISP2S2 Dec 4,

17 Graph Partitioning on DAG Standard Graph Partitioning Ideal Partitioning for Scheduling Node-A Node-B Node-C Node-D Former Tasks Latter Tasks Standard GP is not aware of parallelizable tasks ISP2S2 Dec 4,

18 Multi-Constraint Graph Partitioning (MCGP) Vertex Weight Vectors w 1 VA w 4 (1,3,1) VB w i w 5 i V A w i j 1 w1 j V (1,2,4) (2,2,2) i j 2 nd dim: w w 6 i i i w1, w2, w3 1 st dim: i V V A A B 2 2 j V B V B 6 w 2 w 3 (2,3,1) (3,1,1) w 6 (3,1,3) 3 rd dim: i V A w i j 3 w3 j V B 6 Balance the sum of Vertex Weights at each dimension ISP2S2 Dec 4,

19 Proposed method: MCGP (Multi-Constraint Graph Partitioning) w 1 w 1 w 1 w 1 w 2 w 2 w 2 w 3 w 4 w 4 w 4 w 4 w 5 w 5 w 6 w 6 w 7 w 1 = (1,0,0,0,0) w 2 = (0,1,0,0,0) w 3 = (0,0,0,0,0) w 4 = (0,0,1,0,0) w 5 = (0,0,0,1,0) w 6 = (0,0,0,0,1) w 7 = (0,0,0,0,0) N task N group : set 1 at i th dim set 0 at others N task < N group : set 0 to all dims ISP2S2 Dec 4,

20 Result of MCGP Node-A Node-B w 1 = (1,0,0,0,0) w 2 = (0,1,0,0,0) w 3 = (0,0,0,0,0) w 4 = (0,0,1,0,0) Uniform Distribution in each phase Minimize Edge-cut applied to Entire Graph w 5 = (0,0,0,1,0) w 6 = (0,0,0,0,1) w 7 = (0,0,0,0,0) ISP2S2 Dec 4,

21 Image Position and Task Nodes (Initially, input files are stored in one node) Naïve locality MCGP ISP2S2 Dec 4,

22 Data Movement between nodes Data Size Ratio (%) A(Unconcern) B(Naïve locality) C(MCGP) ISP2S2 Dec 4,

23 Workflow Execution Time Elapsed Time (sec) A(Unconcern) B(Naïve locality) C(MCGP) 31% down 22% down Includes time to solve MCGP (30 ms) ISP2S2 Dec 4,

24 Disk cache-aware Task Scheduling (Cluster 2014) ISP2S2 Dec 4,

25 Disk Cache-aware Task Scheduling LIFO queue: Intermediate files are read soon. High probability that the file is cached. Trailing task problem (Armstrong et al. MTAGS 2010) Workflow DAG LIFO queue Execution order Last-in First-out A1 A2 A3 A4 A5 A6 A6 A6 B6 B6 A5 B1 B2 B3 B4 B5 B6 A5 A4 B5 A3 C A2 time First-in A1 ISP2S2 Dec 4,

26 Proposed Scheduling methods Workflow DAG A1 A2.. An B1 B2.. Bn C A1 A3 An-2 An B2 Bn-3 Bn-1 A2 A4 An-1 B1 B3 Bn-2 Bn Trailing tasks A1 B1 A3 B3 An-2 Bn-2 An Bn : A2 B2 A4 B4 An-1 Bn-1 idle core A1 B1 A3 B3 An-2 Bn-2 Bn-1 : A2 B2 A4 B4 An-1 An Bn LIFO HRF A1 B1 B2 B3 An-1 Bn-1 Bn-2 HRF: Highest Rank First : A2 A3 A4 A5 An-2 An Bn Bn-3 FIFO (HRF) LIFO LIFO+HRF Rank Equalization+HRF Disk Cache Trailing Task Task Overlap Rank Overlap HRF ISP2S2 Dec 4,

27 Core Utilization FIFO mprojectpp mdiff LIFO Trailing Task Sequential tasks Rank Eq+HRF (different setting) mdiff overlaps mprojectpp LIFO+HRF No Trailing Task ISP2S2 Dec 4,

28 Measurement of Strong Scaling (1-12 nodes, Logarithmic) 1node 8cores x1.9 speedup from FIFO 1/n cores 4nodes 32cores Next Slide ISP2S2 Dec 4,

29 Measurement of Strong Scaling (4-12 nodes, Linear) 4nodes 32cores 12nodes 96cores 12 % speedup from LIFO (trailing task) 1/n cores ISP2S2 Dec 4,

30 Conclusion We developed Pwrake workflow system for data-intensive, many-task workflows. I/O-aware workflow scheduling: Locality-aware scheduling using MCGP remote file access: 88% 14% workflow execution: 31% speedup Disk cache-aware scheduling LIFO: 1.9x speedup HRF: ~12% speedup ISP2S2 Dec 4,

Disk Cache-Aware Task Scheduling

Disk Cache-Aware Task Scheduling Disk ache-aware Task Scheduling For Data-Intensive and Many-Task Workflow Masahiro Tanaka and Osamu Tatebe University of Tsukuba, JST/REST Japan Science and Technology Agency IEEE luster 2014 2014-09-24

More information

System Software for Big Data and Post Petascale Computing

System Software for Big Data and Post Petascale Computing The Japanese Extreme Big Data Workshop February 26, 2014 System Software for Big Data and Post Petascale Computing Osamu Tatebe University of Tsukuba I/O performance requirement for exascale applications

More information

Data Management in Parallel Scripting

Data Management in Parallel Scripting Data Management in Parallel Scripting Zhao Zhang 11/11/2012 Problem Statement Definition: MTC applications are those applications in which existing sequential or parallel programs are linked by files output

More information

Optimizing Local File Accesses for FUSE-Based Distributed Storage

Optimizing Local File Accesses for FUSE-Based Distributed Storage Optimizing Local File Accesses for FUSE-Based Distributed Storage Shun Ishiguro 1, Jun Murakami 1, Yoshihiro Oyama 1,3, Osamu Tatebe 2,3 1. The University of Electro-Communications, Japan 2. University

More information

NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC

NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Segregated storage and compute NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Co-located storage and compute HDFS, GFS Data

More information

HSC Data Processing Environment ~Upgrading plan of open-use computing facility for HSC data

HSC Data Processing Environment ~Upgrading plan of open-use computing facility for HSC data HSC Data Processing Environment ~Upgrading plan of open-use computing facility for HSC data Hisanori Furusawa (NAOJ) 2018.1.17 Subaru Users Meeting 1 Missions of HSC Data Processing in NAOJ Support General

More information

A high-speed data processing technique for time-sequential satellite observation data

A high-speed data processing technique for time-sequential satellite observation data A high-speed data processing technique for time-sequential satellite observation data Ken T. Murata 1a), Hidenobu Watanabe 1, Kazunori Yamamoto 1, Eizen Kimura 2, Masahiro Tanaka 3,OsamuTatebe 3, Kentaro

More information

Practice of Software Development: Dynamic scheduler for scientific simulations

Practice of Software Development: Dynamic scheduler for scientific simulations Practice of Software Development: Dynamic scheduler for scientific simulations @ SimLab EA Teilchen STEINBUCH CENTRE FOR COMPUTING - SCC KIT Universität des Landes Baden-Württemberg und nationales Forschungszentrum

More information

Data Throttling for Data-Intensive Workflows

Data Throttling for Data-Intensive Workflows Data Throttling for Data-Intensive Workflows Sang-Min Park and Marty Humphrey Dept. of Computer Science, University of Virginia This work was supported by the National Science Foundation under Grant No.

More information

A Data Diffusion Approach to Large Scale Scientific Exploration

A Data Diffusion Approach to Large Scale Scientific Exploration A Data Diffusion Approach to Large Scale Scientific Exploration Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Joint work with: Yong Zhao: Microsoft Ian Foster:

More information

MOHA: Many-Task Computing Framework on Hadoop

MOHA: Many-Task Computing Framework on Hadoop Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction

More information

Overcoming Data Locality: an In-Memory Runtime File System with Symmetrical Data Distribution

Overcoming Data Locality: an In-Memory Runtime File System with Symmetrical Data Distribution Overcoming Data Locality: an In-Memory Runtime File System with Symmetrical Data Distribution Alexandru Uta a,, Andreea Sandu a, Thilo Kielmann a a Dept. of Computer Science, VU University Amsterdam, The

More information

Moore s Law. Computer architect goal Software developer assumption

Moore s Law. Computer architect goal Software developer assumption Moore s Law The number of transistors that can be placed inexpensively on an integrated circuit will double approximately every 18 months. Self-fulfilling prophecy Computer architect goal Software developer

More information

Piccolo. Fast, Distributed Programs with Partitioned Tables. Presenter: Wu, Weiyi Yale University. Saturday, October 15,

Piccolo. Fast, Distributed Programs with Partitioned Tables. Presenter: Wu, Weiyi Yale University. Saturday, October 15, Piccolo Fast, Distributed Programs with Partitioned Tables 1 Presenter: Wu, Weiyi Yale University Outline Background Intuition Design Evaluation Future Work 2 Outline Background Intuition Design Evaluation

More information

Lecture 4, 04/08/2015. Scribed by Eric Lax, Andreas Santucci, Charles Zheng.

Lecture 4, 04/08/2015. Scribed by Eric Lax, Andreas Santucci, Charles Zheng. CME 323: Distributed Algorithms and Optimization, Spring 2015 http://stanford.edu/~rezab/dao. Instructor: Reza Zadeh, Databricks and Stanford. Lecture 4, 04/08/2015. Scribed by Eric Lax, Andreas Santucci,

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing Bootcamp for SahasraT 7th September 2018 Aditya Krishna Swamy adityaks@iisc.ac.in SERC, IISc Acknowledgments Akhila, SERC S. Ethier, PPPL P. Messina, ECP LLNL HPC tutorials

More information

Chapter 5. The MapReduce Programming Model and Implementation

Chapter 5. The MapReduce Programming Model and Implementation Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing

More information

Scientific Workflows and Cloud Computing. Gideon Juve USC Information Sciences Institute

Scientific Workflows and Cloud Computing. Gideon Juve USC Information Sciences Institute Scientific Workflows and Cloud Computing Gideon Juve USC Information Sciences Institute gideon@isi.edu Scientific Workflows Loosely-coupled parallel applications Expressed as directed acyclic graphs (DAGs)

More information

Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique

Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique Prateek Dhawalia Sriram Kailasam D. Janakiram Distributed and Object Systems Lab Dept. of Comp.

More information

LEEN: Locality/Fairness- Aware Key Partitioning for MapReduce in the Cloud

LEEN: Locality/Fairness- Aware Key Partitioning for MapReduce in the Cloud LEEN: Locality/Fairness- Aware Key Partitioning for MapReduce in the Cloud Shadi Ibrahim, Hai Jin, Lu Lu, Song Wu, Bingsheng He*, Qi Li # Huazhong University of Science and Technology *Nanyang Technological

More information

NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC

NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Segregated storage and compute NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Co-located storage and compute HDFS, GFS Data

More information

Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work

Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work Today (2014):

More information

Pegasus. Automate, recover, and debug scientific computations. Rafael Ferreira da Silva.

Pegasus. Automate, recover, and debug scientific computations. Rafael Ferreira da Silva. Pegasus Automate, recover, and debug scientific computations. Rafael Ferreira da Silva http://pegasus.isi.edu Experiment Timeline Scientific Problem Earth Science, Astronomy, Neuroinformatics, Bioinformatics,

More information

Outline. ASP 2012 Grid School

Outline. ASP 2012 Grid School Distributed Storage Rob Quick Indiana University Slides courtesy of Derek Weitzel University of Nebraska Lincoln Outline Storage Patterns in Grid Applications Storage

More information

Dynamic Fine Grain Scheduling of Pipeline Parallelism. Presented by: Ram Manohar Oruganti and Michael TeWinkle

Dynamic Fine Grain Scheduling of Pipeline Parallelism. Presented by: Ram Manohar Oruganti and Michael TeWinkle Dynamic Fine Grain Scheduling of Pipeline Parallelism Presented by: Ram Manohar Oruganti and Michael TeWinkle Overview Introduction Motivation Scheduling Approaches GRAMPS scheduling method Evaluation

More information

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

MATE-EC2: A Middleware for Processing Data with Amazon Web Services MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering

More information

Synonymous with supercomputing Tightly-coupled applications Implemented using Message Passing Interface (MPI) Large of amounts of computing for short

Synonymous with supercomputing Tightly-coupled applications Implemented using Message Passing Interface (MPI) Large of amounts of computing for short Synonymous with supercomputing Tightly-coupled applications Implemented using Message Passing Interface (MPI) Large of amounts of computing for short periods of time Usually requires low latency interconnects

More information

Most real programs operate somewhere between task and data parallelism. Our solution also lies in this set.

Most real programs operate somewhere between task and data parallelism. Our solution also lies in this set. for Windows Azure and HPC Cluster 1. Introduction In parallel computing systems computations are executed simultaneously, wholly or in part. This approach is based on the partitioning of a big task into

More information

Scientific Workflow Scheduling with Provenance Data in a Multisite Cloud

Scientific Workflow Scheduling with Provenance Data in a Multisite Cloud Scientific Workflow Scheduling with Provenance Data in a Multisite Cloud Ji Liu, Esther Pacitti, Patrick Valduriez, Marta Mattoso To cite this version: Ji Liu, Esther Pacitti, Patrick Valduriez, Marta

More information

Parallel Processing. Majid AlMeshari John W. Conklin. Science Advisory Committee Meeting September 3, 2010 Stanford University

Parallel Processing. Majid AlMeshari John W. Conklin. Science Advisory Committee Meeting September 3, 2010 Stanford University Parallel Processing Majid AlMeshari John W. Conklin 1 Outline Challenge Requirements Resources Approach Status Tools for Processing 2 Challenge A computationally intensive algorithm is applied on a huge

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Piccolo: Building Fast, Distributed Programs

More information

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Programming Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Computing Only required amount of CPU and storage can be used anytime from anywhere via network Availability, throughput, reliability

More information

Enlightening the I/O Path: A Holistic Approach for Application Performance

Enlightening the I/O Path: A Holistic Approach for Application Performance Enlightening the I/O Path: A Holistic Approach for Application Performance Sangwook Kim 13, Hwanju Kim 2, Joonwon Lee 3, and Jinkyu Jeong 3 Apposha 1 Dell EMC 2 Sungkyunkwan University 3 Data-Intensive

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

Typically applied in clusters and grids Loosely-coupled applications with sequential jobs Large amounts of computing for long periods of times

Typically applied in clusters and grids Loosely-coupled applications with sequential jobs Large amounts of computing for long periods of times Typically applied in clusters and grids Loosely-coupled applications with sequential jobs Large amounts of computing for long periods of times Measured in operations per month or years 2 Bridge the gap

More information

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic) I/O Systems Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) I/O Systems 1393/9/15 1 / 57 Motivation Amir H. Payberah (Tehran

More information

Jyotheswar Kuricheti

Jyotheswar Kuricheti Jyotheswar Kuricheti 1 Agenda: 1. Performance Tuning Overview 2. Identify Bottlenecks 3. Optimizing at different levels : Target Source Mapping Session System 2 3 Performance Tuning Overview: 4 What is

More information

Lecture 13 Input/Output (I/O) Systems (chapter 13)

Lecture 13 Input/Output (I/O) Systems (chapter 13) Bilkent University Department of Computer Engineering CS342 Operating Systems Lecture 13 Input/Output (I/O) Systems (chapter 13) Dr. İbrahim Körpeoğlu http://www.cs.bilkent.edu.tr/~korpe 1 References The

More information

Algorithm Engineering with PRAM Algorithms

Algorithm Engineering with PRAM Algorithms Algorithm Engineering with PRAM Algorithms Bernard M.E. Moret moret@cs.unm.edu Department of Computer Science University of New Mexico Albuquerque, NM 87131 Rome School on Alg. Eng. p.1/29 Measuring and

More information

Implementing Scheduling Algorithms. Real-Time and Embedded Systems (M) Lecture 9

Implementing Scheduling Algorithms. Real-Time and Embedded Systems (M) Lecture 9 Implementing Scheduling Algorithms Real-Time and Embedded Systems (M) Lecture 9 Lecture Outline Implementing real time systems Key concepts and constraints System architectures: Cyclic executive Microkernel

More information

Apache Giraph: Facebook-scale graph processing infrastructure. 3/31/2014 Avery Ching, Facebook GDM

Apache Giraph: Facebook-scale graph processing infrastructure. 3/31/2014 Avery Ching, Facebook GDM Apache Giraph: Facebook-scale graph processing infrastructure 3/31/2014 Avery Ching, Facebook GDM Motivation Apache Giraph Inspired by Google s Pregel but runs on Hadoop Think like a vertex Maximum value

More information

Energy Prediction for I/O Intensive Workflow Applications

Energy Prediction for I/O Intensive Workflow Applications Energy Prediction for I/O Intensive Workflow Applications ABSTRACT As workflow-based data-intensive applications have become increasingly popular, the lack of support tools to aid resource provisioning

More information

Course Syllabus. Operating Systems

Course Syllabus. Operating Systems Course Syllabus. Introduction - History; Views; Concepts; Structure 2. Process Management - Processes; State + Resources; Threads; Unix implementation of Processes 3. Scheduling Paradigms; Unix; Modeling

More information

Oracle Database 12c: JMS Sharded Queues

Oracle Database 12c: JMS Sharded Queues Oracle Database 12c: JMS Sharded Queues For high performance, scalable Advanced Queuing ORACLE WHITE PAPER MARCH 2015 Table of Contents Introduction 2 Architecture 3 PERFORMANCE OF AQ-JMS QUEUES 4 PERFORMANCE

More information

Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions

Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing

More information

Module 12: I/O Systems

Module 12: I/O Systems Module 12: I/O Systems I/O hardwared Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Performance 12.1 I/O Hardware Incredible variety of I/O devices Common

More information

modern database systems lecture 10 : large-scale graph processing

modern database systems lecture 10 : large-scale graph processing modern database systems lecture 1 : large-scale graph processing Aristides Gionis spring 18 timeline today : homework is due march 6 : homework out april 5, 9-1 : final exam april : homework due graphs

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

I/O Systems. 04/16/2007 CSCI 315 Operating Systems Design 1

I/O Systems. 04/16/2007 CSCI 315 Operating Systems Design 1 I/O Systems Notice: The slides for this lecture have been largely based on those accompanying the textbook Operating Systems Concepts with Java, by Silberschatz, Galvin, and Gagne (2007). Many, if not

More information

Migrating Scientific Workflows to the Cloud

Migrating Scientific Workflows to the Cloud Migrating Scientific Workflows to the Cloud Through Graph-partitioning, Scheduling and Peer-to-Peer Data Sharing Satish Narayana Srirama, Jaagup Viil Mobile & Cloud Lab, Institute of Computer Science University

More information

Enosis: Bridging the Semantic Gap between

Enosis: Bridging the Semantic Gap between Enosis: Bridging the Semantic Gap between File-based and Object-based Data Models Anthony Kougkas - akougkas@hawk.iit.edu, Hariharan Devarajan, Xian-He Sun Outline Introduction Background Approach Evaluation

More information

Scalable Algorithmic Techniques Decompositions & Mapping. Alexandre David

Scalable Algorithmic Techniques Decompositions & Mapping. Alexandre David Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05 adavid@cs.aau.dk Introduction Focus on data parallelism, scale with size. Task parallelism limited. Notion of scalability

More information

Parallel Programming Patterns. Overview and Concepts

Parallel Programming Patterns. Overview and Concepts Parallel Programming Patterns Overview and Concepts Outline Practical Why parallel programming? Decomposition Geometric decomposition Task farm Pipeline Loop parallelism Performance metrics and scaling

More information

Chapter 7. Multicores, Multiprocessors, and Clusters. Goal: connecting multiple computers to get higher performance

Chapter 7. Multicores, Multiprocessors, and Clusters. Goal: connecting multiple computers to get higher performance Chapter 7 Multicores, Multiprocessors, and Clusters Introduction Goal: connecting multiple computers to get higher performance Multiprocessors Scalability, availability, power efficiency Job-level (process-level)

More information

High Performance Computing and Data Mining

High Performance Computing and Data Mining High Performance Computing and Data Mining Performance Issues in Data Mining Peter Christen Peter.Christen@anu.edu.au Data Mining Group Department of Computer Science, FEIT Australian National University,

More information

Shared Memory and Distributed Multiprocessing. Bhanu Kapoor, Ph.D. The Saylor Foundation

Shared Memory and Distributed Multiprocessing. Bhanu Kapoor, Ph.D. The Saylor Foundation Shared Memory and Distributed Multiprocessing Bhanu Kapoor, Ph.D. The Saylor Foundation 1 Issue with Parallelism Parallel software is the problem Need to get significant performance improvement Otherwise,

More information

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004 A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into

More information

CS MapReduce. Vitaly Shmatikov

CS MapReduce. Vitaly Shmatikov CS 5450 MapReduce Vitaly Shmatikov BackRub (Google), 1997 slide 2 NEC Earth Simulator, 2002 slide 3 Conventional HPC Machine ucompute nodes High-end processors Lots of RAM unetwork Specialized Very high

More information

THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano

THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL Jun Sun, Yasushi Shinjo and Kozo Itano Institute of Information Sciences and Electronics University of Tsukuba Tsukuba,

More information

Background: I/O Concurrency

Background: I/O Concurrency Background: I/O Concurrency Brad Karp UCL Computer Science CS GZ03 / M030 5 th October 2011 Outline Worse Is Better and Distributed Systems Problem: Naïve single-process server leaves system resources

More information

Distributed Computations MapReduce. adapted from Jeff Dean s slides

Distributed Computations MapReduce. adapted from Jeff Dean s slides Distributed Computations MapReduce adapted from Jeff Dean s slides What we ve learnt so far Basic distributed systems concepts Consistency (sequential, eventual) Fault tolerance (recoverability, availability)

More information

High Performance Computing. Introduction to Parallel Computing

High Performance Computing. Introduction to Parallel Computing High Performance Computing Introduction to Parallel Computing Acknowledgements Content of the following presentation is borrowed from The Lawrence Livermore National Laboratory https://hpc.llnl.gov/training/tutorials

More information

Towards Jungle Computing with Ibis/Constellation

Towards Jungle Computing with Ibis/Constellation Towards Jungle Computing with Ibis/Constellation Jason Maassen, Niels Drost Henri Bal, Frank Seinstra Department of Computer Science VU University, Amsterdam, The Netherlands Introduction HPC is entering

More information

Chapter 13: I/O Systems

Chapter 13: I/O Systems Chapter 13: I/O Systems DM510-14 Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations STREAMS Performance 13.2 Objectives

More information

Chapter 13 Strong Scaling

Chapter 13 Strong Scaling Chapter 13 Strong Scaling Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables

More information

MapReduce: A Programming Model for Large-Scale Distributed Computation

MapReduce: A Programming Model for Large-Scale Distributed Computation CSC 258/458 MapReduce: A Programming Model for Large-Scale Distributed Computation University of Rochester Department of Computer Science Shantonu Hossain April 18, 2011 Outline Motivation MapReduce Overview

More information

HPX. High Performance ParalleX CCT Tech Talk Series. Hartmut Kaiser

HPX. High Performance ParalleX CCT Tech Talk Series. Hartmut Kaiser HPX High Performance CCT Tech Talk Hartmut Kaiser (hkaiser@cct.lsu.edu) 2 What s HPX? Exemplar runtime system implementation Targeting conventional architectures (Linux based SMPs and clusters) Currently,

More information

A Method for Order/Degree Problem Based on Graph Symmetry and Simulated Annealing

A Method for Order/Degree Problem Based on Graph Symmetry and Simulated Annealing A Method for Order/Degree Problem Based on Graph Symmetry and Simulated Annealing Masahiro Nakao (RIKEN Center for Computational Science) CANDAR'18, Nov. 2018@Takayama, Gifu, Japan What is Order/Degree

More information

Shared-memory Parallel Programming with Cilk Plus

Shared-memory Parallel Programming with Cilk Plus Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 30 August 2018 Outline for Today Threaded programming

More information

Enterprise. Breadth-First Graph Traversal on GPUs. November 19th, 2015

Enterprise. Breadth-First Graph Traversal on GPUs. November 19th, 2015 Enterprise Breadth-First Graph Traversal on GPUs Hang Liu H. Howie Huang November 9th, 5 Graph is Ubiquitous Breadth-First Search (BFS) is Important Wide Range of Applications Single Source Shortest Path

More information

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop

More information

High Performance Computing Systems

High Performance Computing Systems High Performance Computing Systems Shared Memory Doug Shook Shared Memory Bottlenecks Trips to memory Cache coherence 2 Why Multicore? Shared memory systems used to be purely the domain of HPC... What

More information

A Cloud Framework for Big Data Analytics Workflows on Azure

A Cloud Framework for Big Data Analytics Workflows on Azure A Cloud Framework for Big Data Analytics Workflows on Azure Fabrizio MAROZZO a, Domenico TALIA a,b and Paolo TRUNFIO a a DIMES, University of Calabria, Rende (CS), Italy b ICAR-CNR, Rende (CS), Italy Abstract.

More information

Running Data-Intensive Scientific Workflows in the Cloud

Running Data-Intensive Scientific Workflows in the Cloud 2014 15th International Conference on Parallel and Distributed Computing, Applications and Technologies Running Data-Intensive Scientific Workflows in the Cloud Chiaki Sato University of Sydney, Australia

More information

Managing and Executing Loosely-Coupled Large-Scale Applications on Clusters, Grids, and Supercomputers

Managing and Executing Loosely-Coupled Large-Scale Applications on Clusters, Grids, and Supercomputers Managing and Executing Loosely-Coupled Large-Scale Applications on Clusters, Grids, and Supercomputers Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Collaborators:

More information

Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago

Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Falkon, a Fast and Light-weight task execution framework for Clusters, Grids, and Supercomputers Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago In Collaboration

More information

Pegasus Workflow Management System. Gideon Juve. USC Informa3on Sciences Ins3tute

Pegasus Workflow Management System. Gideon Juve. USC Informa3on Sciences Ins3tute Pegasus Workflow Management System Gideon Juve USC Informa3on Sciences Ins3tute Scientific Workflows Orchestrate complex, multi-stage scientific computations Often expressed as directed acyclic graphs

More information

Request Sequencing: Enabling Workflow for Efficient Problem Solving in GridSolve

Request Sequencing: Enabling Workflow for Efficient Problem Solving in GridSolve Request Sequencing: Enabling Workflow for Efficient Problem Solving in GridSolve Yinan Li 1, Jack Dongarra 2, Keith Seymour 3, Asim YarKhan 4 1 yili@eecs.utk.edu 2 dongarra@eecs.utk.edu 3 seymour@eecs.utk.edu

More information

Scalable Multi Agent Simulation on the GPU. Avi Bleiweiss NVIDIA Corporation San Jose, 2009

Scalable Multi Agent Simulation on the GPU. Avi Bleiweiss NVIDIA Corporation San Jose, 2009 Scalable Multi Agent Simulation on the GPU Avi Bleiweiss NVIDIA Corporation San Jose, 2009 Reasoning Explicit State machine, serial Implicit Compute intensive Fits SIMT well Collision avoidance Motivation

More information

Parallel Computing with MATLAB

Parallel Computing with MATLAB Parallel Computing with MATLAB CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University

More information

Chapter 12: I/O Systems

Chapter 12: I/O Systems Chapter 12: I/O Systems Chapter 12: I/O Systems I/O Hardware! Application I/O Interface! Kernel I/O Subsystem! Transforming I/O Requests to Hardware Operations! STREAMS! Performance! Silberschatz, Galvin

More information

Chapter 13: I/O Systems

Chapter 13: I/O Systems Chapter 13: I/O Systems Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations STREAMS Performance Silberschatz, Galvin and

More information

Chapter 12: I/O Systems. Operating System Concepts Essentials 8 th Edition

Chapter 12: I/O Systems. Operating System Concepts Essentials 8 th Edition Chapter 12: I/O Systems Silberschatz, Galvin and Gagne 2011 Chapter 12: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations STREAMS

More information

A Non-Relational Storage Analysis

A Non-Relational Storage Analysis A Non-Relational Storage Analysis Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Cloud Computing - 2nd semester 2012/2013 Universitat Politècnica de Catalunya Microblogging - big data?

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class

More information

The MOSIX Scalable Cluster Computing for Linux. mosix.org

The MOSIX Scalable Cluster Computing for Linux.  mosix.org The MOSIX Scalable Cluster Computing for Linux Prof. Amnon Barak Computer Science Hebrew University http://www. mosix.org 1 Presentation overview Part I : Why computing clusters (slide 3-7) Part II : What

More information

Online Course Evaluation. What we will do in the last week?

Online Course Evaluation. What we will do in the last week? Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do

More information

Parallel Mesh Partitioning in Alya

Parallel Mesh Partitioning in Alya Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Parallel Mesh Partitioning in Alya A. Artigues a *** and G. Houzeaux a* a Barcelona Supercomputing Center ***antoni.artigues@bsc.es

More information

Virtual SQL Servers. Actual Performance. 2016

Virtual SQL Servers. Actual Performance. 2016 @kleegeek davidklee.net heraflux.com linkedin.com/in/davidaklee Specialties / Focus Areas / Passions: Performance Tuning & Troubleshooting Virtualization Cloud Enablement Infrastructure Architecture Health

More information

Interconnect Technology and Computational Speed

Interconnect Technology and Computational Speed Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented

More information

Analytical Modeling of Parallel Programs

Analytical Modeling of Parallel Programs Analytical Modeling of Parallel Programs Alexandre David Introduction to Parallel Computing 1 Topic overview Sources of overhead in parallel programs. Performance metrics for parallel systems. Effect of

More information

Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago

Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Running 1 Million Jobs in 10 Minutes via the Falkon Fast and Light-weight Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago In Collaboration with: Ian Foster,

More information

Clouds: An Opportunity for Scientific Applications?

Clouds: An Opportunity for Scientific Applications? Clouds: An Opportunity for Scientific Applications? Ewa Deelman USC Information Sciences Institute Acknowledgements Yang-Suk Ki (former PostDoc, USC) Gurmeet Singh (former Ph.D. student, USC) Gideon Juve

More information

Chapter 13: I/O Systems

Chapter 13: I/O Systems Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Streams Performance I/O Hardware Incredible variety of I/O devices Common

More information

KAAPI : Adaptive Runtime System for Parallel Computing

KAAPI : Adaptive Runtime System for Parallel Computing KAAPI : Adaptive Runtime System for Parallel Computing Thierry Gautier, thierry.gautier@inrialpes.fr Bruno Raffin, bruno.raffin@inrialpes.fr, INRIA Grenoble Rhône-Alpes Moais Project http://moais.imag.fr

More information

Chapter 13: I/O Systems. Operating System Concepts 9 th Edition

Chapter 13: I/O Systems. Operating System Concepts 9 th Edition Chapter 13: I/O Systems Silberschatz, Galvin and Gagne 2013 Chapter 13: I/O Systems Overview I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations

More information

Scalable GPU Graph Traversal!

Scalable GPU Graph Traversal! Scalable GPU Graph Traversal Duane Merrill, Michael Garland, and Andrew Grimshaw PPoPP '12 Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming Benwen Zhang

More information

Chapter 13: I/O Systems

Chapter 13: I/O Systems Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Streams Performance Objectives Explore the structure of an operating

More information

SuperGlue and DuctTEiP: Using data versioning for dependency-aware task-based parallelization

SuperGlue and DuctTEiP: Using data versioning for dependency-aware task-based parallelization SuperGlue and DuctTEiP: Using data versioning for dependency-aware task-based parallelization Elisabeth Larsson Martin Tillenius Afshin Zafari UPMARC Workshop on Task-Based Parallel Programming September

More information

Device-Functionality Progression

Device-Functionality Progression Chapter 12: I/O Systems I/O Hardware I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Incredible variety of I/O devices Common concepts Port

More information