Ohua: Implicit Dataflow Programming for Concurrent Systems

Size: px
Start display at page:

Download "Ohua: Implicit Dataflow Programming for Concurrent Systems"

Transcription

1 Ohua: Implicit Dataflow Programming for Concurrent Systems Sebastian Ertel Compiler Construction Group TU Dresden, Germany Christof Fetzer Systems Engineering Group TU Dresden, Germany Pascal Felber Institut d informatique Université de Neuchâtel, Switzerland PPPJ

2 The Multi-Core Future Jons-Tobias Wamhoff Exploiting Speculative and Asymmetric Execution on Multicore Architectures (Dissertation) 2

3 Parallelism Lesson learned (for general purpose programming): Work granularity must compensate scheduling overheads: Coarse-grained parallelism Concurrent programming Tim Harris and Satnam Singh. Feedback directed implicit parallelism. ICFP '07 3

4 Concurrent Programming (compiler knowledge) runtime optimizations manual automatic threads locks explicit (control structures, abstractions) tasks fork/join actors messagepassing dataflow operators dataflowgraph concurrent programming model stateful functions functional program implicit (functions, variables) All models depart from sequential programming style. Scalability in programming effort suffers. Limited compiler support for optimizations. 4

5 Concurrent Programming (compiler knowledge) runtime optimizations manual automatic IO Programming Models Sync vs. Async threads locks explicit (control structures, abstractions) tasks fork/join actors messagepassing dataflow operators dataflowgraph concurrent programming model stateful functions functional program implicit (functions, variables) All models depart from sequential programming style. Scalability in programming effort suffers. Limited compiler support for optimizations. 5

6 Concurrent Programming (compiler knowledge) runtime optimizations manual automatic threads locks explicit (control structures, abstractions) tasks fork/join actors messagepassing dataflow operators dataflowgraph concurrent programming model stateful functions functional program implicit (functions, variables) Key observation: Algorithms are composed out of functionality. 6

7 From dataflow programming to stateful functional programming (SFP) Example: simple web server a r p l c r Jesper H. Spring, Jean Privat, Rachid Guerraoui, and Jan Vitek Streamflex: high-throughput stream programming in java. OOPSLA '07 7

8 From dataflow programming to stateful functional programming (SFP) Example: simple web server a r p l c r port cnn cnn req resource length header a r p l c r content Jesper H. Spring, Jean Privat, Rachid Guerraoui, and Jan Vitek Streamflex: high-throughput stream programming in java. OOPSLA '07 8

9 From dataflow programming to stateful functional programming (SFP) Example: simple web server Jesper H. Spring, Jean Privat, Rachid Guerraoui, and Jan Vitek Streamflex: high-throughput stream programming in java. OOPSLA '07 9

10 From dataflow programming to stateful functional programming (SFP) Example: simple web server Functions instead of channels! Stateful functions state encapsulation Implemented in imperative language/style. Jesper H. Spring, Jean Privat, Rachid Guerraoui, and Jan Vitek Streamflex: high-throughput stream programming in java. OOPSLA '07 10

11 From dataflow programming to stateful functional programming (SFP) Example: simple web server Functions instead of channels! Stateful functions state encapsulation Implemented in imperative language/style. Jesper H. Spring, Jean Privat, Rachid Guerraoui, and Jan Vitek Streamflex: high-throughput stream programming in java. OOPSLA '07 11

12 Ohua (Java/Clojure) Function (Java/Clojure) Algorithm (Clojure) Function Library Linker Dataflow Graph Extraction Dataflow Compilation Runtime Code Dataflow Graph Compiletime Data Section-Mapping Algorithm Execution Engine Hierarchical Scheduler Runtime Algorithm Code Transformation Compile-time 12

13 Ohua (Java/Clojure) Function (Java/Clojure) Algorithm (Clojure) Function Library Linker Dataflow Graph Extraction Dataflow Compilation Runtime Code Dataflow Graph Compiletime Data Section-Mapping Algorithm Execution Engine Hierarchical Scheduler Runtime Algorithm Code Transformation Compile-time 13

14 From Control Flow to Dataflow Construction of a control flow dependence graph. read load composeget-resp merge reply parseheader cond composepost-resp parsepost store Most special forms of Clojure supported. Micah Beck, Richard Johnson, and Keshav Pingali From control flow to dataflow. J. Parallel Distrib. Comput. 14

15 Closures and Destructuring Closures in combination with destructuring create opportunities for enhanced parallelism. macro construction for parallel request handling: 15

16 Closures and Destructuring Closures in combination with destructuring create opportunities for enhanced parallelism. macro construction for parallel request handling: 16

17 Closures and Destructuring Closures in combination with destructuring create opportunities for enhanced parallelism. macro construction for parallel request handling: 17

18 Closures and Destructuring Closures in combination with destructuring create opportunities for enhanced parallelism. macro construction for parallel request handling: balancedata read read read parse parse parse load load load compose compose compose reply reply reply 18

19 Closures and Destructuring Closures in combination with destructuring create opportunities for enhanced parallelism. macro construction for parallel request handling: balancedata read read read parse parse parse load load load compose compose compose reply reply reply Beware: Each operator/function invocation has its own state! 19

20 Closures and Destructuring Closures in combination with destructuring create opportunities for enhanced parallelism. macro construction for parallel request handling: balancedata read read read parse parse parse load load load compose compose compose reply reply reply 20

21 Ohua (Java/Clojure) Function (Java/Clojure) Algorithm (Clojure) Function Library Linker Dataflow Graph Extraction Dataflow Compilation Runtime Code Dataflow Graph Compiletime Data Section-Mapping Algorithm Execution Engine Hierarchical Scheduler Runtime Algorithm Code Transformation Compile-time 21

22 Sections Dataflow execution model: pipeline parallelism. Section: set of one or more operators. section request handling section section network I/O read section disk I/O section network I/O write section read parse load compose reply read parse load compose reply Construct section mapping from execution restrictions. Hierarchical scheduling: Sections executed across thread pool parallel. Operators scheduled cooperatively sequential. 22

23 Synchronization Sections provide lock-free synchronization! Consider parallel request handling by type: GET cond load parseheader composeget-resp reply read composepost-resp parsepost store reply PUT Shared external resource (file) inconsistency. 23

24 Synchronization Sections provide lock-free synchronization! Consider parallel request handling by type: GET cond load parseheader composeget-resp reply read composepost-resp parsepost store reply PUT Shared external resource (file) inconsistency. sync section cond load parseheader composeget-resp reply read composepost-resp parsepost store reply 24

25 Experimental Setup server (8-core) request delay: 20 ms clients spread across 19 - (8-core) machines 25

26 Section Strategies Goal: web server fine-tuning without code changes. Best deployment strategy: call/sgl-rd-rp/non-blocking/normal read parse load compose reply 26

27 Section Strategies Goal: web server fine-tuning without code changes. Best deployment strategy: call 27

28 Section Strategies Goal: web server fine-tuning without code changes. Best deployment strategy: call/sgl-rd-rp read parse load compose reply 28

29 Section Strategies Goal: web server fine-tuning without code changes. Best deployment strategy: call/sgl-rd-rp/non-blocking read parse load compose reply 29

30 Section Strategies Goal: web server fine-tuning without code changes. Best deployment strategy: call/sgl-rd-rp/non-blocking/normal read parse load compose reply 30

31 Jetty vs. Ohua 31

32 Jetty vs. Ohua 32

33 Jetty vs. Ohua Ohua is comparable to Jetty (NIO) in terms of latency, outperforms Jetty (NIO) in terms of throughput, but without pre-selecting the GC and without mixing programming models. 33

34 Stateful Functional Programming (SFP) SFP programming model: program = declarative algorithms + stateful functions Benefits: Scalable system construction via algorithm composition. Implicit parallelism for algorithms. Future work: (tip of the ice berg) No deadlocks or data races. Leverage dataflow experience for runtime optimizations. 34

35 Thanks for your attention! Questions?

36 Loops Example: non-blocking read nb-readconcat nbread merge cond parse load compose reply Beware of state inside loops! 36

37 Sections section request handling section read parse load compose reply section network I/O read section disk I/O section network I/O write section read parse load compose reply Manual section configuration via regular expressions: Restrictions: read parse load compose reply Ultimate goal: automatic section configuration. 37

38 Type sensitive request handling Scenario: small feeds in RAM vs. articles on disk Goal: unblock feeds from articles 38

39 Type sensitive request handling Scenario: small feeds in RAM vs. articles on disk Goal: unblock feeds from articles Latency (ms) Disk RAM # Clients Throughput (kreq/s) # Clients RAM Disk 39

Agenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based?

Agenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based? Agenda Designing Transactional Memory Systems Part III: Lock-based STMs Pascal Felber University of Neuchatel Pascal.Felber@unine.ch Part I: Introduction Part II: Obstruction-free STMs Part III: Lock-based

More information

All routines were built with VS2010 compiler, OpenMP 2.0 and TBB 3.0 libraries were used to implement parallel versions of programs.

All routines were built with VS2010 compiler, OpenMP 2.0 and TBB 3.0 libraries were used to implement parallel versions of programs. technologies for multi-core numeric computation In order to compare ConcRT, OpenMP and TBB technologies, we implemented a few algorithms from different areas of numeric computation and compared their performance

More information

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 Readings: Multiprocessing Required Amdahl, Validity of the single processor

More information

Thinking parallel. Decomposition. Thinking parallel. COMP528 Ways of exploiting parallelism, or thinking parallel

Thinking parallel. Decomposition. Thinking parallel. COMP528 Ways of exploiting parallelism, or thinking parallel COMP528 Ways of exploiting parallelism, or thinking parallel www.csc.liv.ac.uk/~alexei/comp528 Alexei Lisitsa Dept of computer science University of Liverpool a.lisitsa@.liverpool.ac.uk Thinking parallel

More information

Concurrency: what, why, how

Concurrency: what, why, how Concurrency: what, why, how Oleg Batrashev February 10, 2014 what this course is about? additional experience with try out some language concepts and techniques Grading java.util.concurrent.future? agents,

More information

QoS support for Intelligent Storage Devices

QoS support for Intelligent Storage Devices QoS support for Intelligent Storage Devices Joel Wu Scott Brandt Department of Computer Science University of California Santa Cruz ISW 04 UC Santa Cruz Mixed-Workload Requirement General purpose systems

More information

Concurrency: what, why, how

Concurrency: what, why, how Concurrency: what, why, how May 28, 2009 1 / 33 Lecture about everything and nothing Explain basic idea (pseudo) vs. Give reasons for using Present briefly different classifications approaches models and

More information

Architectural Styles. Software Architecture Lecture 5. Copyright Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved.

Architectural Styles. Software Architecture Lecture 5. Copyright Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved. Architectural Styles Software Architecture Lecture 5 Copyright Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy. All rights reserved. Object-Oriented Style Components are objects Data and associated

More information

Executive Summary. It is important for a Java Programmer to understand the power and limitations of concurrent programming in Java using threads.

Executive Summary. It is important for a Java Programmer to understand the power and limitations of concurrent programming in Java using threads. Executive Summary. It is important for a Java Programmer to understand the power and limitations of concurrent programming in Java using threads. Poor co-ordination that exists in threads on JVM is bottleneck

More information

Software Architecture

Software Architecture Software Architecture Lecture 5 Call-Return Systems Rob Pettit George Mason University last class data flow data flow styles batch sequential pipe & filter process control! process control! looping structure

More information

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Parallel Processing Basics Prof. Onur Mutlu Carnegie Mellon University Readings Required Hill, Jouppi, Sohi, Multiprocessors and Multicomputers, pp. 551-560 in Readings in Computer

More information

Parallel Functional Programming Lecture 1. John Hughes

Parallel Functional Programming Lecture 1. John Hughes Parallel Functional Programming Lecture 1 John Hughes Moore s Law (1965) The number of transistors per chip increases by a factor of two every year two years (1975) Number of transistors What shall we

More information

Scala Actors. Scalable Multithreading on the JVM. Philipp Haller. Ph.D. candidate Programming Methods Lab EPFL, Lausanne, Switzerland

Scala Actors. Scalable Multithreading on the JVM. Philipp Haller. Ph.D. candidate Programming Methods Lab EPFL, Lausanne, Switzerland Scala Actors Scalable Multithreading on the JVM Philipp Haller Ph.D. candidate Programming Methods Lab EPFL, Lausanne, Switzerland The free lunch is over! Software is concurrent Interactive applications

More information

DATA-DRIVEN TASKS THEIR IMPLEMENTATION AND SAĞNAK TAŞIRLAR, VIVEK SARKAR DEPARTMENT OF COMPUTER SCIENCE. RICE UNIVERSITY

DATA-DRIVEN TASKS THEIR IMPLEMENTATION AND SAĞNAK TAŞIRLAR, VIVEK SARKAR DEPARTMENT OF COMPUTER SCIENCE. RICE UNIVERSITY 1 DATA-DRIVEN TASKS AND THEIR IMPLEMENTATION SAĞNAK TAŞIRLAR, VIVEK SARKAR DEPARTMENT OF COMPUTER SCIENCE. RICE UNIVERSITY Fork/Join graphs constraint -ism 2 Fork/Join models restrict task graphs to be

More information

Cloud Programming James Larus Microsoft Research. July 13, 2010

Cloud Programming James Larus Microsoft Research. July 13, 2010 Cloud Programming James Larus Microsoft Research July 13, 2010 New Programming Model, New Problems (and some old, unsolved ones) Concurrency Parallelism Message passing Distribution High availability Performance

More information

Compiling for Concise Code and Efficient I/O

Compiling for Concise Code and Efficient I/O Compiling for Concise Code and Efficient I/O Sebastian Ertel, Andrés Goens, Justus Adam and Jeronimo Castrillon Chair for Compiler Construction TU Dresden 27th International Conference on Compiler Construction

More information

Revisiting the Past 25 Years: Lessons for the Future. Guri Sohi University of Wisconsin-Madison

Revisiting the Past 25 Years: Lessons for the Future. Guri Sohi University of Wisconsin-Madison Revisiting the Past 25 Years: Lessons for the Future Guri Sohi University of Wisconsin-Madison Outline VLIW OOO Superscalar Enhancing Superscalar And the future 2 Beyond pipelining to ILP Late 1980s to

More information

Threads and Too Much Milk! CS439: Principles of Computer Systems February 6, 2019

Threads and Too Much Milk! CS439: Principles of Computer Systems February 6, 2019 Threads and Too Much Milk! CS439: Principles of Computer Systems February 6, 2019 Bringing It Together OS has three hats: What are they? Processes help with one? two? three? of those hats OS protects itself

More information

Introduction II. Overview

Introduction II. Overview Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and

More information

Shared state model. April 3, / 29

Shared state model. April 3, / 29 Shared state April 3, 2012 1 / 29 the s s limitations of explicit state: cells equivalence of the two s programming in limiting interleavings locks, monitors, transactions comparing the 3 s 2 / 29 Message

More information

Higher Level Programming Abstractions for FPGAs using OpenCL

Higher Level Programming Abstractions for FPGAs using OpenCL Higher Level Programming Abstractions for FPGAs using OpenCL Desh Singh Supervising Principal Engineer Altera Corporation Toronto Technology Center ! Technology scaling favors programmability CPUs."#/0$*12'$-*

More information

Parallelism and runtimes

Parallelism and runtimes Parallelism and runtimes Advanced Course on Compilers Spring 2015 (III-V): Lecture 7 Vesa Hirvisalo ESG/CSE/Aalto Today Parallel platforms Concurrency Consistency Examples of parallelism Regularity of

More information

Architectural Styles. Reid Holmes

Architectural Styles. Reid Holmes Material and some slide content from: - Emerson Murphy-Hill - Software Architecture: Foundations, Theory, and Practice - Essential Software Architecture Architectural Styles Reid Holmes Lecture 5 - Tuesday,

More information

A Process Model suitable for defining and programming MpSoCs

A Process Model suitable for defining and programming MpSoCs A Process Model suitable for defining and programming MpSoCs MpSoC-Workshop at Rheinfels, 29-30.6.2010 F. Mayer-Lindenberg, TU Hamburg-Harburg 1. Motivation 2. The Process Model 3. Mapping to MpSoC 4.

More information

High Performance Computing on GPUs using NVIDIA CUDA

High Performance Computing on GPUs using NVIDIA CUDA High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and

More information

The Rise and Rise of Dataflow in the JavaVerse

The Rise and Rise of Dataflow in the JavaVerse The Rise and Rise of Dataflow in the JavaVerse @russel_winder russel@winder.org.uk https://www.russel.org.uk 1 The Plan for the Session Stuff. More stuff. Even more stuff possibly. Summary and conclusions

More information

Software Architecture in Practice

Software Architecture in Practice Software Architecture in Practice Chapter 5: Architectural Styles - From Qualities to Architecture Pittsburgh, PA 15213-3890 Sponsored by the U.S. Department of Defense Chapter 5 - page 1 Lecture Objectives

More information

Speculative Concurrent Processing with Transactional Memory in the Actor Model

Speculative Concurrent Processing with Transactional Memory in the Actor Model Speculative oncurrent Processing with Transactional Memory in the Actor Model Yaroslav Hayduk, Anita Sobe, Derin Harmanci, Patrick Marlier, and Pascal Felber first.last@unine.ch University of Neuchatel,

More information

6/20/2018 CS5386 SOFTWARE DESIGN & ARCHITECTURE LECTURE 5: ARCHITECTURAL VIEWS C&C STYLES. Outline for Today. Architecture views C&C Views

6/20/2018 CS5386 SOFTWARE DESIGN & ARCHITECTURE LECTURE 5: ARCHITECTURAL VIEWS C&C STYLES. Outline for Today. Architecture views C&C Views 1 CS5386 SOFTWARE DESIGN & ARCHITECTURE LECTURE 5: ARCHITECTURAL VIEWS C&C STYLES Outline for Today 2 Architecture views C&C Views 1 Components and Connectors (C&C) Styles 3 Elements Relations Properties

More information

High Performance Computing. University questions with solution

High Performance Computing. University questions with solution High Performance Computing University questions with solution Q1) Explain the basic working principle of VLIW processor. (6 marks) The following points are basic working principle of VLIW processor. The

More information

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015 18-447 Computer Architecture Lecture 27: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015 Assignments Lab 7 out Due April 17 HW 6 Due Friday (April 10) Midterm II April

More information

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant Big and Fast Anti-Caching in OLTP Systems Justin DeBrabant Online Transaction Processing transaction-oriented small footprint write-intensive 2 A bit of history 3 OLTP Through the Years relational model

More information

September 15th, Finagle + Java. A love story (

September 15th, Finagle + Java. A love story ( September 15th, 2016 Finagle + Java A love story ( ) @mnnakamura hi, I m Moses Nakamura Twitter lives on the JVM When Twitter realized we couldn t stay on a Rails monolith and continue to scale at the

More information

Lecture 14. Synthesizing Parallel Programs IAP 2007 MIT

Lecture 14. Synthesizing Parallel Programs IAP 2007 MIT 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007 MIT Synthesizing parallel programs (or borrowing some ideas from hardware design) Arvind Computer Science & Artificial

More information

Improving the Practicality of Transactional Memory

Improving the Practicality of Transactional Memory Improving the Practicality of Transactional Memory Woongki Baek Electrical Engineering Stanford University Programming Multiprocessors Multiprocessor systems are now everywhere From embedded to datacenter

More information

New Programming Abstractions for Concurrency. Torvald Riegel Red Hat 12/04/05

New Programming Abstractions for Concurrency. Torvald Riegel Red Hat 12/04/05 New Programming Abstractions for Concurrency Red Hat 12/04/05 1 Concurrency and atomicity C++11 atomic types Transactional Memory Provide atomicity for concurrent accesses by different threads Both based

More information

Message Passing Models and Multicomputer distributed system LECTURE 7

Message Passing Models and Multicomputer distributed system LECTURE 7 Message Passing Models and Multicomputer distributed system LECTURE 7 DR SAMMAN H AMEEN 1 Node Node Node Node Node Node Message-passing direct network interconnection Node Node Node Node Node Node PAGE

More information

Threads and Too Much Milk! CS439: Principles of Computer Systems January 31, 2018

Threads and Too Much Milk! CS439: Principles of Computer Systems January 31, 2018 Threads and Too Much Milk! CS439: Principles of Computer Systems January 31, 2018 Last Time CPU Scheduling discussed the possible policies the scheduler may use to choose the next process (or thread!)

More information

Threads SPL/2010 SPL/20 1

Threads SPL/2010 SPL/20 1 Threads 1 Today Processes and Scheduling Threads Abstract Object Models Computation Models Java Support for Threads 2 Process vs. Program processes as the basic unit of execution managed by OS OS as any

More information

Functional modeling style for efficient SW code generation of video codec applications

Functional modeling style for efficient SW code generation of video codec applications Functional modeling style for efficient SW code generation of video codec applications Sang-Il Han 1)2) Soo-Ik Chae 1) Ahmed. A. Jerraya 2) SD Group 1) SLS Group 2) Seoul National Univ., Korea TIMA laboratory,

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

Concurrency. Unleash your processor(s) Václav Pech

Concurrency. Unleash your processor(s) Václav Pech Concurrency Unleash your processor(s) Václav Pech About me Passionate programmer Concurrency enthusiast GPars @ Codehaus lead Groovy contributor Technology evangelist @ JetBrains http://www.jroller.com/vaclav

More information

Optimization Coaching for Fork/Join Applications on the Java Virtual Machine

Optimization Coaching for Fork/Join Applications on the Java Virtual Machine Optimization Coaching for Fork/Join Applications on the Java Virtual Machine Eduardo Rosales Advisor: Research area: PhD stage: Prof. Walter Binder Parallel applications, performance analysis Planner EuroDW

More information

Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling

Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling Tobias Schwarzer 1, Joachim Falk 1, Michael Glaß 1, Jürgen Teich 1, Christian Zebelein 2, Christian

More information

Transactional Memory

Transactional Memory Transactional Memory Architectural Support for Practical Parallel Programming The TCC Research Group Computer Systems Lab Stanford University http://tcc.stanford.edu TCC Overview - January 2007 The Era

More information

POSIX Threads: a first step toward parallel programming. George Bosilca

POSIX Threads: a first step toward parallel programming. George Bosilca POSIX Threads: a first step toward parallel programming George Bosilca bosilca@icl.utk.edu Process vs. Thread A process is a collection of virtual memory space, code, data, and system resources. A thread

More information

Heckaton. SQL Server's Memory Optimized OLTP Engine

Heckaton. SQL Server's Memory Optimized OLTP Engine Heckaton SQL Server's Memory Optimized OLTP Engine Agenda Introduction to Hekaton Design Consideration High Level Architecture Storage and Indexing Query Processing Transaction Management Transaction Durability

More information

Java SE 7 Programming

Java SE 7 Programming Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 4108 4709 Java SE 7 Programming Duration: 5 Days What you will learn This is the second of two courses that cover the Java Standard Edition

More information

HAFT Hardware-Assisted Fault Tolerance

HAFT Hardware-Assisted Fault Tolerance HAFT Hardware-Assisted Fault Tolerance Dmitrii Kuvaiskii Rasha Faqeh Pramod Bhatotia Christof Fetzer Technische Universität Dresden Pascal Felber Université de Neuchâtel Hardware Errors in the Wild Online

More information

CSci 4061 Introduction to Operating Systems. (Thread-Basics)

CSci 4061 Introduction to Operating Systems. (Thread-Basics) CSci 4061 Introduction to Operating Systems (Thread-Basics) Threads Abstraction: for an executing instruction stream Threads exist within a process and share its resources (i.e. memory) But, thread has

More information

An Object-Oriented Approach to Software Development for Parallel Processing Systems

An Object-Oriented Approach to Software Development for Parallel Processing Systems An Object-Oriented Approach to Software Development for Parallel Processing Systems Stephen S. Yau, Xiaoping Jia, Doo-Hwan Bae, Madhan Chidambaram, and Gilho Oh Computer and Information Sciences Department

More information

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing Designing Parallel Programs This review was developed from Introduction to Parallel Computing Author: Blaise Barney, Lawrence Livermore National Laboratory references: https://computing.llnl.gov/tutorials/parallel_comp/#whatis

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Portland State University ECE 588/688. Graphics Processors

Portland State University ECE 588/688. Graphics Processors Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly

More information

Fine-grained Transaction Scheduling in Replicated Databases via Symbolic Execution

Fine-grained Transaction Scheduling in Replicated Databases via Symbolic Execution Fine-grained Transaction Scheduling in Replicated Databases via Symbolic Execution Raminhas pedro.raminhas@tecnico.ulisboa.pt Stage: 2 nd Year PhD Student Research Area: Dependable and fault-tolerant systems

More information

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES Anish Athalye and Patrick Long Mentors: Austin Clements and Stephen Tu 3 rd annual MIT PRIMES Conference Sequential

More information

ECE 587 Hardware/Software Co-Design Lecture 07 Concurrency in Practice Shared Memory I

ECE 587 Hardware/Software Co-Design Lecture 07 Concurrency in Practice Shared Memory I ECE 587 Hardware/Software Co-Design Spring 2018 1/15 ECE 587 Hardware/Software Co-Design Lecture 07 Concurrency in Practice Shared Memory I Professor Jia Wang Department of Electrical and Computer Engineering

More information

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Using the SDACK Architecture to Build a Big Data Product Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Outline A Threat Analytic Big Data product The SDACK Architecture Akka Streams and data

More information

6.189 IAP Lecture 12. StreamIt Parallelizing Compiler. Prof. Saman Amarasinghe, MIT IAP 2007 MIT

6.189 IAP Lecture 12. StreamIt Parallelizing Compiler. Prof. Saman Amarasinghe, MIT IAP 2007 MIT 6.89 IAP 2007 Lecture 2 StreamIt Parallelizing Compiler 6.89 IAP 2007 MIT Common Machine Language Represent common properties of architectures Necessary for performance Abstract away differences in architectures

More information

Outline. Database Tuning. Ideal Transaction. Concurrency Tuning Goals. Concurrency Tuning. Nikolaus Augsten. Lock Tuning. Unit 8 WS 2013/2014

Outline. Database Tuning. Ideal Transaction. Concurrency Tuning Goals. Concurrency Tuning. Nikolaus Augsten. Lock Tuning. Unit 8 WS 2013/2014 Outline Database Tuning Nikolaus Augsten University of Salzburg Department of Computer Science Database Group 1 Unit 8 WS 2013/2014 Adapted from Database Tuning by Dennis Shasha and Philippe Bonnet. Nikolaus

More information

Introduction to Parallel Programming Models

Introduction to Parallel Programming Models Introduction to Parallel Programming Models Tim Foley Stanford University Beyond Programmable Shading 1 Overview Introduce three kinds of parallelism Used in visual computing Targeting throughput architectures

More information

In-Memory Data Management for Enterprise Applications. BigSys 2014, Stuttgart, September 2014 Johannes Wust Hasso Plattner Institute (now with SAP)

In-Memory Data Management for Enterprise Applications. BigSys 2014, Stuttgart, September 2014 Johannes Wust Hasso Plattner Institute (now with SAP) In-Memory Data Management for Enterprise Applications BigSys 2014, Stuttgart, September 2014 Johannes Wust Hasso Plattner Institute (now with SAP) What is an In-Memory Database? 2 Source: Hector Garcia-Molina

More information

Database Management and Tuning

Database Management and Tuning Database Management and Tuning Concurrency Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 8 May 10, 2012 Acknowledgements: The slides are provided by Nikolaus

More information

An Efficient Commit Protocol Exploiting Primary-Backup Placement in a Parallel Storage System. Haruo Yokota Tokyo Institute of Technology

An Efficient Commit Protocol Exploiting Primary-Backup Placement in a Parallel Storage System. Haruo Yokota Tokyo Institute of Technology An Efficient Commit Protocol Exploiting Primary-Backup Placement in a Parallel Storage System Haruo Yokota Tokyo Institute of Technology My Research Interests Data Engineering + Dependable Systems Dependable

More information

Message Passing. Advanced Operating Systems Tutorial 7

Message Passing. Advanced Operating Systems Tutorial 7 Message Passing Advanced Operating Systems Tutorial 7 Tutorial Outline Review of Lectured Material Discussion: Erlang and message passing 2 Review of Lectured Material Message passing systems Limitations

More information

Software-Controlled Multithreading Using Informing Memory Operations

Software-Controlled Multithreading Using Informing Memory Operations Software-Controlled Multithreading Using Informing Memory Operations Todd C. Mowry Computer Science Department University Sherwyn R. Ramkissoon Department of Electrical & Computer Engineering University

More information

Issues in Multiprocessors

Issues in Multiprocessors Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores SPARCCenter, SGI Challenge, Cray T3D, Convex Exemplar, KSR-1&2, today s CMPs message

More information

SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (II)

SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (II) SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (II) Shan He School for Computational Science University of Birmingham Module 06-19321: SSC Outline Outline of Topics

More information

Parallel Programming Multicore systems

Parallel Programming Multicore systems FYS3240 PC-based instrumentation and microcontrollers Parallel Programming Multicore systems Spring 2011 Lecture #9 Bekkeng, 4.4.2011 Introduction Until recently, innovations in processor technology have

More information

xtc Robert Grimm Making C Safely Extensible New York University

xtc Robert Grimm Making C Safely Extensible New York University xtc Making C Safely Extensible Robert Grimm New York University The Problem Complexity of modern systems is staggering Increasingly, a seamless, global computing environment System builders continue to

More information

JiST Java in Simulation Time An efficient, unifying approach to simulation using virtual machines

JiST Java in Simulation Time An efficient, unifying approach to simulation using virtual machines JiST Java in Simulation Time An efficient, unifying approach to simulation using virtual machines Rimon Barr, Zygmunt Haas, Robbert van Renesse rimon@acm.org haas@ece.cornell.edu rvr@cs.cornell.edu. Cornell

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

Lock Inference in the Presence of Large Libraries

Lock Inference in the Presence of Large Libraries Lock Inference in the Presence of Large Libraries Khilan Gudka, Imperial College London* Tim Harris, Microso8 Research Cambridge Susan Eisenbach, Imperial College London ECOOP 2012 This work was generously

More information

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors. CS 320 Ch. 17 Parallel Processing Multiple Processor Organization The author makes the statement: "Processors execute programs by executing machine instructions in a sequence one at a time." He also says

More information

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture II Benny Thörnberg Associate Professor in Electronics Parallelism Microscopic vs Macroscopic Microscopic parallelism hardware solutions inside system components providing

More information

Scalable Shared Memory Programing

Scalable Shared Memory Programing Scalable Shared Memory Programing Marc Snir www.parallel.illinois.edu What is (my definition of) Shared Memory Global name space (global references) Implicit data movement Caching: User gets good memory

More information

Concurrent ML. John Reppy January 21, University of Chicago

Concurrent ML. John Reppy January 21, University of Chicago Concurrent ML John Reppy jhr@cs.uchicago.edu University of Chicago January 21, 2016 Introduction Outline I Concurrent programming models I Concurrent ML I Multithreading via continuations (if there is

More information

Milind Kulkarni Research Statement

Milind Kulkarni Research Statement Milind Kulkarni Research Statement With the increasing ubiquity of multicore processors, interest in parallel programming is again on the upswing. Over the past three decades, languages and compilers researchers

More information

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples Multicore Jukka Julku 19.2.2009 1 2 3 4 5 6 Disclaimer There are several low-level, languages and directive based approaches But no silver bullets This presentation only covers some examples of them is

More information

G Programming Languages Spring 2010 Lecture 13. Robert Grimm, New York University

G Programming Languages Spring 2010 Lecture 13. Robert Grimm, New York University G22.2110-001 Programming Languages Spring 2010 Lecture 13 Robert Grimm, New York University 1 Review Last week Exceptions 2 Outline Concurrency Discussion of Final Sources for today s lecture: PLP, 12

More information

MultiJav: A Distributed Shared Memory System Based on Multiple Java Virtual Machines. MultiJav: Introduction

MultiJav: A Distributed Shared Memory System Based on Multiple Java Virtual Machines. MultiJav: Introduction : A Distributed Shared Memory System Based on Multiple Java Virtual Machines X. Chen and V.H. Allan Computer Science Department, Utah State University 1998 : Introduction Built on concurrency supported

More information

CSE 544: Principles of Database Systems

CSE 544: Principles of Database Systems CSE 544: Principles of Database Systems Anatomy of a DBMS, Parallel Databases 1 Announcements Lecture on Thursday, May 2nd: Moved to 9am-10:30am, CSE 403 Paper reviews: Anatomy paper was due yesterday;

More information

Asynchronous Elastic Dataflows by Leveraging Clocked EDA

Asynchronous Elastic Dataflows by Leveraging Clocked EDA Optimised Synthesis of Asynchronous Elastic Dataflows by Leveraging Clocked EDA Mahdi Jelodari Mamaghani, Jim Garside, Will Toms, & Doug Edwards Verona, Italy 29 th August 2014 Motivation: Automatic GALSification

More information

Java SE 7 Programming

Java SE 7 Programming Oracle University Contact Us: +40 21 3678820 Java SE 7 Programming Duration: 5 Days What you will learn This Java Programming training covers the core Application Programming Interfaces (API) you'll use

More information

Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University A Note on This Lecture These slides are partly from 18-742 Fall 2012, Parallel Computer Architecture, Lecture 9: Multithreading

More information

Contemporary Design. Traditional Hardware Design. Traditional Hardware Design. HDL Based Hardware Design User Inputs. Requirements.

Contemporary Design. Traditional Hardware Design. Traditional Hardware Design. HDL Based Hardware Design User Inputs. Requirements. Contemporary Design We have been talking about design process Let s now take next steps into examining in some detail Increasing complexities of contemporary systems Demand the use of increasingly powerful

More information

Execution architecture concepts

Execution architecture concepts by Gerrit Muller Buskerud University College e-mail: gaudisite@gmail.com www.gaudisite.nl Abstract The execution architecture determines largely the realtime and performance behavior of a system. Hard

More information

Separating Access Control Policy, Enforcement, and Functionality in Extensible Systems. Robert Grimm University of Washington

Separating Access Control Policy, Enforcement, and Functionality in Extensible Systems. Robert Grimm University of Washington Separating Access Control Policy, Enforcement, and Functionality in Extensible Systems Robert Grimm University of Washington Extensions Added to running system Interact through low-latency interfaces Form

More information

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008 SHARCNET Workshop on Parallel Computing Hugh Merz Laurentian University May 2008 What is Parallel Computing? A computational method that utilizes multiple processing elements to solve a problem in tandem

More information

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Multi-Core Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 4 " Due: 11:49pm, Saturday " Two late days with penalty! Exam I " Grades out on

More information

Transactifying Apache s Cache Module

Transactifying Apache s Cache Module H. Eran O. Lutzky Z. Guz I. Keidar Department of Electrical Engineering Technion Israel Institute of Technology SYSTOR 2009 The Israeli Experimental Systems Conference Outline 1 Why legacy applications

More information

Java Concurrency in practice

Java Concurrency in practice Java Concurrency in practice Chapter: 12 Bjørn Christian Sebak (bse069@student.uib.no) Karianne Berg (karianne@ii.uib.no) INF329 Spring 2007 Testing Concurrent Programs Conc. programs have a degree of

More information

A developer s guide to load testing

A developer s guide to load testing Software architecture for developers What is software architecture? What is the role of a software architect? How do you define software architecture? How do you share software architecture? How do you

More information

Futures. Prof. Clarkson Fall Today s music: It's Gonna be Me by *NSYNC

Futures. Prof. Clarkson Fall Today s music: It's Gonna be Me by *NSYNC Futures Prof. Clarkson Fall 2017 Today s music: It's Gonna be Me by *NSYNC Review Previously in 3110: Functional programming Modular programming Interpreters Formal methods Final unit of course: Advanced

More information

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started

More information

Issues in Multiprocessors

Issues in Multiprocessors Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing explicit sends & receives Which execution model control parallel

More information

HPX The C++ Standards Library for Concurrency and Parallelism. Hartmut Kaiser

HPX The C++ Standards Library for Concurrency and Parallelism. Hartmut Kaiser HPX The C++ Standards Library for Concurrency and Hartmut Kaiser (hkaiser@cct.lsu.edu) HPX A General Purpose Runtime System The C++ Standards Library for Concurrency and Exposes a coherent and uniform,

More information

Java SE7 Fundamentals

Java SE7 Fundamentals Java SE7 Fundamentals Introducing the Java Technology Relating Java with other languages Showing how to download, install, and configure the Java environment on a Windows system. Describing the various

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen

More information

A Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms

A Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms A Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms Shuoxin Lin, Yanzhou Liu, William Plishker, Shuvra Bhattacharyya Maryland DSPCAD Research Group Department of

More information