Petalisp. A Common Lisp Library for Data Parallel Programming. Marco Heisig Chair for System Simulation FAU Erlangen-Nürnberg

Size: px
Start display at page:

Download "Petalisp. A Common Lisp Library for Data Parallel Programming. Marco Heisig Chair for System Simulation FAU Erlangen-Nürnberg"

Transcription

1 Petalisp A Common Lisp Library for Data Parallel Programming Marco Heisig Chair for System Simulation FAU Erlangen-Nürnberg

2 Petalisp The library Petalisp 1 is a new approach to data parallel computing. The Goal: Elegant High Performance Computing Programs that are beautiful and fast A programming model that is safe and productive Drawbacks: Limited to operations on structured data Significant run-time overhead 1 1

3 Table of contents 1. Using Petalisp 2. Implementation 3. Performance 4. Conclusions 2

4 Using Petalisp

5 Strided Arrays A strided array in n dimensions is a function from elements of the cartesian product of n ranges to a set of Common Lisp objects. A range with the lower bound x L, the step size s and the upper bound x U, with x L, s, x U Z, is the set of integers { x Z x L x x U ( k Z) [x = x L + ks] }. stride 1 stride 3 stride 2 Objects of type cl:simple-array are a special case of strided arrays. 3

6 API 1/5 First Class Index Spaces To introduce parallelism, Petalisp always operates on index spaces, not on individual array elements. Notation: (σ (START [STEP] END)...) (σ (0 5) (0 3)) (σ (1 2 3) (1 2 3) (1 2 3)) Implementation detail: Petalisp can compute the union, difference and intersection of arbitrary index spaces. 4

7 API 2/5 Transformations A transformation is an affine-linear mapping from indices to indices. Notation: (τ (INDEX...) (EXPRESSION...)) (τ (i j) (0 j (- i 4))) (τ (0 j k) ((+ k 4) j)) Implementation detail: Petalisp can compute the inverse and composition of arbitrary transformations. 5

8 API 3/5 Moving Data The -> operator allows to select, transform or broadcast data. (-> 0 (σ (0 9) (0 9))) ; a array of zeros (-> #(2 3) (σ (0 0))) ; the first element only (-> A (τ (i j) (j i))) ; transposing A Admittedly, -> is a mediocre function name. Better suggestions are most welcome! 6

9 API 4/5 Combining Data The fuse and fuse* operator combine multiple arrays into one. For fuse, the arguments must be non-overlapping. For fuse*, the value of the rightmost array takes precedence on overlap. (defvar B (-> #(2) (τ (i) ((1+ i))))) (fuse #(1) B) ; equivalent to (-> #(1 2)) (fuse #(1 3) B) ; an error! (fuse* #(1 3) B) ; equivalent to (-> #(1 2)) 7

10 API 5/5 Parallel Operations The final piece: Application of Common Lisp functions to strided arrays. The function α is basically cl:map for strided arrays. The function β is basically cl:reduce applied to the last dimension of a strided array. (α # + 2 3) ; adding two numbers (α # + A B C) ; adding three arrays element-wise (β # + #(2 3)) ; adding two numbers (defvar B #2A((1 2 3) (4 5 6))) (β # + B) ; summing the rows of B Remark: No guarantees are made about when and how often the functions passed to α and β are invoked. 8

11 API Summary All core functions at a glance: Index spaces, e.g. (σ (a b)) Transformations, e.g. (τ (x y) (y (- x))) Data motion, e.g. (-> A (σ (2 5))) Data combination, e.g. (fuse* A B C) Parallel map, e.g. (α # * A B) Parallel reduce, e.g. (β # + A) This API is purely functional and declarative. But how do we obtain values? 9

12 API 6/5 Triggering Evaluation Petalisp provides two functions to trigger evaluation. The compute function converts strided arrays into regular Common Lisp arrays. (compute (-> 0.0 (σ (0 1)))) => #( ) (defvar A #(1 2 3)) (compute (β # + A)) => 6 (compute (-> A (τ (i) ((- i))))) => #(3 2 1) Remark: There is also a schedule function for asynchronous evaluation. 10

13 Example: Matrix Multiplication The mathematical definition C ij = n A ip B pj p=1 The corresponding Petalisp code (β # + (α # * (-> A (τ (m n) (m 1 n))) (-> B (τ (n k) (1 k n))))) n k C A B m R E D U C T I O N 11

14 Implementation

15 Lazy Arrays are Data Flow Graphs (jacobi u 1) => #<strided-array-fusion t (σ (0 9) (0 9))> Internal representation: (σ (0 9) (0 9)) (σ) #0A0.25 (τ (i j) ((- i 1) j)) (σ (1 8) (1 8)) (τ (i j) ((1+ i) j)) (σ (1 8) (1 8)) (τ (i j) (i (- j 1))) (σ (1 8) (1 8)) (τ (i j) (i (1+ j))) (σ (1 8) (1 8)) (σ (0 9) (0 9)) (τ (i j) ()) (σ (1 8) (1 8)) + (σ (1 8) (1 8)) (τ (i j) (i j)) (σ (0 9 9) (0 9)) (τ (i j) (i j)) (σ (1 8) (0 9 9)) * (σ (1 8) (1 8)) fusion (σ (0 1 9) (0 1 9)) 12

16 From Arrays to Executable Code How do we execute a graph? fusion compute 13

17 From Arrays to Executable Code How do we execute a graph? 1. Determine critical nodes fusion compute 13

18 From Arrays to Executable Code How do we execute a graph? 1. Determine critical nodes fusion compute 13

19 From Arrays to Executable Code How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees fusion compute 13

20 From Arrays to Executable Code How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees fusion compute 13

21 From Arrays to Executable Code How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees 3. Lift s fusion compute 13

22 From Arrays to Executable Code How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees 3. Lift s fusion compute 13

23 From Arrays to Executable Code How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees 3. Lift s 4. Eliminate fusions fusion compute 13

24 From Arrays to Executable Code How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees 3. Lift s 4. Eliminate fusions compute 13

25 From Arrays to Executable Code How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees 3. Lift s 4. Eliminate fusions 5. Construct kernels compute 13

26 From Arrays to Executable Code a0 How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees 3. Lift s 4. Eliminate fusions 5. Construct kernels [f0 [f1 [a0]] [f2 [a0]] a2 a1 [f0 [a2] [a2]] [f0 [a2] [a1]] a3 13

27 From Arrays to Executable Code a0 How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees 3. Lift s 4. Eliminate fusions 5. Construct kernels 6. Schedule & allocate [f0 [a2] [a2]] [f0 [f1 [a0]] [f2 [a0]] a2 a1 [f0 [a2] [a1]] a3 13

28 From Arrays to Executable Code a0 a1 How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees [f0 [f1 [a0]] [f2 [a0]] a2 3. Lift s 4. Eliminate fusions time [f0 [a2] [a1]] a3 5. Construct kernels 6. Schedule & allocate [f0 [a2] [a2]] 13

29 From Arrays to Executable Code a0 a1 How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees [f0 [f1 [a0]] [f2 [a0]] a2 3. Lift s 4. Eliminate fusions time [f0 [a2] [a1]] a3 5. Construct kernels 6. Schedule & allocate [f0 [a2] [a2]] 7. Compile & execute 13

30 From Arrays to Executable Code How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees 3. Lift s 4. Eliminate fusions 5. Construct kernels 6. Schedule & allocate 7. Compile & execute 8. Done! fusion compute 13

31 Performance

32 On Performance The long-term goal of Petalisp is to provide a programming model for Petascale (10 15 operations per second) systems. The constant, high overhead of analysis and JIT-compilation seems to be at odds with this goal. However: Do not underestimate the power of memoization, hash-consing and CLOS wizardry. Scheduling can often be done asynchronously. Petalisp s analysis is independent of the problem size. 14

33 Jacobi s method: Python vs. C++ vs. Petalisp Hardware: Intel Xeon E CPU 3.6GHz 15

34 Conclusions

35 Conclusions Main Result: Our compilation strategy is feasible, with just about microseconds overhead when calling compute. Benefits: Clean separation between notation and execution. Unprecedented potential for optimization. Already faster than NumPy.... all in just about 5000 lines of maintainable code. 16

36 Future Work My preliminary roadmap for the next years: More s (simulations, image processing, machine learning) API finalization Sophisticated Scheduling Better Shared-Memory Parallelization Auto-Vectorization Distributed Parallelization... Make this a PhD thesis 17

37 Thank you! Questions or remarks? 18

NumbaPro CUDA Python. Square matrix multiplication

NumbaPro CUDA Python. Square matrix multiplication NumbaPro Enables parallel programming in Python Support various entry points: Low-level (CUDA-C like) programming language High-level array oriented interface CUDA library bindings Also support multicore

More information

3.Constructors and Destructors. Develop cpp program to implement constructor and destructor.

3.Constructors and Destructors. Develop cpp program to implement constructor and destructor. 3.Constructors and Destructors Develop cpp program to implement constructor and destructor. Constructors A constructor is a special member function whose task is to initialize the objects of its class.

More information

Deadlock. Pritee Parwekar

Deadlock. Pritee Parwekar Deadlock Each request requires that the system consider the,, to decide whether the current request can be satisfied or must wait to avoid a future possible deadlock. a) resources currently available b)

More information

Chapter 2: Intro to Relational Model

Chapter 2: Intro to Relational Model Non è possibile visualizzare l'immagine. Chapter 2: Intro to Relational Model Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Example of a Relation attributes (or columns)

More information

Program verification. Generalities about software Verification Model Checking. September 20, 2016

Program verification. Generalities about software Verification Model Checking. September 20, 2016 Program verification Generalities about software Verification Model Checking Laure Gonnord David Monniaux September 20, 2016 1 / 43 The teaching staff Laure Gonnord, associate professor, LIP laboratory,

More information

CLIP - A Crytographic Language with Irritating Parentheses

CLIP - A Crytographic Language with Irritating Parentheses CLIP - A Crytographic Language with Irritating Parentheses Author: Duan Wei wd2114@columbia.edu Yi-Hsiu Chen yc2796@columbia.edu Instructor: Prof. Stephen A. Edwards July 24, 2013 Contents 1 Introduction

More information

Study and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou

Study and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou Study and implementation of computational methods for Differential Equations in heterogeneous systems Asimina Vouronikoy - Eleni Zisiou Outline Introduction Review of related work Cyclic Reduction Algorithm

More information

Module 9: Virtual Memory

Module 9: Virtual Memory Module 9: Virtual Memory Background Demand Paging Performance of Demand Paging Page Replacement Page-Replacement Algorithms Allocation of Frames Thrashing Other Considerations Demand Segmentation Operating

More information

1. Represent each of these relations on {1, 2, 3} with a matrix (with the elements of this set listed in increasing order).

1. Represent each of these relations on {1, 2, 3} with a matrix (with the elements of this set listed in increasing order). Exercises Exercises 1. Represent each of these relations on {1, 2, 3} with a matrix (with the elements of this set listed in increasing order). a) {(1, 1), (1, 2), (1, 3)} b) {(1, 2), (2, 1), (2, 2), (3,

More information

Database Technology Introduction. Heiko Paulheim

Database Technology Introduction. Heiko Paulheim Database Technology Introduction Outline The Need for Databases Data Models Relational Databases Database Design Storage Manager Query Processing Transaction Manager Introduction to the Relational Model

More information

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Michael Boyer, David Tarjan, Scott T. Acton, and Kevin Skadron University of Virginia IPDPS 2009 Outline Leukocyte

More information

Lecture 4: Transforms. Computer Graphics CMU /15-662, Fall 2016

Lecture 4: Transforms. Computer Graphics CMU /15-662, Fall 2016 Lecture 4: Transforms Computer Graphics CMU 15-462/15-662, Fall 2016 Brief recap from last class How to draw a triangle - Why focus on triangles, and not quads, pentagons, etc? - What was specific to triangles

More information

POLYHEDRAL GEOMETRY. Convex functions and sets. Mathematical Programming Niels Lauritzen Recall that a subset C R n is convex if

POLYHEDRAL GEOMETRY. Convex functions and sets. Mathematical Programming Niels Lauritzen Recall that a subset C R n is convex if POLYHEDRAL GEOMETRY Mathematical Programming Niels Lauritzen 7.9.2007 Convex functions and sets Recall that a subset C R n is convex if {λx + (1 λ)y 0 λ 1} C for every x, y C and 0 λ 1. A function f :

More information

More Data Locality for Static Control Programs on NUMA Architectures

More Data Locality for Static Control Programs on NUMA Architectures More Data Locality for Static Control Programs on NUMA Architectures Adilla Susungi 1, Albert Cohen 2, Claude Tadonki 1 1 MINES ParisTech, PSL Research University 2 Inria and DI, Ecole Normale Supérieure

More information

Affine Loop Optimization using Modulo Unrolling in CHAPEL

Affine Loop Optimization using Modulo Unrolling in CHAPEL Affine Loop Optimization using Modulo Unrolling in CHAPEL Aroon Sharma, Joshua Koehler, Rajeev Barua LTS POC: Michael Ferguson 2 Overall Goal Improve the runtime of certain types of parallel computers

More information

Block Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations

Block Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations Block Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations D. Zheltkov, N. Zamarashkin INM RAS September 24, 2018 Scalability of Lanczos method Notations Matrix order

More information

Translations. Geometric Image Transformations. Two-Dimensional Geometric Transforms. Groups and Composition

Translations. Geometric Image Transformations. Two-Dimensional Geometric Transforms. Groups and Composition Geometric Image Transformations Algebraic Groups Euclidean Affine Projective Bovine Translations Translations are a simple family of two-dimensional transforms. Translations were at the heart of our Sprite

More information

New Mapping Interface

New Mapping Interface New Mapping Interface Mike Bauer NVIDIA Research 1 Why Have A Mapping Interface? Legion Program Legion Legion Mapper Scheduling is hard Lots of runtimes have heuristics What do you do when they are wrong?

More information

6.837 LECTURE 7. Lecture 7 Outline Fall '01. Lecture Fall '01

6.837 LECTURE 7. Lecture 7 Outline Fall '01. Lecture Fall '01 6.837 LECTURE 7 1. Geometric Image Transformations 2. Two-Dimensional Geometric Transforms 3. Translations 4. Groups and Composition 5. Rotations 6. Euclidean Transforms 7. Problems with this Form 8. Choose

More information

GPU programming made easier

GPU programming made easier GPU programming made easier Jacob Jepsen 6. June 2014 University of Copenhagen Department of Computer Science 6. June 2014 Introduction We created a tool that reduces the development time of GPU code.

More information

Types II. Hwansoo Han

Types II. Hwansoo Han Types II Hwansoo Han Arrays Most common and important composite data types Homogeneous elements, unlike records Fortran77 requires element type be scalar Elements can be any type (Fortran90, etc.) A mapping

More information

Size of a problem instance: Bigger instances take

Size of a problem instance: Bigger instances take 2.1 Integer Programming and Combinatorial Optimization Slide set 2: Computational Complexity Katta G. Murty Lecture slides Aim: To study efficiency of various algo. for solving problems, and to classify

More information

The role of semantic analysis in a compiler

The role of semantic analysis in a compiler Semantic Analysis Outline The role of semantic analysis in a compiler A laundry list of tasks Scope Static vs. Dynamic scoping Implementation: symbol tables Types Static analyses that detect type errors

More information

Peter Pacheco. Chapter 3. Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved

Peter Pacheco. Chapter 3. Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved An Introduction to Parallel Programming Peter Pacheco Chapter 3 Distributed Memory Programming with MPI 1 Roadmap Writing your first MPI program. Using the common MPI functions. The Trapezoidal Rule in

More information

Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved

Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved An Introduction to Parallel Programming Peter Pacheco Chapter 3 Distributed Memory Programming with MPI 1 Roadmap Writing your first MPI program. Using the common MPI functions. The Trapezoidal Rule in

More information

Module 9: Virtual Memory

Module 9: Virtual Memory Module 9: Virtual Memory Background Demand Paging Performance of Demand Paging Page Replacement Page-Replacement Algorithms Allocation of Frames Thrashing Other Considerations Demand Segmenation 9.1 Background

More information

An array is a collection of data that holds fixed number of values of same type. It is also known as a set. An array is a data type.

An array is a collection of data that holds fixed number of values of same type. It is also known as a set. An array is a data type. Data Structures Introduction An array is a collection of data that holds fixed number of values of same type. It is also known as a set. An array is a data type. Representation of a large number of homogeneous

More information

Name :. Roll No. :... Invigilator s Signature :.. CS/B.TECH (NEW)/SEM-2/CS-201/ BASIC COMPUTATION & PRINCIPLES OF COMPUTER PROGRAMMING

Name :. Roll No. :... Invigilator s Signature :.. CS/B.TECH (NEW)/SEM-2/CS-201/ BASIC COMPUTATION & PRINCIPLES OF COMPUTER PROGRAMMING Name :. Roll No. :..... Invigilator s Signature :.. CS/B.TECH (NEW)/SEM-2/CS-201/2012 2012 BASIC COMPUTATION & PRINCIPLES OF COMPUTER PROGRAMMING Time Allotted : 3 Hours Full Marks : 70 The figures in

More information

CSCI-580 Advanced High Performance Computing

CSCI-580 Advanced High Performance Computing CSCI-580 Advanced High Performance Computing Performance Hacking: Matrix Multiplication Bo Wu Colorado School of Mines Most content of the slides is from: Saman Amarasinghe (MIT) Square-Matrix Multiplication!2

More information

Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU

Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU Lifan Xu Wei Wang Marco A. Alvarez John Cavazos Dongping Zhang Department of Computer and Information Science University of Delaware

More information

Web Science & Technologies University of Koblenz Landau, Germany. Relational Data Model

Web Science & Technologies University of Koblenz Landau, Germany. Relational Data Model Web Science & Technologies University of Koblenz Landau, Germany Relational Data Model Overview Relational data model; Tuples and relations; Schemas and instances; Named vs. unnamed perspective; Relational

More information

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End Outline Semantic Analysis The role of semantic analysis in a compiler A laundry list of tasks Scope Static vs. Dynamic scoping Implementation: symbol tables Types Static analyses that detect type errors

More information

Understanding Sources of Inefficiency in General-Purpose Chips

Understanding Sources of Inefficiency in General-Purpose Chips Understanding Sources of Inefficiency in General-Purpose Chips Rehan Hameed Wajahat Qadeer Megan Wachs Omid Azizi Alex Solomatnikov Benjamin Lee Stephen Richardson Christos Kozyrakis Mark Horowitz GP Processors

More information

Pointers. A pointer is simply a reference to a variable/object. Compilers automatically generate code to store/retrieve variables from memory

Pointers. A pointer is simply a reference to a variable/object. Compilers automatically generate code to store/retrieve variables from memory Pointers A pointer is simply a reference to a variable/object Compilers automatically generate code to store/retrieve variables from memory It is automatically generating internal pointers We don t have

More information

NumPy. Daniël de Kok. May 4, 2017

NumPy. Daniël de Kok. May 4, 2017 NumPy Daniël de Kok May 4, 2017 Introduction Today Today s lecture is about the NumPy linear algebra library for Python. Today you will learn: How to create NumPy arrays, which store vectors, matrices,

More information

Robotics I. March 27, 2018

Robotics I. March 27, 2018 Robotics I March 27, 28 Exercise Consider the 5-dof spatial robot in Fig., having the third and fifth joints of the prismatic type while the others are revolute. z O x Figure : A 5-dof robot, with a RRPRP

More information

Algorithms & Data Structures

Algorithms & Data Structures GATE- 2016-17 Postal Correspondence 1 Algorithms & Data Structures Computer Science & Information Technology (CS) 20 Rank under AIR 100 Postal Correspondence Examination Oriented Theory, Practice Set Key

More information

Double-precision General Matrix Multiply (DGEMM)

Double-precision General Matrix Multiply (DGEMM) Double-precision General Matrix Multiply (DGEMM) Parallel Computation (CSE 0), Assignment Andrew Conegliano (A0) Matthias Springer (A00) GID G-- January, 0 0. Assumptions The following assumptions apply

More information

Heterogeneous Computing with a Fused CPU+GPU Device

Heterogeneous Computing with a Fused CPU+GPU Device with a Fused CPU+GPU Device MediaTek White Paper January 2015 2015 MediaTek Inc. 1 Introduction Heterogeneous computing technology, based on OpenCL (Open Computing Language), intelligently fuses GPU and

More information

AMath 483/583 Lecture 8

AMath 483/583 Lecture 8 AMath 483/583 Lecture 8 This lecture: Fortran subroutines and functions Arrays Dynamic memory Reading: class notes: Fortran Arrays class notes: Fortran Subroutines and Functions class notes: gfortran flags

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

DIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka

DIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka USE OF FOR Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague Mini workshop on advanced numerical methods

More information

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far Outline Semantic Analysis The role of semantic analysis in a compiler A laundry list of tasks Scope Static vs. Dynamic scoping Implementation: symbol tables Types Statically vs. Dynamically typed languages

More information

Optimizing Closures in O(0) time

Optimizing Closures in O(0) time Optimizing Closures in O(0 time Andrew W. Keep Cisco Systems, Inc. Indiana Univeristy akeep@cisco.com Alex Hearn Indiana University adhearn@cs.indiana.edu R. Kent Dybvig Cisco Systems, Inc. Indiana University

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

Search Algorithms for Discrete Optimization Problems

Search Algorithms for Discrete Optimization Problems Search Algorithms for Discrete Optimization Problems Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. 1 Topic

More information

Jose Aliaga (Universitat Jaume I, Castellon, Spain), Ruyman Reyes, Mehdi Goli (Codeplay Software) 2017 Codeplay Software Ltd.

Jose Aliaga (Universitat Jaume I, Castellon, Spain), Ruyman Reyes, Mehdi Goli (Codeplay Software) 2017 Codeplay Software Ltd. SYCL-BLAS: LeveragingSYCL-BLAS Expression Trees for Linear Algebra Jose Aliaga (Universitat Jaume I, Castellon, Spain), Ruyman Reyes, Mehdi Goli (Codeplay Software) 1 About me... Phd in Compilers and Parallel

More information

Algorithm. Algorithm Analysis. Algorithm. Algorithm. Analyzing Sorting Algorithms (Insertion Sort) Analyzing Algorithms 8/31/2017

Algorithm. Algorithm Analysis. Algorithm. Algorithm. Analyzing Sorting Algorithms (Insertion Sort) Analyzing Algorithms 8/31/2017 8/3/07 Analysis Introduction to Analysis Model of Analysis Mathematical Preliminaries for Analysis Set Notation Asymptotic Analysis What is an algorithm? An algorithm is any well-defined computational

More information

Lecture Notes on Aggregate Data Structures

Lecture Notes on Aggregate Data Structures Lecture Notes on Aggregate Data Structures 15-312: Foundations of Programming Languages Frank Pfenning Lecture 8 September 23, 2004 In this lecture we discuss various language extensions which make MinML

More information

Pattern Matching and Abstract Data Types

Pattern Matching and Abstract Data Types Pattern Matching and Abstract Data Types Tom Murphy VII 3 Dec 2002 0-0 Outline Problem Setup Views ( Views: A Way For Pattern Matching To Cohabit With Data Abstraction, Wadler, 1986) Active Patterns (

More information

4. Simplicial Complexes and Simplicial Homology

4. Simplicial Complexes and Simplicial Homology MATH41071/MATH61071 Algebraic topology Autumn Semester 2017 2018 4. Simplicial Complexes and Simplicial Homology Geometric simplicial complexes 4.1 Definition. A finite subset { v 0, v 1,..., v r } R n

More information

Combinatorics I (Lecture 36)

Combinatorics I (Lecture 36) Combinatorics I (Lecture 36) February 18, 2015 Our goal in this lecture (and the next) is to carry out the combinatorial steps in the proof of the main theorem of Part III. We begin by recalling some notation

More information

Accelerating Polynomial Homotopy Continuation on a Graphics Processing Unit with Double Double and Quad Double Arithmetic

Accelerating Polynomial Homotopy Continuation on a Graphics Processing Unit with Double Double and Quad Double Arithmetic Accelerating Polynomial Homotopy Continuation on a Graphics Processing Unit with Double Double and Quad Double Arithmetic Jan Verschelde joint work with Xiangcheng Yu University of Illinois at Chicago

More information

MPI Casestudy: Parallel Image Processing

MPI Casestudy: Parallel Image Processing MPI Casestudy: Parallel Image Processing David Henty 1 Introduction The aim of this exercise is to write a complete MPI parallel program that does a very basic form of image processing. We will start by

More information

C 1 Modified Genetic Algorithm to Solve Time-varying Lot Sizes Economic Lot Scheduling Problem

C 1 Modified Genetic Algorithm to Solve Time-varying Lot Sizes Economic Lot Scheduling Problem C 1 Modified Genetic Algorithm to Solve Time-varying Lot Sizes Economic Lot Scheduling Problem Bethany Elvira 1, Yudi Satria 2, dan Rahmi Rusin 3 1 Student in Department of Mathematics, University of Indonesia,

More information

ON SOME METHODS OF CONSTRUCTION OF BLOCK DESIGNS

ON SOME METHODS OF CONSTRUCTION OF BLOCK DESIGNS ON SOME METHODS OF CONSTRUCTION OF BLOCK DESIGNS NURNABI MEHERUL ALAM M.Sc. (Agricultural Statistics), Roll No. I.A.S.R.I, Library Avenue, New Delhi- Chairperson: Dr. P.K. Batra Abstract: Block designs

More information

Performance Engineering for Algorithmic Building Blocks in the GHOST Library

Performance Engineering for Algorithmic Building Blocks in the GHOST Library Performance Engineering for Algorithmic Building Blocks in the GHOST Library Georg Hager, Moritz Kreutzer, Faisal Shahzad, Gerhard Wellein, Martin Galgon, Lukas Krämer, Bruno Lang, Jonas Thies, Melven

More information

Concepts of Programming Languages

Concepts of Programming Languages Concepts of Programming Languages Lecture 15 - Functional Programming Patrick Donnelly Montana State University Spring 2014 Patrick Donnelly (Montana State University) Concepts of Programming Languages

More information

HPF commands specify which processor gets which part of the data. Concurrency is defined by HPF commands based on Fortran90

HPF commands specify which processor gets which part of the data. Concurrency is defined by HPF commands based on Fortran90 149 Fortran and HPF 6.2 Concept High Performance Fortran 6.2 Concept Fortran90 extension SPMD (Single Program Multiple Data) model each process operates with its own part of data HPF commands specify which

More information

Lecture 8: Summary of Haskell course + Type Level Programming

Lecture 8: Summary of Haskell course + Type Level Programming Lecture 8: Summary of Haskell course + Type Level Programming Søren Haagerup Department of Mathematics and Computer Science University of Southern Denmark, Odense October 31, 2017 Principles from Haskell

More information

LOGIC AND DISCRETE MATHEMATICS

LOGIC AND DISCRETE MATHEMATICS LOGIC AND DISCRETE MATHEMATICS A Computer Science Perspective WINFRIED KARL GRASSMANN Department of Computer Science University of Saskatchewan JEAN-PAUL TREMBLAY Department of Computer Science University

More information

OPERATING SYSTEMS. Systems with Multi-programming. CS 3502 Spring Chapter 4

OPERATING SYSTEMS. Systems with Multi-programming. CS 3502 Spring Chapter 4 OPERATING SYSTEMS CS 3502 Spring 2018 Systems with Multi-programming Chapter 4 Multiprogramming - Review An operating system can support several processes in memory. While one process receives service

More information

Chapter 8: Memory Management

Chapter 8: Memory Management Chapter 8: Memory Management Chapter 8: Memory Management Background Swapping Contiguous Allocation Paging Segmentation Segmentation with Paging 8.2 Background Program must be brought into memory and placed

More information

Functional Programming. Big Picture. Design of Programming Languages

Functional Programming. Big Picture. Design of Programming Languages Functional Programming Big Picture What we ve learned so far: Imperative Programming Languages Variables, binding, scoping, reference environment, etc What s next: Functional Programming Languages Semantics

More information

RTOS 101. Understand your real-time applications. with the help of Percepio Tracealyzer

RTOS 101. Understand your real-time applications. with the help of Percepio Tracealyzer RTOS 101 Understand your real-time applications with the help of Percepio Tracealyzer RTOS 101 Tasks, Priorities and Analysis Figure 1: Tracealyzer showing RTOS task scheduling and calls to RTOS services.

More information

A RETROSPECTIVE OF CRYSTAL

A RETROSPECTIVE OF CRYSTAL Introduction Language Program Transformation Compiler References A RETROSPECTIVE OF CRYSTAL ANOTHER PARALLEL COMPILER" AMBITION FROM THE LATE 80 S Eva Burrows BLDL-Talks Department of Informatics, University

More information

Submission instructions (read carefully): SS17 / Assignment 4 Instructor: Markus Püschel. ETH Zurich

Submission instructions (read carefully): SS17 / Assignment 4 Instructor: Markus Püschel. ETH Zurich 263-2300-00: How To Write Fast Numerical Code Assignment 4: 120 points Due Date: Th, April 13th, 17:00 http://www.inf.ethz.ch/personal/markusp/teaching/263-2300-eth-spring17/course.html Questions: fastcode@lists.inf.ethz.ch

More information

Introduction to C++ Introduction. Structure of a C++ Program. Structure of a C++ Program. C++ widely-used general-purpose programming language

Introduction to C++ Introduction. Structure of a C++ Program. Structure of a C++ Program. C++ widely-used general-purpose programming language Introduction C++ widely-used general-purpose programming language procedural and object-oriented support strong support created by Bjarne Stroustrup starting in 1979 based on C Introduction to C++ also

More information

Concepts of Programming Languages

Concepts of Programming Languages Concepts of Programming Languages Lecture 1 - Introduction Patrick Donnelly Montana State University Spring 2014 Patrick Donnelly (Montana State University) Concepts of Programming Languages Spring 2014

More information

Chapter 3. Distributed Memory Programming with MPI

Chapter 3. Distributed Memory Programming with MPI An Introduction to Parallel Programming Peter Pacheco Chapter 3 Distributed Memory Programming with MPI 1 Roadmap n Writing your first MPI program. n Using the common MPI functions. n The Trapezoidal Rule

More information

How to declare an array in C?

How to declare an array in C? Introduction An array is a collection of data that holds fixed number of values of same type. It is also known as a set. An array is a data type. Representation of a large number of homogeneous values.

More information

COP4020 Programming Languages. Functional Programming Prof. Robert van Engelen

COP4020 Programming Languages. Functional Programming Prof. Robert van Engelen COP4020 Programming Languages Functional Programming Prof. Robert van Engelen Overview What is functional programming? Historical origins of functional programming Functional programming today Concepts

More information

Weld: A Common Runtime for Data Analytics

Weld: A Common Runtime for Data Analytics Weld: A Common Runtime for Data Analytics Shoumik Palkar, James Thomas, Anil Shanbhag*, Deepak Narayanan, Malte Schwarzkopf*, Holger Pirk*, Saman Amarasinghe*, Matei Zaharia Stanford InfoLab, *MIT CSAIL

More information

Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures

Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures University of Virginia Dept. of Computer Science Technical Report #CS-2011-09 Jeremy W. Sheaffer and Kevin

More information

Accelerating Data Warehousing Applications Using General Purpose GPUs

Accelerating Data Warehousing Applications Using General Purpose GPUs Accelerating Data Warehousing Applications Using General Purpose s Sponsors: Na%onal Science Founda%on, LogicBlox Inc., IBM, and NVIDIA The General Purpose is a many core co-processor 10s to 100s of cores

More information

Math background. 2D Geometric Transformations. Implicit representations. Explicit representations. Read: CS 4620 Lecture 6

Math background. 2D Geometric Transformations. Implicit representations. Explicit representations. Read: CS 4620 Lecture 6 Math background 2D Geometric Transformations CS 4620 Lecture 6 Read: Chapter 2: Miscellaneous Math Chapter 5: Linear Algebra Notation for sets, functions, mappings Linear transformations Matrices Matrix-vector

More information

Introduction to C++ with content from

Introduction to C++ with content from Introduction to C++ with content from www.cplusplus.com 2 Introduction C++ widely-used general-purpose programming language procedural and object-oriented support strong support created by Bjarne Stroustrup

More information

The Mercury project. Zoltan Somogyi

The Mercury project. Zoltan Somogyi The Mercury project Zoltan Somogyi The University of Melbourne Linux Users Victoria 7 June 2011 Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 1 / 23 Introduction Mercury: objectives

More information

Some instance messages and methods

Some instance messages and methods Some instance messages and methods x ^x y ^y movedx: dx Dy: dy x

More information

GPU Linear algebra extensions for GNU/Octave

GPU Linear algebra extensions for GNU/Octave Journal of Physics: Conference Series GPU Linear algebra extensions for GNU/Octave To cite this article: L B Bosi et al 2012 J. Phys.: Conf. Ser. 368 012062 View the article online for updates and enhancements.

More information

Virtual Memory. Overview: Virtual Memory. Virtual address space of a process. Virtual Memory

Virtual Memory. Overview: Virtual Memory. Virtual address space of a process. Virtual Memory TDIU Operating systems Overview: Virtual Memory Virtual Memory Background Demand Paging Page Replacement Allocation of Frames Thrashing and Data Access Locality [SGG7/8/9] Chapter 9 Copyright Notice: The

More information

A Short History of Array Computing in Python. Wolf Vollprecht, PyParis 2018

A Short History of Array Computing in Python. Wolf Vollprecht, PyParis 2018 A Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - - Array computing in general History up to NumPy Libraries after NumPy - Pure Python libraries - JIT / AOT compilers - Deep

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 49 Plan of the course 1 Relational databases 2 Relational database design 3 Conceptual database design 4

More information

Operating System Concepts Ch. 5: Scheduling

Operating System Concepts Ch. 5: Scheduling Operating System Concepts Ch. 5: Scheduling Silberschatz, Galvin & Gagne Scheduling In a multi-programmed system, multiple processes may be loaded into memory at the same time. We need a procedure, or

More information

axiomatic semantics involving logical rules for deriving relations between preconditions and postconditions.

axiomatic semantics involving logical rules for deriving relations between preconditions and postconditions. CS 6110 S18 Lecture 18 Denotational Semantics 1 What is Denotational Semantics? So far we have looked at operational semantics involving rules for state transitions, definitional semantics involving translations

More information

Parallel Systems. Project topics

Parallel Systems. Project topics Parallel Systems Project topics 2016-2017 1. Scheduling Scheduling is a common problem which however is NP-complete, so that we are never sure about the optimality of the solution. Parallelisation is a

More information

Welcome to the Webinar. What s New in Gurobi 7.5

Welcome to the Webinar. What s New in Gurobi 7.5 Welcome to the Webinar What s New in Gurobi 7.5 Speaker Introduction Dr. Tobias Achterberg Director of R&D at Gurobi Optimization Formerly a developer at ILOG, where he worked on CPLEX 11.0 to 12.6 Obtained

More information

Sparse Matrix Formats

Sparse Matrix Formats Christopher Bross Friedrich-Alexander-Universität Erlangen-Nürnberg Motivation Sparse Matrices are everywhere Sparse Matrix Formats C. Bross BGCE Research Day, Erlangen, 09.06.2016 2/16 Motivation Sparse

More information

Parallel & Concurrent Programming: ZPL. Emery Berger CMPSCI 691W Spring 2006 AMHERST. Department of Computer Science UNIVERSITY OF MASSACHUSETTS

Parallel & Concurrent Programming: ZPL. Emery Berger CMPSCI 691W Spring 2006 AMHERST. Department of Computer Science UNIVERSITY OF MASSACHUSETTS Parallel & Concurrent Programming: ZPL Emery Berger CMPSCI 691W Spring 2006 Department of Computer Science Outline Previously: MPI point-to-point & collective Complicated, far from problem abstraction

More information

Lecture 17. Edited from slides for Operating System Concepts by Silberschatz, Galvin, Gagne

Lecture 17. Edited from slides for Operating System Concepts by Silberschatz, Galvin, Gagne Lecture 17 Edited from slides for Operating System Concepts by Silberschatz, Galvin, Gagne Page Replacement Algorithms Last Lecture: FIFO Optimal Page Replacement LRU LRU Approximation Additional-Reference-Bits

More information

The Simplex Algorithm

The Simplex Algorithm The Simplex Algorithm April 25, 2005 We seek x 1,..., x n 0 which mini- Problem. mizes C(x 1,..., x n ) = c 1 x 1 + + c n x n, subject to the constraint Ax b, where A is m n, b = m 1. Through the introduction

More information

CS 1200 Discrete Math Math Preliminaries. A.R. Hurson 323 CS Building, Missouri S&T

CS 1200 Discrete Math Math Preliminaries. A.R. Hurson 323 CS Building, Missouri S&T CS 1200 Discrete Math A.R. Hurson 323 CS Building, Missouri S&T hurson@mst.edu 1 Course Objective: Mathematical way of thinking in order to solve problems 2 Variable: holder. A variable is simply a place

More information

[Ch 6] Set Theory. 1. Basic Concepts and Definitions. 400 lecture note #4. 1) Basics

[Ch 6] Set Theory. 1. Basic Concepts and Definitions. 400 lecture note #4. 1) Basics 400 lecture note #4 [Ch 6] Set Theory 1. Basic Concepts and Definitions 1) Basics Element: ; A is a set consisting of elements x which is in a/another set S such that P(x) is true. Empty set: notated {

More information

Automatic Development of Linear Algebra Libraries for the Tesla Series

Automatic Development of Linear Algebra Libraries for the Tesla Series Automatic Development of Linear Algebra Libraries for the Tesla Series Enrique S. Quintana-Ortí quintana@icc.uji.es Universidad Jaime I de Castellón (Spain) Dense Linear Algebra Major problems: Source

More information

9/21/17. Outline. Expression Evaluation and Control Flow. Arithmetic Expressions. Operators. Operators. Notation & Placement

9/21/17. Outline. Expression Evaluation and Control Flow. Arithmetic Expressions. Operators. Operators. Notation & Placement Outline Expression Evaluation and Control Flow In Text: Chapter 6 Notation Operator evaluation order Operand evaluation order Overloaded operators Type conversions Short-circuit evaluation of conditions

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Index. object lifetimes, and ownership, use after change by an alias errors, use after drop errors, BTreeMap, 309

Index. object lifetimes, and ownership, use after change by an alias errors, use after drop errors, BTreeMap, 309 A Arithmetic operation floating-point arithmetic, 11 12 integer numbers, 9 11 Arrays, 97 copying, 59 60 creation, 48 elements, 48 empty arrays and vectors, 57 58 executable program, 49 expressions, 48

More information

Optimizations of BLIS Library for AMD ZEN Core

Optimizations of BLIS Library for AMD ZEN Core Optimizations of BLIS Library for AMD ZEN Core 1 Introduction BLIS [1] is a portable software framework for instantiating high-performance BLAS-like dense linear algebra libraries [2] The framework was

More information

New Strategies for Filtering the Number Field Sieve Matrix

New Strategies for Filtering the Number Field Sieve Matrix New Strategies for Filtering the Number Field Sieve Matrix Shailesh Patil Department of CSA Indian Institute of Science Bangalore 560 012 India Email: shailesh.patil@gmail.com Gagan Garg Department of

More information

LECTURE 2. Compilers and Interpreters

LECTURE 2. Compilers and Interpreters LECTURE 2 Compilers and Interpreters COMPILATION AND INTERPRETATION Programs written in high-level languages can be run in two ways. Compiled into an executable program written in machine language for

More information