Petalisp. A Common Lisp Library for Data Parallel Programming. Marco Heisig Chair for System Simulation FAU Erlangen-Nürnberg
|
|
- Patrick Flynn
- 5 years ago
- Views:
Transcription
1 Petalisp A Common Lisp Library for Data Parallel Programming Marco Heisig Chair for System Simulation FAU Erlangen-Nürnberg
2 Petalisp The library Petalisp 1 is a new approach to data parallel computing. The Goal: Elegant High Performance Computing Programs that are beautiful and fast A programming model that is safe and productive Drawbacks: Limited to operations on structured data Significant run-time overhead 1 1
3 Table of contents 1. Using Petalisp 2. Implementation 3. Performance 4. Conclusions 2
4 Using Petalisp
5 Strided Arrays A strided array in n dimensions is a function from elements of the cartesian product of n ranges to a set of Common Lisp objects. A range with the lower bound x L, the step size s and the upper bound x U, with x L, s, x U Z, is the set of integers { x Z x L x x U ( k Z) [x = x L + ks] }. stride 1 stride 3 stride 2 Objects of type cl:simple-array are a special case of strided arrays. 3
6 API 1/5 First Class Index Spaces To introduce parallelism, Petalisp always operates on index spaces, not on individual array elements. Notation: (σ (START [STEP] END)...) (σ (0 5) (0 3)) (σ (1 2 3) (1 2 3) (1 2 3)) Implementation detail: Petalisp can compute the union, difference and intersection of arbitrary index spaces. 4
7 API 2/5 Transformations A transformation is an affine-linear mapping from indices to indices. Notation: (τ (INDEX...) (EXPRESSION...)) (τ (i j) (0 j (- i 4))) (τ (0 j k) ((+ k 4) j)) Implementation detail: Petalisp can compute the inverse and composition of arbitrary transformations. 5
8 API 3/5 Moving Data The -> operator allows to select, transform or broadcast data. (-> 0 (σ (0 9) (0 9))) ; a array of zeros (-> #(2 3) (σ (0 0))) ; the first element only (-> A (τ (i j) (j i))) ; transposing A Admittedly, -> is a mediocre function name. Better suggestions are most welcome! 6
9 API 4/5 Combining Data The fuse and fuse* operator combine multiple arrays into one. For fuse, the arguments must be non-overlapping. For fuse*, the value of the rightmost array takes precedence on overlap. (defvar B (-> #(2) (τ (i) ((1+ i))))) (fuse #(1) B) ; equivalent to (-> #(1 2)) (fuse #(1 3) B) ; an error! (fuse* #(1 3) B) ; equivalent to (-> #(1 2)) 7
10 API 5/5 Parallel Operations The final piece: Application of Common Lisp functions to strided arrays. The function α is basically cl:map for strided arrays. The function β is basically cl:reduce applied to the last dimension of a strided array. (α # + 2 3) ; adding two numbers (α # + A B C) ; adding three arrays element-wise (β # + #(2 3)) ; adding two numbers (defvar B #2A((1 2 3) (4 5 6))) (β # + B) ; summing the rows of B Remark: No guarantees are made about when and how often the functions passed to α and β are invoked. 8
11 API Summary All core functions at a glance: Index spaces, e.g. (σ (a b)) Transformations, e.g. (τ (x y) (y (- x))) Data motion, e.g. (-> A (σ (2 5))) Data combination, e.g. (fuse* A B C) Parallel map, e.g. (α # * A B) Parallel reduce, e.g. (β # + A) This API is purely functional and declarative. But how do we obtain values? 9
12 API 6/5 Triggering Evaluation Petalisp provides two functions to trigger evaluation. The compute function converts strided arrays into regular Common Lisp arrays. (compute (-> 0.0 (σ (0 1)))) => #( ) (defvar A #(1 2 3)) (compute (β # + A)) => 6 (compute (-> A (τ (i) ((- i))))) => #(3 2 1) Remark: There is also a schedule function for asynchronous evaluation. 10
13 Example: Matrix Multiplication The mathematical definition C ij = n A ip B pj p=1 The corresponding Petalisp code (β # + (α # * (-> A (τ (m n) (m 1 n))) (-> B (τ (n k) (1 k n))))) n k C A B m R E D U C T I O N 11
14 Implementation
15 Lazy Arrays are Data Flow Graphs (jacobi u 1) => #<strided-array-fusion t (σ (0 9) (0 9))> Internal representation: (σ (0 9) (0 9)) (σ) #0A0.25 (τ (i j) ((- i 1) j)) (σ (1 8) (1 8)) (τ (i j) ((1+ i) j)) (σ (1 8) (1 8)) (τ (i j) (i (- j 1))) (σ (1 8) (1 8)) (τ (i j) (i (1+ j))) (σ (1 8) (1 8)) (σ (0 9) (0 9)) (τ (i j) ()) (σ (1 8) (1 8)) + (σ (1 8) (1 8)) (τ (i j) (i j)) (σ (0 9 9) (0 9)) (τ (i j) (i j)) (σ (1 8) (0 9 9)) * (σ (1 8) (1 8)) fusion (σ (0 1 9) (0 1 9)) 12
16 From Arrays to Executable Code How do we execute a graph? fusion compute 13
17 From Arrays to Executable Code How do we execute a graph? 1. Determine critical nodes fusion compute 13
18 From Arrays to Executable Code How do we execute a graph? 1. Determine critical nodes fusion compute 13
19 From Arrays to Executable Code How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees fusion compute 13
20 From Arrays to Executable Code How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees fusion compute 13
21 From Arrays to Executable Code How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees 3. Lift s fusion compute 13
22 From Arrays to Executable Code How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees 3. Lift s fusion compute 13
23 From Arrays to Executable Code How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees 3. Lift s 4. Eliminate fusions fusion compute 13
24 From Arrays to Executable Code How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees 3. Lift s 4. Eliminate fusions compute 13
25 From Arrays to Executable Code How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees 3. Lift s 4. Eliminate fusions 5. Construct kernels compute 13
26 From Arrays to Executable Code a0 How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees 3. Lift s 4. Eliminate fusions 5. Construct kernels [f0 [f1 [a0]] [f2 [a0]] a2 a1 [f0 [a2] [a2]] [f0 [a2] [a1]] a3 13
27 From Arrays to Executable Code a0 How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees 3. Lift s 4. Eliminate fusions 5. Construct kernels 6. Schedule & allocate [f0 [a2] [a2]] [f0 [f1 [a0]] [f2 [a0]] a2 a1 [f0 [a2] [a1]] a3 13
28 From Arrays to Executable Code a0 a1 How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees [f0 [f1 [a0]] [f2 [a0]] a2 3. Lift s 4. Eliminate fusions time [f0 [a2] [a1]] a3 5. Construct kernels 6. Schedule & allocate [f0 [a2] [a2]] 13
29 From Arrays to Executable Code a0 a1 How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees [f0 [f1 [a0]] [f2 [a0]] a2 3. Lift s 4. Eliminate fusions time [f0 [a2] [a1]] a3 5. Construct kernels 6. Schedule & allocate [f0 [a2] [a2]] 7. Compile & execute 13
30 From Arrays to Executable Code How do we execute a graph? 1. Determine critical nodes 2. Determine subtrees 3. Lift s 4. Eliminate fusions 5. Construct kernels 6. Schedule & allocate 7. Compile & execute 8. Done! fusion compute 13
31 Performance
32 On Performance The long-term goal of Petalisp is to provide a programming model for Petascale (10 15 operations per second) systems. The constant, high overhead of analysis and JIT-compilation seems to be at odds with this goal. However: Do not underestimate the power of memoization, hash-consing and CLOS wizardry. Scheduling can often be done asynchronously. Petalisp s analysis is independent of the problem size. 14
33 Jacobi s method: Python vs. C++ vs. Petalisp Hardware: Intel Xeon E CPU 3.6GHz 15
34 Conclusions
35 Conclusions Main Result: Our compilation strategy is feasible, with just about microseconds overhead when calling compute. Benefits: Clean separation between notation and execution. Unprecedented potential for optimization. Already faster than NumPy.... all in just about 5000 lines of maintainable code. 16
36 Future Work My preliminary roadmap for the next years: More s (simulations, image processing, machine learning) API finalization Sophisticated Scheduling Better Shared-Memory Parallelization Auto-Vectorization Distributed Parallelization... Make this a PhD thesis 17
37 Thank you! Questions or remarks? 18
NumbaPro CUDA Python. Square matrix multiplication
NumbaPro Enables parallel programming in Python Support various entry points: Low-level (CUDA-C like) programming language High-level array oriented interface CUDA library bindings Also support multicore
More information3.Constructors and Destructors. Develop cpp program to implement constructor and destructor.
3.Constructors and Destructors Develop cpp program to implement constructor and destructor. Constructors A constructor is a special member function whose task is to initialize the objects of its class.
More informationDeadlock. Pritee Parwekar
Deadlock Each request requires that the system consider the,, to decide whether the current request can be satisfied or must wait to avoid a future possible deadlock. a) resources currently available b)
More informationChapter 2: Intro to Relational Model
Non è possibile visualizzare l'immagine. Chapter 2: Intro to Relational Model Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Example of a Relation attributes (or columns)
More informationProgram verification. Generalities about software Verification Model Checking. September 20, 2016
Program verification Generalities about software Verification Model Checking Laure Gonnord David Monniaux September 20, 2016 1 / 43 The teaching staff Laure Gonnord, associate professor, LIP laboratory,
More informationCLIP - A Crytographic Language with Irritating Parentheses
CLIP - A Crytographic Language with Irritating Parentheses Author: Duan Wei wd2114@columbia.edu Yi-Hsiu Chen yc2796@columbia.edu Instructor: Prof. Stephen A. Edwards July 24, 2013 Contents 1 Introduction
More informationStudy and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou
Study and implementation of computational methods for Differential Equations in heterogeneous systems Asimina Vouronikoy - Eleni Zisiou Outline Introduction Review of related work Cyclic Reduction Algorithm
More informationModule 9: Virtual Memory
Module 9: Virtual Memory Background Demand Paging Performance of Demand Paging Page Replacement Page-Replacement Algorithms Allocation of Frames Thrashing Other Considerations Demand Segmentation Operating
More information1. Represent each of these relations on {1, 2, 3} with a matrix (with the elements of this set listed in increasing order).
Exercises Exercises 1. Represent each of these relations on {1, 2, 3} with a matrix (with the elements of this set listed in increasing order). a) {(1, 1), (1, 2), (1, 3)} b) {(1, 2), (2, 1), (2, 2), (3,
More informationDatabase Technology Introduction. Heiko Paulheim
Database Technology Introduction Outline The Need for Databases Data Models Relational Databases Database Design Storage Manager Query Processing Transaction Manager Introduction to the Relational Model
More informationAccelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors
Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Michael Boyer, David Tarjan, Scott T. Acton, and Kevin Skadron University of Virginia IPDPS 2009 Outline Leukocyte
More informationLecture 4: Transforms. Computer Graphics CMU /15-662, Fall 2016
Lecture 4: Transforms Computer Graphics CMU 15-462/15-662, Fall 2016 Brief recap from last class How to draw a triangle - Why focus on triangles, and not quads, pentagons, etc? - What was specific to triangles
More informationPOLYHEDRAL GEOMETRY. Convex functions and sets. Mathematical Programming Niels Lauritzen Recall that a subset C R n is convex if
POLYHEDRAL GEOMETRY Mathematical Programming Niels Lauritzen 7.9.2007 Convex functions and sets Recall that a subset C R n is convex if {λx + (1 λ)y 0 λ 1} C for every x, y C and 0 λ 1. A function f :
More informationMore Data Locality for Static Control Programs on NUMA Architectures
More Data Locality for Static Control Programs on NUMA Architectures Adilla Susungi 1, Albert Cohen 2, Claude Tadonki 1 1 MINES ParisTech, PSL Research University 2 Inria and DI, Ecole Normale Supérieure
More informationAffine Loop Optimization using Modulo Unrolling in CHAPEL
Affine Loop Optimization using Modulo Unrolling in CHAPEL Aroon Sharma, Joshua Koehler, Rajeev Barua LTS POC: Michael Ferguson 2 Overall Goal Improve the runtime of certain types of parallel computers
More informationBlock Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations
Block Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations D. Zheltkov, N. Zamarashkin INM RAS September 24, 2018 Scalability of Lanczos method Notations Matrix order
More informationTranslations. Geometric Image Transformations. Two-Dimensional Geometric Transforms. Groups and Composition
Geometric Image Transformations Algebraic Groups Euclidean Affine Projective Bovine Translations Translations are a simple family of two-dimensional transforms. Translations were at the heart of our Sprite
More informationNew Mapping Interface
New Mapping Interface Mike Bauer NVIDIA Research 1 Why Have A Mapping Interface? Legion Program Legion Legion Mapper Scheduling is hard Lots of runtimes have heuristics What do you do when they are wrong?
More information6.837 LECTURE 7. Lecture 7 Outline Fall '01. Lecture Fall '01
6.837 LECTURE 7 1. Geometric Image Transformations 2. Two-Dimensional Geometric Transforms 3. Translations 4. Groups and Composition 5. Rotations 6. Euclidean Transforms 7. Problems with this Form 8. Choose
More informationGPU programming made easier
GPU programming made easier Jacob Jepsen 6. June 2014 University of Copenhagen Department of Computer Science 6. June 2014 Introduction We created a tool that reduces the development time of GPU code.
More informationTypes II. Hwansoo Han
Types II Hwansoo Han Arrays Most common and important composite data types Homogeneous elements, unlike records Fortran77 requires element type be scalar Elements can be any type (Fortran90, etc.) A mapping
More informationSize of a problem instance: Bigger instances take
2.1 Integer Programming and Combinatorial Optimization Slide set 2: Computational Complexity Katta G. Murty Lecture slides Aim: To study efficiency of various algo. for solving problems, and to classify
More informationThe role of semantic analysis in a compiler
Semantic Analysis Outline The role of semantic analysis in a compiler A laundry list of tasks Scope Static vs. Dynamic scoping Implementation: symbol tables Types Static analyses that detect type errors
More informationPeter Pacheco. Chapter 3. Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved
An Introduction to Parallel Programming Peter Pacheco Chapter 3 Distributed Memory Programming with MPI 1 Roadmap Writing your first MPI program. Using the common MPI functions. The Trapezoidal Rule in
More informationDistributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved
An Introduction to Parallel Programming Peter Pacheco Chapter 3 Distributed Memory Programming with MPI 1 Roadmap Writing your first MPI program. Using the common MPI functions. The Trapezoidal Rule in
More informationModule 9: Virtual Memory
Module 9: Virtual Memory Background Demand Paging Performance of Demand Paging Page Replacement Page-Replacement Algorithms Allocation of Frames Thrashing Other Considerations Demand Segmenation 9.1 Background
More informationAn array is a collection of data that holds fixed number of values of same type. It is also known as a set. An array is a data type.
Data Structures Introduction An array is a collection of data that holds fixed number of values of same type. It is also known as a set. An array is a data type. Representation of a large number of homogeneous
More informationName :. Roll No. :... Invigilator s Signature :.. CS/B.TECH (NEW)/SEM-2/CS-201/ BASIC COMPUTATION & PRINCIPLES OF COMPUTER PROGRAMMING
Name :. Roll No. :..... Invigilator s Signature :.. CS/B.TECH (NEW)/SEM-2/CS-201/2012 2012 BASIC COMPUTATION & PRINCIPLES OF COMPUTER PROGRAMMING Time Allotted : 3 Hours Full Marks : 70 The figures in
More informationCSCI-580 Advanced High Performance Computing
CSCI-580 Advanced High Performance Computing Performance Hacking: Matrix Multiplication Bo Wu Colorado School of Mines Most content of the slides is from: Saman Amarasinghe (MIT) Square-Matrix Multiplication!2
More informationParallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU
Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU Lifan Xu Wei Wang Marco A. Alvarez John Cavazos Dongping Zhang Department of Computer and Information Science University of Delaware
More informationWeb Science & Technologies University of Koblenz Landau, Germany. Relational Data Model
Web Science & Technologies University of Koblenz Landau, Germany Relational Data Model Overview Relational data model; Tuples and relations; Schemas and instances; Named vs. unnamed perspective; Relational
More informationSemantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End
Outline Semantic Analysis The role of semantic analysis in a compiler A laundry list of tasks Scope Static vs. Dynamic scoping Implementation: symbol tables Types Static analyses that detect type errors
More informationUnderstanding Sources of Inefficiency in General-Purpose Chips
Understanding Sources of Inefficiency in General-Purpose Chips Rehan Hameed Wajahat Qadeer Megan Wachs Omid Azizi Alex Solomatnikov Benjamin Lee Stephen Richardson Christos Kozyrakis Mark Horowitz GP Processors
More informationPointers. A pointer is simply a reference to a variable/object. Compilers automatically generate code to store/retrieve variables from memory
Pointers A pointer is simply a reference to a variable/object Compilers automatically generate code to store/retrieve variables from memory It is automatically generating internal pointers We don t have
More informationNumPy. Daniël de Kok. May 4, 2017
NumPy Daniël de Kok May 4, 2017 Introduction Today Today s lecture is about the NumPy linear algebra library for Python. Today you will learn: How to create NumPy arrays, which store vectors, matrices,
More informationRobotics I. March 27, 2018
Robotics I March 27, 28 Exercise Consider the 5-dof spatial robot in Fig., having the third and fifth joints of the prismatic type while the others are revolute. z O x Figure : A 5-dof robot, with a RRPRP
More informationAlgorithms & Data Structures
GATE- 2016-17 Postal Correspondence 1 Algorithms & Data Structures Computer Science & Information Technology (CS) 20 Rank under AIR 100 Postal Correspondence Examination Oriented Theory, Practice Set Key
More informationDouble-precision General Matrix Multiply (DGEMM)
Double-precision General Matrix Multiply (DGEMM) Parallel Computation (CSE 0), Assignment Andrew Conegliano (A0) Matthias Springer (A00) GID G-- January, 0 0. Assumptions The following assumptions apply
More informationHeterogeneous Computing with a Fused CPU+GPU Device
with a Fused CPU+GPU Device MediaTek White Paper January 2015 2015 MediaTek Inc. 1 Introduction Heterogeneous computing technology, based on OpenCL (Open Computing Language), intelligently fuses GPU and
More informationAMath 483/583 Lecture 8
AMath 483/583 Lecture 8 This lecture: Fortran subroutines and functions Arrays Dynamic memory Reading: class notes: Fortran Arrays class notes: Fortran Subroutines and Functions class notes: gfortran flags
More informationMatrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
More informationDIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka
USE OF FOR Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague Mini workshop on advanced numerical methods
More informationSemantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far
Outline Semantic Analysis The role of semantic analysis in a compiler A laundry list of tasks Scope Static vs. Dynamic scoping Implementation: symbol tables Types Statically vs. Dynamically typed languages
More informationOptimizing Closures in O(0) time
Optimizing Closures in O(0 time Andrew W. Keep Cisco Systems, Inc. Indiana Univeristy akeep@cisco.com Alex Hearn Indiana University adhearn@cs.indiana.edu R. Kent Dybvig Cisco Systems, Inc. Indiana University
More informationContents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11
Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed
More informationSearch Algorithms for Discrete Optimization Problems
Search Algorithms for Discrete Optimization Problems Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. 1 Topic
More informationJose Aliaga (Universitat Jaume I, Castellon, Spain), Ruyman Reyes, Mehdi Goli (Codeplay Software) 2017 Codeplay Software Ltd.
SYCL-BLAS: LeveragingSYCL-BLAS Expression Trees for Linear Algebra Jose Aliaga (Universitat Jaume I, Castellon, Spain), Ruyman Reyes, Mehdi Goli (Codeplay Software) 1 About me... Phd in Compilers and Parallel
More informationAlgorithm. Algorithm Analysis. Algorithm. Algorithm. Analyzing Sorting Algorithms (Insertion Sort) Analyzing Algorithms 8/31/2017
8/3/07 Analysis Introduction to Analysis Model of Analysis Mathematical Preliminaries for Analysis Set Notation Asymptotic Analysis What is an algorithm? An algorithm is any well-defined computational
More informationLecture Notes on Aggregate Data Structures
Lecture Notes on Aggregate Data Structures 15-312: Foundations of Programming Languages Frank Pfenning Lecture 8 September 23, 2004 In this lecture we discuss various language extensions which make MinML
More informationPattern Matching and Abstract Data Types
Pattern Matching and Abstract Data Types Tom Murphy VII 3 Dec 2002 0-0 Outline Problem Setup Views ( Views: A Way For Pattern Matching To Cohabit With Data Abstraction, Wadler, 1986) Active Patterns (
More information4. Simplicial Complexes and Simplicial Homology
MATH41071/MATH61071 Algebraic topology Autumn Semester 2017 2018 4. Simplicial Complexes and Simplicial Homology Geometric simplicial complexes 4.1 Definition. A finite subset { v 0, v 1,..., v r } R n
More informationCombinatorics I (Lecture 36)
Combinatorics I (Lecture 36) February 18, 2015 Our goal in this lecture (and the next) is to carry out the combinatorial steps in the proof of the main theorem of Part III. We begin by recalling some notation
More informationAccelerating Polynomial Homotopy Continuation on a Graphics Processing Unit with Double Double and Quad Double Arithmetic
Accelerating Polynomial Homotopy Continuation on a Graphics Processing Unit with Double Double and Quad Double Arithmetic Jan Verschelde joint work with Xiangcheng Yu University of Illinois at Chicago
More informationMPI Casestudy: Parallel Image Processing
MPI Casestudy: Parallel Image Processing David Henty 1 Introduction The aim of this exercise is to write a complete MPI parallel program that does a very basic form of image processing. We will start by
More informationC 1 Modified Genetic Algorithm to Solve Time-varying Lot Sizes Economic Lot Scheduling Problem
C 1 Modified Genetic Algorithm to Solve Time-varying Lot Sizes Economic Lot Scheduling Problem Bethany Elvira 1, Yudi Satria 2, dan Rahmi Rusin 3 1 Student in Department of Mathematics, University of Indonesia,
More informationON SOME METHODS OF CONSTRUCTION OF BLOCK DESIGNS
ON SOME METHODS OF CONSTRUCTION OF BLOCK DESIGNS NURNABI MEHERUL ALAM M.Sc. (Agricultural Statistics), Roll No. I.A.S.R.I, Library Avenue, New Delhi- Chairperson: Dr. P.K. Batra Abstract: Block designs
More informationPerformance Engineering for Algorithmic Building Blocks in the GHOST Library
Performance Engineering for Algorithmic Building Blocks in the GHOST Library Georg Hager, Moritz Kreutzer, Faisal Shahzad, Gerhard Wellein, Martin Galgon, Lukas Krämer, Bruno Lang, Jonas Thies, Melven
More informationConcepts of Programming Languages
Concepts of Programming Languages Lecture 15 - Functional Programming Patrick Donnelly Montana State University Spring 2014 Patrick Donnelly (Montana State University) Concepts of Programming Languages
More informationHPF commands specify which processor gets which part of the data. Concurrency is defined by HPF commands based on Fortran90
149 Fortran and HPF 6.2 Concept High Performance Fortran 6.2 Concept Fortran90 extension SPMD (Single Program Multiple Data) model each process operates with its own part of data HPF commands specify which
More informationLecture 8: Summary of Haskell course + Type Level Programming
Lecture 8: Summary of Haskell course + Type Level Programming Søren Haagerup Department of Mathematics and Computer Science University of Southern Denmark, Odense October 31, 2017 Principles from Haskell
More informationLOGIC AND DISCRETE MATHEMATICS
LOGIC AND DISCRETE MATHEMATICS A Computer Science Perspective WINFRIED KARL GRASSMANN Department of Computer Science University of Saskatchewan JEAN-PAUL TREMBLAY Department of Computer Science University
More informationOPERATING SYSTEMS. Systems with Multi-programming. CS 3502 Spring Chapter 4
OPERATING SYSTEMS CS 3502 Spring 2018 Systems with Multi-programming Chapter 4 Multiprogramming - Review An operating system can support several processes in memory. While one process receives service
More informationChapter 8: Memory Management
Chapter 8: Memory Management Chapter 8: Memory Management Background Swapping Contiguous Allocation Paging Segmentation Segmentation with Paging 8.2 Background Program must be brought into memory and placed
More informationFunctional Programming. Big Picture. Design of Programming Languages
Functional Programming Big Picture What we ve learned so far: Imperative Programming Languages Variables, binding, scoping, reference environment, etc What s next: Functional Programming Languages Semantics
More informationRTOS 101. Understand your real-time applications. with the help of Percepio Tracealyzer
RTOS 101 Understand your real-time applications with the help of Percepio Tracealyzer RTOS 101 Tasks, Priorities and Analysis Figure 1: Tracealyzer showing RTOS task scheduling and calls to RTOS services.
More informationA RETROSPECTIVE OF CRYSTAL
Introduction Language Program Transformation Compiler References A RETROSPECTIVE OF CRYSTAL ANOTHER PARALLEL COMPILER" AMBITION FROM THE LATE 80 S Eva Burrows BLDL-Talks Department of Informatics, University
More informationSubmission instructions (read carefully): SS17 / Assignment 4 Instructor: Markus Püschel. ETH Zurich
263-2300-00: How To Write Fast Numerical Code Assignment 4: 120 points Due Date: Th, April 13th, 17:00 http://www.inf.ethz.ch/personal/markusp/teaching/263-2300-eth-spring17/course.html Questions: fastcode@lists.inf.ethz.ch
More informationIntroduction to C++ Introduction. Structure of a C++ Program. Structure of a C++ Program. C++ widely-used general-purpose programming language
Introduction C++ widely-used general-purpose programming language procedural and object-oriented support strong support created by Bjarne Stroustrup starting in 1979 based on C Introduction to C++ also
More informationConcepts of Programming Languages
Concepts of Programming Languages Lecture 1 - Introduction Patrick Donnelly Montana State University Spring 2014 Patrick Donnelly (Montana State University) Concepts of Programming Languages Spring 2014
More informationChapter 3. Distributed Memory Programming with MPI
An Introduction to Parallel Programming Peter Pacheco Chapter 3 Distributed Memory Programming with MPI 1 Roadmap n Writing your first MPI program. n Using the common MPI functions. n The Trapezoidal Rule
More informationHow to declare an array in C?
Introduction An array is a collection of data that holds fixed number of values of same type. It is also known as a set. An array is a data type. Representation of a large number of homogeneous values.
More informationCOP4020 Programming Languages. Functional Programming Prof. Robert van Engelen
COP4020 Programming Languages Functional Programming Prof. Robert van Engelen Overview What is functional programming? Historical origins of functional programming Functional programming today Concepts
More informationWeld: A Common Runtime for Data Analytics
Weld: A Common Runtime for Data Analytics Shoumik Palkar, James Thomas, Anil Shanbhag*, Deepak Narayanan, Malte Schwarzkopf*, Holger Pirk*, Saman Amarasinghe*, Matei Zaharia Stanford InfoLab, *MIT CSAIL
More informationFractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures
Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures University of Virginia Dept. of Computer Science Technical Report #CS-2011-09 Jeremy W. Sheaffer and Kevin
More informationAccelerating Data Warehousing Applications Using General Purpose GPUs
Accelerating Data Warehousing Applications Using General Purpose s Sponsors: Na%onal Science Founda%on, LogicBlox Inc., IBM, and NVIDIA The General Purpose is a many core co-processor 10s to 100s of cores
More informationMath background. 2D Geometric Transformations. Implicit representations. Explicit representations. Read: CS 4620 Lecture 6
Math background 2D Geometric Transformations CS 4620 Lecture 6 Read: Chapter 2: Miscellaneous Math Chapter 5: Linear Algebra Notation for sets, functions, mappings Linear transformations Matrices Matrix-vector
More informationIntroduction to C++ with content from
Introduction to C++ with content from www.cplusplus.com 2 Introduction C++ widely-used general-purpose programming language procedural and object-oriented support strong support created by Bjarne Stroustrup
More informationThe Mercury project. Zoltan Somogyi
The Mercury project Zoltan Somogyi The University of Melbourne Linux Users Victoria 7 June 2011 Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 1 / 23 Introduction Mercury: objectives
More informationSome instance messages and methods
Some instance messages and methods x ^x y ^y movedx: dx Dy: dy x
More informationGPU Linear algebra extensions for GNU/Octave
Journal of Physics: Conference Series GPU Linear algebra extensions for GNU/Octave To cite this article: L B Bosi et al 2012 J. Phys.: Conf. Ser. 368 012062 View the article online for updates and enhancements.
More informationVirtual Memory. Overview: Virtual Memory. Virtual address space of a process. Virtual Memory
TDIU Operating systems Overview: Virtual Memory Virtual Memory Background Demand Paging Page Replacement Allocation of Frames Thrashing and Data Access Locality [SGG7/8/9] Chapter 9 Copyright Notice: The
More informationA Short History of Array Computing in Python. Wolf Vollprecht, PyParis 2018
A Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - - Array computing in general History up to NumPy Libraries after NumPy - Pure Python libraries - JIT / AOT compilers - Deep
More informationRelational Databases
Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 49 Plan of the course 1 Relational databases 2 Relational database design 3 Conceptual database design 4
More informationOperating System Concepts Ch. 5: Scheduling
Operating System Concepts Ch. 5: Scheduling Silberschatz, Galvin & Gagne Scheduling In a multi-programmed system, multiple processes may be loaded into memory at the same time. We need a procedure, or
More informationaxiomatic semantics involving logical rules for deriving relations between preconditions and postconditions.
CS 6110 S18 Lecture 18 Denotational Semantics 1 What is Denotational Semantics? So far we have looked at operational semantics involving rules for state transitions, definitional semantics involving translations
More informationParallel Systems. Project topics
Parallel Systems Project topics 2016-2017 1. Scheduling Scheduling is a common problem which however is NP-complete, so that we are never sure about the optimality of the solution. Parallelisation is a
More informationWelcome to the Webinar. What s New in Gurobi 7.5
Welcome to the Webinar What s New in Gurobi 7.5 Speaker Introduction Dr. Tobias Achterberg Director of R&D at Gurobi Optimization Formerly a developer at ILOG, where he worked on CPLEX 11.0 to 12.6 Obtained
More informationSparse Matrix Formats
Christopher Bross Friedrich-Alexander-Universität Erlangen-Nürnberg Motivation Sparse Matrices are everywhere Sparse Matrix Formats C. Bross BGCE Research Day, Erlangen, 09.06.2016 2/16 Motivation Sparse
More informationParallel & Concurrent Programming: ZPL. Emery Berger CMPSCI 691W Spring 2006 AMHERST. Department of Computer Science UNIVERSITY OF MASSACHUSETTS
Parallel & Concurrent Programming: ZPL Emery Berger CMPSCI 691W Spring 2006 Department of Computer Science Outline Previously: MPI point-to-point & collective Complicated, far from problem abstraction
More informationLecture 17. Edited from slides for Operating System Concepts by Silberschatz, Galvin, Gagne
Lecture 17 Edited from slides for Operating System Concepts by Silberschatz, Galvin, Gagne Page Replacement Algorithms Last Lecture: FIFO Optimal Page Replacement LRU LRU Approximation Additional-Reference-Bits
More informationThe Simplex Algorithm
The Simplex Algorithm April 25, 2005 We seek x 1,..., x n 0 which mini- Problem. mizes C(x 1,..., x n ) = c 1 x 1 + + c n x n, subject to the constraint Ax b, where A is m n, b = m 1. Through the introduction
More informationCS 1200 Discrete Math Math Preliminaries. A.R. Hurson 323 CS Building, Missouri S&T
CS 1200 Discrete Math A.R. Hurson 323 CS Building, Missouri S&T hurson@mst.edu 1 Course Objective: Mathematical way of thinking in order to solve problems 2 Variable: holder. A variable is simply a place
More information[Ch 6] Set Theory. 1. Basic Concepts and Definitions. 400 lecture note #4. 1) Basics
400 lecture note #4 [Ch 6] Set Theory 1. Basic Concepts and Definitions 1) Basics Element: ; A is a set consisting of elements x which is in a/another set S such that P(x) is true. Empty set: notated {
More informationAutomatic Development of Linear Algebra Libraries for the Tesla Series
Automatic Development of Linear Algebra Libraries for the Tesla Series Enrique S. Quintana-Ortí quintana@icc.uji.es Universidad Jaime I de Castellón (Spain) Dense Linear Algebra Major problems: Source
More information9/21/17. Outline. Expression Evaluation and Control Flow. Arithmetic Expressions. Operators. Operators. Notation & Placement
Outline Expression Evaluation and Control Flow In Text: Chapter 6 Notation Operator evaluation order Operand evaluation order Overloaded operators Type conversions Short-circuit evaluation of conditions
More informationMatrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
More informationIndex. object lifetimes, and ownership, use after change by an alias errors, use after drop errors, BTreeMap, 309
A Arithmetic operation floating-point arithmetic, 11 12 integer numbers, 9 11 Arrays, 97 copying, 59 60 creation, 48 elements, 48 empty arrays and vectors, 57 58 executable program, 49 expressions, 48
More informationOptimizations of BLIS Library for AMD ZEN Core
Optimizations of BLIS Library for AMD ZEN Core 1 Introduction BLIS [1] is a portable software framework for instantiating high-performance BLAS-like dense linear algebra libraries [2] The framework was
More informationNew Strategies for Filtering the Number Field Sieve Matrix
New Strategies for Filtering the Number Field Sieve Matrix Shailesh Patil Department of CSA Indian Institute of Science Bangalore 560 012 India Email: shailesh.patil@gmail.com Gagan Garg Department of
More informationLECTURE 2. Compilers and Interpreters
LECTURE 2 Compilers and Interpreters COMPILATION AND INTERPRETATION Programs written in high-level languages can be run in two ways. Compiled into an executable program written in machine language for
More information