SEJITS: High Performance for Productivity Applications
|
|
- Annis Walsh
- 5 years ago
- Views:
Transcription
1 SEJITS: High Performance for Productivity Applications Shoaib Kamil & many others Parlab End of Project Celebration May 30, 2013
2 Correctness Where Did We Start? Personal Health Image Retrieval Hearing, Music Dwarfs Speech Parallel Browser Composition & Coordination Language (C&CL) C&CL Compiler/Interpreter Static Verification Parallel Libraries Parallel Frameworks Type Systems Efficiency Sketching Languages Autotuners Legacy Communication & Schedulers Code Synch. Primitives Efficiency Language Compilers OS Libraries & Services Legacy OS Hypervisor Intel Multicore/GPGPU RAMP Manycore 2 Directed Testing Dynamic Checking Debugging with Replay 2
3 Serial Callee Parallel Callee The Big Problems Serial Caller Parallel Caller? Knew how to coordinate serial caller with serial or parallel callee How can we do the other two, in a way understandable to non-expert programmers? 3
4 The Big Problems Layering is good for engineering, bad for performance App 1 App 2 App 3 Dense Sparse Graph Trav. Multicore GPU Cloud 4
5 Inspiration: Auto-tuned DSL for Stencils Began with a large productivity-programmer-created codebase: new climate code Use code transformation to go from productive to efficient code.f95 Engines Generators Engines.f95.c.cu.c.h Reference Implementation Parse Internal Abstract Syntax Tree Representation Strategy Serial Parallel Victoria Falls GTX280 Code FORTRAN C with pthreads CUDA Transformation & Code Generation with high-level knowledge Myriad of equivalent, optimized, implementations (plus test harness) Search in context of specific problem Best performing implementation and configuration parameters OpenMP Comparison Expected Peak serial reference [IPDPS10] 5
6 Inspiration: ActiveRecord Product.find_by_price(95) Product.all.where( quantity >?, 0).where(:color => red ) Ruby on Rails ORM Complex queries generated from simple chains of method invocations Productivity programmer doesn t need to know SQL 6
7 Execute Phase Compile Phase SEJITS In A Nutshell Program Non-DSL Code Code in DSL A Code in DSL B Data Code in DSL A DSL Codegen Data Interpreter External Compiler Dynamic Link Library Result 7
8 Case Study: Sepya Stencils Embedded in Python with Auto-tuning Simple, declarative source for fast stencil computations on multicore/multisocket CPUs class My1DKernel(StencilKernel): def init (self): super(mykernel, self). init () # set the neighbors set 1 as the point # before and the point after self.set_neighbors(1, [[1],[-1]]) # the actual kernel function def kernel(self, out_grid, in_grid): for x in out_grid.interior_points(): for y in self.neighbors(in_grid, x, 1): out_grid[x] += 0.5 * in_grid[y] 8
9 Case Study: Sepya 93% of Roofline Peak 2.2x faster than state-of-the-art 9
10 Asp is SEJITS for Python Library for building auto-tuned DSELs Introspection of Python Code generation via templates or tree transformations Auto-tuning support Data structure interoperability Automatic compiler detection & demand-driven compilation with caching Ongoing development & support Screencasts available [SciPy11, PPoPP12] 10
11 Copperhead Data parallel compiler for Python subset from copperhead import * import numpy as def axpy(a, x, y): return [a * xi + yi for xi, yi in zip(x, y)] x = np.arange(100, dtype=np.float64) y = np.arange(100, dtype=np.float64) with places.gpu0: gpu = axpy(2.0, x, y) with places.openmp: cpu = axpy(2.0, x, y) Bryan Catanzaro et al (NVIDIA Research) [GTC12] 11
12 Ongoing Work Knowledge Discovery Toolbox (John Gilbert, Adam Lugowski, and Aydin Buluc, UCSB & LBNL) [IPDPS13] PyCASP: Python-based Content Analysis Using Specialization (Katya Gonina) [ASRU11] Bag of Little Bootstraps (Peter Birsinger, Richard Xia, et al) [SciPy12] Communication-Avoiding Recursive Algorithms (David Eliahu, Omer Spillinger, Oded Schwartz, Ben Lipshitz, Jim Demmel) [IPDPS13] Communication-Avoiding Solvers (Jeffrey Morlan) [iwapt12] Three Fingered Jack (David Sheffield) [HotPar13] ASPIRE Project 12
13 SEJITS Participants Bryan Catanzaro, Kurt Keutzer, Katya Gonina, Jeffrey Morlan, Armando Fox, Richard Xia, Henry Cook, Peter Birsinger, David Patterson, Katherine Yelick, James Demmel, Tim Mattson, Henry Gabb, David Sheffield, Michael Anderson, Juan Vargas, Jessica Miller, Scott Beamer, Omer Spillinger, David Eliahu, Aakash Prasad, Oded Schwartz, Ben Lipshitz, David Howard, Derrick Coetzee, Tayfun Elmas, Koushik Sen 13
14 Demo & Testimonial Demo: Derrick Coetzee showing a simple image processing kernel Testimonial: Tim Mattson, Intel 14
Code Generators for Stencil Auto-tuning
Code Generators for Stencil Auto-tuning Shoaib Kamil with Cy Chan, Sam Williams, Kaushik Datta, John Shalf, Katherine Yelick, Jim Demmel, Leonid Oliker Diagnosing Power/Performance Correctness Where this
More informationCode Generators for Stencil Auto-tuning
Code Generators for Stencil Auto-tuning Shoaib Kamil with Cy Chan, John Shalf, Sam Williams, Kaushik Datta, Katherine Yelick, Jim Demmel, Leonid Oliker Diagnosing Power/Performance Correctness Where this
More informationThird party names are the property of their owners.
Disclaimer READ THIS its very important The views expressed in this talk are those of the speaker and not his employer. This is an academic style talk and does not address details of any particular Intel
More informationIntegrating the Par Lab Stack Running Damascene on SEJITS/ROS/RAMP Gold
Integrating the Par Lab Stack Running on /ROS/RAMP Gold Kevin Klues, Yunsup Lee, Andrew Waterman Par Lab Winter Retreat 2010 Overall Goal of the Par Lab: Create Productive, Efficient, Correct, Portable
More informationActive Testing for Concurrent Programs
Active Testing for Concurrent Programs Pallavi Joshi, Mayur Naik, Chang-Seo Park, Koushik Sen 12/30/2008 ROPAS Seminar ParLab, UC Berkeley Intel Research Overview ParLab The Parallel Computing Laboratory
More informationContour Detection on Mobile Platforms
Contour Detection on Mobile Platforms Bor-Yiing Su, subrian@eecs.berkeley.edu Prof. Kurt Keutzer, keutzer@eecs.berkeley.edu Parallel Computing Lab, University of California, Berkeley 1/26 Diagnosing Power/Performance
More informationActive Testing for Concurrent Programs
Active Testing for Concurrent Programs Pallavi Joshi Mayur Naik Chang-Seo Park Koushik Sen 1/8/2009 ParLab Retreat ParLab, UC Berkeley Intel Research Overview Checking correctness of concurrent programs
More informationHardware Acceleration for Tagging
Parallel Hardware Parallel Applications IT industry (Silicon Valley) Parallel Software Users Hardware Acceleration for Tagging Sarah Bird, David McGrogan, John Kubiatowicz, Krste Asanovic June 5, 2008
More informationCopperhead: Data Parallel Python Bryan Catanzaro, NVIDIA Research 1/19
Copperhead: Data Parallel Python Bryan Catanzaro, NVIDIA Research 1/19 Motivation Not everyone wants to write C++ or Fortran How can we broaden the use of parallel computing? All of us want higher programmer
More informationProductive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages
Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages Shoaib Ashraf Kamil Electrical Engineering and Computer Sciences University of California at Berkeley
More informationMulti-level Debugging for Multi-stage, Parallelizing Compilers
Multi-level Debugging for Multi-stage, Parallelizing Compilers Richard Xia Tayfun Elmas Shoaib Ashraf Kamil Armando Fox Koushik Sen Electrical Engineering and Computer Sciences University of California
More informationMaking Performance Understandable: Towards a Standard for Performance Counters on Manycore Architectures
Parallel Hardware Parallel Applications IT industry (Silicon Valley) Parallel Software Users Making Performance Understandable: Towards a Standard for Performance Counters on Manycore Architectures Sarah
More informationEnd of Parlab talk. May 29, 2013 Gans Srinivasa Intel Corp
End of Parlab talk May 29, 2013 Gans Srinivasa Intel Corp 1 A Journey: Year 2005; 2008; 2011; Potential Parallel Speedup 1000 100 MIMD*SIMD (32 b) MIMD*SIMD (64 b) SIMD (32b) SIMD (64b) MIMD 10 To Multicore
More informationA SPECIALIZATION FRAMEWORK
A SPECIALIZATION FRAMEWORK FOR AUDIO CONTENT ANALYSIS Katya Gonina with Henry Cook, Eric Battenberg, Gerald Friedland* and Kurt Keutzer UC Berkeley ParLab, *International Computer Science Institute January
More informationA Retrospective on Par Lab Architecture Research
A Retrospective on Par Lab Architecture Research Par Lab SIGARCH Krste Asanovic, Rimas Avizienis, Jonathan Bachrach, Scott Beamer, Sarah Bird, Alex Bishara, Chris Celio, Henry Cook, Ben Keller, Yunsup
More informationPortable Parallel Performance from Sequential, Productive, Embedded Domain-Specific Languages
Preprint Portable Parallel Performance from Sequential, Productive, Embedded Domain-Specific Languages Shoaib Kamil Derrick Coetzee Scott Beamer Henry Cook Ekaterina Gonina Jonathan Harper Jeffrey Morlan
More informationAn Auto-Tuning Framework for Parallel Multicore Stencil Computations
Software Engineering Seminar Sebastian Hafen An Auto-Tuning Framework for Parallel Multicore Stencil Computations Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, Samuel Williams 1 Stencils What is a
More informationPatterns and Computer Vision
Patterns and Computer Vision Michael Anderson, Bryan Catanzaro, Jike Chong, Katya Gonina, Kurt Keutzer, Tim Mattson, Mark Murphy, David Sheffield, Bor-Yiing Su, Narayanan Sundaram and the rest of the PALLAS
More informationParallel Webpage Layout
Parallel Webpage Layout Leo Meyerovich, Chan Siu Man, Chan Siu On, Heidi Pan Krste Asanovic, Rastislav Bodik and many others from the UPCRC Berkeley project UC Berkeley Par Lab Research Overview Diagnosing
More informationMolecular Simulations using Monte Carlo and Molecular Dynamics
Mitglied der Helmholtz-Gemeinschaft Molecular Simulations using Monte Carlo and Molecular Dynamics Jan H. Meinke Research Centre Jülich Jülich Jülich Wrocław Wrocław Slide 2 Mitglied der Helmholtz-Gemeinschaft
More informationA Framework for Productive, Efficient and Portable Parallel Computing
A Framework for Productive, Efficient and Portable Parallel Computing Ekaterina I. Gonina Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2013-216
More informationRed Fox: An Execution Environment for Relational Query Processing on GPUs
Red Fox: An Execution Environment for Relational Query Processing on GPUs Georgia Institute of Technology: Haicheng Wu, Ifrah Saeed, Sudhakar Yalamanchili LogicBlox Inc.: Daniel Zinn, Martin Bravenboer,
More informationLarge Displacement Optical Flow & Applications
Large Displacement Optical Flow & Applications Narayanan Sundaram, Kurt Keutzer (Parlab) In collaboration with Thomas Brox (University of Freiburg) Michael Tao (University of California Berkeley) Parlab
More informationRed Fox: An Execution Environment for Relational Query Processing on GPUs
Red Fox: An Execution Environment for Relational Query Processing on GPUs Haicheng Wu 1, Gregory Diamos 2, Tim Sheard 3, Molham Aref 4, Sean Baxter 2, Michael Garland 2, Sudhakar Yalamanchili 1 1. Georgia
More informationAFRL-RI-RS-TR
AFRL-RI-RS-TR-2012-301 SELECTIVE, EMBEDDED, JUST-IN-TIME SPECIALIZATION (SEJITS): PORTABLE PARALLEL PERFORMANCE FROM SEQUENTIAL, PRODUCTIVE, EMBEDDED DOMAIN-SPECIFIC LANGUAGES UNIVERSITY OF CALIFORNIA,
More informationAn Extensible Framework for Composing Stencils with Common Scientific Computing Patterns
An Extensible Framework for Composing Stencils with Common Scientific Computing Patterns Leonard Truong, Chick Markley, and Armando Fox Computer Science Division, University of California, Berkeley 565
More informationSteps Towards Heterogeneity and the UC Berkeley Parallel Computing Lab
Steps Towards Heterogeneity and the UC Berkeley Parallel Computing Lab Krste Asanović, Ras Bodik, Eric Brewer, Jim Demmel, Armando Fox, Tony Keaveny, Kurt Keutzer, John Kubiatowicz, Nelson Morgan, Dave
More informationA Framework for Composing High-Performance OpenCL from Python Descriptions
A Framework for Composing High-Performance OpenCL from Python Descriptions Michael Anderson Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2014-210
More informationPhone: (510) Homepage: https://scottbeamer.net
Scott Beamer Lawrence Berkeley National Laboratory One Cyclotron Road Berkeley, CA 94720 Phone: (510) 495-2709 Email: sbeamer@lbl.gov Homepage: https://scottbeamer.net Interests Computer architecture,
More informationCS61CL : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61CL : Machine Structures Lecture #15 Parallelism 2009-8-12 www.xkcd.com/619 Paul Pearce, TA CS61CL L15 Parallelism(1) Background: Threads A Thread stands for thread of
More informationDisruptive technologies: a software response
Disruptive technologies: a software response Tim Mattson Principal Engineer Intel Corporation timothy.g.mattson@intel.com Top 500: total number of processors (1993-2000) 140000 120000 100000 80000 60000
More informationThe Parallel Computing Laboratory
The Parallel Computing Laboratory Krste Asanovic, Ras Bodik, Jim Demmel, Armando Fox, Tony Keaveny, Kurt Keutzer, John Kubiatowicz, Nelson Morgan, Dave Patterson, Koushik Sen, David Wessel, and Kathy Yelick
More informationTools and Primitives for High Performance Graph Computation
Tools and Primitives for High Performance Graph Computation John R. Gilbert University of California, Santa Barbara Aydin Buluç (LBNL) Adam Lugowski (UCSB) SIAM Minisymposium on Analyzing Massive Real-World
More informationA tour of ParallelAccelerator.jl A Library and Compiler for High-Level, High-Performance Scientific Computing in Julia
A tour of ParallelAccelerator.jl A Library and Compiler for High-Level, High-Performance Scientific Computing in Julia Lindsey Kuper Parallel Computing Lab, Intel Labs January 18, 2017 Project Contributors:
More informationPaving the Road to Exascale
Paving the Road to Exascale Kathy Yelick Associate Laboratory Director for Computing Sciences and NERSC Division Director, Lawrence Berkeley National Laboratory EECS Professor, UC Berkeley NERSC Overview
More informationCOMMUNICATION-MINIMIZING 2D CONVOLUTION IN GPU REGISTERS
COMMUNICATION-MINIMIZING 2D CONVOLUTION IN GPU REGISTERS Forrest N. Iandola, David Sheffield, Michael Anderson, Phitchaya Mangpo Phothilimthana, and Kurt Keutzer Parallel Computing Laboratory (ParLab)
More informationCOMMUNICATION-MINIMIZING 2D CONVOLUTION IN GPU REGISTERS
COMMUNICATION-MINIMIZING 2D CONVOLUTION IN GPU REGISTERS Forrest N. Iandola, David Sheffield, Michael Anderson, Phitchaya Mangpo Phothilimthana, and Kurt Keutzer Parallel Computing Laboratory (ParLab)
More informationTrends and Challenges in Multicore Programming
Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores
More informationSCALABLE COMPLEX GRAPH ANALYSIS WITH THE KNOWLEDGE DISCOVERY TOOLBOX
SCALABLE COMPLEX GRAPH ANALYSIS WITH THE KNOWLEDGE DISCOVERY TOOLBOX Adam Lugowski Aydın Buluç John R. Gilbert Steve Reinhardt Lawrence Berkeley National Laboratory University of California, Santa Barbara
More informationJose Aliaga (Universitat Jaume I, Castellon, Spain), Ruyman Reyes, Mehdi Goli (Codeplay Software) 2017 Codeplay Software Ltd.
SYCL-BLAS: LeveragingSYCL-BLAS Expression Trees for Linear Algebra Jose Aliaga (Universitat Jaume I, Castellon, Spain), Ruyman Reyes, Mehdi Goli (Codeplay Software) 1 About me... Phd in Compilers and Parallel
More informationBERKELEY PAR LAB. RAMP Gold Wrap. Krste Asanovic. RAMP Wrap Stanford, CA August 25, 2010
RAMP Gold Wrap Krste Asanovic RAMP Wrap Stanford, CA August 25, 2010 RAMP Gold Team Graduate Students Zhangxi Tan Andrew Waterman Rimas Avizienis Yunsup Lee Henry Cook Sarah Bird Faculty Krste Asanovic
More informationFast Speaker Diarization Using a Specialization Framework for Gaussian Mixture Model Training
Fast Speaker Diarization Using a Specialization Framework for Gaussian Mixture Model Training Ekaterina Gonina Electrical Engineering and Computer Sciences University of California at Berkeley Technical
More informationSKYE: Motion Detection & Tracking with SEJITS
SKYE: Motion Detection & Tracking with SEJITS Mihir Patil Armando Fox, Ed. Charles Markley, Ed. Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2017-115
More informationLithe: Enabling Efficient Composition of Parallel Libraries
Lithe: Enabling Efficient Composition of Parallel Libraries Heidi Pan, Benjamin Hindman, Krste Asanović xoxo@mit.edu apple {benh, krste}@eecs.berkeley.edu Massachusetts Institute of Technology apple UC
More informationCMSC 858M/AMSC 698R. Fast Multipole Methods. Nail A. Gumerov & Ramani Duraiswami. Lecture 20. Outline
CMSC 858M/AMSC 698R Fast Multipole Methods Nail A. Gumerov & Ramani Duraiswami Lecture 20 Outline Two parts of the FMM Data Structures FMM Cost/Optimization on CPU Fine Grain Parallelization for Multicore
More informationHow HPC Hardware and Software are Evolving Towards Exascale
How HPC Hardware and Software are Evolving Towards Exascale Kathy Yelick Associate Laboratory Director and NERSC Director Lawrence Berkeley National Laboratory EECS Professor, UC Berkeley NERSC Overview
More informationThe Parallel Computing Landscape: A View from Berkeley 2.0
Parallel Applications Parallel Hardware Parallel Software The Parallel Computing Landscape: A View from Berkeley 2.0 Krste Asanovic, Ras Bodik, Jim Demmel, Tony Keaveny, Kurt Keutzer, John Kubiatowicz,
More informationCFD Solvers on Many-core Processors
CFD Solvers on Many-core Processors Tobias Brandvik Whittle Laboratory CFD Solvers on Many-core Processors p.1/36 CFD Backgroud CFD: Computational Fluid Dynamics Whittle Laboratory - Turbomachinery CFD
More informationTurbostream: A CFD solver for manycore
Turbostream: A CFD solver for manycore processors Tobias Brandvik Whittle Laboratory University of Cambridge Aim To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware
More informationAddressing Heterogeneity in Manycore Applications
Addressing Heterogeneity in Manycore Applications RTM Simulation Use Case stephane.bihan@caps-entreprise.com Oil&Gas HPC Workshop Rice University, Houston, March 2008 www.caps-entreprise.com Introduction
More informationMatchbox. automatic batching for imperative deep learning NVIDIA GTC, 2018/3/28. James Bradbury
Matchbox automatic batching for imperative deep learning James Bradbury NVIDIA GTC, 2018/3/28 Roadmap Imperative deep learning How manual batching works Other tools for automatic batching The Matchbox
More informationApproaches to GPU Computing. Libraries, OpenACC Directives, and Languages
Approaches to GPU Computing Libraries, OpenACC Directives, and Languages Add GPUs: Accelerate Applications CPU GPU 146X 36X 18X 50X 100X Medical Imaging U of Utah Molecular Dynamics U of Illinois, Urbana
More informationTessellation: Space-Time Partitioning in a Manycore Client OS
Tessellation: Space-Time ing in a Manycore Client OS Rose Liu 1,2, Kevin Klues 1, Sarah Bird 1, Steven Hofmeyr 3, Krste Asanovic 1, John Kubiatowicz 1 1 Parallel Computing Laboratory, UC Berkeley 2 Data
More informationPython VSIP API: A first draft
Python VSIP API: A first draft Stefan Seefeld HPEC WG meeting, December 9, 2014 Goals Use cases: Promote VSIP standard to a wider audience (SciPy users) Add more hardware acceleration to SciPy Allow VSIP
More informationOpenACC/CUDA/OpenMP... 1 Languages and Libraries... 3 Multi-GPU support... 4 How OpenACC Works... 4
OpenACC Course Class #1 Q&A Contents OpenACC/CUDA/OpenMP... 1 Languages and Libraries... 3 Multi-GPU support... 4 How OpenACC Works... 4 OpenACC/CUDA/OpenMP Q: Is OpenACC an NVIDIA standard or is it accepted
More informationChapter 3 Parallel Software
Chapter 3 Parallel Software Part I. Preliminaries Chapter 1. What Is Parallel Computing? Chapter 2. Parallel Hardware Chapter 3. Parallel Software Chapter 4. Parallel Applications Chapter 5. Supercomputers
More informationSupercomputing and Mass Market Desktops
Supercomputing and Mass Market Desktops John Manferdelli Microsoft Corporation This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
More informationA General Discussion on! Parallelism!
Lecture 2! A General Discussion on! Parallelism! John Cavazos! Dept of Computer & Information Sciences! University of Delaware!! www.cis.udel.edu/~cavazos/cisc879! Lecture 2: Overview Flynn s Taxonomy
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationUbiquitous Parallel Computing from Berkeley, Illinois and Stanford
Ubiquitous Parallel Computing from Berkeley, Illinois and Stanford Bryan Catanzaro, Armando Fox, Kurt Keutzer, David Patterson, and Bor- Yiing Su (UC Berkeley) Marc Snir (U Illinois at Urbana- Champaign)
More informationResearch Statement Scott Beamer
Research Statement Scott Beamer Despite the end of Moore s Law on the horizon, there is no end in sight to the rapid growth in the volume of data and applications to make use of it. To use that data, it
More informationCenter for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop
Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop http://cscads.rice.edu/ Discussion and Feedback CScADS Autotuning 07 Top Priority Questions for Discussion
More informationApplication-independent Autotuning for GPUs
Application-independent Autotuning for GPUs Martin Tillmann, Thomas Karcher, Carsten Dachsbacher, Walter F. Tichy KARLSRUHE INSTITUTE OF TECHNOLOGY KIT University of the State of Baden-Wuerttemberg and
More informationD-TEC DSL Technology for Exascale Computing
D-TEC DSL Technology for Exascale Computing Progress Report: March 2013 DOE Office of Science Program: Office of Advanced Scientific Computing Research ASCR Program Manager: Dr. Sonia Sachs 1 Introduction
More informationApplication of KGen and KGen-kernel
Application of KGen and KGen-kernel Youngsung Kim and John Dennis Sep. 14, 2016 NCAR Contents Introduction KGen kernel in practice Optimization and Porting Validation, Test collection, Profiling, etc.
More informationCopperhead: Compiling an Embedded Data Parallel Language
Copperhead: Compiling an Embedded Data Parallel Language Bryan Catanzaro Michael Garland Kurt Keutzer Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report
More informationA General Discussion on! Parallelism!
Lecture 2! A General Discussion on! Parallelism! John Cavazos! Dept of Computer & Information Sciences! University of Delaware! www.cis.udel.edu/~cavazos/cisc879! Lecture 2: Overview Flynn s Taxonomy of
More informationAuto-Generation and Auto-Tuning of 3D Stencil Codes on GPU Clusters
Auto-Generation and Auto-Tuning of 3D Stencil s on GPU Clusters Yongpeng Zhang, Frank Mueller North Carolina State University CGO 2012 Outline Motivation DSL front-end and Benchmarks Framework Experimental
More informationLawrence Berkeley National Laboratory Recent Work
Lawrence Berkeley National Laboratory Recent Work Title An auto-tuning framework for parallel multicore stencil computations Permalink https://escholarship.org/uc/item/nvwkw ISBN 97 Authors Kamil, S Chan,
More informationSPOC : GPGPU programming through Stream Processing with OCaml
SPOC : GPGPU programming through Stream Processing with OCaml Mathias Bourgoin - Emmanuel Chailloux - Jean-Luc Lamotte January 23rd, 2012 GPGPU Programming Two main frameworks Cuda OpenCL Different Languages
More informationProductive Design of Extensible Cache Coherence Protocols!
P A R A L L E L C O M P U T I N G L A B O R A T O R Y Productive Design of Extensible Cache Coherence Protocols!! Henry Cook, Jonathan Bachrach,! Krste Asanovic! Par Lab Summer Retreat, Santa Cruz June
More informationGenerating high-performance multiplatform finite element solvers using the Manycore Form Compiler and OP2
Generating high-performance multiplatform finite element solvers using the Manycore Form Compiler and OP2 Graham R. Markall, Florian Rathgeber, David A. Ham, Paul H. J. Kelly, Carlo Bertolli, Adam Betts
More informationHPC trends (Myths about) accelerator cards & more. June 24, Martin Schreiber,
HPC trends (Myths about) accelerator cards & more June 24, 2015 - Martin Schreiber, M.Schreiber@exeter.ac.uk Outline HPC & current architectures Performance: Programming models: OpenCL & OpenMP Some applications:
More informationTechnology for a better society. hetcomp.com
Technology for a better society hetcomp.com 1 J. Seland, C. Dyken, T. R. Hagen, A. R. Brodtkorb, J. Hjelmervik,E Bjønnes GPU Computing USIT Course Week 16th November 2011 hetcomp.com 2 9:30 10:15 Introduction
More informationIntroduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator
Introduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator What is CUDA? Programming language? Compiler? Classic car? Beer? Coffee? CUDA Parallel Computing Platform www.nvidia.com/getcuda Programming
More informationTORSTEN HOEFLER
TORSTEN HOEFLER MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures most work performed by TOBIAS GYSI AND TOBIAS GROSSER Stencil computations (oh no,
More informationCME 213 S PRING Eric Darve
CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and
More informationThe Content-Based Image Retrieval Project
Chapter 3 The Content-Based Image Retrieval Project Bryan Catanzaro and Kurt Keutzer 1 Introduction The Content Based Image Retrieval Project was one of Par Lab s five motivating applications. In this
More informationThe Roofline Model: EECS. A pedagogical tool for program analysis and optimization
P A R A L L E L C O M P U T I N G L A B O R A T O R Y The Roofline Model: A pedagogical tool for program analysis and optimization Samuel Williams,, David Patterson, Leonid Oliker,, John Shalf, Katherine
More informationAccelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors
Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Michael Boyer, David Tarjan, Scott T. Acton, and Kevin Skadron University of Virginia IPDPS 2009 Outline Leukocyte
More informationFuture Directions for CUDA Presented by Robert Strzodka
Future Directions for CUDA Presented by Robert Strzodka Authored by Mark Harris NVIDIA Corporation Platform for Parallel Computing Platform The CUDA Platform is a foundation that supports a diverse parallel
More informationTHE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT HARDWARE PLATFORMS
Computer Science 14 (4) 2013 http://dx.doi.org/10.7494/csci.2013.14.4.679 Dominik Żurek Marcin Pietroń Maciej Wielgosz Kazimierz Wiatr THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT
More informationCLU: Open Source API for OpenCL Prototyping
CLU: Open Source API for OpenCL Prototyping Presenter: Adam Lake@Intel Lead Developer: Allen Hux@Intel Contributors: Benedict Gaster@AMD, Lee Howes@AMD, Tim Mattson@Intel, Andrew Brownsword@Intel, others
More informationDesign Principles for End-to-End Multicore Schedulers
c Systems Group Department of Computer Science ETH Zürich HotPar 10 Design Principles for End-to-End Multicore Schedulers Simon Peter Adrian Schüpbach Paul Barham Andrew Baumann Rebecca Isaacs Tim Harris
More informationFrom Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation Erik Schnetter, Perimeter Institute with M. Blazewicz, I. Hinder, D. Koppelman, S. Brandt, M. Ciznicki, M.
More informationIntel C++ Compiler Professional Edition 11.1 for Mac OS* X. In-Depth
Intel C++ Compiler Professional Edition 11.1 for Mac OS* X In-Depth Contents Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. 3 Intel C++ Compiler Professional Edition 11.1 Components:...3 Features...3
More informationExceptions & a Taste of Declarative Programming in SQL
Exceptions & a Taste of Declarative Programming in SQL David E. Culler CS8 Computational Structures in Data Science http://inst.eecs.berkeley.edu/~cs88 Lecture 12 April 18, 2016 Computational Concepts
More information1) What is the first step of the system development life cycle (SDLC)? A) Design B) Analysis C) Problem and Opportunity Identification D) Development
Technology In Action, Complete, 14e (Evans et al.) Chapter 10 Behind the Scenes: Software Programming 1) What is the first step of the system development life cycle (SDLC)? A) Design B) Analysis C) Problem
More informationINTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017
INTRODUCTION TO OPENACC Analyzing and Parallelizing with OpenACC, Feb 22, 2017 Objective: Enable you to to accelerate your applications with OpenACC. 2 Today s Objectives Understand what OpenACC is and
More informationAFOSR BRI: Codifying and Applying a Methodology for Manual Co-Design and Developing an Accelerated CFD Library
AFOSR BRI: Codifying and Applying a Methodology for Manual Co-Design and Developing an Accelerated CFD Library Synergy@VT Collaborators: Paul Sathre, Sriram Chivukula, Kaixi Hou, Tom Scogland, Harold Trease,
More informationPROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec
PROGRAMOVÁNÍ V C++ CVIČENÍ Michal Brabec PARALLELISM CATEGORIES CPU? SSE Multiprocessor SIMT - GPU 2 / 17 PARALLELISM V C++ Weak support in the language itself, powerful libraries Many different parallelization
More informationDuksu Kim. Professional Experience Senior researcher, KISTI High performance visualization
Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior
More informationGPU Computing with NVIDIA s new Kepler Architecture
GPU Computing with NVIDIA s new Kepler Architecture Axel Koehler Sr. Solution Architect HPC HPC Advisory Council Meeting, March 13-15 2013, Lugano 1 NVIDIA: Parallel Computing Company GPUs: GeForce, Quadro,
More informationWhat s New in MATLAB and Simulink The MathWorks, Inc. 1
What s New in MATLAB Simulink 2015 The MathWorks, Inc. 1 Engineers scientists 2 Engineers scientists Develop algorithms Analyze data write MATLAB code. 3 Engineers scientists deploy algorithms applications
More informationCUDA Development Using NVIDIA Nsight, Eclipse Edition. David Goodwin
CUDA Development Using NVIDIA Nsight, Eclipse Edition David Goodwin NVIDIA Nsight Eclipse Edition CUDA Integrated Development Environment Project Management Edit Build Debug Profile SC'12 2 Powered By
More informationEECS Electrical Engineering and Computer Sciences
P A R A L L E L C O M P U T I N G L A B O R A T O R Y Auto-tuning Stencil Codes for Cache-Based Multicore Platforms Kaushik Datta Dissertation Talk December 4, 2009 Motivation Multicore revolution has
More informationMIGRATION OF LEGACY APPLICATIONS TO HETEROGENEOUS ARCHITECTURES Francois Bodin, CTO, CAPS Entreprise. June 2011
MIGRATION OF LEGACY APPLICATIONS TO HETEROGENEOUS ARCHITECTURES Francois Bodin, CTO, CAPS Entreprise June 2011 FREE LUNCH IS OVER, CODES HAVE TO MIGRATE! Many existing legacy codes needs to migrate to
More informationOperator Vectorization Library A TensorFlow Plugin
Operator Vectorization Library A TensorFlow Plugin Matthew Pickett, Karen Brems, Florian Raudies Hewlett Packard Labs HPE-2016-94 Keyword(s): Machine learning; GPU acceleration; TensorFlow Abstract: TensorFlow
More informationMachine Learning at the Limit
Machine Learning at the Limit John Canny*^ * Computer Science Division University of California, Berkeley ^ Yahoo Research Labs @GTC, March, 2015 My Other Job(s) Yahoo [Chen, Pavlov, Canny, KDD 2009]*
More informationCommunication-avoidance!
Communication-avoidance! and! Automatic performance tuning! Nick Knight! UC-Berkeley Parlab (Bebop group)! 1 Students and Collaborators (Bebop) PIs! James Demmel! Kathy Yelick! Students! Marghoob Mohiyuddin!
More information