High Performance Computing for Engineers
|
|
- Ruth Alice Horton
- 5 years ago
- Views:
Transcription
1 High Performance Computing for Engineers David Thomas Room 903 HPCE / dt10/ 2014 / 0.1
2 High Performance Computing for Engineers Research Testing communication protocols Evaluating signal-processing filters Simulating analogue and digital designs HPCE / dt10/ 2014 / 0.2
3 High Performance Computing for Engineers Research Testing communication protocols Evaluating signal-processing filters Simulating analogue and digital designs Tools CAD tools: synthesis, place-and-route, verification Libraries/toolboxes: filter design, compressive sensing HPCE / dt10/ 2014 / 0.3
4 High Performance Computing for Engineers Research Testing communication protocols Evaluating signal-processing filters Simulating analogue and digital designs Tools CAD tools: synthesis, place-and-route, verification Libraries/toolboxes: filter design, compressive sensing Products Oil exploration and discovery Mobile-phone apps Financial computing HPCE / dt10/ 2014 / 0.4
5 High Performance Computing for Engineers Types of performance metrics HPCE / dt10/ 2014 / 0.5
6 High Performance Computing for Engineers Types of performance metrics Throughput Latency Power Design-time Capital and running costs HPCE / dt10/ 2014 / 0.6
7 High Performance Computing for Engineers Types of performance metrics Throughput Latency Power Design-time Capital and running costs Required versus desired performance Subject to a throughput of X, minimise average power Subject to a budget of Y, maximise energy efficiency Subject to Z development days, maximise throughput HPCE / dt10/ 2014 / 0.7
8 What is available to you Types of compute device Multi-core CPUs GPUs (Graphics Processing Units) MPPAs (Massively Parallel Processor Arrays) FPGAs (Field Programmable Gate Arrays) HPCE / dt10/ 2014 / 0.8
9 What is available to you Types of compute device Multi-core CPUs GPUs (Graphics Processing Units) MPPAs (Massively Parallel Processor Arrays) FPGAs (Field Programmable Gate Arrays) Types of compute system Embedded Systems Mobile Phones Tablets Laptops Grid computing Cloud computing HPCE / dt10/ 2014 / 0.9
10 HTC Droid DNA Snapdragon S4 Pro - CPU : Quad-core Krait (ARM derivative) - GPU : Adreno 320 GPU (OpenCL compatible) Images Copyright HTC and Qaulcomm HPCE / dt10/ 2014 / 0.10
11 Lenovo Thinkpad Edge E525 AMD Fusion A8-3500M - CPU : Quad-Core 2.4GHz Phenom-II - GPU : HD 6620G 400MHz (320 cores) Img: HPCE / dt10/ 2014 / 0.11
12 Imperial HPC Cluster cx2 - SGI Altix ICE 8200 EX Racks and racks of high-performance PCs x64 cores running at 3GHz Available to researchers and undergrads (if they ask nicely) Grid-management system Run program on 1000 PCs with one command HPCE / dt10/ 2014 / 0.12
13 Performance and Efficiency Relative to CPU Uniform Gaussian Exponential Mean (Geo) MPPA FPGA GPU Uniform Gaussian 345 Exponential Mean (Geo) FPGA GPU MPPA Performance Power Efficiency HPCE / dt10/ 2014 / 0.13
14 Design tradeoffs 1 Sequential SW 10 Performance hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2014 / 0.14
15 Design tradeoffs 1 10 Performance 100 Sequential SW Thread-based SW hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2014 / 0.15
16 Design tradeoffs 1 10 Performance 100 Sequential SW Thread-based SW hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2014 / 0.16
17 Design tradeoffs 1 10 Performance 100 Sequential SW Task-based SW Thread-based SW hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2014 / 0.17
18 Design tradeoffs Task-based parallelism vs threads Easy to program (less time coding) 1 Easy to get right (less time testing) 10Many implementations and APIs Performance 100 Intel Threaded Building Blocks (TBB) Microsoft.NET Task Parallel Library 1000 OpenCL 1 hour 1 day 1 week 1 month Sequential SW Task-based SW Thread-based SW Design-time HPCE / dt10/ 2014 / 0.18
19 Design tradeoffs 1 10 Performance 100 Sequential SW Task-based SW Thread-based SW hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2014 / 0.19
20 Design tradeoffs 1 10 Performance 100 Sequential SW Task-based SW Thread-based SW hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2014 / 0.20
21 Design tradeoffs 1 10 Performance 100 Sequential SW Task-based SW Thread-based SW GPU hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2014 / 0.21
22 Design tradeoffs 1 10 Performance 100 Sequential SW Task-based SW Thread-based SW GPU hour 1 day 1 week 1 month Design-time Src: NVIDIA CUDA Compute Unified Device Architecture, Programmers Guide HPCE / dt10/ 2014 / 0.22
23 Design tradeoffs 1 10 Performance 100 Sequential SW Task-based SW Thread-based SW GPU hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2014 / 0.23
24 Design tradeoffs 1 10 Performance 100 Sequential SW Task-based SW Thread-based SW GPU hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2014 / 0.24
25 Design tradeoffs 1 10 Performance Sequential SW Task-based SW Thread-based SW GPU FPGA 1 hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2014 / 0.25
26 Design tradeoffs 1 10 Performance Sequential SW Task-based SW Thread-based SW GPU FPGA 1 hour 1 day 1 week 1 month Design-time HPCE / dt10/ 2014 / 0.26
27 What you will learn Systems: what high-performance systems are available Methods: how these systems can be programmed Practise: concrete experience with multi-core and GPUs Analysis: knowing what to use and when Tools: making better use of your time HPCE / dt10/ 2014 / 0.27
28 Developer productivity is also part of performance HPCE / dt10/ 2014 / 0.28
29 HPCE / dt10/ 2014 / 0.29
30 Re: XKCD - My Professional Context Undergraduate degree and PhD from Computing If pushed, I self-identify as a programmer Research focuses on hardware acceleration Both academic and industrial applications My motivation for this course Supervising final year project students Working with PhD students Talking to industry people HPCE / dt10/ 2014 / 0.30
31 100% Coursework Course Assessment Change from last year, used to be 50% exam 40% : Four short course-works to build skills Get familiar with environments and how to do common tasks Structured and quite linear should not be taxing Force people to do work earlier in term 40% : Two larger tasks to apply skills to real problems Allow demonstration of knowledge and skills Unstructured; open-ended; competitive; hard 20% : Oral assessment; individual Test ability to communicate about your code and solutions (Check that you did the work) HPCE / dt10/ 2014 / 0.31
32 Skills needed Basic programming If you can t program in _any_ language then worry Intel TBB uses C++ rather than C Some weird C++ stuff, but not scary: explained in lectures Setup and basics covered in third coursework GPU programming uses OpenCL (C-like) Let s you use whatever graphics card you happen to have Working examples, explained in lectures Language and compiler setup covered in fourth coursework Not expected to become a guru, just make it faster HPCE / dt10/ 2014 / 0.32
33 Key Focus: Engineering How does this apply to you? Examples from Elec. Eng. problems Mathematical analysis Simulation of digital circuits VLSI circuit layout Communication channel evaluation Tools and languages used in EE C / C++ MATLAB HPCE / dt10/ 2014 / 0.33
34 Simple example : Totient function Eulers totient function: totient(n) Number of integers in range 1..n which are relatively prime to n Integers i and j are relatively prime if gcd(i,j)=1 Totient not included in MATLAB HPCE / dt10/ 2014 / 0.34
35 Version 0 : Simple loop Eulers totient function: totient(n) Number of integers in range 1..n which are relatively prime to n Not included in MATLAB Integers i and j are relatively prime if gcd(i,j)=1 function [res]=totient_v0(n) res=0; for i=1:n % Loop over all numbers in 1..n if gcd(i,n)==1 % Check if relatively prime res=res+1; % Count any that are end end HPCE / dt10/ 2014 / 0.35
36 Version 1 : Vectorising Convert loops into vector operations Standard MATLAB optimisation Actually a way of making parallelism explicit function [res]=totient_v1(n) numbers=1:n; % Generate all numbers in 1..n gcd_res= (gcd(numbers,n)==1); % Perform GCD on all numbers res=sum(gcd_res==1); % Count all relatively prime numbers HPCE / dt10/ 2014 / 0.36
37 Version 2 : Parallel for loop MATLAB supports a parfor command Each loop iteration is/may be executed in parallel Can operate on multiple cores, and even multiple machines HPCE / dt10/ 2014 / 0.37
38 Version 2 : Parallel for loop MATLAB supports a parfor command Each loop iteration is/may be executed in parallel Can operate on multiple cores, and even multiple machines function [res]=totient_v2(n) res=0; parfor i=1:n % Loop over all numbers in 1..n if gcd(i,n)==1 % Check if relatively prime res=res+1; % Count any that are end end HPCE / dt10/ 2014 / 0.38
39 Version 3 : Agglomeration Too much overhead with current parallel loop Each parallel iteration has a cost due to scheduling Process space in chunks, using smaller vectors function [res]=totient_v3(n, step) if nargin<2 % How large each chunk should be step=1000; end res=0; % Loop over each chunk parfor i=1:floor(n/step) % Then process each chunk as a vector numbers=(i-1)*step+1:min(i*step,n); rel_prime= (gcd(numbers,n)==1); res=res+sum(rel_prime); end HPCE / dt10/ 2014 / 0.39
40 Results from my 4-core desktop v0: For Loop v1: Vectorised v2: ParFor Loop v3: ParFor Chunked v4: Algorithm X x 10 4 HPCE / dt10/ 2014 / 0.40
41 Results from my 4-core desktop v0: For Loop v1: Vectorised v2: ParFor Loop v3: ParFor Chunked v4: Algorithm X HPCE / dt10/ 2014 / 0.41
Cilk programs as a DAG
Cilk programs as a DAG The pattern of spawn and sync commands defines a graph The graph contains dependencies between different functions spawn command creates a new task with an out-bound link sync command
More informationCPU-GPU Heterogeneous Computing
CPU-GPU Heterogeneous Computing Advanced Seminar "Computer Engineering Winter-Term 2015/16 Steffen Lammel 1 Content Introduction Motivation Characteristics of CPUs and GPUs Heterogeneous Computing Systems
More informationECE 8823: GPU Architectures. Objectives
ECE 8823: GPU Architectures Introduction 1 Objectives Distinguishing features of GPUs vs. CPUs Major drivers in the evolution of general purpose GPUs (GPGPUs) 2 1 Chapter 1 Chapter 2: 2.2, 2.3 Reading
More informationGPUs and Emerging Architectures
GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs
More informationHigh performance Computing and O&G Challenges
High performance Computing and O&G Challenges 2 Seismic exploration challenges High Performance Computing and O&G challenges Worldwide Context Seismic,sub-surface imaging Computing Power needs Accelerating
More informationUse cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games
Viewdle Inc. 1 Use cases Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games 2 Why OpenCL matter? OpenCL is going to bring such
More informationAMD s Unified CPU & GPU Processor Concept
Advanced Seminar Computer Engineering Institute of Computer Engineering (ZITI) University of Heidelberg February 5, 2014 Overview 1 2 Current Platforms: 3 4 5 Architecture 6 2/37 Single-thread Performance
More informationGPU for HPC. October 2010
GPU for HPC Simone Melchionna Jonas Latt Francis Lapique October 2010 EPFL/ EDMX EPFL/EDMX EPFL/DIT simone.melchionna@epfl.ch jonas.latt@epfl.ch francis.lapique@epfl.ch 1 Moore s law: in the old days,
More information! Readings! ! Room-level, on-chip! vs.!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads
More informationConvergence of Parallel Architecture
Parallel Computing Convergence of Parallel Architecture Hwansoo Han History Parallel architectures tied closely to programming models Divergent architectures, with no predictable pattern of growth Uncertainty
More informationLecture 1: Gentle Introduction to GPUs
CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 1: Gentle Introduction to GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Who Am I? Mohamed
More informationHigh Performance Computing Course Notes Course Administration
High Performance Computing Course Notes 2009-2010 2010 Course Administration Contacts details Dr. Ligang He Home page: http://www.dcs.warwick.ac.uk/~liganghe Email: liganghe@dcs.warwick.ac.uk Office hours:
More informationCOMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES
COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES P(ND) 2-2 2014 Guillaume Colin de Verdière OCTOBER 14TH, 2014 P(ND)^2-2 PAGE 1 CEA, DAM, DIF, F-91297 Arpajon, France October 14th, 2014 Abstract:
More informationParallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple
More informationCross Teaching Parallelism and Ray Tracing: A Project based Approach to Teaching Applied Parallel Computing
and Ray Tracing: A Project based Approach to Teaching Applied Parallel Computing Chris Lupo Computer Science Cal Poly Session 0311 GTC 2012 Slide 1 The Meta Data Cal Poly is medium sized, public polytechnic
More informationTrends and Concepts in Software Industry I
Trends and Concepts in Software Industry I Goals Deep technical understanding of column-oriented dictionary-encoded in-memory databases and its application in enterprise computing Foundations of database
More informationAn Execution Strategy and Optimized Runtime Support for Parallelizing Irregular Reductions on Modern GPUs
An Execution Strategy and Optimized Runtime Support for Parallelizing Irregular Reductions on Modern GPUs Xin Huo, Vignesh T. Ravi, Wenjing Ma and Gagan Agrawal Department of Computer Science and Engineering
More informationGPU ACCELERATED DATABASE MANAGEMENT SYSTEMS
CIS 601 - Graduate Seminar Presentation 1 GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS PRESENTED BY HARINATH AMASA CSU ID: 2697292 What we will talk about.. Current problems GPU What are GPU Databases GPU
More informationHigh Performance Computing Course Notes HPC Fundamentals
High Performance Computing Course Notes 2008-2009 2009 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
More informationAutomatic Intra-Application Load Balancing for Heterogeneous Systems
Automatic Intra-Application Load Balancing for Heterogeneous Systems Michael Boyer, Shuai Che, and Kevin Skadron Department of Computer Science University of Virginia Jayanth Gummaraju and Nuwan Jayasena
More informationPerformance potential for simulating spin models on GPU
Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationAdvances of parallel computing. Kirill Bogachev May 2016
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
More informationIntroduction II. Overview
Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and
More informationCurrent Trends in Computer Graphics Hardware
Current Trends in Computer Graphics Hardware Dirk Reiners University of Louisiana Lafayette, LA Quick Introduction Assistant Professor in Computer Science at University of Louisiana, Lafayette (since 2006)
More informationGeneral-purpose computing on graphics processing units (GPGPU)
General-purpose computing on graphics processing units (GPGPU) Thomas Ægidiussen Jensen Henrik Anker Rasmussen François Rosé November 1, 2010 Table of Contents Introduction CUDA CUDA Programming Kernels
More informationHow GPUs can find your next hit: Accelerating virtual screening with OpenCL. Simon Krige
How GPUs can find your next hit: Accelerating virtual screening with OpenCL Simon Krige ACS 2013 Agenda > Background > About blazev10 > What is a GPU? > Heterogeneous computing > OpenCL: a framework for
More informationParallel Computing Platforms
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationMatCL - OpenCL MATLAB Interface
MatCL - OpenCL MATLAB Interface MatCL - OpenCL MATLAB Interface Slide 1 MatCL - OpenCL MATLAB Interface OpenCL toolkit for Mathworks MATLAB/SIMULINK Compile & Run OpenCL Kernels Handles OpenCL memory management
More informationGAME PROGRAMMING ON HYBRID CPU-GPU ARCHITECTURES TAKAHIRO HARADA, AMD DESTRUCTION FOR GAMES ERWIN COUMANS, AMD
GAME PROGRAMMING ON HYBRID CPU-GPU ARCHITECTURES TAKAHIRO HARADA, AMD DESTRUCTION FOR GAMES ERWIN COUMANS, AMD GAME PROGRAMMING ON HYBRID CPU-GPU ARCHITECTURES Jason Yang, Takahiro Harada AMD HYBRID CPU-GPU
More informationDid I Just Do That on a Bunch of FPGAs?
Did I Just Do That on a Bunch of FPGAs? Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto About the Talk Title It s the measure
More informationInstruction Set Architecture ( ISA ) 1 / 28
Instruction Set Architecture ( ISA ) 1 / 28 instructions 2 / 28 Instruction Set Architecture Also called (computer) architecture Implementation --> actual realisation of ISA ISA can have multiple implementations
More informationProgramming Parallel Computers
ICS-E4020 Programming Parallel Computers Jukka Suomela Jaakko Lehtinen Samuli Laine Aalto University Spring 2016 users.ics.aalto.fi/suomela/ppc-2016/ New code must be parallel! otherwise a computer from
More informationDell Cloud Client Computing. Dennis Larsen DVS Specialist Dell Cloud Client Computing
Dell Cloud Client Computing Dennis Larsen DVS Specialist Dell Cloud Client Computing Dennis_larsen@dell.com What is Dell Cloud Client Computing (CCC)? Desktop Virtualization Solutions (DVS) Dell cloud
More informationGPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS
GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS Agenda Forming a GPGPU WG 1 st meeting Future meetings Activities Forming a GPGPU WG To raise needs and enhance information sharing A platform for knowledge
More informationLaptop Requirement: Technical Specifications and Guidelines. Frequently Asked Questions
Laptop Requirement: Technical Specifications and Guidelines As artists and designers, you will be working in an increasingly digital landscape. The Parsons curriculum addresses this by making digital literacy
More informationAccelerating Data Warehousing Applications Using General Purpose GPUs
Accelerating Data Warehousing Applications Using General Purpose s Sponsors: Na%onal Science Founda%on, LogicBlox Inc., IBM, and NVIDIA The General Purpose is a many core co-processor 10s to 100s of cores
More informationParallel Processing for Data Deduplication
Parallel Processing for Data Deduplication Peter Sobe, Denny Pazak, Martin Stiehr Faculty of Computer Science and Mathematics Dresden University of Applied Sciences Dresden, Germany Corresponding Author
More informationHPC with GPU and its applications from Inspur. Haibo Xie, Ph.D
HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com 2 Agenda I. HPC with GPU II. YITIAN solution and application 3 New Moore s Law 4 HPC? HPC stands for High Heterogeneous Performance
More informationWhy? High performance clusters: Fast interconnects Hundreds of nodes, with multiple cores per node Large storage systems Hardware accelerators
Remote CUDA (rcuda) Why? High performance clusters: Fast interconnects Hundreds of nodes, with multiple cores per node Large storage systems Hardware accelerators Better performance-watt, performance-cost
More informationOpenCL. Matt Sellitto Dana Schaa Northeastern University NUCAR
OpenCL Matt Sellitto Dana Schaa Northeastern University NUCAR OpenCL Architecture Parallel computing for heterogenous devices CPUs, GPUs, other processors (Cell, DSPs, etc) Portable accelerated code Defined
More informationReal Parallel Computers
Real Parallel Computers Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005 Short history
More informationDuksu Kim. Professional Experience Senior researcher, KISTI High performance visualization
Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior
More informationPractical Scientific Computing
Practical Scientific Computing Performance-optimized Programming Preliminary discussion: July 11, 2008 Dr. Ralf-Peter Mundani, mundani@tum.de Dipl.-Ing. Ioan Lucian Muntean, muntean@in.tum.de MSc. Csaba
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationCS 220: Introduction to Parallel Computing. Introduction to CUDA. Lecture 28
CS 220: Introduction to Parallel Computing Introduction to CUDA Lecture 28 Today s Schedule Project 4 Read-Write Locks Introduction to CUDA 5/2/18 CS 220: Parallel Computing 2 Today s Schedule Project
More informationProgramming Models for Multi- Threading. Brian Marshall, Advanced Research Computing
Programming Models for Multi- Threading Brian Marshall, Advanced Research Computing Why Do Parallel Computing? Limits of single CPU computing performance available memory I/O rates Parallel computing allows
More informationPerformance of computer systems
Performance of computer systems Many different factors among which: Technology Raw speed of the circuits (clock, switching time) Process technology (how many transistors on a chip) Organization What type
More informationGPUs and GPGPUs. Greg Blanton John T. Lubia
GPUs and GPGPUs Greg Blanton John T. Lubia PROCESSOR ARCHITECTURAL ROADMAP Design CPU Optimized for sequential performance ILP increasingly difficult to extract from instruction stream Control hardware
More informationWhat is This Course About? CS 356 Unit 0. Today's Digital Environment. Why is System Knowledge Important?
0.1 What is This Course About? 0.2 CS 356 Unit 0 Class Introduction Basic Hardware Organization Introduction to Computer Systems a.k.a. Computer Organization or Architecture Filling in the "systems" details
More informationUsing GPUs for unstructured grid CFD
Using GPUs for unstructured grid CFD Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford e-research Centre Schlumberger Abingdon Technology Centre, February 17th, 2011
More informationDirected Optimization On Stencil-based Computational Fluid Dynamics Application(s)
Directed Optimization On Stencil-based Computational Fluid Dynamics Application(s) Islam Harb 08/21/2015 Agenda Motivation Research Challenges Contributions & Approach Results Conclusion Future Work 2
More informationviewdle! - machine vision experts
viewdle! - machine vision experts topic using algorithmic metadata creation and heterogeneous computing to build the personal content management system of the future Page 2 Page 3 video of basic recognition
More informationIntroduction to GPGPU and GPU-architectures
Introduction to GPGPU and GPU-architectures Henk Corporaal Gert-Jan van den Braak http://www.es.ele.tue.nl/ Contents 1. What is a GPU 2. Programming a GPU 3. GPU thread scheduling 4. GPU performance bottlenecks
More informationNVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield
NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host
More informationTrends and Challenges in Multicore Programming
Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores
More informationand Parallel Algorithms Programming with CUDA, WS09 Waqar Saleem, Jens Müller
Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller Organization People Waqar Saleem, waqar.saleem@uni-jena.de Jens Mueller, jkm@informatik.uni-jena.de Room 3335, Ernst-Abbe-Platz 2
More informationHybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS
+ Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics
More informationTOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT
TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT Eric Kelmelis 28 March 2018 OVERVIEW BACKGROUND Evolution of processing hardware CROSS-PLATFORM KERNEL DEVELOPMENT Write once, target multiple hardware
More informationAFOSR BRI: Codifying and Applying a Methodology for Manual Co-Design and Developing an Accelerated CFD Library
AFOSR BRI: Codifying and Applying a Methodology for Manual Co-Design and Developing an Accelerated CFD Library Synergy@VT Collaborators: Paul Sathre, Sriram Chivukula, Kaixi Hou, Tom Scogland, Harold Trease,
More informationHardware/Software Codesign
Hardware/Software Codesign SS 2016 Prof. Dr. Christian Plessl High-Performance IT Systems group University of Paderborn Version 2.2.0 2016-04-08 how to design a "digital TV set top box" Motivating Example
More informationad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors
ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors Weifeng Liu and Brian Vinter Niels Bohr Institute University of Copenhagen Denmark {weifeng, vinter}@nbi.dk March 1, 2014 Weifeng
More informationGPU ACCELERATED TOTAL FOCUSING METHOD IN CIVA
OPARUS GPU ACCELERATED TOTAL FOCUSING METHOD IN CIVA Authors: Gilles ROUGERON, Jason LAMBERT, Ekaterina IAKOVLEVA, L. LACASSAGNE Presenter: Nicolas DOMINGUEZ QNDE 2013 Baltimore, Md, USA, 24/07/2013 CEA
More informationGeneral Purpose GPU Computing in Partial Wave Analysis
JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied
More informationCUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging
CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging Saoni Mukherjee, Nicholas Moore, James Brock and Miriam Leeser September 12, 2012 1 Outline Introduction to CT Scan, 3D reconstruction
More informationHETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE
HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE Haibo Xie, Ph.D. Chief HSA Evangelist AMD China OUTLINE: The Challenges with Computing Today Introducing Heterogeneous System Architecture (HSA)
More informationEE 7722 GPU Microarchitecture. Offered by: Prerequisites By Topic: Text EE 7722 GPU Microarchitecture. URL:
00 1 EE 7722 GPU Microarchitecture 00 1 EE 7722 GPU Microarchitecture URL: http://www.ece.lsu.edu/gp/. Offered by: David M. Koppelman 345 ERAD, 578-5482, koppel@ece.lsu.edu, http://www.ece.lsu.edu/koppel
More informationLarge scale Imaging on Current Many- Core Platforms
Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,
More informationFundamentals of Quantitative Design and Analysis
Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature
More informationGPUfs: Integrating a file system with GPUs
GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Traditional System Architecture Applications OS CPU
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware
More informationIntroduction to Parallel Computing
Introduction to Parallel Computing Chris Kauffman CS 499: Spring 2016 GMU Goals Motivate: Parallel Programming Overview concepts a bit Discuss course mechanics Moore s Law Smaller transistors closer together
More informationTrends in the Infrastructure of Computing
Trends in the Infrastructure of Computing CSCE 9: Computing in the Modern World Dr. Jason D. Bakos My Questions How do computer processors work? Why do computer processors get faster over time? How much
More informationAn innovative compilation tool-chain for embedded multi-core architectures M. Torquati, Computer Science Departmente, Univ.
An innovative compilation tool-chain for embedded multi-core architectures M. Torquati, Computer Science Departmente, Univ. Of Pisa Italy 29/02/2012, Nuremberg, Germany ARTEMIS ARTEMIS Joint Joint Undertaking
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationParaFormance TM : An Advanced Refactoring Tool for Parallelising C++ Programs Part 1
ParaFormance TM : An Advanced Refactoring Tool for Parallelising C++ Programs Part 1 Chris Brown, Vladimir Janjic, Kenneth MacKenzie, Kevin Hammond University of St Andrews, Scotland @chrismarkbrown @rephrase_eu
More informationProgrammer's View of Execution Teminology Summary
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 28: GP-GPU Programming GPUs Hardware specialized for graphics calculations Originally developed to facilitate the use of CAD programs
More informationLift: a Functional Approach to Generating High Performance GPU Code using Rewrite Rules
Lift: a Functional Approach to Generating High Performance GPU Code using Rewrite Rules Toomas Remmelg Michel Steuwer Christophe Dubach The 4th South of England Regional Programming Language Seminar 27th
More informationEmbarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA
Embarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA Pierre Nowodzienski Engineer pierre.nowodzienski@mathworks.fr 2018 The MathWorks, Inc. 1 From Data to Business value Make decisions Get
More informationGraphics Processor Acceleration and YOU
Graphics Processor Acceleration and YOU James Phillips Research/gpu/ Goals of Lecture After this talk the audience will: Understand how GPUs differ from CPUs Understand the limits of GPU acceleration Have
More informationRenderscript Accelerated Advanced Image and Video Processing on ARM Mali T-600 GPUs. Lihua Zhang, Ph.D. MulticoreWare Inc.
Renderscript Accelerated Advanced Image and Video Processing on ARM Mali T-600 GPUs Lihua Zhang, Ph.D. MulticoreWare Inc. lihua@multicorewareinc.com Overview More & more mobile apps are beginning to require
More informationIntroduction to Multicore Programming
Introduction to Multicore Programming Minsoo Ryu Department of Computer Science and Engineering 2 1 Multithreaded Programming 2 Automatic Parallelization and OpenMP 3 GPGPU 2 Multithreaded Programming
More informationTurbostream: A CFD solver for manycore
Turbostream: A CFD solver for manycore processors Tobias Brandvik Whittle Laboratory University of Cambridge Aim To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware
More informationHeterogenous Computing
Heterogenous Computing Fall 2018 CS, SE - Freshman Seminar 11:00 a 11:50a Computer Architecture What are the components of a computer? How do these components work together to perform computations? How
More informationEvaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi
Evaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
More informationCMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN
CMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN Graphics Processing Unit Accelerate the creation of images in a frame buffer intended for the output
More informationAnalyses, Hardware/Software Compilation, Code Optimization for Complex Dataflow HPC Applications
Analyses, Hardware/Software Compilation, Code Optimization for Complex Dataflow HPC Applications CASH team proposal (Compilation and Analyses for Software and Hardware) Matthieu Moy and Christophe Alias
More informationEE 434 ASIC & Digital Systems
EE 434 ASIC & Digital Systems Dae Hyun Kim EECS Washington State University Spring 2018 Course Website http://eecs.wsu.edu/~ee434 Themes Study how to design, analyze, and test a complex applicationspecific
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationGPU ARCHITECTURE Chris Schultz, June 2017
GPU ARCHITECTURE Chris Schultz, June 2017 MISC All of the opinions expressed in this presentation are my own and do not reflect any held by NVIDIA 2 OUTLINE CPU versus GPU Why are they different? CUDA
More informationOverview of the ECE Computer Software Curriculum. David O Hallaron Associate Professor of ECE and CS Carnegie Mellon University
Overview of the ECE Computer Software Curriculum David O Hallaron Associate Professor of ECE and CS Carnegie Mellon University The Fundamental Idea of Abstraction Human beings Applications Software systems
More informationLeveraging Mobile GPUs for Flexible High-speed Wireless Communication
0 Leveraging Mobile GPUs for Flexible High-speed Wireless Communication Qi Zheng, Cao Gao, Trevor Mudge, Ronald Dreslinski *, Ann Arbor The 3 rd International Workshop on Parallelism in Mobile Platforms
More informationGPUs have enormous power that is enormously difficult to use
524 GPUs GPUs have enormous power that is enormously difficult to use Nvidia GP100-5.3TFlops of double precision This is equivalent to the fastest super computer in the world in 2001; put a single rack
More informationOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC
More informationLarge Data in MATLAB: A Seismic Data Processing Case Study U. M. Sundar Senior Application Engineer
Large Data in MATLAB: A Seismic Data Processing Case Study U. M. Sundar Senior Application Engineer 2013 MathWorks, Inc. 1 Problem Statement: Scaling Up Seismic Analysis Challenge: Developing a seismic
More informationManycore and GPU Channelisers. Seth Hall High Performance Computing Lab, AUT
Manycore and GPU Channelisers Seth Hall High Performance Computing Lab, AUT GPU Accelerated Computing GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a CPU to accelerate
More informationFinite Element Integration and Assembly on Modern Multi and Many-core Processors
Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,
More informationPreliminary Discussion
Preliminary Discussion Multi-Core Architectures and Programming Oliver Reiche, Christian Schmitt, Michael Witterauf, Frank Hannig Hardware/Software Co-Design, Friedrich-Alexander University Erlangen-Nürnberg
More informationGPU programming. Dr. Bernhard Kainz
GPU programming Dr. Bernhard Kainz Overview About myself Motivation GPU hardware and system architecture GPU programming languages GPU programming paradigms Pitfalls and best practice Reduction and tiling
More information