COMPILER-ASSISTED TEST ACCELERATION ON GPUS FOR EMBEDDED SOFTWARE
|
|
- Ophelia Rogers
- 5 years ago
- Views:
Transcription
1 COMPILER-ASSISTED TEST ACCELERATION ON GPUS FOR EMBEDDED SOFTWARE VANYA YANEVA Ajitha Rajan, Christophe Dubach ISSTA July 2017 Santa Barbara, CA
2 EMBEDDED SOFTWARE IS EVERYWHERE ITS SAFETY AND CORRECTNESS ARE CRUCIAL FUNCTIONAL TESTING IS CRITICAL
3 FUNCTIONAL TESTING CAN BE EXTREMELY TIME CONSUMING
4 FUNCTIONAL TESTING CAN BE EXTREMELY TIME CONSUMING Test suite Test case 1 Test case 2 Test case 3 Expected result 1 Expected result 2 Expected result 3 Application Test case n Expected result n
5 FUNCTIONAL TESTING CAN BE EXTREMELY TIME CONSUMING Test suite Test case 1 Test case 2 Test case 3 Expected result 1 Expected result 2 Expected result 3 Application Test case n Expected result n TESTING IS AN IDEAL CANDIDATE FOR PARALLELISATION
6 CPU SERVERS Expensive Do not scale easily as test suites grow Can be extremely underutilised
7 CPU SERVERS Expensive Do not scale easily as test suites grow Can be extremely underutilised GPUS Cheap and widely available Large-scale parallelism, thousands of threads SIMD architecture suited to functional testing
8 EXECUTE TESTS IN PARALLEL ON THE GPU THREADS Test suite Test case 1 Test case 2 Test case 3 Read test cases: INPUT[] = {test case 1 test case n} Transfer INPUT[] to GPU memory Build and launch tested program on the GPU threads Expected result 1 Expected result 2 Expected result 3 th_id n-1 OUTPUT[th_id] = program( INPUT[th_id] ) Test case n Expected result n Transfer OUTPUT[] to CPU memory A. Rajan, S. Sharma, P. Schrammel, D. Kroening. Accelerated test execution using GPUs. In proceedings of ASE 2014, pages , Sweden, Nov 2014.
9 EXECUTE TESTS IN PARALLEL ON THE GPU THREADS Test suite Read test cases: INPUT[] = {test case 1 test case n} Test case 1 Test case 2 Test case 3 Transfer INPUT[] to GPU memory Build and launch tested program on the GPU threads th_id n-1 Expected result 1 Expected result 2 Expected result 3 CHALLENGES Usability Test case n OUTPUT[th_id] = program( INPUT[th_id] ) Transfer OUTPUT[] to CPU memory Expected result n Scope Performance? A. Rajan, S. Sharma, P. Schrammel, D. Kroening. Accelerated test execution using GPUs. In proceedings of ASE 2014, pages , Sweden, Nov 2014.
10 INTRODUCING PARTECL Test cases (CSV format) Unmodified source files ParTeCL CodeGen OpenCL ParTeCL Runtime Execution on the GPU Config file
11 INPUTS Example: Configuration: #include <stdio.h> #include <stdlib.h> int c; int addc(int a, int b){ return a + b + c; } input: int a 1 input: int b 2 result: int sum variable: sum Test cases: int main(int argc, char* argv[]){ int a = atoi(argv[1]); int b = atoi(argv[2]); c = 3; int sum = addc(a, b); printf("%d + %d + %c = %d\n", a, b, c, sum); }
12 PARTECL CODEGEN Example: OpenCL: #include <stdio.h> #include <stdlib.h> int c; int addc(int a, int b){ return a + b + c; } int main(int argc, char* argv[]){ int a = atoi(argv[1]); int b = atoi(argv[2]); c = 3; int sum = addc(a, b); printf("%d + %d + %c = %d\n", a, b, c, sum); } #include "structs.h" //#include <stdio.h> //#include <stdlib.h> /*int c;*/ int addc(int a, int b, int *c){ return a + b + (*c); } kernel void main_kernel( global struct test_input* inputs, global struct test_result* results){ int idx = get_global_id(0); struct test_input input_gen = inputs[idx]; global struct test_result *result_gen = &results[idx]; int argc = input_gen.argc; result_gen->test_case_num = input_gen.test_case_num; int c; int a = input_gen.a; int b = input_gen.b; c = 3; int sum = addc(a, b, &c); /*printf("%d + %d + %c = %d\n", a, b, c, sum);*/ result_gen->sum = sum; }
13 CODE TRANSFORMATIONS global scope variables command line arguments standard in/out standard library (partial support): clclibc
14 PARTECL RUNTIME Read test cases: INPUT[] = {test case 1 test case n} Transfer INPUT[] to GPU memory Automatically generated OpenCL Build and launch tested program on the GPU threads th_id n-1 OUTPUT[th_id] = program( INPUT[th_id] ) Transfer OUTPUT[] to CPU memory
15 CHALLENGES Usability Scope Performance? Test cases (CSV format) Unmodified source files ParTeCL CodeGen OpenCL ParTeCL Runtime Execution on the GPU Config file
16 EVALUATION 1. Speedup against CPU 2. Data transfer overhead 3. Comparison to a multi-core CPU 4. Correctness
17 EXPERIMENT Subjects: EEMBC - Industry-standard benchmark suite for embedded software Hardware: GPU - NVidia Tesla K40m; CPU - Intel Xeon, 8 cores Test suite size: 130K
18 SPEEDUP AGAINST CPU
19 DATA TRANSFER OVERHEAD viterb00 Input transfer Output transfer Kernelexecution 80 fbital00 Input transfer Output transfer Kernelexecution a2time01 Input transfer Output transfer Kernelexecution 40 autcor00 Input transfer Output transfer Kernelexecution Execution time [ms] Execution time [ms] Execution time [ms] Execution time [ms] Number of tests (log base 2 scale) Number of tests (log base 2 scale) Number of tests (log base 2 scale Number of tests (log base 2 scale) Execution time [ms] tblook01 Input transfer Output transfer Kernelexecution Execution time [ms] conven00 Input transfer Output transfer Kernelexecution Execution time [ms] fft00 Input transfer Output transfer Kernelexecution Execution time [ms] puwmod01 Input transfer Output transfer Kernelexecution Execution time [ms] rspeed01 Input transfer Output transfer Kernelexecution Number of tests (log base 2 scale Number of tests (log base 2 scale) Number of tests (log base 2 scale) Number of tests (log base 2 scale Number of tests (log base 2 scale
20 DATA TRANSFER OVERHEAD
21 COMPARISON TO A MULTI-CORE CPU
22 CHALLENGES Usability Scope Performance
23 CORRECTNESS For all 9 benchmarks, testing results from the GPU are an exact match to the testing results from the CPU.
24 SUMMARY Automatic GPU code generation Automatic test execution on the GPU threads Speedup of up to 53x (avg 16x) on EEMBC benchmarks Correct testing results
25 SUMMARY Automatic GPU code generation Automatic test execution on the GPU threads Speedup of up to 53x (avg 16x) on EEMBC benchmarks Correct testing results FUTURE WORK Extend evaluation & scope Analyse & improve performance
26 THANKS ParTeCL CodeGen ParTeCL Runtime clclibc github.com/wyaneva/partecl-codegen github.com/wyaneva/partecl-runtime github.com/wyaneva/clclibc
27
28 C FEATURES Out of the box: pure functions, function calls, double precision (for OpenCL 1.2) With transformations: standard in/out global scope variables standard library calls (partial support) Unsupported (yet): dynamic memory allocation file I/O recursion
Ajitha Rajan, Christophe Dubach. in preparation for: ISSTA July 2017 Santa Barbara, CA
COMPILER-ASSISTED TEST ACCELERATION ON GPUS FOR EMBEDDED SOFTWARE VANYA YANEVA Ajitha Rajan, Christophe Dubach in preparation for: ISSTA 2017 10 July 2017 Santa Barbara, CA EMBEDDED SOFTWARE IS EVERYWHERE
More informationAccelerated Test Execution Using GPUs
Accelerated Test Execution Using GPUs Vanya Yaneva Supervisors: Ajitha Rajan, Christophe Dubach Mathworks May 27, 2016 The Problem Software testing is time consuming Functional testing The Problem Software
More informationCompiler-Assisted Test Acceleration on GPUs for Embedded Software
Compiler-Assisted Test Acceleration on GPUs for Embedded Software Vanya Yaneva School of Informatics University of Edinburgh, UK vanya.yaneva@ed.ac.uk Ajitha Rajan School of Informatics University of Edinburgh,
More informationAutomated Test Execution Using GPUs
Automated Test Execution Using GPUs Vanya Yaneva E H U N I V E R S I T Y T O H F R G E D I N B U Master of Science by Research Laboratory for Foundations of Computer Science CDT in Pervasive Parallelism
More informationParallel Programming Libraries and implementations
Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.
More informationPragma-based GPU Programming and HMPP Workbench. Scott Grauer-Gray
Pragma-based GPU Programming and HMPP Workbench Scott Grauer-Gray Pragma-based GPU programming Write programs for GPU processing without (directly) using CUDA/OpenCL Place pragmas to drive processing on
More informationParallel Hybrid Computing F. Bodin, CAPS Entreprise
Parallel Hybrid Computing F. Bodin, CAPS Entreprise Introduction Main stream applications will rely on new multicore / manycore architectures It is about performance not parallelism Various heterogeneous
More informationDealing with Heterogeneous Multicores
Dealing with Heterogeneous Multicores François Bodin INRIA-UIUC, June 12 th, 2009 Introduction Main stream applications will rely on new multicore / manycore architectures It is about performance not parallelism
More informationWhen you add a number to a pointer, that number is added, but first it is multiplied by the sizeof the type the pointer points to.
Refresher When you add a number to a pointer, that number is added, but first it is multiplied by the sizeof the type the pointer points to. i.e. char *ptr1 = malloc(1); ptr1 + 1; // adds 1 to pointer
More informationParallel Programming. Libraries and Implementations
Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationPerformance Diagnosis for Hybrid CPU/GPU Environments
Performance Diagnosis for Hybrid CPU/GPU Environments Michael M. Smith and Karen L. Karavanic Computer Science Department Portland State University Performance Diagnosis for Hybrid CPU/GPU Environments
More informationOpenCL C. Matt Sellitto Dana Schaa Northeastern University NUCAR
OpenCL C Matt Sellitto Dana Schaa Northeastern University NUCAR OpenCL C Is used to write kernels when working with OpenCL Used to code the part that runs on the device Based on C99 with some extensions
More informationCSCI565 Compiler Design
CSCI565 Compiler Design Spring 2011 Homework 4 Solution Due Date: April 6, 2011 in class Problem 1: Activation Records and Stack Layout [50 points] Consider the following C source program shown below.
More informationECE 574 Cluster Computing Lecture 10
ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular
More informationJCudaMP: OpenMP/Java on CUDA
JCudaMP: OpenMP/Java on CUDA Georg Dotzler, Ronald Veldema, Michael Klemm Programming Systems Group Martensstraße 3 91058 Erlangen Motivation Write once, run anywhere - Java Slogan created by Sun Microsystems
More informationParallel Computing. November 20, W.Homberg
Mitglied der Helmholtz-Gemeinschaft Parallel Computing November 20, 2017 W.Homberg Why go parallel? Problem too large for single node Job requires more memory Shorter time to solution essential Better
More informationcuda-on-cl A compiler and runtime for running NVIDIA CUDA C++11 applications on OpenCL 1.2 devices Hugh Perkins (ASAPP)
cuda-on-cl A compiler and runtime for running NVIDIA CUDA C++11 applications on OpenCL 1.2 devices Hugh Perkins (ASAPP) Demo: CUDA on Intel HD5500 global void setvalue(float *data, int idx, float value)
More informationGPU Programming with Ateji PX June 8 th Ateji All rights reserved.
GPU Programming with Ateji PX June 8 th 2010 Ateji All rights reserved. Goals Write once, run everywhere, even on a GPU Target heterogeneous architectures from Java GPU accelerators OpenCL standard Get
More informationData Parallel Algorithmic Skeletons with Accelerator Support
MÜNSTER Data Parallel Algorithmic Skeletons with Accelerator Support Steffen Ernsting and Herbert Kuchen July 2, 2015 Agenda WESTFÄLISCHE MÜNSTER Data Parallel Algorithmic Skeletons with Accelerator Support
More informationA Large-Scale Cross-Architecture Evaluation of Thread-Coarsening. Alberto Magni, Christophe Dubach, Michael O'Boyle
A Large-Scale Cross-Architecture Evaluation of Thread-Coarsening Alberto Magni, Christophe Dubach, Michael O'Boyle Introduction Wide adoption of GPGPU for HPC Many GPU devices from many of vendors AMD
More informationCS16 Midterm Exam 1 E01, 10S, Phill Conrad, UC Santa Barbara Wednesday, 04/21/2010, 1pm-1:50pm
CS16 Midterm Exam 1 E01, 10S, Phill Conrad, UC Santa Barbara Wednesday, 04/21/2010, 1pm-1:50pm Name: Umail Address: @ umail.ucsb.edu Circle Lab section: 9am 10am 11am noon (Link to Printer Friendly-PDF
More informationCSCI-243 Exam 1 Review February 22, 2015 Presented by the RIT Computer Science Community
CSCI-243 Exam 1 Review February 22, 2015 Presented by the RIT Computer Science Community http://csc.cs.rit.edu History and Evolution of Programming Languages 1. Explain the relationship between machine
More informationGPU ACCELERATED DATABASE MANAGEMENT SYSTEMS
CIS 601 - Graduate Seminar Presentation 1 GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS PRESENTED BY HARINATH AMASA CSU ID: 2697292 What we will talk about.. Current problems GPU What are GPU Databases GPU
More informationA Code Merging Optimization Technique for GPU. Ryan Taylor Xiaoming Li University of Delaware
A Code Merging Optimization Technique for GPU Ryan Taylor Xiaoming Li University of Delaware FREE RIDE MAIN FINDING A GPU program can use the spare resources of another GPU program without hurting its
More informationECE264 Fall 2013 Exam 1, September 24, 2013
ECE264 Fall 2013 Exam 1, September 24, 2013 In signing this statement, I hereby certify that the work on this exam is my own and that I have not copied the work of any other student while completing it.
More informationC programming for beginners
C programming for beginners Lesson 2 December 10, 2008 (Medical Physics Group, UNED) C basics Lesson 2 1 / 11 Main task What are the values of c that hold bounded? x n+1 = x n2 + c (x ; c C) (Medical Physics
More information3L Diamond. Multiprocessor DSP RTOS
3L Diamond Multiprocessor DSP RTOS What is 3L Diamond? Diamond is an operating system designed for multiprocessor DSP applications. With Diamond you develop efficient applications that use networks of
More informationComputer Systems Assignment 2: Fork and Threads Package
Autumn Term 2018 Distributed Computing Computer Systems Assignment 2: Fork and Threads Package Assigned on: October 5, 2018 Due by: October 12, 2018 1 Understanding fork() and exec() Creating new processes
More informationOpenACC (Open Accelerators - Introduced in 2012)
OpenACC (Open Accelerators - Introduced in 2012) Open, portable standard for parallel computing (Cray, CAPS, Nvidia and PGI); introduced in 2012; GNU has an incomplete implementation. Uses directives in
More informationParallel Systems. Project topics
Parallel Systems Project topics 2016-2017 1. Scheduling Scheduling is a common problem which however is NP-complete, so that we are never sure about the optimality of the solution. Parallelisation is a
More informationOmpSs + OpenACC Multi-target Task-Based Programming Model Exploiting OpenACC GPU Kernel
www.bsc.es OmpSs + OpenACC Multi-target Task-Based Programming Model Exploiting OpenACC GPU Kernel Guray Ozen guray.ozen@bsc.es Exascale in BSC Marenostrum 4 (13.7 Petaflops ) General purpose cluster (3400
More informationLesson 5: Functions and Libraries. EE3490E: Programming S1 2018/2019 Dr. Đào Trung Kiên Hanoi Univ. of Science and Technology
Lesson 5: Functions and Libraries 1 Functions 2 Overview Function is a block of statements which performs a specific task, and can be called by others Each function has a name (not identical to any other),
More informationParallel Programming Using MPI
Parallel Programming Using MPI Prof. Hank Dietz KAOS Seminar, February 8, 2012 University of Kentucky Electrical & Computer Engineering Parallel Processing Process N pieces simultaneously, get up to a
More informationWhy? High performance clusters: Fast interconnects Hundreds of nodes, with multiple cores per node Large storage systems Hardware accelerators
Remote CUDA (rcuda) Why? High performance clusters: Fast interconnects Hundreds of nodes, with multiple cores per node Large storage systems Hardware accelerators Better performance-watt, performance-cost
More informationGenerating Performance Portable Code using Rewrite Rules
Generating Performance Portable Code using Rewrite Rules From High-Level Functional Expressions to High-Performance OpenCL Code Michel Steuwer Christian Fensch Sam Lindley Christophe Dubach The Problem(s)
More informationntroduction to C CS 2022: ntroduction to C nstructor: Hussam Abu-Libdeh (based on slides by Saikat Guha) Fall 2011, Lecture 1 ntroduction to C CS 2022, Fall 2011, Lecture 1 History of C Writing code in
More informationCS 0449 Sample Midterm
Name: CS 0449 Sample Midterm Multiple Choice 1.) Given char *a = Hello ; char *b = World;, which of the following would result in an error? A) strlen(a) B) strcpy(a, b) C) strcmp(a, b) D) strstr(a, b)
More informationFluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices
Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices Prasanna Pandit Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore, India prasanna@hpc.serc.iisc.ernet.in
More informationHigh-Performance Computing Using GPUs
High-Performance Computing Using GPUs Luca Caucci caucci@email.arizona.edu Center for Gamma-Ray Imaging November 7, 2012 Outline Slide 1 of 27 Why GPUs? What is CUDA? The CUDA programming model Anatomy
More informationMIGRATION OF LEGACY APPLICATIONS TO HETEROGENEOUS ARCHITECTURES Francois Bodin, CTO, CAPS Entreprise. June 2011
MIGRATION OF LEGACY APPLICATIONS TO HETEROGENEOUS ARCHITECTURES Francois Bodin, CTO, CAPS Entreprise June 2011 FREE LUNCH IS OVER, CODES HAVE TO MIGRATE! Many existing legacy codes needs to migrate to
More informationIntroduction to OpenACC. 16 May 2013
Introduction to OpenACC 16 May 2013 GPUs Reaching Broader Set of Developers 1,000,000 s 100,000 s Early Adopters Research Universities Supercomputing Centers Oil & Gas CAE CFD Finance Rendering Data Analytics
More informationCA341 - Comparative Programming Languages
CA341 - Comparative Programming Languages David Sinclair Dynamic Data Structures Generally we do not know how much data a program will have to process. There are 2 ways to handle this: Create a fixed data
More informationthe Intel Xeon Phi coprocessor
the Intel Xeon Phi coprocessor 1 Introduction about the Intel Xeon Phi coprocessor comparing Phi with CUDA the Intel Many Integrated Core architecture 2 Programming the Intel Xeon Phi Coprocessor with
More informationX10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management
X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management Hideyuki Shamoto, Tatsuhiro Chiba, Mikio Takeuchi Tokyo Institute of Technology IBM Research Tokyo Programming for large
More informationCPSC 341 OS & Networks. Threads. Dr. Yingwu Zhu
CPSC 341 OS & Networks Threads Dr. Yingwu Zhu Processes Recall that a process includes many things An address space (defining all the code and data pages) OS resources (e.g., open files) and accounting
More informationECE 264 Exam 2. 6:30-7:30PM, March 9, You must sign here. Otherwise you will receive a 1-point penalty.
ECE 264 Exam 2 6:30-7:30PM, March 9, 2011 I certify that I will not receive nor provide aid to any other student for this exam. Signature: You must sign here. Otherwise you will receive a 1-point penalty.
More informationAMCAT Automata Coding Sample Questions And Answers
1) Find the syntax error in the below code without modifying the logic. #include int main() float x = 1.1; switch (x) case 1: printf( Choice is 1 ); default: printf( Invalid choice ); return
More informationOpenACC Fundamentals. Steve Abbott November 15, 2017
OpenACC Fundamentals Steve Abbott , November 15, 2017 AGENDA Data Regions Deep Copy 2 while ( err > tol && iter < iter_max ) { err=0.0; JACOBI ITERATION #pragma acc parallel loop reduction(max:err)
More informationOpenACC. Part I. Ned Nedialkov. McMaster University Canada. October 2016
OpenACC. Part I Ned Nedialkov McMaster University Canada October 2016 Outline Introduction Execution model Memory model Compiling pgaccelinfo Example Speedups Profiling c 2016 Ned Nedialkov 2/23 Why accelerators
More informationLDetector: A low overhead data race detector for GPU programs
LDetector: A low overhead data race detector for GPU programs 1 PENGCHENG LI CHEN DING XIAOYU HU TOLGA SOYATA UNIVERSITY OF ROCHESTER 1 Data races in GPU Introduction & Contribution Impact correctness
More informationCS 789 Multiprocessor Programming. Optimizing the Sequential Mandelbrot Computation.
CS 789 Multiprocessor Programming Optimizing the Sequential Mandelbrot Computation. School of Computer Science Howard Hughes College of Engineering University of Nevada, Las Vegas (c) Matt Pedersen, 2010
More informationCSE 160 Lecture 7. C++11 threads C++11 memory model
CSE 160 Lecture 7 C++11 threads C++11 memory model Today s lecture C++ threads The C++11 Memory model 2013 Scott B. Baden / CSE 160 / Winter 2013 2 C++11 Threads Via , C++ supports a threading
More informationParallel Programming. Libraries and implementations
Parallel Programming Libraries and implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationAdvanced OpenMP Features
Christian Terboven, Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group {terboven,schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University Vectorization 2 Vectorization SIMD =
More informationCUDA Programming. Aiichiro Nakano
CUDA Programming Aiichiro Nakano Collaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials Science
More informationECE264 Summer 2013 Exam 1, June 20, 2013
ECE26 Summer 2013 Exam 1, June 20, 2013 In signing this statement, I hereby certify that the work on this exam is my own and that I have not copied the work of any other student while completing it. I
More informationARCHER Champions 2 workshop
ARCHER Champions 2 workshop Mike Giles Mathematical Institute & OeRC, University of Oxford Sept 5th, 2016 Mike Giles (Oxford) ARCHER Champions 2 Sept 5th, 2016 1 / 14 Tier 2 bids Out of the 8 bids, I know
More informationCS/CoE 1541 Final exam (Fall 2017). This is the cumulative final exam given in the Fall of Question 1 (12 points): was on Chapter 4
CS/CoE 1541 Final exam (Fall 2017). Name: This is the cumulative final exam given in the Fall of 2017. Question 1 (12 points): was on Chapter 4 Question 2 (13 points): was on Chapter 4 For Exam 2, you
More informationCUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav
CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture
More informationAPT Session 4: C. Software Development Team Laurence Tratt. 1 / 14
APT Session 4: C Laurence Tratt Software Development Team 2017-11-10 1 / 14 http://soft-dev.org/ What to expect from this session 1 C. 2 / 14 http://soft-dev.org/ Prerequisites 1 Install either GCC or
More informationDS Assignment I. 1. Set a pointer by name first and last to point to the first element and last element of the list respectively.
DS Assignment I 1 Suppose an integer array by name list is declared of size N (ex: #define N 10 int list[n]; ) Write C statements to achieve the following: 1 Set a pointer by name first and last to point
More informationDesigning a Domain-specific Language to Simulate Particles. dan bailey
Designing a Domain-specific Language to Simulate Particles dan bailey Double Negative Largest Visual Effects studio in Europe Offices in London and Singapore Large and growing R & D team Squirt Fluid Solver
More informationOpenMP examples. Sergeev Efim. Singularis Lab, Ltd. Senior software engineer
OpenMP examples Sergeev Efim Senior software engineer Singularis Lab, Ltd. OpenMP Is: An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism.
More informationMemory Management. a C view. Dr Alun Moon KF5010. Computer Science. Dr Alun Moon (Computer Science) Memory Management KF / 24
Memory Management a C view Dr Alun Moon Computer Science KF5010 Dr Alun Moon (Computer Science) Memory Management KF5010 1 / 24 The Von Neumann model Memory Architecture One continuous address space Program
More informationAccelerating Polynomial Homotopy Continuation on a Graphics Processing Unit with Double Double and Quad Double Arithmetic
Accelerating Polynomial Homotopy Continuation on a Graphics Processing Unit with Double Double and Quad Double Arithmetic Jan Verschelde joint work with Xiangcheng Yu University of Illinois at Chicago
More informationMemory Management. To do. q Basic memory management q Swapping q Kernel memory allocation q Next Time: Virtual memory
Memory Management To do q Basic memory management q Swapping q Kernel memory allocation q Next Time: Virtual memory Memory management Ideal memory for a programmer large, fast, nonvolatile and cheap not
More informationMemory Allocation in C
Memory Allocation in C When a C program is loaded into memory, it is organized into three areas of memory, called segments: the text segment, stack segment and heap segment. The text segment (also called
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationCUDA Advanced Techniques 2 Mohamed Zahran (aka Z)
CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming CUDA Advanced Techniques 2 Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Alignment Memory Alignment Memory
More informationGPU Linear algebra extensions for GNU/Octave
Journal of Physics: Conference Series GPU Linear algebra extensions for GNU/Octave To cite this article: L B Bosi et al 2012 J. Phys.: Conf. Ser. 368 012062 View the article online for updates and enhancements.
More informationAutomatic Intra-Application Load Balancing for Heterogeneous Systems
Automatic Intra-Application Load Balancing for Heterogeneous Systems Michael Boyer, Shuai Che, and Kevin Skadron Department of Computer Science University of Virginia Jayanth Gummaraju and Nuwan Jayasena
More informationCS 2301 Exam 3 B-Term 2011
NAME: CS 2301 Exam 3 B-Term 2011 Questions 1-3: (15) Question 4: (15) Question 5: (20) Question 6: (10) Question 7: (15) Question 8: (15) Question 9: (10) TOTAL: (100) You may refer to one sheet of notes
More informationINTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD
INTRODUCTION TO OPENCL TM A Beginner s Tutorial Udeepta Bordoloi AMD IT S A HETEROGENEOUS WORLD Heterogeneous computing The new normal CPU Many CPU s 2, 4, 8, Very many GPU processing elements 100 s Different
More informationStrings. Compare these program fragments:
Objects 1 What are objects? 2 C doesn't properly support object oriented programming But it is reasonable to use the word object to mean a structure or array, accessed using a pointer This represents another
More information518 Lecture Notes Week 3
518 Lecture Notes Week 3 (Sept. 15, 2014) 1/8 518 Lecture Notes Week 3 1 Topics Process management Process creation with fork() Overlaying an existing process with exec Notes on Lab 3 2 Process management
More informationBIL 104E Introduction to Scientific and Engineering Computing. Lecture 14
BIL 104E Introduction to Scientific and Engineering Computing Lecture 14 Because each C program starts at its main() function, information is usually passed to the main() function via command-line arguments.
More informationCAPS Technology. ProHMPT, 2009 March12 th
CAPS Technology ProHMPT, 2009 March12 th Overview of the Talk 1. HMPP in a nutshell Directives for Hardware Accelerators (HWA) 2. HMPP Code Generation Capabilities Efficient code generation for CUDA 3.
More informationThreads. What is a thread? Motivation. Single and Multithreaded Processes. Benefits
CS307 What is a thread? Threads A thread is a basic unit of CPU utilization contains a thread ID, a program counter, a register set, and a stack shares with other threads belonging to the same process
More informationLecture Topic: An Overview of OpenCL on Xeon Phi
C-DAC Four Days Technology Workshop ON Hybrid Computing Coprocessors/Accelerators Power-Aware Computing Performance of Applications Kernels hypack-2013 (Mode-4 : GPUs) Lecture Topic: on Xeon Phi Venue
More informationCS16 Midterm Exam 2 E02, 10W, Phill Conrad, UC Santa Barbara Tuesday, 03/02/2010
CS16 Midterm Exam 2 E02, 10W, Phill Conrad, UC Santa Barbara Tuesday, 03/02/2010 Name: Umail Address: @ umail.ucsb.edu Circle Lab section: 3PM 4PM 5PM Link to Printer Friendly PDF Version Please write
More informationPRACE Autumn School Basic Programming Models
PRACE Autumn School 2010 Basic Programming Models Basic Programming Models - Outline Introduction Key concepts Architectures Programming models Programming languages Compilers Operating system & libraries
More informationThe following program computes a Calculus value, the "trapezoidal approximation of
Multicore machines and shared memory Multicore CPUs have more than one core processor that can execute instructions at the same time. The cores share main memory. In the next few activities, we will learn
More informationTR On Using Multiple CPU Threads to Manage Multiple GPUs under CUDA
TR-2008-04 On Using Multiple CPU Threads to Manage Multiple GPUs under CUDA Hammad Mazhar Simulation Based Engineering Lab University of Wisconsin Madison August 1, 2008 Abstract Presented here is a short
More informationAn introduction to Halide. Jonathan Ragan-Kelley (Stanford) Andrew Adams (Google) Dillon Sharlet (Google)
An introduction to Halide Jonathan Ragan-Kelley (Stanford) Andrew Adams (Google) Dillon Sharlet (Google) Today s agenda Now: the big ideas in Halide Later: writing & optimizing real code Hello world (brightness)
More informationRuud van der Pas. Senior Principal So2ware Engineer SPARC Microelectronics. Santa Clara, CA, USA
Senior Principal So2ware Engineer SPARC Microelectronics Santa Clara, CA, USA SC 13 Talk at OpenMP Booth Wednesday, November 20, 2013 1 What Was Missing? 2 Before OpenMP 3.0 n Constructs worked well for
More informationChapter 3 Parallel Software
Chapter 3 Parallel Software Part I. Preliminaries Chapter 1. What Is Parallel Computing? Chapter 2. Parallel Hardware Chapter 3. Parallel Software Chapter 4. Parallel Applications Chapter 5. Supercomputers
More informationColin Riddell GPU Compiler Developer Codeplay Visit us at
OpenCL Colin Riddell GPU Compiler Developer Codeplay Visit us at www.codeplay.com 2 nd Floor 45 York Place Edinburgh EH1 3HP United Kingdom Codeplay Overview of OpenCL Codeplay + OpenCL Our technology
More informationINTRODUCTION TO ACCELERATED COMPUTING WITH OPENACC. Jeff Larkin, NVIDIA Developer Technologies
INTRODUCTION TO ACCELERATED COMPUTING WITH OPENACC Jeff Larkin, NVIDIA Developer Technologies AGENDA Accelerated Computing Basics What are Compiler Directives? Accelerating Applications with OpenACC Identifying
More informationInformation Coding / Computer Graphics, ISY, LiTH. Introduction to CUDA. Ingemar Ragnemalm Information Coding, ISY
Introduction to CUDA Ingemar Ragnemalm Information Coding, ISY This lecture: Programming model and language Memory spaces and memory access Shared memory Examples Lecture questions: 1. Suggest two significant
More informationOpenCL implementation of PSO: a comparison between multi-core CPU and GPU performances
OpenCL implementation of PSO: a comparison between multi-core CPU and GPU performances Stefano Cagnoni 1, Alessandro Bacchini 1,2, Luca Mussi 1 1 Dept. of Information Engineering, University of Parma,
More informationBrushing the Locks out of the Fur: A Lock-Free Work Stealing Library Based on Wool
Brushing the Locks out of the Fur: A Lock-Free Work Stealing Library Based on Wool Håkan Sundell School of Business and Informatics University of Borås, 50 90 Borås E-mail: Hakan.Sundell@hb.se Philippas
More informationINTRODUCTION TO COMPILER DIRECTIVES WITH OPENACC
INTRODUCTION TO COMPILER DIRECTIVES WITH OPENACC DR. CHRISTOPH ANGERER, NVIDIA *) THANKS TO JEFF LARKIN, NVIDIA, FOR THE SLIDES 3 APPROACHES TO GPU PROGRAMMING Applications Libraries Compiler Directives
More informationREDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS
BeBeC-2014-08 REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS Steffen Schmidt GFaI ev Volmerstraße 3, 12489, Berlin, Germany ABSTRACT Beamforming algorithms make high demands on the
More informationHeterogeneous Computing and OpenCL
Heterogeneous Computing and OpenCL Hongsuk Yi (hsyi@kisti.re.kr) (Korea Institute of Science and Technology Information) Contents Overview of the Heterogeneous Computing Introduction to Intel Xeon Phi
More informationINTRODUCTION TO GPU COMPUTING WITH CUDA. Topi Siro
INTRODUCTION TO GPU COMPUTING WITH CUDA Topi Siro 19.10.2015 OUTLINE PART I - Tue 20.10 10-12 What is GPU computing? What is CUDA? Running GPU jobs on Triton PART II - Thu 22.10 10-12 Using libraries Different
More informationThreaded Programming. Lecture 9: Alternatives to OpenMP
Threaded Programming Lecture 9: Alternatives to OpenMP What s wrong with OpenMP? OpenMP is designed for programs where you want a fixed number of threads, and you always want the threads to be consuming
More informationSPOC : GPGPU programming through Stream Processing with OCaml
SPOC : GPGPU programming through Stream Processing with OCaml Mathias Bourgoin - Emmanuel Chailloux - Jean-Luc Lamotte January 23rd, 2012 GPGPU Programming Two main frameworks Cuda OpenCL Different Languages
More informationCUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging
CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging Saoni Mukherjee, Nicholas Moore, James Brock and Miriam Leeser September 12, 2012 1 Outline Introduction to CT Scan, 3D reconstruction
More informationSupporting Class / C++ Lecture Notes
Goal Supporting Class / C++ Lecture Notes You started with an understanding of how to write Java programs. This course is about explaining the path from Java to executing programs. We proceeded in a mostly
More informationCS 33. Architecture and the OS. CS33 Intro to Computer Systems XIX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.
CS 33 Architecture and the OS CS33 Intro to Computer Systems XIX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. The Operating System My Program Mary s Program Bob s Program OS CS33 Intro to
More information