Intel Array Building Blocks (Intel ArBB) Technical Presentation
|
|
- Giles Hopkins
- 6 years ago
- Views:
Transcription
1 Intel Array Building Blocks (Intel ArBB) Technical Presentation Copyright 2010, Intel Corporation. All rights reserved. 1 Noah Clemons Software And Services Group Developer Products Division Performance and Productivity Libraries
2 Agenda Understand the key ideas behind Intel ArBB Understand the syntax Code walkthroughs Intel Parallel Building Blocks Q & A 2 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
3 Intel Array Building Blocks - Benefits Generalized data-parallel programming model Supports wide variety of patterns and collections Supports explicit dynamic generation and management of code Implementation targets both threads and vector code Machine independent optimization Offload management Machine specific code generation and optimizations Scalable threading runtime Application C++ API calling ArBB APIs Virtual Machine Virtual ISA Debug/ Svcs Memory Manager Other Language Bindings Backend JIT Compiler Threading Runtime CPU Accelerator Future 3
4 How does it work? Sequentially consistent semantics CPU Intel ArBB kernels in serial C++ app Standard C++ compiler Templates Overloaded operators Links with dynamic library Intel ArBB Runtime Dynamic compiler Threading and heterogeneous runtime Future Future 4 4
5 Interface: The API as a Language Syntax and semantics that extend C++ Adds parallel collection objects and methods to C++ Uses standard C++ features (templates and operator overloading) to create new types and operators Sequences of API calls are fused and optimized by a JIT compiler Works with standard C++ compilers Intel C++ Compiler Microsoft* Visual* C++ Compiler GNU Compiler Collection* Express algorithms using mathematical notation Developers focus on what to do, not how to do it 5
6 Code Skeleton for Intel ArBB Applications Use the following code skeleton for Intel ArBB 6 applications int main(int argc, char* argv[]) { int ret_code; try { // call into ArBB code ret_code = EXIT_SUCCESS; catch(const std::exception& e) { ret_code = EXIT_FAILURE; catch(...) { cerr << "Error: Unknown exception caught!" << endl; ret_code = EXIT_FAILURE; return ret_code; ArBB indicates runtime errors through C++ exceptions arbb::exception inherits from std::exception
7 A First Example: Vector Addition Plain C version void vecsum(float* a, float* b, float* c, int size) { for (int i=0; i<size; i++) { c[i] = a[i] + b[i]; int main(int argc, char** argv) { #define SIZE = 1024; float a[size]; float b[size]; float c[size]; Add two vectors a and b of length SIZE into vector c. vecsum(a, b, c, SIZE); 7
8 Step 1: Figure out Kernel Signature void vecsum(float* a, float* b, float* c, int size) { for (int i=0; i<size; i++) { c[i] = a[i] + b[i]; int main(int argc, char** argv) { #define SIZE = 1024; float a[size]; float b[size]; float c[size]; void vecsum(dense<f32> a, dense<f32> b, dense<f32>& c) { int main(int argc, char** argv) { #define SIZE = 1024; float a[size]; float b[size]; float c[size]; 8 vecsum(a, b, c, SIZE);
9 Step 2: Prepare Data void vecsum(float* a, float* b, float* c, int size) { for (int i=0; i<size; i++) { c[i] = a[i] + b[i]; int main(int argc, char** argv) { #define SIZE = 1024; float a[size]; float b[size]; float c[size]; void vecsum(dense<f32> a, dense<f32> b, dense<f32>& c) { int main(int argc, char** argv) { #define SIZE = 1024; float a[size]; float b[size]; float c[size]; dense<f32> va; bind(va, a, SIZE); dense<f32> vb; bind(vb, b, SIZE); dense<f32> vc; bind(vc, c, SIZE); 9 vecsum(a, b, c, SIZE);
10 Step 3: Set up Bridge from C/C++ to Intel ArBB void vecsum(float* a, float* b, float* c, int size) { for (int i=0; i<size; i++) { c[i] = a[i] + b[i]; int main(int argc, char** argv) { #define SIZE = 1024; float a[size]; float b[size]; float c[size]; void vecsum(dense<f32> a, dense<f32> b, dense<f32>& c) { int main(int argc, char** argv) { #define SIZE = 1024; float a[size]; float b[size]; float c[size]; dense<f32> va; bind(va, a, SIZE); dense<f32> vb; bind(vb, b, SIZE); dense<f32> vc; bind(vc, c, SIZE); vecsum(a, b, c, SIZE); call(vecsum)(va, vb, vc); 10
11 Step 4: Implement Kernel void vecsum(float* a, float* b, float* c, int size) { for (int i=0; i<size; i++) { c[i] = a[i] + b[i]; int main(int argc, char** argv) { #define SIZE = 1024; float a[size]; float b[size]; float c[size]; void vecsum(dense<f32> a, dense<f32> b, dense<f32>& c) { c = a + b; int main(int argc, char** argv) { #define SIZE = 1024; float a[size]; float b[size]; float c[size]; dense<f32> va; bind(va, a, SIZE); dense<f32> vb; bind(vb, b, SIZE); dense<f32> vc; bind(vc, c, SIZE); vecsum(a, b, c, SIZE); call(vecsum)(va, vb, vc); 11
12 What We Learned from this Example? Create Array Building Blocks Data Structure Encapsulate operations on those structures Invoke Array Building Blocks Function 12
13 Intel ArBB Benefits Shown in this Example Ease of use Dense containers are used to implicitly express data parallelism Simple syntax invites users to focus on high-level algorithms Performance The syntax does not allow aliases, allowing aggressive optimization by the runtime Safe by design Separate memory space for Intel ArBB and C/C++ objects Intel ArBB objects can only be operated on by Intel ArBB functions. Need to use the call operator to invoke an Intel ArBB function from inside a C/C++ function. (See next slide for description of call) 13
14 Capturing Computation A call() expression works like this: If it s never seen the function passed in before, it captures the function into a closure, then executes it Otherwise, it executes the previously captured closure 14
15 Logically Separated Memory Spaces C/C++ space Intel ArBB space copyin C(++) ArBB code copyout 15 Runtime ensures that data copying happens only when required!
16 Other Benefits of Intel ArBB Sequential semantics Developers do not use threads, locks or other lower-level constructs and can avoid the associated complexity Programmers can reason and debug as if the program were serial. The ArBB dynamic execution model provides advantages Performance transparency Translation of seemingly sequential and scalar based codes into highly efficient, SIMD-ized and parallelized codes, depending on the low-level architecture. Forward scalability Automatic scaling to the increased core counts and bigger SIMD widths for future IA s. 16
17 Two Code Samples Copyright 2010, Intel Corporation. All rights reserved. 17
18 Dot Product of Vectors Plain C version 10 void dot_product(double* a, double* b, double* c, int size) { 11 for (int i=0; i<size; i++) { 12 c += a[i] * b[i]; int main() { 21 #define SIZE = 1024; 22 double a[size], b[size], c; 23 for (int i = 0; i < SIZE; ++i) { 24 a[i] =, b[i] = ; dot_product(a, b, c, SIZE); return 0; 30 18
19 Dot Product of Vectors Intel ArBB version: Using arbb::bind void dot_product(double* a, double* b, double* c, int size) { 19 for (int i=0; i<size; i++) { int main() { c += a[i] * b[i]; #define SIZE = 1024; double a[size], b[size], c; for (int i = 0; i < SIZE; ++i) { a[i] =, b[i] = ; dot_product(a, b, c, SIZE); return 0; void dot_product(const dense<f64>& a, const dense<f64>& b, f64& c) { c = add_reduce(a * b); int main() { #define SIZE = 1024; ARBB_CPP_ALIGN(double a[size]); ARBB_CPP_ALIGN(double b[size]); ARBB_CPP_ALIGN(double c[size]); for (int i = 0; i < SIZE; ++i) { a[i] = ; b[i] = ; dense<f64> va, vb, vc; bind(va, a, SIZE); bind(vb, b, SIZE); bind(vc, c, SIZE); call(dot_product)(va, vb, vc); for (int i = 0; i < SIZE; ++i) { std::cout << c[i] << std::endl; return 0;
20 Data Binding arbb::bind() Bind Intel ArBB containers to C/C++ data The bound containers can be operated by Intel ArBBfunctions Use arbb::bind() when you already have data allocated using C/C++ types You can bind 1D, 2D, or 3D dense containers You can bind containers of user defined types You can bind a portion of a C/C++ array You can also bind non-consecutive elements in a C/C++ array 20
21 Dot Product of Vectors Intel ArBB version: Using arbb::range void dot_product(double* a, double* b, double* c, int size) { for (int i=0; i<size; i++) { int main() { c += a[i] * b[i]; #define SIZE = 1024; double a[size], b[size], c; for (int i = 0; i < SIZE; ++i) { a[i] =, b[i] = ; dot_product(a, b, c, SIZE); void dot_product(const dense<f64>& a, const dense<f64>& b, f64& c) { c = add_reduce(a * b); int main() { #define SIZE = 1024; dense<f64> a, b; f64 c; range<f64> range_a = a.write_only_range(); range<f64> range_b = b.write_only_range(); for (int i = 0; i < SIZE; ++i) { range_a[i] = ; range_b[i] = ; 21 return 0; call(dot_product)(a, b, c); std::cout << value(c) << std::endl; return 0;
22 Accessing Containers as a Range Range allows accessing containers as if they were plain C/C++ data Use range when your data is allocated in the Intel ArBB space Range provides operator[ ] to index into containers Range also provides random access iterators 22
23 Heat Dissipation (Algorithm) Stencil Solution cells Data structure: 2D grid (N x M cells) Boundary cells Algorithm: Sweep over the grid Update nonboundary cells Read cells N, S, E, and W of the current cell Take the average of the value Boundary conditions 23
24 10 void apply_stencil(double** grid1, double** grid2) { 11 for (int iter = 0; iter < ITER; iter++) { 12 stencil(grid1, grid2); 13 tmp = grid1; 14 grid1 = grid2; 15 grid2 = tmp; void stencil(double** src, double** dst) { 21 for (int i = 1; i < SIZE-1; i++) { 22 for (int j = 1; j < SIZE-1; j++) { 23 dst[i][j] = 0.25*(src[i+1][j] + src[i-1][j]+ 24 src[i][j+1] + src[i][j-1]); 25 Heat Dissipation Plain C version After each sweep, swap source and destination grid. Run ITER sweeps over the 2D grid. For each grid cell apply stencil. 24
25 Heat Dissipation Intel ArBB version 10 void apply_stencil(dense<f64, 2>& grid, dense<f64, 2>& swap) { 11 _for(i32 i = 0, i < ITER, ++i) { 12 _if(0 == (i&1)) 13 map(stencil)(grid, swap); 14 _else { 15 map(stencil)(swap, grid); 16 _end_if 17 _end_for 18 Run ITER sweeps over the 2D grid. 20 void stencil(f64 src, f64& dst) { 21 arbb::array<usize, 2> coord; 22 position(coord); 23 usize x = coord[0], usize y = coord[1]; 24 _if(x == 0 y == 0 x == WIDTH-1 y == HEIGHT-1) { 25 dst = src; 26 _else { 27 dst = 0.25 * (neighbor(src, -1, 0) + neighbor(src, 1, 0) + 28 neighbor(src, 0, -1) + neighbor(src, 0, 1)); 29 _end_if 30 Test for boundary cells apply stencil. 25
26 Highlights from this Code Implement a kernel to operator on a single stencil Apply the kernel across all stencils using the arbb::map() operator Inside the kernel: Use arbb::position() to get the coordinates of the current stencil Use arbb::neighbor() to get the coordinates of the neighboring stencils 26
27 Heat Dissipation Intel ArBB version: A better solution 10 void apply_stencil(dense<f64, 2>& grid) { 11 _for(i32 i = 0, i < ITER, ++i) { 12 map(stencil)(grid); 13 _end_for 14 Run ITER sweeps over the 2D grid. 20 void stencil(f64& cell) { 21 arbb::array<usize, 2> coord; 22 position(coord); 23 usize x = coord[0], usize y = coord[1]; 24 _if(x!= 0 && y!= 0 && x!= WIDTH-1 && y!= HEIGHT-1) { 25 cell = 0.25 * (neighbor(cell, -1, 0) + neighbor(cell, 1, 0) + 26 neighbor(cell, 0, -1) + neighbor(cell, 0, 1)); 27 _end_if 28 Test for boundary cells apply stencil. 27
28 Highlights from this Code Almost same to the previous version, but The stencil uses a single parameter for both input and output The ArBB runtime and memory manager take care of the shadow copy 28
29 Heat Dissipation Intel ArBB version: A even better solution 10 void apply_stencil(dense<f64, 2>& grid) { 11 _for(i32 i = 0, i < ITER, ++i) { 12 map(stencil)(grid); 13 _end_for 14 Run ITER sweeps over the 2D grid. 20 void stencil(f64& cell) { 21 cell = 0.25 * (neighbor(cell, -1, 0) + neighbor(cell, 1, 0) + 22 neighbor(cell, 0, -1) + neighbor(cell, 0, 1)); void dissipation(dense<f64, 2>& grid) { 31 arbb::array<usize, 2> sizes = grid.size(); 32 dense<f64, 2> interior = section(grid, 33 1, sizes[0] 2, 34 1, sizes[1] 2); 35 apply_stencil(interior); grid = replace(grid, 1, sizes[0] 2, 1, sizes[1] 2, interior); 38 apply stencil. No explicit testing for boundary cells. 29
30 Highlights from this Code Very clean code No if-else to handle boundary cells Making it easier for the Intel ArBB runtime to do more optimizations 30
31 Key Points 1. High level of abstraction 2. Automatic threading and vectorization 3. Modularity of code 4. Guaranteed race free, deterministic application 5. Code as if it is serial single core 31
32 Register and Download Now: Read Documentation: Ask question or share product usage at forum Read Knowledge Base How to get started Attend advanced Intel ArBB webinar (Nov) 32
33 Intel Parallel Building Blocks (Intel PBB) What Is It? Performance Unleashed Tools to optimize application performance for the latest CPU features What to Use? Intel Cilk Plus Compiler extensions to simplify task and data parallelism Intel Threading Building Blocks C++ template library for task parallelism Intel Array Building Blocks Sophisticated library for data parallelism Benefit Mix and Match to optimize your app s performance 33
34 Intel s Family of Parallel Models Intel Parallel Building Blocks (PBB) Fixed Function Libraries Established Standards Research and Exploration Intel Threading Building Blocks (TBB) Intel Array Building Blocks (ArBB) Intel Cilk Plus Intel Math Kernel Library (MKL) Intel Integrated Performance Primitives (IPP) MPI OpenMP* Intel Concurrent Collections OpenCL* 34
35 Other Reference Recorded webinars on Intel PBB Read About More Parallelism: Give us feedback Forum and Beta Survey 35
36 Questions? Noah Clemons Copyright 2010, Intel Corporation. All rights reserved. 36
37 Backup Slides Scalar types Container types Declaration and initialization Range, binding, copy_in/copy_out Flow controls 37
38 Scalar types Scalar types provide equivalent functionality to the scalar types built into C/C++ Types Description C++ equivalents f32, f64 32/64 bit floating point number float, double i8, i16, i32, i64 8/16/32 bit signed integers char, short, int u8, u16, u32, u64 8/16/32 bit unsigned integers unsigned char/short/int boolean Boolean value (true or false) bool isize, usize Signed/unsigned integers sufficiently large to store addresses size_t 38
39 Containers regular containers irregular containers dense<t> dense<t, 2> nested dense<t,3> array< > dense<array< >> 39
40 Declaration and Construction Declaration Element type Dimensionality Size dense<f32> a; f dense<i32, 2> b; i32 2 0, 0 dense<f32> c(1000); f dense<f32> d(c); f dense<i8, 3> e(5, 3, 2); i8 3 5, 3, 2 40
41 Moving Data into and out of Containers Dense containers provide three ways to access data: Data copy operations copy_in to copy data into the container copy_out to copy out of the container Iterators read_only_range iterator to read from the container write_only_range iterator to write into the container read_write_range iterator to write/read a container Binding On construction, dense containers can be bound (associated) to a particular data location Moves data into and out of that location when required 41
42 Filling dense Containers // request write access to container dense<f32> a(1024); range<f32> range_a = a.write_only_range(); std::fill(range_a.begin(), range_a.end(), static_cast<f32>(1)); // request read/write access to container dense<f32> b(1024); range<f32> range_b = b.read_write_range(); std::fill(range_b.begin(), range_b.end(), static_cast<f32>(2)); 42
43 Loops For loop _for (begin, end, step) { // note use of commas, not semicolons! /* code */ _end_for; // note use of termination keyword Example _for (i32 i=0, i<=n, i++) { /* code */ _end_for; 43
44 Loops While loop _while (condition) { /* code */ _end_while; Supporting statements: Exit loop with _break Skip remainder of current iteration with _continue 44
45 Conditionals if statement _if (condition){ /* code */ _end_if; if statement with else if _if (condition){ /* code */ _else_if { _else { /* code */ /* code */ _end_if; if statement with else _if (condition){ _else { /* code */ /* code */ _end_if; 45
Intel Array Building Blocks
Intel Array Building Blocks Productivity, Performance, and Portability with Intel Parallel Building Blocks Intel SW Products Workshop 2010 CERN openlab 11/29/2010 1 Agenda Legal Information Vision Call
More informationIntel Array Building Blocks Technical Presentation: Code Tips
Intel Array Building Blocks Technical Presentation: Code Tips Zhang Zhang Noah Clemons {zhang.zhang, noah.clemons}@intel.com 1 Intel compilers, associated libraries and associated development tools may
More informationIntel Array Building Blocks
Intel Array Building Blocks Dr.-Ing. Michael Klemm Sen. Application Engineer Software and Services Group Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO
More informationIntroduction to Programming Using Java (98-388)
Introduction to Programming Using Java (98-388) Understand Java fundamentals Describe the use of main in a Java application Signature of main, why it is static; how to consume an instance of your own class;
More informationEfficiently Introduce Threading using Intel TBB
Introduction This guide will illustrate how to efficiently introduce threading using Intel Threading Building Blocks (Intel TBB), part of Intel Parallel Studio XE. It is a widely used, award-winning C++
More informationIntel Software Development Products for High Performance Computing and Parallel Programming
Intel Software Development Products for High Performance Computing and Parallel Programming Multicore development tools with extensions to many-core Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN
More informationGetting Started with Intel SDK for OpenCL Applications
Getting Started with Intel SDK for OpenCL Applications Webinar #1 in the Three-part OpenCL Webinar Series July 11, 2012 Register Now for All Webinars in the Series Welcome to Getting Started with Intel
More informationOpenCV. Basics. Department of Electrical Engineering and Computer Science
OpenCV Basics 1 OpenCV header file OpenCV namespace OpenCV basic structures Primitive data types Point_ Size_ Vec Scalar_ Mat Basics 2 OpenCV Header File #include .hpp is a convention
More informationParallel Programming. Libraries and Implementations
Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationIntel Parallel Studio 2011
THE ULTIMATE ALL-IN-ONE PERFORMANCE TOOLKIT Studio 2011 Product Brief Studio 2011 Accelerate Development of Reliable, High-Performance Serial and Threaded Applications for Multicore Studio 2011 is a comprehensive
More informationAP COMPUTER SCIENCE JAVA CONCEPTS IV: RESERVED WORDS
AP COMPUTER SCIENCE JAVA CONCEPTS IV: RESERVED WORDS PAUL L. BAILEY Abstract. This documents amalgamates various descriptions found on the internet, mostly from Oracle or Wikipedia. Very little of this
More informationHigh Performance Parallel Programming. Multicore development tools with extensions to many-core. Investment protection. Scale Forward.
High Performance Parallel Programming Multicore development tools with extensions to many-core. Investment protection. Scale Forward. Enabling & Advancing Parallelism High Performance Parallel Programming
More informationIntroduction to C++ 2. A Simple C++ Program. A C++ program consists of: a set of data & function definitions, and the main function (or driver)
Introduction to C++ 1. General C++ is an Object oriented extension of C which was derived from B (BCPL) Developed by Bjarne Stroustrup (AT&T Bell Labs) in early 1980 s 2. A Simple C++ Program A C++ program
More informationQUIZ. 1. Explain the meaning of the angle brackets in the declaration of v below:
QUIZ 1. Explain the meaning of the angle brackets in the declaration of v below: This is a template, used for generic programming! QUIZ 2. Why is the vector class called a container? 3. Explain how the
More informationCommunication With the Outside World
Communication With the Outside World Program Return Code Arguments From the Program Call Aborting Program Calling Other Programs Data Processing Course, I. Hrivnacova, IPN Orsay I. Hrivnacova @ Data Processing
More informationWhat are the characteristics of Object Oriented programming language?
What are the various elements of OOP? Following are the various elements of OOP:- Class:- A class is a collection of data and the various operations that can be performed on that data. Object- This is
More informationC++ Basics. Brian A. Malloy. References Data Expressions Control Structures Functions. Slide 1 of 24. Go Back. Full Screen. Quit.
C++ Basics January 19, 2012 Brian A. Malloy Slide 1 of 24 1. Many find Deitel quintessentially readable; most find Stroustrup inscrutable and overbearing: Slide 2 of 24 1.1. Meyers Texts Two excellent
More informationParallel Programming Libraries and implementations
Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.
More informationWhat s New August 2015
What s New August 2015 Significant New Features New Directory Structure OpenMP* 4.1 Extensions C11 Standard Support More C++14 Standard Support Fortran 2008 Submodules and IMPURE ELEMENTAL Further C Interoperability
More informationMIGRATION OF LEGACY APPLICATIONS TO HETEROGENEOUS ARCHITECTURES Francois Bodin, CTO, CAPS Entreprise. June 2011
MIGRATION OF LEGACY APPLICATIONS TO HETEROGENEOUS ARCHITECTURES Francois Bodin, CTO, CAPS Entreprise June 2011 FREE LUNCH IS OVER, CODES HAVE TO MIGRATE! Many existing legacy codes needs to migrate to
More informationEMBEDDED SYSTEMS PROGRAMMING Language Basics
EMBEDDED SYSTEMS PROGRAMMING 2014-15 Language Basics (PROGRAMMING) LANGUAGES "The tower of Babel" by Pieter Bruegel the Elder Kunsthistorisches Museum, Vienna ABOUT THE LANGUAGES C (1972) Designed to replace
More informationCS242 COMPUTER PROGRAMMING
CS242 COMPUTER PROGRAMMING I.Safa a Alawneh Variables Outline 2 Data Type C++ Built-in Data Types o o o o bool Data Type char Data Type int Data Type Floating-Point Data Types Variable Declaration Initializing
More informationChapter 2. Procedural Programming
Chapter 2 Procedural Programming 2: Preview Basic concepts that are similar in both Java and C++, including: standard data types control structures I/O functions Dynamic memory management, and some basic
More informationOptimize an Existing Program by Introducing Parallelism
Optimize an Existing Program by Introducing Parallelism 1 Introduction This guide will help you add parallelism to your application using Intel Parallel Studio. You will get hands-on experience with our
More informationParallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops
Parallel Programming Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Single computers nowadays Several CPUs (cores) 4 to 8 cores on a single chip Hyper-threading
More informationA Simple Path to Parallelism with Intel Cilk Plus
Introduction This introductory tutorial describes how to use Intel Cilk Plus to simplify making taking advantage of vectorization and threading parallelism in your code. It provides a brief description
More informationBasic C Programming (2) Bin Li Assistant Professor Dept. of Electrical, Computer and Biomedical Engineering University of Rhode Island
Basic C Programming (2) Bin Li Assistant Professor Dept. of Electrical, Computer and Biomedical Engineering University of Rhode Island Data Types Basic Types Enumerated types The type void Derived types
More informationCS2141 Software Development using C/C++ C++ Basics
CS2141 Software Development using C/C++ C++ Basics Integers Basic Types Can be short, long, or just plain int C++ does not define the size of them other than short
More informationHigh Level Programming for GPGPU. Jason Yang Justin Hensley
Jason Yang Justin Hensley Outline Brook+ Brook+ demonstration on R670 AMD IL 2 Brook+ Introduction 3 What is Brook+? Brook is an extension to the C-language for stream programming originally developed
More informationOpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system
OpenMP A parallel language standard that support both data and functional Parallelism on a shared memory system Use by system programmers more than application programmers Considered a low level primitives
More informationIntel Parallel Studio
Intel Parallel Studio Product Brief Intel Parallel Studio Parallelism for your Development Lifecycle Intel Parallel Studio brings comprehensive parallelism to C/C++ Microsoft Visual Studio* application
More informationIntroduction to C++ Systems Programming
Introduction to C++ Systems Programming Introduction to C++ Syntax differences between C and C++ A Simple C++ Example C++ Input/Output C++ Libraries C++ Header Files Another Simple C++ Example Inline Functions
More informationAbsolute C++ Walter Savitch
Absolute C++ sixth edition Walter Savitch Global edition This page intentionally left blank Absolute C++, Global Edition Cover Title Page Copyright Page Preface Acknowledgments Brief Contents Contents
More informationIntroduction to Programming
Introduction to Programming session 6 Instructor: Reza Entezari-Maleki Email: entezari@ce.sharif.edu 1 Spring 2011 These slides are created using Deitel s slides Sharif University of Technology Outlines
More informationOverview of research activities Toward portability of performance
Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into
More informationCS420: Operating Systems
Threads James Moscola Department of Physical Sciences York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Threads A thread is a basic unit of processing
More informationOmpSs + OpenACC Multi-target Task-Based Programming Model Exploiting OpenACC GPU Kernel
www.bsc.es OmpSs + OpenACC Multi-target Task-Based Programming Model Exploiting OpenACC GPU Kernel Guray Ozen guray.ozen@bsc.es Exascale in BSC Marenostrum 4 (13.7 Petaflops ) General purpose cluster (3400
More informationModule Contact: Dr Anthony J. Bagnall, CMP Copyright of the University of East Anglia Version 2
UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2014/15 PROGRAMMING 2 CMP-5015Y Time allowed: 2 hours Answer four questions. All questions carry equal weight. Notes are
More informationOpenCL TM & OpenMP Offload on Sitara TM AM57x Processors
OpenCL TM & OpenMP Offload on Sitara TM AM57x Processors 1 Agenda OpenCL Overview of Platform, Execution and Memory models Mapping these models to AM57x Overview of OpenMP Offload Model Compare and contrast
More information1 Lexical Considerations
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2013 Handout Decaf Language Thursday, Feb 7 The project for the course is to write a compiler
More informationChapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc.
Chapter 1 GETTING STARTED SYS-ED/ Computer Education Techniques, Inc. Objectives You will learn: Java platform. Applets and applications. Java programming language: facilities and foundation. Memory management
More informationChapter 1 INTRODUCTION SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC.
hapter 1 INTRODUTION SYS-ED/ OMPUTER EDUATION TEHNIQUES, IN. Objectives You will learn: Java features. Java and its associated components. Features of a Java application and applet. Java data types. Java
More informationC++ Programming. Pointers and Memory Management. M1 Math Michail Lampis
C++ Programming Pointers and Memory Management M1 Math Michail Lampis michail.lampis@dauphine.fr Dynamic Memory Allocation Data in your program lives (mostly) in two areas The stack The heap So far, we
More informationIntroduction to Visual Basic and Visual C++ Introduction to Java. JDK Editions. Overview. Lesson 13. Overview
Introduction to Visual Basic and Visual C++ Introduction to Java Lesson 13 Overview I154-1-A A @ Peter Lo 2010 1 I154-1-A A @ Peter Lo 2010 2 Overview JDK Editions Before you can write and run the simple
More informationChapter 15 - C++ As A "Better C"
Chapter 15 - C++ As A "Better C" Outline 15.1 Introduction 15.2 C++ 15.3 A Simple Program: Adding Two Integers 15.4 C++ Standard Library 15.5 Header Files 15.6 Inline Functions 15.7 References and Reference
More informationCSE 333. Lecture 10 - references, const, classes. Hal Perkins Paul G. Allen School of Computer Science & Engineering University of Washington
CSE 333 Lecture 10 - references, const, classes Hal Perkins Paul G. Allen School of Computer Science & Engineering University of Washington Administrivia New C++ exercise out today, due Friday morning
More informationIntel C++ Compiler Professional Edition 11.1 for Mac OS* X. In-Depth
Intel C++ Compiler Professional Edition 11.1 for Mac OS* X In-Depth Contents Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. 3 Intel C++ Compiler Professional Edition 11.1 Components:...3 Features...3
More informationChapter 7. Additional Control Structures
Chapter 7 Additional Control Structures 1 Chapter 7 Topics Switch Statement for Multi-Way Branching Do-While Statement for Looping For Statement for Looping Using break and continue Statements 2 Chapter
More informationOpenMP 4.0. Mark Bull, EPCC
OpenMP 4.0 Mark Bull, EPCC OpenMP 4.0 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all devices!
More informationOpenMP 4.0/4.5. Mark Bull, EPCC
OpenMP 4.0/4.5 Mark Bull, EPCC OpenMP 4.0/4.5 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all
More informationOutline. 1 About the course
Outline EDAF50 C++ Programming 1. Introduction 1 About the course Sven Gestegård Robertz Computer Science, LTH 2018 2 Presentation of C++ History Introduction Data types and variables 1. Introduction 2/1
More informationThreaded Programming. Lecture 9: Alternatives to OpenMP
Threaded Programming Lecture 9: Alternatives to OpenMP What s wrong with OpenMP? OpenMP is designed for programs where you want a fixed number of threads, and you always want the threads to be consuming
More informationOpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen
OpenMP I Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS16/17 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT Press,
More informationOpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen
OpenMP - II Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT
More informationC++ Basics. Data Processing Course, I. Hrivnacova, IPN Orsay
C++ Basics Data Processing Course, I. Hrivnacova, IPN Orsay The First Program Comments Function main() Input and Output Namespaces Variables Fundamental Types Operators Control constructs 1 C++ Programming
More informationJCudaMP: OpenMP/Java on CUDA
JCudaMP: OpenMP/Java on CUDA Georg Dotzler, Ronald Veldema, Michael Klemm Programming Systems Group Martensstraße 3 91058 Erlangen Motivation Write once, run anywhere - Java Slogan created by Sun Microsystems
More informationGeneral Computer Science II Course: B International University Bremen Date: Dr. Jürgen Schönwälder Deadline:
General Computer Science II Course: 320102-B International University Bremen Date: 2004-04-28 Dr. Jürgen Schönwälder Deadline: 2004-05-14 Problem Sheet #7 This problem sheet focusses on C++ casting operators
More informationCUDA Advanced Techniques 2 Mohamed Zahran (aka Z)
CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming CUDA Advanced Techniques 2 Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Alignment Memory Alignment Memory
More informationOpenACC Standard. Credits 19/07/ OpenACC, Directives for Accelerators, Nvidia Slideware
OpenACC Standard Directives for Accelerators Credits http://www.openacc.org/ o V1.0: November 2011 Specification OpenACC, Directives for Accelerators, Nvidia Slideware CAPS OpenACC Compiler, HMPP Workbench
More information3.Constructors and Destructors. Develop cpp program to implement constructor and destructor.
3.Constructors and Destructors Develop cpp program to implement constructor and destructor. Constructors A constructor is a special member function whose task is to initialize the objects of its class.
More informationIndex. object lifetimes, and ownership, use after change by an alias errors, use after drop errors, BTreeMap, 309
A Arithmetic operation floating-point arithmetic, 11 12 integer numbers, 9 11 Arrays, 97 copying, 59 60 creation, 48 elements, 48 empty arrays and vectors, 57 58 executable program, 49 expressions, 48
More informationAlexei Katranov. IWOCL '16, April 21, 2016, Vienna, Austria
Alexei Katranov IWOCL '16, April 21, 2016, Vienna, Austria Hardware: customization, integration, heterogeneity Intel Processor Graphics CPU CPU CPU CPU Multicore CPU + integrated units for graphics, media
More informationCS3157: Advanced Programming. Outline
CS3157: Advanced Programming Lecture #12 Apr 3 Shlomo Hershkop shlomo@cs.columbia.edu 1 Outline Intro CPP Boring stuff: Language basics: identifiers, data types, operators, type conversions, branching
More informationC++ Quick Guide. Advertisements
C++ Quick Guide Advertisements Previous Page Next Page C++ is a statically typed, compiled, general purpose, case sensitive, free form programming language that supports procedural, object oriented, and
More informationSupporting Data Parallelism in Matcloud: Final Report
Supporting Data Parallelism in Matcloud: Final Report Yongpeng Zhang, Xing Wu 1 Overview Matcloud is an on-line service to run Matlab-like script on client s web browser. Internally it is accelerated by
More informationProgramming Models for Multi- Threading. Brian Marshall, Advanced Research Computing
Programming Models for Multi- Threading Brian Marshall, Advanced Research Computing Why Do Parallel Computing? Limits of single CPU computing performance available memory I/O rates Parallel computing allows
More informationFast Introduction to Object Oriented Programming and C++
Fast Introduction to Object Oriented Programming and C++ Daniel G. Aliaga Note: a compilation of slides from Jacques de Wet, Ohio State University, Chad Willwerth, and Daniel Aliaga. Outline Programming
More informationParallel Hybrid Computing F. Bodin, CAPS Entreprise
Parallel Hybrid Computing F. Bodin, CAPS Entreprise Introduction Main stream applications will rely on new multicore / manycore architectures It is about performance not parallelism Various heterogeneous
More informationComputer Science II Lecture 2 Strings, Vectors and Recursion
1 Overview of Lecture 2 Computer Science II Lecture 2 Strings, Vectors and Recursion The following topics will be covered quickly strings vectors as smart arrays Basic recursion Mostly, these are assumed
More informationc++ keywords: ( all lowercase ) Note: cin and cout are NOT keywords.
Chapter 1 File Extensions: Source code (cpp), Object code (obj), and Executable code (exe). Preprocessor processes directives and produces modified source Compiler takes modified source and produces object
More informationIntroduction to Internet of Things Prof. Sudip Misra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur
Introduction to Internet of Things Prof. Sudip Misra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture - 23 Introduction to Arduino- II Hi. Now, we will continue
More informationMPI 1. CSCI 4850/5850 High-Performance Computing Spring 2018
MPI 1 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives
More informationCE221 Programming in C++ Part 1 Introduction
CE221 Programming in C++ Part 1 Introduction 06/10/2017 CE221 Part 1 1 Module Schedule There are two lectures (Monday 13.00-13.50 and Tuesday 11.00-11.50) each week in the autumn term, and a 2-hour lab
More informationOpenACC 2.6 Proposed Features
OpenACC 2.6 Proposed Features OpenACC.org June, 2017 1 Introduction This document summarizes features and changes being proposed for the next version of the OpenACC Application Programming Interface, tentatively
More informationAdvanced OpenMP Features
Christian Terboven, Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group {terboven,schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University Vectorization 2 Vectorization SIMD =
More informationKingdom of Saudi Arabia Princes Nora bint Abdul Rahman University College of Computer Since and Information System CS242 ARRAYS
Kingdom of Saudi Arabia Princes Nora bint Abdul Rahman University College of Computer Since and Information System CS242 1 ARRAYS Arrays 2 Arrays Structures of related data items Static entity (same size
More informationParallel Hybrid Computing Stéphane Bihan, CAPS
Parallel Hybrid Computing Stéphane Bihan, CAPS Introduction Main stream applications will rely on new multicore / manycore architectures It is about performance not parallelism Various heterogeneous hardware
More informationIntroduction. Program construction in C++ for Scientific Computing. School of Engineering Sciences. Introduction. Michael Hanke.
1 (63) School of Engineering Sciences construction in C++ for Scientific Computing 2 (63) Outline 1 2 3 4 5 6 3 (63) Motivations From Mathematical Formulae to Scientific Software Computer simulation of
More informationIntel Many Integrated Core (MIC) Architecture
Intel Many Integrated Core (MIC) Architecture Karl Solchenbach Director European Exascale Labs BMW2011, November 3, 2011 1 Notice and Disclaimers Notice: This document contains information on products
More informationShared Memory programming paradigm: openmp
IPM School of Physics Workshop on High Performance Computing - HPC08 Shared Memory programming paradigm: openmp Luca Heltai Stefano Cozzini SISSA - Democritos/INFM
More informationModule 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program
The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives
More informationIntroduction to OpenACC. Shaohao Chen Research Computing Services Information Services and Technology Boston University
Introduction to OpenACC Shaohao Chen Research Computing Services Information Services and Technology Boston University Outline Introduction to GPU and OpenACC Basic syntax and the first OpenACC program:
More informationeingebetteter Systeme
Praktikum: Entwicklung interaktiver eingebetteter Systeme C++-Tutorial (falk@cs.fau.de) 1 Agenda Classes Pointers and References Functions and Methods Function and Operator Overloading Template Classes
More informationHomework 4. Any questions?
CSE333 SECTION 8 Homework 4 Any questions? STL Standard Template Library Has many pre-build container classes STL containers store by value, not by reference Should try to use this as much as possible
More informationIntroduce C# as Object Oriented programming language. Explain, tokens,
Module 2 98 Assignment 1 Introduce C# as Object Oriented programming language. Explain, tokens, lexicals and control flow constructs. 99 The C# Family Tree C Platform Independence C++ Object Orientation
More informationTopics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP
Topics Lecture 11 Introduction OpenMP Some Examples Library functions Environment variables 1 2 Introduction Shared Memory Parallelization OpenMP is: a standard for parallel programming in C, C++, and
More informationProgramming Language Concepts: Lecture 2
Programming Language Concepts: Lecture 2 Madhavan Mukund Chennai Mathematical Institute madhavan@cmi.ac.in http://www.cmi.ac.in/~madhavan/courses/pl2011 PLC 2011, Lecture 2, 6 January 2011 Classes and
More informationPARALUTION - a Library for Iterative Sparse Methods on CPU and GPU
- a Library for Iterative Sparse Methods on CPU and GPU Dimitar Lukarski Division of Scientific Computing Department of Information Technology Uppsala Programming for Multicore Architectures Research Center
More informationWeiss Chapter 1 terminology (parenthesized numbers are page numbers)
Weiss Chapter 1 terminology (parenthesized numbers are page numbers) assignment operators In Java, used to alter the value of a variable. These operators include =, +=, -=, *=, and /=. (9) autoincrement
More informationCS201 Some Important Definitions
CS201 Some Important Definitions For Viva Preparation 1. What is a program? A program is a precise sequence of steps to solve a particular problem. 2. What is a class? We write a C++ program using data
More informationModel Viva Questions for Programming in C lab
Model Viva Questions for Programming in C lab Title of the Practical: Assignment to prepare general algorithms and flow chart. Q1: What is a flowchart? A1: A flowchart is a diagram that shows a continuous
More informationParallel Processing with the SMP Framework in VTK. Berk Geveci Kitware, Inc.
Parallel Processing with the SMP Framework in VTK Berk Geveci Kitware, Inc. November 3, 2013 Introduction The main objective of the SMP (symmetric multiprocessing) framework is to provide an infrastructure
More informationStream Computing using Brook+
Stream Computing using Brook+ School of Electrical Engineering and Computer Science University of Central Florida Slides courtesy of P. Bhaniramka Outline Overview of Brook+ Brook+ Software Architecture
More information... IntArray B (100); // IntArray with 100 elements, each initialized to 0
Types with External Resources A class constructor is invoked when an object comes into scope. The constructor prepares the object by creating an environment in which the member functions operate. For many
More informationGeneral Purpose GPU Programming (1) Advanced Operating Systems Lecture 14
General Purpose GPU Programming (1) Advanced Operating Systems Lecture 14 Lecture Outline Heterogenous multi-core systems and general purpose GPU programming Programming models Heterogenous multi-kernels
More informationPROFESSOR: DR.JALILI BY: MAHDI ESHAGHI
PROFESSOR: DR.JALILI BY: MAHDI ESHAGHI 1 2 Overview Distributed OZ Java RMI CORBA IDL IDL VS C++ CORBA VS RMI 3 Distributed OZ Oz Language Multi paradigm language, strong support for compositionality and
More informationQuestion No: 1 ( Marks: 1 ) - Please choose one One difference LISP and PROLOG is. AI Puzzle Game All f the given
MUHAMMAD FAISAL MIT 4 th Semester Al-Barq Campus (VGJW01) Gujranwala faisalgrw123@gmail.com MEGA File Solved MCQ s For Final TERM EXAMS CS508- Modern Programming Languages Question No: 1 ( Marks: 1 ) -
More informationHigh Performance Computing MPI and C-Language Seminars 2009
High Performance Computing - Seminar Plan Welcome to the High Performance Computing seminars for 2009. Aims: Introduce the C Programming Language. Basic coverage of C and programming techniques needed
More informationcalling a function - function-name(argument list); y = square ( z ); include parentheses even if parameter list is empty!
Chapter 6 - Functions return type void or a valid data type ( int, double, char, etc) name parameter list void or a list of parameters separated by commas body return keyword required if function returns
More informationMATLIP: MATLAB-Like Language for Image Processing
COMS W4115: Programming Languages and Translators MATLIP: MATLAB-Like Language for Image Processing Language Reference Manual Pin-Chin Huang (ph2249@columbia.edu) Shariar Zaber Kazi (szk2103@columbia.edu)
More information