IBM pseries Compiler Roadmap

Size: px
Start display at page:

Download "IBM pseries Compiler Roadmap"

Transcription

1 IBM pseries Compiler Roadmap Roch Archambault IBM Toronto Laboratory SciComp 10 August 13, 2004

2 Agenda The pseries Compiler Products Roadmaps XL Fortran XL C/C++ Performance Multiple compiler installation Performance Results Q&A

3 The pseries Compiler Products Latest Versions (All POWER4 enabled except Mac OS X) VisualAge C++ Version 6.0 for AIX VisualAge C++ Version 6.0 for Linux on pseries XL C/C++ Advanced Edition Version 6.0 for Mac OS X (PPC970 only) C for AIX V6.0 XL Fortran Version 8.1 for AIX (latest level is version 8.1.1) XL Fortran Version 8.1 for Linux on pseries (latest level is version 8.1.1) XL Fortran Advanced Edition Version 8.1 for Mac OS X (PPC970 only) Upcoming Versions (All POWER4, POWER5 and PPC970 enabled) XL C/C++ Enterprise Edition V7.0 for AIX, Linux XL C/C++ Advanced Edition V7.0 for Mac OS X (PPC970 only) XL Fortran Enterprise Edition V9.1 for AIX, Linux XL Fortran Advanced Edition V9.1 for Mac OS X (PPC970 only)

4 XL Fortran Roadmap: Strategic Priorities Premium Customer Service Continue to work closely with key ISVs and customers in scientific and technical computing industries Compliance to Language Standards and Industry Specifications OpenMP Fortran API V2.0 Fortran 77, 90 and 95 standards Emerging 2003 Standard Exploitation of Hardware Committed to maximum performance on POWER4, PPC970, POWER5 and successors Continue to work very closely with processor design teams

5 XL Fortran Version 8.1 for AIX Fully compliant Fortran 77/90/95 compiler Partial Fortran 2003 support Allocatable components, IEEE module, INTENT of F90 pointers Many industry extensions 128-bit float, 64-bit integer, Cray pointers, structure record, union map, BYTE, STATIC, AUTOMATIC, asynchronous I/O, SIZE intrinsic 32- and 64-bit support Optimized OpenMP Fortran V2.0 support Symbolic Debugging support TotalView and dbx/pdbx Full support for debugging of OpenMP programs Partial support for debugging of optimized code Portfolio of Optimizing Transformations Comprehensive path length reduction Whole program analysis Loop optimization for parallelism, locality and instruction scheduling Tuned support for all pseries processors (including POWER4) More info:

6 XL Fortran V8.1.1 for AIX (Update to V8.1) Fortran 2003 Stream I/O VALUE and PROTECTED attributes and statements Performance Customer benchmarks, SPECCPU2000 improvements, scalability for SPECOMP 64-bit TPO enablement (compiler component) Stream_unroll and Unroll_and_fuse directives Unroll directive extended to outer loops Intrinsic Performance improvements Customer Requirements -qsuppress suboption -qextname suboption Improved Debug support with TotalView Installation and operation on OS/400 PASE on iseries

7 XL Fortran Version 8.1 for Linux on pseries Based on XL Fortran Version 8.1 for AIX Product Leverage proven industry leading performance compiler technology Fully compliant Fortran 77/90/95 compiler Partial Fortran 2003 support Allocatable components, IEEE module, INTENT on F90 pointers Many industry extensions 128-bit float, 64-bit integer, Cray pointers, structure record, union map, BYTE, STATIC, AUTOMATIC, asynchronous I/O, SIZE intrinsic 32- and 64-bit support Optimized OpenMP Fortran V2.0 support Symbolic Debugging support with gdb Portfolio of Optimizing Transformations (same as XLF V8.1 for AIX) Supports SUSE Linux Enterprise Server 8 (SLES 8) Supports RedHat Enterprise Linux 3.0 (RHEL 3) Support for TurboLinux (TLES 8) and Conectiva

8 XL Fortran V9.1 And Beyond XL Fortran Enterprise Edition V9.1 for AIX (2004) XL Fortran Enterprise Edition V9.1 for Linux (2004) XL Fortran Advanced Edition V9.1 for Mac OS X Continued rollout of Fortran 2003 POWER5 support PPC970 and PPC970FX support including SIMD vectorization (VMX) Include MASS library in compiler products Improved performance Improved compile time and reduce space usage Improved debugging New directives and intrinsics New customer requirements Supported on SLES9 and RHEL 3 update 3 (Linux only) Supported on OS/400 (V5R2) and i5/os PASE (V5R3) All information subject to change without notice

9 Tentative rollout of Fortran 2003 (XLF V9.1) Available in XL Fortran V9.1 (2004) BIND attribute and statement, BIND on derived types, common blocks, and procedures PROCEDURE, IMPORT and FLUSH statements PUBLIC/PRIVATE on components of a type ISO_FORTRAN_ENV and ISO_C_BINDING intrinsic modules ASSOCIATE Construct Command Line Argument and NEW_LINE intrinsics Intrinsic attribute on USE IOMSG= specifier All information subject to change without notice

10 Prioritized rollout of Fortran 2003 (XLF V10.1) 1. OO extensions 2. ENUM/ENUMERATOR 3. Derived type I/O 4. Asynchronous I/O additions 5. Allow character on MAX/MIN family of intrinsics 6. Procedure pointers 7. I/O specifiers (DECIMAL=, SIGN=, ROUND=, and PAD=) 8. Complex literal 9. Derived type parameters 10. Abstract interface 11. Enhance structure constructors 12. Enhanced array constructors 13.Pointer assignment enhancement (specify bounds) 14. Support for International Usage All information subject to change without notice

11 C/C++ Roadmap: Strategic Priorities Premium Customer Service Compliance to Language Standards and Industry Specifications ANSI / ISO C and C++ Standards OpenMP C/C++ API V2.0 Exploitation of Hardware Committed to maximum performance on POWER4, PPC970, POWER5 and successors Continue to work very closely with processor design teams Exploitation of OS and Middleware Synergies with operating system and middleware ISVs (performance, specialized function) Committed to AIX Linux affinity strategy and to Linux on pseries Reduced Emphasis on proprietary Tooling Affinity with GNU toolchain

12 VisualAge C++ Version 6.0 for AIX C for AIX Version 6.0 Fully compliant ISO C 1999 and ISO C standards AIX 5.2 required for ISO C 1999 complex runtime support Partial GNU C/C++ language and options compatibility 32- and 64-bit support Interprocedural Analysis (IPA) including specialized C++ optimizations for templates, exception handling and virtual functions Automatic Parallelization for C/C++ Optimized OpenMP C/C++ API V1.0 support Template instantiation improvements Reuse of template instantiations to reduce code size Parsing templates at instantiation time Improved automatic template instantiations Symbolic Debugging Support TotalView, IBM Distributed Debugger and dbx/pdbx Full support for debugging of OpenMP programs Partial support for debugging of optimized code Runtime memory debug support

13 VisualAge C++ Version 6.0 for AIX (continued) C for AIX Version 6.0 (continued) Portfolio of optimizing transformations Comprehensive path length reduction Whole program analysis Loop optimization for parallelism, locality and instruction scheduling Tuned support for all pseries processors (including POWER4) More info:

14 VisualAge C++ Version 6.0 for Linux on pseries Based on VisualAge C++ Version 6.0 for AIX Product Leverage proven industry leading performance compiler technology Fully compliant ISO C 1999 and ISO C standards Partial GNU C/C++ language and options compatibility 32- and 64-bit support Automatic Parallelization for C/C++ Optimized OpenMP C/C++ API V1.0 support Portfolio of Optimizing Transformations (same as C/C++ V6.0 for AIX) Supports SUSE Linux Enterprise Server 8 (SLES 8) Supports RedHat Enterprise Linux 3.0 (RHEL 3) Support for TurboLinux (TLES 3) and Conectiva

15 XL C/C++ V7.0 And Beyond XL C/C++ Enterprise Edition Version 7.0 for AIX (2004) XL C/C++ Enterprise Edition Version 7.0 for Linux (2004) XL C/C++ Advanced Edition Version 7.0 for Mac OS X Addenda to ISO C/C++ standards (C is 1999, C++ is 1998 with TC1) Support for some C99 features in C++ compiler: Complex, Hex Float and Variable Length Arrays, Compound Literal Further GNU C/C++ compatibility features OpenMP C/C++ API V2.0 POWER5 support PPC970 and PPC970FX support including SIMD vectorization (VMX) Improved performance 64-bit TPO enablement (compiler component) on AIX and Linux Boost and STLport libraries (as they become better accepted by the standard committee) All information subject to change without notice

16 XL C/C++ V7.0 And Beyond (Continued) XL C/C++ Enterprise Edition Version 7.0 for AIX (2004) XL C/C++ Enterprise Edition Version 7.0 for Linux (2004) XL C/C++ Advanced Edition Version 7.0 for Mac OS X IOStream Performance improvements New pragmas / options Xcode enabled and support for Frameworks (Mac OS X only) Supported on SLES9 and RHEL 3 update 3 (Linux only) Supported on OS/400 (V5R2) and i5/os PASE (V5R3) New compiler drivers (gxlc and gxlc) for compatibility with gcc and g++ Support for UPC language (AIX only) Support for Objective-C (Mac OS X only) Include MASS library in compiler products All information subject to change without notice

17 Tentative GNU C/C++ Compatibility Enhancements Labels as values / computed goto Nested functions Naming types Conditionals with omitted operands Zero length arrays Labeled elements (C only) Case ranges (C only) Cast to union (C only) Function Attributes Support Noinline, always_inline, format, format_arg, section Accept and ignore used All information subject to change without notice

18 Tentative GNU C/C++ Compatibility Enhancements (Continued) Variable Attributes Support Nocommon, transparent_union Type Attributes Support Aligned, packed Accept and ignore Transparent_union extension Incomplete enums Function names as strings Partial Asm support All information subject to change without notice

19 Performance Improvements Delivered in 2003 Included in XLF V8.1.1 release OpenMP Much faster barrier implementation (over 5x faster on 32-way p690) Much faster uncontended atomic (over 4x faster on 32-way p690) Improved parallel region startup (25%) POWER4 Expanded scope and precision of instruction scheduling More precise loop unrolling for pipelining and parallelization of reductions Use of dcbz to optimize store streams Loop unrolling for prefetch stream utilization Loop Optimization Aggressive loop fusion to exploit data reuse Index set splitting and gather/scatter to move branches out of loops Loop peeling for automatic parallelization Improved loop interchange for automatic parallelization Strip-mining of vectorized loops Vector calls to specialized routines for POWER4 (vsqrt, vrsqrt, vtan, vlog, vexp)

20 Expected Performance Improvements in 2004 Included in XLF V9.1 and XL C/C++ V7 on all platforms (AIX, Linux, Mac OS X) POWER5 Modified scheduling machine model Usage of improved prefetch facilities Usage of new instructions PPC970 Automatic generation of VMX code on Linux and Mac OS X (SIMD vectorization) Interprocedural pointer alignment propagation OpenMP Tuned support for 64-way SMP Continued improvements in overhead reduction Intrinsic functions (Fortran Only) MATMUL, TRANSFER, INDEX, TRANSPOSE

21 Expected Performance Improvements in 2004 Included in XLF V9.1 and XL C/C++ V7 on all platforms (AIX, Linux, Mac OS X) Tuning assists : BLOCK_LOOP and LOOPID directives to specify which set of loops to tile, interchange or strip-mine NOVECTOR and NOSIMD directive to tell compiler not to vectorize or simdize loop Builtin functions for generating software divides (full double precision on POWER5) Thread binding (set via XLSMPOPTS env variable) Environment variable (intrinthds) to control number of threads used by MATMUL and RANDOM_NUMBER (added in XLF V8.1.1 PTF) View and manipulate information gathered by profile directed feedback (-qpdf1/-qpdf2) via showpdf and mergepdf tools New prefetch directives for POWER5

22 Expected Performance Improvements in 2004 Included in XLF V9.1 and XL C/C++ V7 on all platforms (AIX, Linux, Mac OS X) Loop Optimizations : Modulo scheduling of loops which contain branches Further improvements to loop fusion for data reuse (e.g. loop alignment) Perform vectorization on all platforms (including Linux and Mac OS X) Enhancement of vectorization (additional functions, loop versioning, vector merging) Tiling for BLAS-like and streaming loop nests Predictive Commoning (common subexpression elimination across loop iterations) Improved data dependence analysis Automatic generation of software divides on POWER5 Automatic generation of stream prefetch instructions on POWER5

23 Expected Performance Improvements in 2004 Included in XLF V9.1 and XL C/C++ V7 on all platforms (AIX, Linux, Mac OS X) Other Optimizations : Interprocedural Strength reduction Interprocedural Register Allocation Split array of structures into multiple arrays for better exploitation of hardware streams and smaller d-cache footprint Use Profile Directed Feedback (PDF) information to: Specialize calls to malloc/calloc to use pools of small objects Specialize memcpy and memset with small lengths Specialize integer divide and modulo Form Super blocks for better instruction scheduling

24 Expected Performance Improvements in 2005 Loop Optimizations: Sparse Vectorization Recognize BLAS loop and replace with call to ESSL Exploitation of SMT threads in parallel programs Improved dependence analysis Array Data flow analysis for array privatization and contraction Automatic array padding Improved automatic parallelization (wavefront, multidimensional reductions) Other Optimizations: Outline cold fields of data structures for smaller d-cache footprint (using pdf) Outline cold regions of code for smaller i-cache footprint (using pdf) All information subject to change without notice

25 Installation of Multiple Compiler Versions Installation of multiple compiler versions is supported The vacppndi and xlfndi scripts shipped with VisualAge C and XL Fortran 8.1 allow the installation of a given compiler release or update into a non-default directory The configuration file can be used to direct compilation to a specific version of the compiler Example: xlf_v8r1 c foo.f May direct compilation to use components in a non-default directory Care must be taken when multiple runtimes are installed on the same machine (details on next slide)

26 Coexistence of Multiple Compiler Runtimes Backward compatibility C, C++ and Fortran runtimes support backward compatibility. Executables generated by an earlier release of a compiler will work with a later version of the run-time environment. Concurrent installation Multiple versions of a compiler and run-time environment can be installed on the same machine Full support in xlfndi and vacppndi scripts is now available Limited support for coexistence LIBPATH must be used to ensure that a compatible runtime version is used with a given executable Only one runtime version can be used in a given process. Renaming a compiler library is not allowed. Take care in statically linking compiler libraries or in the use of dlopen or load. Details in the compiler FAQ

27 Documentation An information center containing the documentation for the current versions of the AIX compilers (both Fortran and C/C++) is available at: Similar documentation for the current version of Mac OS X compilers (both Fortran and C/C++) is available at: This information center contains all the html documentation shipped with the compilers. It is completely searchable. Please send any comments or suggestions on this information center or about the existing C, C++ or Fortran documentation shipped with the products to compinfo@ca.ibm.com.

28 Software Divide Improvements On POWER5 % Imrovement vs hardware divide SP SWDIV SP SWDIV_NOCHK Software Divide Instrinsics DP SWDIV DP SWDIV -qnostrict DP SWDIV_NOCHK DP SWDIV_NOCHK -qnostrict

29 SPEC FP Base Improvements From Compiler On POWER4 XLF V9.1 & XLC V7 Versus XLF V8.1.1 & XLC V6 % improvement SPECFP Benchmarks 168.wupwise 171.swim 172.mgrid 173.applu 177.mesa 178.galgel 179.art 183.equake 187.facerec 188.ammp 189.lucas 191.fma3d 200.sixtrack 301.apsi SPEC FP + 18%

30 SPEC FP Base Improvements From Compiler On POWER5 XLF V9.1 & XLC V7 Versus XLF V8.1.1 & XLC V6 % Improvement SPECFP Benchmarks 168.wupwise 171.swim 172.mgrid 173.applu 177.mesa 178.galgel 179.art 183.equake 187.facerec 188.ammp 189.lucas 191.fma3d 200.sixtrack 301.apsi SPEC FP + 20%

31 SPEC FP Base Auto-Parallelization (2 CPUs, POWER4) XLF V9.1 & XLC V7 2-CPU versus 1-CPU % Improvement SPECFP Benchmarks 168.wupwise 171.swim 172.mgrid 173.applu 177.mesa 178.galgel 179.art 183.equake 187.facerec 188.ammp 189.lucas 191.fma3d 200.sixtrack 301.apsi SPEC FP + 8%

32 90 70 SPEC FP Peak Differences Between IT2 (1.5 GHZ) and POWER5 P5 / IT2 % difference GHZ 1.9 GHZ 168.wupwise 171.swim 172.mgrid 173.applu 177.mesa 178.galgel 179.art 183.equake 187.facerec 188.ammp 189.lucas 191.fma3d 200.sixtrack 301.apsi SPECFP Peak

33 32-way EPCC results on AIX 5.2 p690 system (1.1 GHZ) XLF XLF PAR DO PDO BAR SING CRIT LOCK ORD ATM REDC Time in micro-seconds - lower is better

34 SPECOMP Base Improvements From Compiler On POWER4 (32-way) XLF V9.1 & XLC V7 Versus XLF V8.1.1 & XLC V6 % improvement SPECOMP Benchmarks 310.wupwise_m 312.swim_m 314.mgrid_m 316.applu_m 318.galgel_m 320.equake_m 324.apsi_m 326.gafort_m 328.fma3d_m 330.art_m 332.ammp_m SPECOMP + 8%

35 SPECOMP Scalability On POWER4 (16 vs 32 CPUs) Scalability Factor 1.0 implies perfect scaling SPECOMP Benchmarks 310.wupwise_m 312.swim_m 314.mgrid_m 316.applu_m 318.galgel_m 320.equake_m 324.apsi_m 326.gafort_m 328.fma3d_m 330.art_m 332.ammp_m

36 SPEC OMPM2001 Base Versus Competition 40 Thousands IBM 32x1.7 HP-I 32x1.5 SGI-I 32x1.5 HP-A 64x1.15 HP-I 64x1.5 SGI-I 64x1.5 FUJ 64x1.3 IBM P *1.9 0 SPECMARK

37 Himeno Benchmark Improvement on POWER XLF V8.1.1 XLF V CPU 2-CPU 4-CPU 8-CPU Small Data Problem Measured in MFLOPS on 1.1GHZ p690

IBM pseries Compiler Roadmap

IBM pseries Compiler Roadmap IBM pseries Compiler Roadmap Roch Archambault IBM Toronto Laboratory archie@ca.ibm.com SCICOMP 12 July 20, 2006 Agenda The pseries Compiler Products Roadmaps Common Features & Compiler Architecture XL

More information

IBM POWER Systems Compiler Roadmap

IBM POWER Systems Compiler Roadmap IBM POWER Systems Compiler Roadmap Roch Archambault IBM Toronto Laboratory archie@ca.ibm.com SCICOMP-14 May 22, 2008 Agenda Overall Roadmap The POWER Systems Compiler Products Detailed Roadmaps Common

More information

IBM System p Compiler Roadmap

IBM System p Compiler Roadmap IBM System p Compiler Roadmap Roch Archambault IBM Toronto Laboratory archie@ca.ibm.com SCICOMP-13 July 19, 2007 Agenda Overall Roadmap The System p Compiler Products Detailed Roadmaps Common Features

More information

Architecture Cloning For PowerPC Processors. Edwin Chan, Raul Silvera, Roch Archambault IBM Toronto Lab Oct 17 th, 2005

Architecture Cloning For PowerPC Processors. Edwin Chan, Raul Silvera, Roch Archambault IBM Toronto Lab Oct 17 th, 2005 Architecture Cloning For PowerPC Processors Edwin Chan, Raul Silvera, Roch Archambault edwinc@ca.ibm.com IBM Toronto Lab Oct 17 th, 2005 Outline Motivation Implementation Details Results Scenario Previously,

More information

IBM XL Fortran Advanced Edition V8.1 for Mac OS X A new platform supported in the IBM XL Fortran family

IBM XL Fortran Advanced Edition V8.1 for Mac OS X A new platform supported in the IBM XL Fortran family Software Announcement January 13, 2004 IBM XL Fortran Advanced Edition V8.1 for Mac OS X A new platform supported in the IBM XL Fortran family Overview IBM extends the XL Fortran family to the Apple Mac

More information

Upgrading XL Fortran Compilers

Upgrading XL Fortran Compilers Upgrading XL Fortran Compilers Oeriew Upgrading to the latest IBM XL Fortran compilers makes good business sense. Upgrading puts new capabilities into the hands of your programmers making them and your

More information

IBM. IBM XL C/C++ and XL Fortran compilers on Power architectures overview

IBM. IBM XL C/C++ and XL Fortran compilers on Power architectures overview IBM IBM XL C/C++ and XL Fortran compilers on Power architectures overview December 2017 References in this document to IBM products, programs, or services do not imply that IBM intends to make these available

More information

XL C/C++ Advanced Edition V6.0 for Mac OS X A new platform for the IBM family of C/C++ compilers

XL C/C++ Advanced Edition V6.0 for Mac OS X A new platform for the IBM family of C/C++ compilers Software Announcement January 13, 2004 XL C/C++ Advanced Edition V6.0 for Mac OS X A new platform for the IBM family of C/C++ compilers Overview XL C/C++ Advanced Edition for Mac OS X is an optimizing,

More information

AMD S X86 OPEN64 COMPILER. Michael Lai AMD

AMD S X86 OPEN64 COMPILER. Michael Lai AMD AMD S X86 OPEN64 COMPILER Michael Lai AMD CONTENTS Brief History AMD and Open64 Compiler Overview Major Components of Compiler Important Optimizations Recent Releases Performance Applications and Libraries

More information

IBM XL C/C++ Enterprise Edition supports POWER5 architecture

IBM XL C/C++ Enterprise Edition supports POWER5 architecture Software Announcement August 31, 2004 IBM XL C/C++ Enterprise Edition supports POWER5 architecture Overview IBM XL C/C++ Enterprise Edition V7.0 for AIX is an optimizing, standards-based compiler for the

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

Data Prefetch and Software Pipelining. Stanford University CS243 Winter 2006 Wei Li 1

Data Prefetch and Software Pipelining. Stanford University CS243 Winter 2006 Wei Li 1 Data Prefetch and Software Pipelining Wei Li 1 Data Prefetch Software Pipelining Agenda 2 Why Data Prefetching Increasing Processor Memory distance Caches do work!!! IF Data set cache-able, able, accesses

More information

GCC Developers Summit Ottawa, Canada, June 2006

GCC Developers Summit Ottawa, Canada, June 2006 OpenMP Implementation in GCC Diego Novillo dnovillo@redhat.com Red Hat Canada GCC Developers Summit Ottawa, Canada, June 2006 OpenMP Language extensions for shared memory concurrency (C, C++ and Fortran)

More information

IBM XL Fortran Advanced Edition V10.1 for Linux now supports Power5+ architecture

IBM XL Fortran Advanced Edition V10.1 for Linux now supports Power5+ architecture Software Announcement December 6, 2005 IBM now supports Power5+ architecture Overview IBM XL Fortran is a standards-based, highly optimized compiler. With IBM XL Fortran technology, you have a powerful

More information

15-740/ Computer Architecture Lecture 10: Runahead and MLP. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 10: Runahead and MLP. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 10: Runahead and MLP Prof. Onur Mutlu Carnegie Mellon University Last Time Issues in Out-of-order execution Buffer decoupling Register alias tables Physical

More information

A Framework for Safe Automatic Data Reorganization

A Framework for Safe Automatic Data Reorganization Compiler Technology A Framework for Safe Automatic Data Reorganization Shimin Cui (Speaker), Yaoqing Gao, Roch Archambault, Raul Silvera IBM Toronto Software Lab Peng Peers Zhao, Jose Nelson Amaral University

More information

Introduction. No Optimization. Basic Optimizations. Normal Optimizations. Advanced Optimizations. Inter-Procedural Optimizations

Introduction. No Optimization. Basic Optimizations. Normal Optimizations. Advanced Optimizations. Inter-Procedural Optimizations Introduction Optimization options control compile time optimizations to generate an application with code that executes more quickly. Absoft Fortran 90/95 is an advanced optimizing compiler. Various optimizers

More information

IBM XL Fortran Advanced Edition for Linux, V11.1 exploits the capabilities of the IBM POWER6 processors

IBM XL Fortran Advanced Edition for Linux, V11.1 exploits the capabilities of the IBM POWER6 processors IBM United States Announcement 207-158, dated July 24, 2007 IBM, V11.1 exploits the capabilities of the IBM POWER6 processors...2 Product positioning... 4 Offering Information...5 Publications... 5 Technical

More information

Code optimization with the IBM XL compilers on Power architectures IBM

Code optimization with the IBM XL compilers on Power architectures IBM Code optimization with the IBM XL compilers on Power architectures IBM December 2017 References in this document to IBM products, programs, or services do not imply that IBM intends to make these available

More information

Inserting Data Prefetches into Loops in Dynamically Translated Code in IA-32EL. Inserting Prefetches IA-32 Execution Layer - 1

Inserting Data Prefetches into Loops in Dynamically Translated Code in IA-32EL. Inserting Prefetches IA-32 Execution Layer - 1 I Inserting Data Prefetches into Loops in Dynamically Translated Code in IA-32EL Inserting Prefetches IA-32 Execution Layer - 1 Agenda IA-32EL Brief Overview Prefetching in Loops IA-32EL Prefetching in

More information

Getting Started with XL C

Getting Started with XL C IBM XL C for AIX, V12.1 Getting Started with XL C Version 12.1 SC14-7323-00 IBM XL C for AIX, V12.1 Getting Started with XL C Version 12.1 SC14-7323-00 Note Before using this information and the product

More information

MULTI-CORE PROGRAMMING. Dongrui She December 9, 2010 ASSIGNMENT

MULTI-CORE PROGRAMMING. Dongrui She December 9, 2010 ASSIGNMENT MULTI-CORE PROGRAMMING Dongrui She December 9, 2010 ASSIGNMENT Goal of the Assignment 1 The purpose of this assignment is to Have in-depth understanding of the architectures of real-world multi-core CPUs

More information

Performance Tools and Environments Carlo Nardone. Technical Systems Ambassador GSO Client Solutions

Performance Tools and Environments Carlo Nardone. Technical Systems Ambassador GSO Client Solutions Performance Tools and Environments Carlo Nardone Technical Systems Ambassador GSO Client Solutions The Stack Applications Grid Management Standards, Open Source v. Commercial Libraries OS Management MPI,

More information

Impact of Cache Coherence Protocols on the Processing of Network Traffic

Impact of Cache Coherence Protocols on the Processing of Network Traffic Impact of Cache Coherence Protocols on the Processing of Network Traffic Amit Kumar and Ram Huggahalli Communication Technology Lab Corporate Technology Group Intel Corporation 12/3/2007 Outline Background

More information

Scalability issues : HPC Applications & Performance Tools

Scalability issues : HPC Applications & Performance Tools High Performance Computing Systems and Technology Group Scalability issues : HPC Applications & Performance Tools Chiranjib Sur HPC @ India Systems and Technology Lab chiranjib.sur@in.ibm.com Top 500 :

More information

November IBM XL C/C++ Compilers Insights on Improving Your Application

November IBM XL C/C++ Compilers Insights on Improving Your Application November 2010 IBM XL C/C++ Compilers Insights on Improving Your Application Page 1 Table of Contents Purpose of this document...2 Overview...2 Performance...2 Figure 1:...3 Figure 2:...4 Exploiting the

More information

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example 1 Which is the best? 2 Lecture 05 Performance Metrics and Benchmarking 3 Measuring & Improving Performance (if planes were computers...) Plane People Range (miles) Speed (mph) Avg. Cost (millions) Passenger*Miles

More information

IBM z/os V1R13 XL C/C++

IBM z/os V1R13 XL C/C++ IBM z/os V1R13 XL C/C++ Enable high-performing z/os XL C/C++ programs for workload optimized business software solutions Highlights v Enhances system programming capabilities by adding advanced optimization

More information

Chapter-5 Memory Hierarchy Design

Chapter-5 Memory Hierarchy Design Chapter-5 Memory Hierarchy Design Unlimited amount of fast memory - Economical solution is memory hierarchy - Locality - Cost performance Principle of locality - most programs do not access all code or

More information

Barbara Chapman, Gabriele Jost, Ruud van der Pas

Barbara Chapman, Gabriele Jost, Ruud van der Pas Using OpenMP Portable Shared Memory Parallel Programming Barbara Chapman, Gabriele Jost, Ruud van der Pas The MIT Press Cambridge, Massachusetts London, England c 2008 Massachusetts Institute of Technology

More information

Cell SDK and Best Practices

Cell SDK and Best Practices Cell SDK and Best Practices Stefan Lutz Florian Braune Hardware-Software-Co-Design Universität Erlangen-Nürnberg siflbrau@mb.stud.uni-erlangen.de Stefan.b.lutz@mb.stud.uni-erlangen.de 1 Overview - Introduction

More information

Aries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX

Aries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX Aries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX Keerthi Bhushan Rajesh K Chaurasia Hewlett-Packard India Software Operations 29, Cunningham Road Bangalore 560 052 India +91-80-2251554

More information

ARMv8-A Software Development

ARMv8-A Software Development ARMv8-A Software Development Course Description ARMv8-A software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to develop software for

More information

C6000 Compiler Roadmap

C6000 Compiler Roadmap C6000 Compiler Roadmap CGT v7.4 CGT v7.3 CGT v7. CGT v8.0 CGT C6x v8. CGT Longer Term In Development Production Early Adopter Future CGT v7.2 reactive Current 3H2 4H 4H2 H H2 Future CGT C6x v7.3 Control

More information

Martin Kruliš, v

Martin Kruliš, v Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal

More information

Post-K: Building the Arm HPC Ecosystem

Post-K: Building the Arm HPC Ecosystem Post-K: Building the Arm HPC Ecosystem Toshiyuki Shimizu FUJITSU LIMITED Nov. 14th, 2017 Exhibitor Forum, SC17, Nov. 14, 2017 0 Post-K: Building up Arm HPC Ecosystem Fujitsu s approach for HPC Approach

More information

Progress on OpenMP Specifications

Progress on OpenMP Specifications Progress on OpenMP Specifications Wednesday, November 13, 2012 Bronis R. de Supinski Chair, OpenMP Language Committee This work has been authored by Lawrence Livermore National Security, LLC under contract

More information

IBM Parallel Environment for Linux, V4.3 delivers enhancements for IBM System x clusters, including a parallel debugger

IBM Parallel Environment for Linux, V4.3 delivers enhancements for IBM System x clusters, including a parallel debugger Software Announcement IBM Parallel Environment for Linux, V4.3 delivers enhancements for IBM System x clusters, including a parallel debugger Overview Parallel Environment for Linux Multiplatform (IBM

More information

IBM. Getting Started with XL Fortran for Little Endian Distributions. IBM XL Fortran for Linux, V Version 15.1.

IBM. Getting Started with XL Fortran for Little Endian Distributions. IBM XL Fortran for Linux, V Version 15.1. IBM XL Fortran for Linux, V15.1.6 IBM Getting Started with XL Fortran for Little Endian Distributions Version 15.1.6 SC27-6620-05 IBM XL Fortran for Linux, V15.1.6 IBM Getting Started with XL Fortran

More information

Fortran 2003 and Beyond. Bill Long, Cray Inc. 17-May-2005

Fortran 2003 and Beyond. Bill Long, Cray Inc. 17-May-2005 Fortran 2003 and Beyond Bill Long, Cray Inc. 17-May-2005 Fortran 2003 and Fortran 2008 Fortran is now ISO/IEC 1539:1-2004 Common name is Fortran 2003 or f03 This cancels and replaces the previous standard,

More information

Introduction to Programming Using Java (98-388)

Introduction to Programming Using Java (98-388) Introduction to Programming Using Java (98-388) Understand Java fundamentals Describe the use of main in a Java application Signature of main, why it is static; how to consume an instance of your own class;

More information

Intel C++ Compiler Professional Edition 11.1 for Linux* In-Depth

Intel C++ Compiler Professional Edition 11.1 for Linux* In-Depth Intel C++ Compiler Professional Edition 11.1 for Linux* In-Depth Contents Intel C++ Compiler Professional Edition 11.1 for Linux*.... 3 Intel C++ Compiler Professional Edition Components:......... 3 s...3

More information

Processors, Performance, and Profiling

Processors, Performance, and Profiling Processors, Performance, and Profiling Architecture 101: 5-Stage Pipeline Fetch Decode Execute Memory Write-Back Registers PC FP ALU Memory Architecture 101 1. Fetch instruction from memory. 2. Decode

More information

Getting Started with XL C/C++

Getting Started with XL C/C++ IBM XL C/C++ Enterprise Edition V8.0 for AIX Getting Started with XL C/C++ SC09-7997-00 IBM XL C/C++ Enterprise Edition V8.0 for AIX Getting Started with XL C/C++ SC09-7997-00 Note! Before using this

More information

Fahad Zafar, Dibyajyoti Ghosh, Lawrence Sebald, Shujia Zhou. University of Maryland Baltimore County

Fahad Zafar, Dibyajyoti Ghosh, Lawrence Sebald, Shujia Zhou. University of Maryland Baltimore County Accelerating a climate physics model with OpenCL Fahad Zafar, Dibyajyoti Ghosh, Lawrence Sebald, Shujia Zhou University of Maryland Baltimore County Introduction The demand to increase forecast predictability

More information

The Cray Programming Environment. An Introduction

The Cray Programming Environment. An Introduction The Cray Programming Environment An Introduction Vision Cray systems are designed to be High Productivity as well as High Performance Computers The Cray Programming Environment (PE) provides a simple consistent

More information

Using Java for Scientific Computing. Mark Bul EPCC, University of Edinburgh

Using Java for Scientific Computing. Mark Bul EPCC, University of Edinburgh Using Java for Scientific Computing Mark Bul EPCC, University of Edinburgh markb@epcc.ed.ac.uk Java and Scientific Computing? Benefits of Java for Scientific Computing Portability Network centricity Software

More information

Kampala August, Agner Fog

Kampala August, Agner Fog Advanced microprocessor optimization Kampala August, 2007 Agner Fog www.agner.org Agenda Intel and AMD microprocessors Out Of Order execution Branch prediction Platform, 32 or 64 bits Choice of compiler

More information

Intel C++ Compiler Professional Edition 11.0 for Linux* In-Depth

Intel C++ Compiler Professional Edition 11.0 for Linux* In-Depth Intel C++ Compiler Professional Edition 11.0 for Linux* In-Depth Contents Intel C++ Compiler Professional Edition for Linux*...3 Intel C++ Compiler Professional Edition Components:...3 Features...3 New

More information

Supporting the new IBM z13 mainframe and its SIMD vector unit

Supporting the new IBM z13 mainframe and its SIMD vector unit Supporting the new IBM z13 mainframe and its SIMD vector unit Dr. Ulrich Weigand Senior Technical Staff Member GNU/Linux Compilers & Toolchain Date: Apr 13, 2015 2015 IBM Corporation Agenda IBM z13 Vector

More information

Compilation for Heterogeneous Platforms

Compilation for Heterogeneous Platforms Compilation for Heterogeneous Platforms Grid in a Box and on a Chip Ken Kennedy Rice University http://www.cs.rice.edu/~ken/presentations/heterogeneous.pdf Senior Researchers Ken Kennedy John Mellor-Crummey

More information

Short Notes of CS201

Short Notes of CS201 #includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system

More information

Overview: The OpenMP Programming Model

Overview: The OpenMP Programming Model Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP

More information

CS201 - Introduction to Programming Glossary By

CS201 - Introduction to Programming Glossary By CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with

More information

Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world.

Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world. Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world. Supercharge your PS3 game code Part 1: Compiler internals.

More information

Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2

Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2 Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2 This release of the Intel C++ Compiler 16.0 product is a Pre-Release, and as such is 64 architecture processor supporting

More information

IBM Lotus Domino Product Roadmap

IBM Lotus Domino Product Roadmap IBM Lotus Domino Product Roadmap Your Name Your Title Today s agenda Domino Strategy What s coming in Domino 8? What s planned beyond Domino 8? Lotus Domino Strategy The integrated messaging & collaboration

More information

Register Packing Exploiting Narrow-Width Operands for Reducing Register File Pressure

Register Packing Exploiting Narrow-Width Operands for Reducing Register File Pressure Register Packing Exploiting Narrow-Width Operands for Reducing Register File Pressure Oguz Ergin*, Deniz Balkan, Kanad Ghose, Dmitry Ponomarev Department of Computer Science State University of New York

More information

Optimising with the IBM compilers

Optimising with the IBM compilers Optimising with the IBM Overview Introduction Optimisation techniques compiler flags compiler hints code modifications Optimisation topics locals and globals conditionals data types CSE divides and square

More information

WIND RIVER DIAB COMPILER

WIND RIVER DIAB COMPILER AN INTEL COMPANY WIND RIVER DIAB COMPILER Boost application performance, reduce memory footprint, and produce high-quality, standards-compliant object code for embedded systems with Wind River Diab Compiler.

More information

Workloads, Scalability and QoS Considerations in CMP Platforms

Workloads, Scalability and QoS Considerations in CMP Platforms Workloads, Scalability and QoS Considerations in CMP Platforms Presenter Don Newell Sr. Principal Engineer Intel Corporation 2007 Intel Corporation Agenda Trends and research context Evolving Workload

More information

José F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2

José F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2 CHERRY: CHECKPOINTED EARLY RESOURCE RECYCLING José F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2 1 2 3 MOTIVATION Problem: Limited processor resources Goal: More

More information

Make Your C/C++ and PL/I Code FLY With the Right Compiler Options

Make Your C/C++ and PL/I Code FLY With the Right Compiler Options Make Your C/C++ and PL/I Code FLY With the Right Compiler Options Visda Vokhshoori/Peter Elderon IBM Corporation Session 13790 Insert Custom Session QR if Desired. WHAT does good application performance

More information

IBM. Getting Started with XL Fortran for Little Endian Distributions. IBM XL Fortran for Linux, V Version 15.1.

IBM. Getting Started with XL Fortran for Little Endian Distributions. IBM XL Fortran for Linux, V Version 15.1. IBM XL Fortran for Linux, V15.1.1 IBM Getting Started with XL Fortran for Little Endian Distributions Version 15.1.1 SC27-6620-00 IBM XL Fortran for Linux, V15.1.1 IBM Getting Started with XL Fortran

More information

Administrivia. CMSC 411 Computer Systems Architecture Lecture 14 Instruction Level Parallelism (cont.) Control Dependencies

Administrivia. CMSC 411 Computer Systems Architecture Lecture 14 Instruction Level Parallelism (cont.) Control Dependencies Administrivia CMSC 411 Computer Systems Architecture Lecture 14 Instruction Level Parallelism (cont.) HW #3, on memory hierarchy, due Tuesday Continue reading Chapter 3 of H&P Alan Sussman als@cs.umd.edu

More information

Code optimization techniques

Code optimization techniques & Alberto Bertoldo Advanced Computing Group Dept. of Information Engineering, University of Padova, Italy cyberto@dei.unipd.it May 19, 2009 The Four Commandments 1. The Pareto principle 80% of the effects

More information

OpenACC Course. Office Hour #2 Q&A

OpenACC Course. Office Hour #2 Q&A OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle

More information

Overview Implicit Vectorisation Explicit Vectorisation Data Alignment Summary. Vectorisation. James Briggs. 1 COSMOS DiRAC.

Overview Implicit Vectorisation Explicit Vectorisation Data Alignment Summary. Vectorisation. James Briggs. 1 COSMOS DiRAC. Vectorisation James Briggs 1 COSMOS DiRAC April 28, 2015 Session Plan 1 Overview 2 Implicit Vectorisation 3 Explicit Vectorisation 4 Data Alignment 5 Summary Section 1 Overview What is SIMD? Scalar Processing:

More information

Cluster Computing. Performance and Debugging Issues in OpenMP. Topics. Factors impacting performance. Scalable Speedup

Cluster Computing. Performance and Debugging Issues in OpenMP. Topics. Factors impacting performance. Scalable Speedup Topics Scalable Speedup and Data Locality Parallelizing Sequential Programs Breaking data dependencies Avoiding synchronization overheads Performance and Debugging Issues in OpenMP Achieving Cache and

More information

IBM. Code optimization with the IBM z/os XL C/C++ compiler. Introduction. z/os XL C/C++ compiler

IBM. Code optimization with the IBM z/os XL C/C++ compiler. Introduction. z/os XL C/C++ compiler IBM Code optimization with the IBM z/os XL C/C++ compiler Introduction The IBM z/os XL C/C++ compiler is built on an industry-wide reputation for robustness, versatility, and high level of standards compliance.

More information

OpenMP 4.0/4.5. Mark Bull, EPCC

OpenMP 4.0/4.5. Mark Bull, EPCC OpenMP 4.0/4.5 Mark Bull, EPCC OpenMP 4.0/4.5 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all

More information

Cortex-R5 Software Development

Cortex-R5 Software Development Cortex-R5 Software Development Course Description Cortex-R5 software development is a three days ARM official course. The course goes into great depth, and provides all necessary know-how to develop software

More information

IBM PSSC Montpellier Customer Center. Blue Gene/P ASIC IBM Corporation

IBM PSSC Montpellier Customer Center. Blue Gene/P ASIC IBM Corporation Blue Gene/P ASIC Memory Overview/Considerations No virtual Paging only the physical memory (2-4 GBytes/node) In C, C++, and Fortran, the malloc routine returns a NULL pointer when users request more memory

More information

Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. In-Depth

Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. In-Depth Intel C++ Compiler Professional Edition 11.1 for Mac OS* X In-Depth Contents Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. 3 Intel C++ Compiler Professional Edition 11.1 Components:...3 Features...3

More information

OpenACC 2.6 Proposed Features

OpenACC 2.6 Proposed Features OpenACC 2.6 Proposed Features OpenACC.org June, 2017 1 Introduction This document summarizes features and changes being proposed for the next version of the OpenACC Application Programming Interface, tentatively

More information

PowerEdge 3250 Features and Performance Report

PowerEdge 3250 Features and Performance Report Performance Brief Jan 2004 Revision 3.2 Executive Summary Dell s Itanium Processor Strategy and 1 Product Line 2 Transitioning to the Itanium Architecture 3 Benefits of the Itanium processor Family PowerEdge

More information

Low-Complexity Reorder Buffer Architecture*

Low-Complexity Reorder Buffer Architecture* Low-Complexity Reorder Buffer Architecture* Gurhan Kucuk, Dmitry Ponomarev, Kanad Ghose Department of Computer Science State University of New York Binghamton, NY 13902-6000 http://www.cs.binghamton.edu/~lowpower

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class

More information

ECE 574 Cluster Computing Lecture 10

ECE 574 Cluster Computing Lecture 10 ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular

More information

Exploring Parallelism At Different Levels

Exploring Parallelism At Different Levels Exploring Parallelism At Different Levels Balanced composition and customization of optimizations 7/9/2014 DragonStar 2014 - Qing Yi 1 Exploring Parallelism Focus on Parallelism at different granularities

More information

A generic approach to the definition of low-level components for multi-architecture binary analysis

A generic approach to the definition of low-level components for multi-architecture binary analysis A generic approach to the definition of low-level components for multi-architecture binary analysis Cédric Valensi PhD advisor: William Jalby University of Versailles Saint-Quentin-en-Yvelines, France

More information

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano Building a Runnable Program and Code Improvement Dario Marasco, Greg Klepic, Tess DiStefano Building a Runnable Program Review Front end code Source code analysis Syntax tree Back end code Target code

More information

Base Vectors: A Potential Technique for Micro-architectural Classification of Applications

Base Vectors: A Potential Technique for Micro-architectural Classification of Applications Base Vectors: A Potential Technique for Micro-architectural Classification of Applications Dan Doucette School of Computing Science Simon Fraser University Email: ddoucett@cs.sfu.ca Alexandra Fedorova

More information

41st Cray User Group Conference Minneapolis, Minnesota

41st Cray User Group Conference Minneapolis, Minnesota 41st Cray User Group Conference Minneapolis, Minnesota (MSP) Technical Lead, MSP Compiler The Copyright SGI Multi-Stream 1999, SGI Processor We know Multi-level parallelism experts for 25 years Multiple,

More information

S Comparing OpenACC 2.5 and OpenMP 4.5

S Comparing OpenACC 2.5 and OpenMP 4.5 April 4-7, 2016 Silicon Valley S6410 - Comparing OpenACC 2.5 and OpenMP 4.5 James Beyer, NVIDIA Jeff Larkin, NVIDIA GTC16 April 7, 2016 History of OpenMP & OpenACC AGENDA Philosophical Differences Technical

More information

IBM. Data Sheet. Enterprise COBOL for z/os. Version 6.2

IBM. Data Sheet. Enterprise COBOL for z/os. Version 6.2 Enterprise COBOL for z/os IBM Data Sheet Version 6.2 Enterprise COBOL for z/os IBM Data Sheet Version 6.2 Third edition (January 2018) This edition applies to Version 6 Release 2 of IBM Enterprise COBOL

More information

CISC 662 Graduate Computer Architecture Lecture 18 - Cache Performance. Why More on Memory Hierarchy?

CISC 662 Graduate Computer Architecture Lecture 18 - Cache Performance. Why More on Memory Hierarchy? CISC 662 Graduate Computer Architecture Lecture 18 - Cache Performance Michela Taufer Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture, 4th edition ---- Additional

More information

PGI Accelerator Programming Model for Fortran & C

PGI Accelerator Programming Model for Fortran & C PGI Accelerator Programming Model for Fortran & C The Portland Group Published: v1.3 November 2010 Contents 1. Introduction... 5 1.1 Scope... 5 1.2 Glossary... 5 1.3 Execution Model... 7 1.4 Memory Model...

More information

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can

More information

Simultaneous Multithreading on Pentium 4

Simultaneous Multithreading on Pentium 4 Hyper-Threading: Simultaneous Multithreading on Pentium 4 Presented by: Thomas Repantis trep@cs.ucr.edu CS203B-Advanced Computer Architecture, Spring 2004 p.1/32 Overview Multiple threads executing on

More information

Introduction to CELL B.E. and GPU Programming. Agenda

Introduction to CELL B.E. and GPU Programming. Agenda Introduction to CELL B.E. and GPU Programming Department of Electrical & Computer Engineering Rutgers University Agenda Background CELL B.E. Architecture Overview CELL B.E. Programming Environment GPU

More information

NOW Handout Page # Why More on Memory Hierarchy? CISC 662 Graduate Computer Architecture Lecture 18 - Cache Performance

NOW Handout Page # Why More on Memory Hierarchy? CISC 662 Graduate Computer Architecture Lecture 18 - Cache Performance CISC 66 Graduate Computer Architecture Lecture 8 - Cache Performance Michela Taufer Performance Why More on Memory Hierarchy?,,, Processor Memory Processor-Memory Performance Gap Growing Powerpoint Lecture

More information

Intel Knights Landing Hardware

Intel Knights Landing Hardware Intel Knights Landing Hardware TACC KNL Tutorial IXPUG Annual Meeting 2016 PRESENTED BY: John Cazes Lars Koesterke 1 Intel s Xeon Phi Architecture Leverages x86 architecture Simpler x86 cores, higher compute

More information

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model Parallel Programming Principle and Practice Lecture 9 Introduction to GPGPUs and CUDA Programming Model Outline Introduction to GPGPUs and Cuda Programming Model The Cuda Thread Hierarchy / Memory Hierarchy

More information

Many Cores, One Thread: Dean Tullsen University of California, San Diego

Many Cores, One Thread: Dean Tullsen University of California, San Diego Many Cores, One Thread: The Search for Nontraditional Parallelism University of California, San Diego There are some domains that feature nearly unlimited parallelism. Others, not so much Moore s Law and

More information

Performance Best Practices Paper for IBM Tivoli Directory Integrator v6.1 and v6.1.1

Performance Best Practices Paper for IBM Tivoli Directory Integrator v6.1 and v6.1.1 Performance Best Practices Paper for IBM Tivoli Directory Integrator v6.1 and v6.1.1 version 1.0 July, 2007 Table of Contents 1. Introduction...3 2. Best practices...3 2.1 Preparing the solution environment...3

More information

Intel Parallel Studio XE 2015

Intel Parallel Studio XE 2015 2015 Create faster code faster with this comprehensive parallel software development suite. Faster code: Boost applications performance that scales on today s and next-gen processors Create code faster:

More information

Implementation of Parallelization

Implementation of Parallelization Implementation of Parallelization OpenMP, PThreads and MPI Jascha Schewtschenko Institute of Cosmology and Gravitation, University of Portsmouth May 9, 2018 JAS (ICG, Portsmouth) Implementation of Parallelization

More information

High Performance Computing: Tools and Applications

High Performance Computing: Tools and Applications High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 8 Processor-level SIMD SIMD instructions can perform

More information

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems Ed Hinkel Senior Sales Engineer Agenda Overview - Rogue Wave & TotalView GPU Debugging with TotalView Nvdia CUDA Intel Phi 2

More information