IBM pseries Compiler Roadmap
|
|
- Rolf Jenkins
- 5 years ago
- Views:
Transcription
1 IBM pseries Compiler Roadmap Roch Archambault IBM Toronto Laboratory SciComp 10 August 13, 2004
2 Agenda The pseries Compiler Products Roadmaps XL Fortran XL C/C++ Performance Multiple compiler installation Performance Results Q&A
3 The pseries Compiler Products Latest Versions (All POWER4 enabled except Mac OS X) VisualAge C++ Version 6.0 for AIX VisualAge C++ Version 6.0 for Linux on pseries XL C/C++ Advanced Edition Version 6.0 for Mac OS X (PPC970 only) C for AIX V6.0 XL Fortran Version 8.1 for AIX (latest level is version 8.1.1) XL Fortran Version 8.1 for Linux on pseries (latest level is version 8.1.1) XL Fortran Advanced Edition Version 8.1 for Mac OS X (PPC970 only) Upcoming Versions (All POWER4, POWER5 and PPC970 enabled) XL C/C++ Enterprise Edition V7.0 for AIX, Linux XL C/C++ Advanced Edition V7.0 for Mac OS X (PPC970 only) XL Fortran Enterprise Edition V9.1 for AIX, Linux XL Fortran Advanced Edition V9.1 for Mac OS X (PPC970 only)
4 XL Fortran Roadmap: Strategic Priorities Premium Customer Service Continue to work closely with key ISVs and customers in scientific and technical computing industries Compliance to Language Standards and Industry Specifications OpenMP Fortran API V2.0 Fortran 77, 90 and 95 standards Emerging 2003 Standard Exploitation of Hardware Committed to maximum performance on POWER4, PPC970, POWER5 and successors Continue to work very closely with processor design teams
5 XL Fortran Version 8.1 for AIX Fully compliant Fortran 77/90/95 compiler Partial Fortran 2003 support Allocatable components, IEEE module, INTENT of F90 pointers Many industry extensions 128-bit float, 64-bit integer, Cray pointers, structure record, union map, BYTE, STATIC, AUTOMATIC, asynchronous I/O, SIZE intrinsic 32- and 64-bit support Optimized OpenMP Fortran V2.0 support Symbolic Debugging support TotalView and dbx/pdbx Full support for debugging of OpenMP programs Partial support for debugging of optimized code Portfolio of Optimizing Transformations Comprehensive path length reduction Whole program analysis Loop optimization for parallelism, locality and instruction scheduling Tuned support for all pseries processors (including POWER4) More info:
6 XL Fortran V8.1.1 for AIX (Update to V8.1) Fortran 2003 Stream I/O VALUE and PROTECTED attributes and statements Performance Customer benchmarks, SPECCPU2000 improvements, scalability for SPECOMP 64-bit TPO enablement (compiler component) Stream_unroll and Unroll_and_fuse directives Unroll directive extended to outer loops Intrinsic Performance improvements Customer Requirements -qsuppress suboption -qextname suboption Improved Debug support with TotalView Installation and operation on OS/400 PASE on iseries
7 XL Fortran Version 8.1 for Linux on pseries Based on XL Fortran Version 8.1 for AIX Product Leverage proven industry leading performance compiler technology Fully compliant Fortran 77/90/95 compiler Partial Fortran 2003 support Allocatable components, IEEE module, INTENT on F90 pointers Many industry extensions 128-bit float, 64-bit integer, Cray pointers, structure record, union map, BYTE, STATIC, AUTOMATIC, asynchronous I/O, SIZE intrinsic 32- and 64-bit support Optimized OpenMP Fortran V2.0 support Symbolic Debugging support with gdb Portfolio of Optimizing Transformations (same as XLF V8.1 for AIX) Supports SUSE Linux Enterprise Server 8 (SLES 8) Supports RedHat Enterprise Linux 3.0 (RHEL 3) Support for TurboLinux (TLES 8) and Conectiva
8 XL Fortran V9.1 And Beyond XL Fortran Enterprise Edition V9.1 for AIX (2004) XL Fortran Enterprise Edition V9.1 for Linux (2004) XL Fortran Advanced Edition V9.1 for Mac OS X Continued rollout of Fortran 2003 POWER5 support PPC970 and PPC970FX support including SIMD vectorization (VMX) Include MASS library in compiler products Improved performance Improved compile time and reduce space usage Improved debugging New directives and intrinsics New customer requirements Supported on SLES9 and RHEL 3 update 3 (Linux only) Supported on OS/400 (V5R2) and i5/os PASE (V5R3) All information subject to change without notice
9 Tentative rollout of Fortran 2003 (XLF V9.1) Available in XL Fortran V9.1 (2004) BIND attribute and statement, BIND on derived types, common blocks, and procedures PROCEDURE, IMPORT and FLUSH statements PUBLIC/PRIVATE on components of a type ISO_FORTRAN_ENV and ISO_C_BINDING intrinsic modules ASSOCIATE Construct Command Line Argument and NEW_LINE intrinsics Intrinsic attribute on USE IOMSG= specifier All information subject to change without notice
10 Prioritized rollout of Fortran 2003 (XLF V10.1) 1. OO extensions 2. ENUM/ENUMERATOR 3. Derived type I/O 4. Asynchronous I/O additions 5. Allow character on MAX/MIN family of intrinsics 6. Procedure pointers 7. I/O specifiers (DECIMAL=, SIGN=, ROUND=, and PAD=) 8. Complex literal 9. Derived type parameters 10. Abstract interface 11. Enhance structure constructors 12. Enhanced array constructors 13.Pointer assignment enhancement (specify bounds) 14. Support for International Usage All information subject to change without notice
11 C/C++ Roadmap: Strategic Priorities Premium Customer Service Compliance to Language Standards and Industry Specifications ANSI / ISO C and C++ Standards OpenMP C/C++ API V2.0 Exploitation of Hardware Committed to maximum performance on POWER4, PPC970, POWER5 and successors Continue to work very closely with processor design teams Exploitation of OS and Middleware Synergies with operating system and middleware ISVs (performance, specialized function) Committed to AIX Linux affinity strategy and to Linux on pseries Reduced Emphasis on proprietary Tooling Affinity with GNU toolchain
12 VisualAge C++ Version 6.0 for AIX C for AIX Version 6.0 Fully compliant ISO C 1999 and ISO C standards AIX 5.2 required for ISO C 1999 complex runtime support Partial GNU C/C++ language and options compatibility 32- and 64-bit support Interprocedural Analysis (IPA) including specialized C++ optimizations for templates, exception handling and virtual functions Automatic Parallelization for C/C++ Optimized OpenMP C/C++ API V1.0 support Template instantiation improvements Reuse of template instantiations to reduce code size Parsing templates at instantiation time Improved automatic template instantiations Symbolic Debugging Support TotalView, IBM Distributed Debugger and dbx/pdbx Full support for debugging of OpenMP programs Partial support for debugging of optimized code Runtime memory debug support
13 VisualAge C++ Version 6.0 for AIX (continued) C for AIX Version 6.0 (continued) Portfolio of optimizing transformations Comprehensive path length reduction Whole program analysis Loop optimization for parallelism, locality and instruction scheduling Tuned support for all pseries processors (including POWER4) More info:
14 VisualAge C++ Version 6.0 for Linux on pseries Based on VisualAge C++ Version 6.0 for AIX Product Leverage proven industry leading performance compiler technology Fully compliant ISO C 1999 and ISO C standards Partial GNU C/C++ language and options compatibility 32- and 64-bit support Automatic Parallelization for C/C++ Optimized OpenMP C/C++ API V1.0 support Portfolio of Optimizing Transformations (same as C/C++ V6.0 for AIX) Supports SUSE Linux Enterprise Server 8 (SLES 8) Supports RedHat Enterprise Linux 3.0 (RHEL 3) Support for TurboLinux (TLES 3) and Conectiva
15 XL C/C++ V7.0 And Beyond XL C/C++ Enterprise Edition Version 7.0 for AIX (2004) XL C/C++ Enterprise Edition Version 7.0 for Linux (2004) XL C/C++ Advanced Edition Version 7.0 for Mac OS X Addenda to ISO C/C++ standards (C is 1999, C++ is 1998 with TC1) Support for some C99 features in C++ compiler: Complex, Hex Float and Variable Length Arrays, Compound Literal Further GNU C/C++ compatibility features OpenMP C/C++ API V2.0 POWER5 support PPC970 and PPC970FX support including SIMD vectorization (VMX) Improved performance 64-bit TPO enablement (compiler component) on AIX and Linux Boost and STLport libraries (as they become better accepted by the standard committee) All information subject to change without notice
16 XL C/C++ V7.0 And Beyond (Continued) XL C/C++ Enterprise Edition Version 7.0 for AIX (2004) XL C/C++ Enterprise Edition Version 7.0 for Linux (2004) XL C/C++ Advanced Edition Version 7.0 for Mac OS X IOStream Performance improvements New pragmas / options Xcode enabled and support for Frameworks (Mac OS X only) Supported on SLES9 and RHEL 3 update 3 (Linux only) Supported on OS/400 (V5R2) and i5/os PASE (V5R3) New compiler drivers (gxlc and gxlc) for compatibility with gcc and g++ Support for UPC language (AIX only) Support for Objective-C (Mac OS X only) Include MASS library in compiler products All information subject to change without notice
17 Tentative GNU C/C++ Compatibility Enhancements Labels as values / computed goto Nested functions Naming types Conditionals with omitted operands Zero length arrays Labeled elements (C only) Case ranges (C only) Cast to union (C only) Function Attributes Support Noinline, always_inline, format, format_arg, section Accept and ignore used All information subject to change without notice
18 Tentative GNU C/C++ Compatibility Enhancements (Continued) Variable Attributes Support Nocommon, transparent_union Type Attributes Support Aligned, packed Accept and ignore Transparent_union extension Incomplete enums Function names as strings Partial Asm support All information subject to change without notice
19 Performance Improvements Delivered in 2003 Included in XLF V8.1.1 release OpenMP Much faster barrier implementation (over 5x faster on 32-way p690) Much faster uncontended atomic (over 4x faster on 32-way p690) Improved parallel region startup (25%) POWER4 Expanded scope and precision of instruction scheduling More precise loop unrolling for pipelining and parallelization of reductions Use of dcbz to optimize store streams Loop unrolling for prefetch stream utilization Loop Optimization Aggressive loop fusion to exploit data reuse Index set splitting and gather/scatter to move branches out of loops Loop peeling for automatic parallelization Improved loop interchange for automatic parallelization Strip-mining of vectorized loops Vector calls to specialized routines for POWER4 (vsqrt, vrsqrt, vtan, vlog, vexp)
20 Expected Performance Improvements in 2004 Included in XLF V9.1 and XL C/C++ V7 on all platforms (AIX, Linux, Mac OS X) POWER5 Modified scheduling machine model Usage of improved prefetch facilities Usage of new instructions PPC970 Automatic generation of VMX code on Linux and Mac OS X (SIMD vectorization) Interprocedural pointer alignment propagation OpenMP Tuned support for 64-way SMP Continued improvements in overhead reduction Intrinsic functions (Fortran Only) MATMUL, TRANSFER, INDEX, TRANSPOSE
21 Expected Performance Improvements in 2004 Included in XLF V9.1 and XL C/C++ V7 on all platforms (AIX, Linux, Mac OS X) Tuning assists : BLOCK_LOOP and LOOPID directives to specify which set of loops to tile, interchange or strip-mine NOVECTOR and NOSIMD directive to tell compiler not to vectorize or simdize loop Builtin functions for generating software divides (full double precision on POWER5) Thread binding (set via XLSMPOPTS env variable) Environment variable (intrinthds) to control number of threads used by MATMUL and RANDOM_NUMBER (added in XLF V8.1.1 PTF) View and manipulate information gathered by profile directed feedback (-qpdf1/-qpdf2) via showpdf and mergepdf tools New prefetch directives for POWER5
22 Expected Performance Improvements in 2004 Included in XLF V9.1 and XL C/C++ V7 on all platforms (AIX, Linux, Mac OS X) Loop Optimizations : Modulo scheduling of loops which contain branches Further improvements to loop fusion for data reuse (e.g. loop alignment) Perform vectorization on all platforms (including Linux and Mac OS X) Enhancement of vectorization (additional functions, loop versioning, vector merging) Tiling for BLAS-like and streaming loop nests Predictive Commoning (common subexpression elimination across loop iterations) Improved data dependence analysis Automatic generation of software divides on POWER5 Automatic generation of stream prefetch instructions on POWER5
23 Expected Performance Improvements in 2004 Included in XLF V9.1 and XL C/C++ V7 on all platforms (AIX, Linux, Mac OS X) Other Optimizations : Interprocedural Strength reduction Interprocedural Register Allocation Split array of structures into multiple arrays for better exploitation of hardware streams and smaller d-cache footprint Use Profile Directed Feedback (PDF) information to: Specialize calls to malloc/calloc to use pools of small objects Specialize memcpy and memset with small lengths Specialize integer divide and modulo Form Super blocks for better instruction scheduling
24 Expected Performance Improvements in 2005 Loop Optimizations: Sparse Vectorization Recognize BLAS loop and replace with call to ESSL Exploitation of SMT threads in parallel programs Improved dependence analysis Array Data flow analysis for array privatization and contraction Automatic array padding Improved automatic parallelization (wavefront, multidimensional reductions) Other Optimizations: Outline cold fields of data structures for smaller d-cache footprint (using pdf) Outline cold regions of code for smaller i-cache footprint (using pdf) All information subject to change without notice
25 Installation of Multiple Compiler Versions Installation of multiple compiler versions is supported The vacppndi and xlfndi scripts shipped with VisualAge C and XL Fortran 8.1 allow the installation of a given compiler release or update into a non-default directory The configuration file can be used to direct compilation to a specific version of the compiler Example: xlf_v8r1 c foo.f May direct compilation to use components in a non-default directory Care must be taken when multiple runtimes are installed on the same machine (details on next slide)
26 Coexistence of Multiple Compiler Runtimes Backward compatibility C, C++ and Fortran runtimes support backward compatibility. Executables generated by an earlier release of a compiler will work with a later version of the run-time environment. Concurrent installation Multiple versions of a compiler and run-time environment can be installed on the same machine Full support in xlfndi and vacppndi scripts is now available Limited support for coexistence LIBPATH must be used to ensure that a compatible runtime version is used with a given executable Only one runtime version can be used in a given process. Renaming a compiler library is not allowed. Take care in statically linking compiler libraries or in the use of dlopen or load. Details in the compiler FAQ
27 Documentation An information center containing the documentation for the current versions of the AIX compilers (both Fortran and C/C++) is available at: Similar documentation for the current version of Mac OS X compilers (both Fortran and C/C++) is available at: This information center contains all the html documentation shipped with the compilers. It is completely searchable. Please send any comments or suggestions on this information center or about the existing C, C++ or Fortran documentation shipped with the products to compinfo@ca.ibm.com.
28 Software Divide Improvements On POWER5 % Imrovement vs hardware divide SP SWDIV SP SWDIV_NOCHK Software Divide Instrinsics DP SWDIV DP SWDIV -qnostrict DP SWDIV_NOCHK DP SWDIV_NOCHK -qnostrict
29 SPEC FP Base Improvements From Compiler On POWER4 XLF V9.1 & XLC V7 Versus XLF V8.1.1 & XLC V6 % improvement SPECFP Benchmarks 168.wupwise 171.swim 172.mgrid 173.applu 177.mesa 178.galgel 179.art 183.equake 187.facerec 188.ammp 189.lucas 191.fma3d 200.sixtrack 301.apsi SPEC FP + 18%
30 SPEC FP Base Improvements From Compiler On POWER5 XLF V9.1 & XLC V7 Versus XLF V8.1.1 & XLC V6 % Improvement SPECFP Benchmarks 168.wupwise 171.swim 172.mgrid 173.applu 177.mesa 178.galgel 179.art 183.equake 187.facerec 188.ammp 189.lucas 191.fma3d 200.sixtrack 301.apsi SPEC FP + 20%
31 SPEC FP Base Auto-Parallelization (2 CPUs, POWER4) XLF V9.1 & XLC V7 2-CPU versus 1-CPU % Improvement SPECFP Benchmarks 168.wupwise 171.swim 172.mgrid 173.applu 177.mesa 178.galgel 179.art 183.equake 187.facerec 188.ammp 189.lucas 191.fma3d 200.sixtrack 301.apsi SPEC FP + 8%
32 90 70 SPEC FP Peak Differences Between IT2 (1.5 GHZ) and POWER5 P5 / IT2 % difference GHZ 1.9 GHZ 168.wupwise 171.swim 172.mgrid 173.applu 177.mesa 178.galgel 179.art 183.equake 187.facerec 188.ammp 189.lucas 191.fma3d 200.sixtrack 301.apsi SPECFP Peak
33 32-way EPCC results on AIX 5.2 p690 system (1.1 GHZ) XLF XLF PAR DO PDO BAR SING CRIT LOCK ORD ATM REDC Time in micro-seconds - lower is better
34 SPECOMP Base Improvements From Compiler On POWER4 (32-way) XLF V9.1 & XLC V7 Versus XLF V8.1.1 & XLC V6 % improvement SPECOMP Benchmarks 310.wupwise_m 312.swim_m 314.mgrid_m 316.applu_m 318.galgel_m 320.equake_m 324.apsi_m 326.gafort_m 328.fma3d_m 330.art_m 332.ammp_m SPECOMP + 8%
35 SPECOMP Scalability On POWER4 (16 vs 32 CPUs) Scalability Factor 1.0 implies perfect scaling SPECOMP Benchmarks 310.wupwise_m 312.swim_m 314.mgrid_m 316.applu_m 318.galgel_m 320.equake_m 324.apsi_m 326.gafort_m 328.fma3d_m 330.art_m 332.ammp_m
36 SPEC OMPM2001 Base Versus Competition 40 Thousands IBM 32x1.7 HP-I 32x1.5 SGI-I 32x1.5 HP-A 64x1.15 HP-I 64x1.5 SGI-I 64x1.5 FUJ 64x1.3 IBM P *1.9 0 SPECMARK
37 Himeno Benchmark Improvement on POWER XLF V8.1.1 XLF V CPU 2-CPU 4-CPU 8-CPU Small Data Problem Measured in MFLOPS on 1.1GHZ p690
IBM pseries Compiler Roadmap
IBM pseries Compiler Roadmap Roch Archambault IBM Toronto Laboratory archie@ca.ibm.com SCICOMP 12 July 20, 2006 Agenda The pseries Compiler Products Roadmaps Common Features & Compiler Architecture XL
More informationIBM POWER Systems Compiler Roadmap
IBM POWER Systems Compiler Roadmap Roch Archambault IBM Toronto Laboratory archie@ca.ibm.com SCICOMP-14 May 22, 2008 Agenda Overall Roadmap The POWER Systems Compiler Products Detailed Roadmaps Common
More informationIBM System p Compiler Roadmap
IBM System p Compiler Roadmap Roch Archambault IBM Toronto Laboratory archie@ca.ibm.com SCICOMP-13 July 19, 2007 Agenda Overall Roadmap The System p Compiler Products Detailed Roadmaps Common Features
More informationArchitecture Cloning For PowerPC Processors. Edwin Chan, Raul Silvera, Roch Archambault IBM Toronto Lab Oct 17 th, 2005
Architecture Cloning For PowerPC Processors Edwin Chan, Raul Silvera, Roch Archambault edwinc@ca.ibm.com IBM Toronto Lab Oct 17 th, 2005 Outline Motivation Implementation Details Results Scenario Previously,
More informationIBM XL Fortran Advanced Edition V8.1 for Mac OS X A new platform supported in the IBM XL Fortran family
Software Announcement January 13, 2004 IBM XL Fortran Advanced Edition V8.1 for Mac OS X A new platform supported in the IBM XL Fortran family Overview IBM extends the XL Fortran family to the Apple Mac
More informationUpgrading XL Fortran Compilers
Upgrading XL Fortran Compilers Oeriew Upgrading to the latest IBM XL Fortran compilers makes good business sense. Upgrading puts new capabilities into the hands of your programmers making them and your
More informationIBM. IBM XL C/C++ and XL Fortran compilers on Power architectures overview
IBM IBM XL C/C++ and XL Fortran compilers on Power architectures overview December 2017 References in this document to IBM products, programs, or services do not imply that IBM intends to make these available
More informationXL C/C++ Advanced Edition V6.0 for Mac OS X A new platform for the IBM family of C/C++ compilers
Software Announcement January 13, 2004 XL C/C++ Advanced Edition V6.0 for Mac OS X A new platform for the IBM family of C/C++ compilers Overview XL C/C++ Advanced Edition for Mac OS X is an optimizing,
More informationAMD S X86 OPEN64 COMPILER. Michael Lai AMD
AMD S X86 OPEN64 COMPILER Michael Lai AMD CONTENTS Brief History AMD and Open64 Compiler Overview Major Components of Compiler Important Optimizations Recent Releases Performance Applications and Libraries
More informationIBM XL C/C++ Enterprise Edition supports POWER5 architecture
Software Announcement August 31, 2004 IBM XL C/C++ Enterprise Edition supports POWER5 architecture Overview IBM XL C/C++ Enterprise Edition V7.0 for AIX is an optimizing, standards-based compiler for the
More informationTDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading
Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5
More informationData Prefetch and Software Pipelining. Stanford University CS243 Winter 2006 Wei Li 1
Data Prefetch and Software Pipelining Wei Li 1 Data Prefetch Software Pipelining Agenda 2 Why Data Prefetching Increasing Processor Memory distance Caches do work!!! IF Data set cache-able, able, accesses
More informationGCC Developers Summit Ottawa, Canada, June 2006
OpenMP Implementation in GCC Diego Novillo dnovillo@redhat.com Red Hat Canada GCC Developers Summit Ottawa, Canada, June 2006 OpenMP Language extensions for shared memory concurrency (C, C++ and Fortran)
More informationIBM XL Fortran Advanced Edition V10.1 for Linux now supports Power5+ architecture
Software Announcement December 6, 2005 IBM now supports Power5+ architecture Overview IBM XL Fortran is a standards-based, highly optimized compiler. With IBM XL Fortran technology, you have a powerful
More information15-740/ Computer Architecture Lecture 10: Runahead and MLP. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 10: Runahead and MLP Prof. Onur Mutlu Carnegie Mellon University Last Time Issues in Out-of-order execution Buffer decoupling Register alias tables Physical
More informationA Framework for Safe Automatic Data Reorganization
Compiler Technology A Framework for Safe Automatic Data Reorganization Shimin Cui (Speaker), Yaoqing Gao, Roch Archambault, Raul Silvera IBM Toronto Software Lab Peng Peers Zhao, Jose Nelson Amaral University
More informationIntroduction. No Optimization. Basic Optimizations. Normal Optimizations. Advanced Optimizations. Inter-Procedural Optimizations
Introduction Optimization options control compile time optimizations to generate an application with code that executes more quickly. Absoft Fortran 90/95 is an advanced optimizing compiler. Various optimizers
More informationIBM XL Fortran Advanced Edition for Linux, V11.1 exploits the capabilities of the IBM POWER6 processors
IBM United States Announcement 207-158, dated July 24, 2007 IBM, V11.1 exploits the capabilities of the IBM POWER6 processors...2 Product positioning... 4 Offering Information...5 Publications... 5 Technical
More informationCode optimization with the IBM XL compilers on Power architectures IBM
Code optimization with the IBM XL compilers on Power architectures IBM December 2017 References in this document to IBM products, programs, or services do not imply that IBM intends to make these available
More informationInserting Data Prefetches into Loops in Dynamically Translated Code in IA-32EL. Inserting Prefetches IA-32 Execution Layer - 1
I Inserting Data Prefetches into Loops in Dynamically Translated Code in IA-32EL Inserting Prefetches IA-32 Execution Layer - 1 Agenda IA-32EL Brief Overview Prefetching in Loops IA-32EL Prefetching in
More informationGetting Started with XL C
IBM XL C for AIX, V12.1 Getting Started with XL C Version 12.1 SC14-7323-00 IBM XL C for AIX, V12.1 Getting Started with XL C Version 12.1 SC14-7323-00 Note Before using this information and the product
More informationMULTI-CORE PROGRAMMING. Dongrui She December 9, 2010 ASSIGNMENT
MULTI-CORE PROGRAMMING Dongrui She December 9, 2010 ASSIGNMENT Goal of the Assignment 1 The purpose of this assignment is to Have in-depth understanding of the architectures of real-world multi-core CPUs
More informationPerformance Tools and Environments Carlo Nardone. Technical Systems Ambassador GSO Client Solutions
Performance Tools and Environments Carlo Nardone Technical Systems Ambassador GSO Client Solutions The Stack Applications Grid Management Standards, Open Source v. Commercial Libraries OS Management MPI,
More informationImpact of Cache Coherence Protocols on the Processing of Network Traffic
Impact of Cache Coherence Protocols on the Processing of Network Traffic Amit Kumar and Ram Huggahalli Communication Technology Lab Corporate Technology Group Intel Corporation 12/3/2007 Outline Background
More informationScalability issues : HPC Applications & Performance Tools
High Performance Computing Systems and Technology Group Scalability issues : HPC Applications & Performance Tools Chiranjib Sur HPC @ India Systems and Technology Lab chiranjib.sur@in.ibm.com Top 500 :
More informationNovember IBM XL C/C++ Compilers Insights on Improving Your Application
November 2010 IBM XL C/C++ Compilers Insights on Improving Your Application Page 1 Table of Contents Purpose of this document...2 Overview...2 Performance...2 Figure 1:...3 Figure 2:...4 Exploiting the
More informationWhich is the best? Measuring & Improving Performance (if planes were computers...) An architecture example
1 Which is the best? 2 Lecture 05 Performance Metrics and Benchmarking 3 Measuring & Improving Performance (if planes were computers...) Plane People Range (miles) Speed (mph) Avg. Cost (millions) Passenger*Miles
More informationIBM z/os V1R13 XL C/C++
IBM z/os V1R13 XL C/C++ Enable high-performing z/os XL C/C++ programs for workload optimized business software solutions Highlights v Enhances system programming capabilities by adding advanced optimization
More informationChapter-5 Memory Hierarchy Design
Chapter-5 Memory Hierarchy Design Unlimited amount of fast memory - Economical solution is memory hierarchy - Locality - Cost performance Principle of locality - most programs do not access all code or
More informationBarbara Chapman, Gabriele Jost, Ruud van der Pas
Using OpenMP Portable Shared Memory Parallel Programming Barbara Chapman, Gabriele Jost, Ruud van der Pas The MIT Press Cambridge, Massachusetts London, England c 2008 Massachusetts Institute of Technology
More informationCell SDK and Best Practices
Cell SDK and Best Practices Stefan Lutz Florian Braune Hardware-Software-Co-Design Universität Erlangen-Nürnberg siflbrau@mb.stud.uni-erlangen.de Stefan.b.lutz@mb.stud.uni-erlangen.de 1 Overview - Introduction
More informationAries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX
Aries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX Keerthi Bhushan Rajesh K Chaurasia Hewlett-Packard India Software Operations 29, Cunningham Road Bangalore 560 052 India +91-80-2251554
More informationARMv8-A Software Development
ARMv8-A Software Development Course Description ARMv8-A software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to develop software for
More informationC6000 Compiler Roadmap
C6000 Compiler Roadmap CGT v7.4 CGT v7.3 CGT v7. CGT v8.0 CGT C6x v8. CGT Longer Term In Development Production Early Adopter Future CGT v7.2 reactive Current 3H2 4H 4H2 H H2 Future CGT C6x v7.3 Control
More informationMartin Kruliš, v
Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal
More informationPost-K: Building the Arm HPC Ecosystem
Post-K: Building the Arm HPC Ecosystem Toshiyuki Shimizu FUJITSU LIMITED Nov. 14th, 2017 Exhibitor Forum, SC17, Nov. 14, 2017 0 Post-K: Building up Arm HPC Ecosystem Fujitsu s approach for HPC Approach
More informationProgress on OpenMP Specifications
Progress on OpenMP Specifications Wednesday, November 13, 2012 Bronis R. de Supinski Chair, OpenMP Language Committee This work has been authored by Lawrence Livermore National Security, LLC under contract
More informationIBM Parallel Environment for Linux, V4.3 delivers enhancements for IBM System x clusters, including a parallel debugger
Software Announcement IBM Parallel Environment for Linux, V4.3 delivers enhancements for IBM System x clusters, including a parallel debugger Overview Parallel Environment for Linux Multiplatform (IBM
More informationIBM. Getting Started with XL Fortran for Little Endian Distributions. IBM XL Fortran for Linux, V Version 15.1.
IBM XL Fortran for Linux, V15.1.6 IBM Getting Started with XL Fortran for Little Endian Distributions Version 15.1.6 SC27-6620-05 IBM XL Fortran for Linux, V15.1.6 IBM Getting Started with XL Fortran
More informationFortran 2003 and Beyond. Bill Long, Cray Inc. 17-May-2005
Fortran 2003 and Beyond Bill Long, Cray Inc. 17-May-2005 Fortran 2003 and Fortran 2008 Fortran is now ISO/IEC 1539:1-2004 Common name is Fortran 2003 or f03 This cancels and replaces the previous standard,
More informationIntroduction to Programming Using Java (98-388)
Introduction to Programming Using Java (98-388) Understand Java fundamentals Describe the use of main in a Java application Signature of main, why it is static; how to consume an instance of your own class;
More informationIntel C++ Compiler Professional Edition 11.1 for Linux* In-Depth
Intel C++ Compiler Professional Edition 11.1 for Linux* In-Depth Contents Intel C++ Compiler Professional Edition 11.1 for Linux*.... 3 Intel C++ Compiler Professional Edition Components:......... 3 s...3
More informationProcessors, Performance, and Profiling
Processors, Performance, and Profiling Architecture 101: 5-Stage Pipeline Fetch Decode Execute Memory Write-Back Registers PC FP ALU Memory Architecture 101 1. Fetch instruction from memory. 2. Decode
More informationGetting Started with XL C/C++
IBM XL C/C++ Enterprise Edition V8.0 for AIX Getting Started with XL C/C++ SC09-7997-00 IBM XL C/C++ Enterprise Edition V8.0 for AIX Getting Started with XL C/C++ SC09-7997-00 Note! Before using this
More informationFahad Zafar, Dibyajyoti Ghosh, Lawrence Sebald, Shujia Zhou. University of Maryland Baltimore County
Accelerating a climate physics model with OpenCL Fahad Zafar, Dibyajyoti Ghosh, Lawrence Sebald, Shujia Zhou University of Maryland Baltimore County Introduction The demand to increase forecast predictability
More informationThe Cray Programming Environment. An Introduction
The Cray Programming Environment An Introduction Vision Cray systems are designed to be High Productivity as well as High Performance Computers The Cray Programming Environment (PE) provides a simple consistent
More informationUsing Java for Scientific Computing. Mark Bul EPCC, University of Edinburgh
Using Java for Scientific Computing Mark Bul EPCC, University of Edinburgh markb@epcc.ed.ac.uk Java and Scientific Computing? Benefits of Java for Scientific Computing Portability Network centricity Software
More informationKampala August, Agner Fog
Advanced microprocessor optimization Kampala August, 2007 Agner Fog www.agner.org Agenda Intel and AMD microprocessors Out Of Order execution Branch prediction Platform, 32 or 64 bits Choice of compiler
More informationIntel C++ Compiler Professional Edition 11.0 for Linux* In-Depth
Intel C++ Compiler Professional Edition 11.0 for Linux* In-Depth Contents Intel C++ Compiler Professional Edition for Linux*...3 Intel C++ Compiler Professional Edition Components:...3 Features...3 New
More informationSupporting the new IBM z13 mainframe and its SIMD vector unit
Supporting the new IBM z13 mainframe and its SIMD vector unit Dr. Ulrich Weigand Senior Technical Staff Member GNU/Linux Compilers & Toolchain Date: Apr 13, 2015 2015 IBM Corporation Agenda IBM z13 Vector
More informationCompilation for Heterogeneous Platforms
Compilation for Heterogeneous Platforms Grid in a Box and on a Chip Ken Kennedy Rice University http://www.cs.rice.edu/~ken/presentations/heterogeneous.pdf Senior Researchers Ken Kennedy John Mellor-Crummey
More informationShort Notes of CS201
#includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system
More informationOverview: The OpenMP Programming Model
Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP
More informationCS201 - Introduction to Programming Glossary By
CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with
More informationUnder the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world.
Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world. Supercharge your PS3 game code Part 1: Compiler internals.
More informationIntel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2
Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2 This release of the Intel C++ Compiler 16.0 product is a Pre-Release, and as such is 64 architecture processor supporting
More informationIBM Lotus Domino Product Roadmap
IBM Lotus Domino Product Roadmap Your Name Your Title Today s agenda Domino Strategy What s coming in Domino 8? What s planned beyond Domino 8? Lotus Domino Strategy The integrated messaging & collaboration
More informationRegister Packing Exploiting Narrow-Width Operands for Reducing Register File Pressure
Register Packing Exploiting Narrow-Width Operands for Reducing Register File Pressure Oguz Ergin*, Deniz Balkan, Kanad Ghose, Dmitry Ponomarev Department of Computer Science State University of New York
More informationOptimising with the IBM compilers
Optimising with the IBM Overview Introduction Optimisation techniques compiler flags compiler hints code modifications Optimisation topics locals and globals conditionals data types CSE divides and square
More informationWIND RIVER DIAB COMPILER
AN INTEL COMPANY WIND RIVER DIAB COMPILER Boost application performance, reduce memory footprint, and produce high-quality, standards-compliant object code for embedded systems with Wind River Diab Compiler.
More informationWorkloads, Scalability and QoS Considerations in CMP Platforms
Workloads, Scalability and QoS Considerations in CMP Platforms Presenter Don Newell Sr. Principal Engineer Intel Corporation 2007 Intel Corporation Agenda Trends and research context Evolving Workload
More informationJosé F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2
CHERRY: CHECKPOINTED EARLY RESOURCE RECYCLING José F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2 1 2 3 MOTIVATION Problem: Limited processor resources Goal: More
More informationMake Your C/C++ and PL/I Code FLY With the Right Compiler Options
Make Your C/C++ and PL/I Code FLY With the Right Compiler Options Visda Vokhshoori/Peter Elderon IBM Corporation Session 13790 Insert Custom Session QR if Desired. WHAT does good application performance
More informationIBM. Getting Started with XL Fortran for Little Endian Distributions. IBM XL Fortran for Linux, V Version 15.1.
IBM XL Fortran for Linux, V15.1.1 IBM Getting Started with XL Fortran for Little Endian Distributions Version 15.1.1 SC27-6620-00 IBM XL Fortran for Linux, V15.1.1 IBM Getting Started with XL Fortran
More informationAdministrivia. CMSC 411 Computer Systems Architecture Lecture 14 Instruction Level Parallelism (cont.) Control Dependencies
Administrivia CMSC 411 Computer Systems Architecture Lecture 14 Instruction Level Parallelism (cont.) HW #3, on memory hierarchy, due Tuesday Continue reading Chapter 3 of H&P Alan Sussman als@cs.umd.edu
More informationCode optimization techniques
& Alberto Bertoldo Advanced Computing Group Dept. of Information Engineering, University of Padova, Italy cyberto@dei.unipd.it May 19, 2009 The Four Commandments 1. The Pareto principle 80% of the effects
More informationOpenACC Course. Office Hour #2 Q&A
OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle
More informationOverview Implicit Vectorisation Explicit Vectorisation Data Alignment Summary. Vectorisation. James Briggs. 1 COSMOS DiRAC.
Vectorisation James Briggs 1 COSMOS DiRAC April 28, 2015 Session Plan 1 Overview 2 Implicit Vectorisation 3 Explicit Vectorisation 4 Data Alignment 5 Summary Section 1 Overview What is SIMD? Scalar Processing:
More informationCluster Computing. Performance and Debugging Issues in OpenMP. Topics. Factors impacting performance. Scalable Speedup
Topics Scalable Speedup and Data Locality Parallelizing Sequential Programs Breaking data dependencies Avoiding synchronization overheads Performance and Debugging Issues in OpenMP Achieving Cache and
More informationIBM. Code optimization with the IBM z/os XL C/C++ compiler. Introduction. z/os XL C/C++ compiler
IBM Code optimization with the IBM z/os XL C/C++ compiler Introduction The IBM z/os XL C/C++ compiler is built on an industry-wide reputation for robustness, versatility, and high level of standards compliance.
More informationOpenMP 4.0/4.5. Mark Bull, EPCC
OpenMP 4.0/4.5 Mark Bull, EPCC OpenMP 4.0/4.5 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all
More informationCortex-R5 Software Development
Cortex-R5 Software Development Course Description Cortex-R5 software development is a three days ARM official course. The course goes into great depth, and provides all necessary know-how to develop software
More informationIBM PSSC Montpellier Customer Center. Blue Gene/P ASIC IBM Corporation
Blue Gene/P ASIC Memory Overview/Considerations No virtual Paging only the physical memory (2-4 GBytes/node) In C, C++, and Fortran, the malloc routine returns a NULL pointer when users request more memory
More informationIntel C++ Compiler Professional Edition 11.1 for Mac OS* X. In-Depth
Intel C++ Compiler Professional Edition 11.1 for Mac OS* X In-Depth Contents Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. 3 Intel C++ Compiler Professional Edition 11.1 Components:...3 Features...3
More informationOpenACC 2.6 Proposed Features
OpenACC 2.6 Proposed Features OpenACC.org June, 2017 1 Introduction This document summarizes features and changes being proposed for the next version of the OpenACC Application Programming Interface, tentatively
More informationPowerEdge 3250 Features and Performance Report
Performance Brief Jan 2004 Revision 3.2 Executive Summary Dell s Itanium Processor Strategy and 1 Product Line 2 Transitioning to the Itanium Architecture 3 Benefits of the Itanium processor Family PowerEdge
More informationLow-Complexity Reorder Buffer Architecture*
Low-Complexity Reorder Buffer Architecture* Gurhan Kucuk, Dmitry Ponomarev, Kanad Ghose Department of Computer Science State University of New York Binghamton, NY 13902-6000 http://www.cs.binghamton.edu/~lowpower
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class
More informationECE 574 Cluster Computing Lecture 10
ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular
More informationExploring Parallelism At Different Levels
Exploring Parallelism At Different Levels Balanced composition and customization of optimizations 7/9/2014 DragonStar 2014 - Qing Yi 1 Exploring Parallelism Focus on Parallelism at different granularities
More informationA generic approach to the definition of low-level components for multi-architecture binary analysis
A generic approach to the definition of low-level components for multi-architecture binary analysis Cédric Valensi PhD advisor: William Jalby University of Versailles Saint-Quentin-en-Yvelines, France
More informationBuilding a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano
Building a Runnable Program and Code Improvement Dario Marasco, Greg Klepic, Tess DiStefano Building a Runnable Program Review Front end code Source code analysis Syntax tree Back end code Target code
More informationBase Vectors: A Potential Technique for Micro-architectural Classification of Applications
Base Vectors: A Potential Technique for Micro-architectural Classification of Applications Dan Doucette School of Computing Science Simon Fraser University Email: ddoucett@cs.sfu.ca Alexandra Fedorova
More information41st Cray User Group Conference Minneapolis, Minnesota
41st Cray User Group Conference Minneapolis, Minnesota (MSP) Technical Lead, MSP Compiler The Copyright SGI Multi-Stream 1999, SGI Processor We know Multi-level parallelism experts for 25 years Multiple,
More informationS Comparing OpenACC 2.5 and OpenMP 4.5
April 4-7, 2016 Silicon Valley S6410 - Comparing OpenACC 2.5 and OpenMP 4.5 James Beyer, NVIDIA Jeff Larkin, NVIDIA GTC16 April 7, 2016 History of OpenMP & OpenACC AGENDA Philosophical Differences Technical
More informationIBM. Data Sheet. Enterprise COBOL for z/os. Version 6.2
Enterprise COBOL for z/os IBM Data Sheet Version 6.2 Enterprise COBOL for z/os IBM Data Sheet Version 6.2 Third edition (January 2018) This edition applies to Version 6 Release 2 of IBM Enterprise COBOL
More informationCISC 662 Graduate Computer Architecture Lecture 18 - Cache Performance. Why More on Memory Hierarchy?
CISC 662 Graduate Computer Architecture Lecture 18 - Cache Performance Michela Taufer Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture, 4th edition ---- Additional
More informationPGI Accelerator Programming Model for Fortran & C
PGI Accelerator Programming Model for Fortran & C The Portland Group Published: v1.3 November 2010 Contents 1. Introduction... 5 1.1 Scope... 5 1.2 Glossary... 5 1.3 Execution Model... 7 1.4 Memory Model...
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationSimultaneous Multithreading on Pentium 4
Hyper-Threading: Simultaneous Multithreading on Pentium 4 Presented by: Thomas Repantis trep@cs.ucr.edu CS203B-Advanced Computer Architecture, Spring 2004 p.1/32 Overview Multiple threads executing on
More informationIntroduction to CELL B.E. and GPU Programming. Agenda
Introduction to CELL B.E. and GPU Programming Department of Electrical & Computer Engineering Rutgers University Agenda Background CELL B.E. Architecture Overview CELL B.E. Programming Environment GPU
More informationNOW Handout Page # Why More on Memory Hierarchy? CISC 662 Graduate Computer Architecture Lecture 18 - Cache Performance
CISC 66 Graduate Computer Architecture Lecture 8 - Cache Performance Michela Taufer Performance Why More on Memory Hierarchy?,,, Processor Memory Processor-Memory Performance Gap Growing Powerpoint Lecture
More informationIntel Knights Landing Hardware
Intel Knights Landing Hardware TACC KNL Tutorial IXPUG Annual Meeting 2016 PRESENTED BY: John Cazes Lars Koesterke 1 Intel s Xeon Phi Architecture Leverages x86 architecture Simpler x86 cores, higher compute
More informationParallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model
Parallel Programming Principle and Practice Lecture 9 Introduction to GPGPUs and CUDA Programming Model Outline Introduction to GPGPUs and Cuda Programming Model The Cuda Thread Hierarchy / Memory Hierarchy
More informationMany Cores, One Thread: Dean Tullsen University of California, San Diego
Many Cores, One Thread: The Search for Nontraditional Parallelism University of California, San Diego There are some domains that feature nearly unlimited parallelism. Others, not so much Moore s Law and
More informationPerformance Best Practices Paper for IBM Tivoli Directory Integrator v6.1 and v6.1.1
Performance Best Practices Paper for IBM Tivoli Directory Integrator v6.1 and v6.1.1 version 1.0 July, 2007 Table of Contents 1. Introduction...3 2. Best practices...3 2.1 Preparing the solution environment...3
More informationIntel Parallel Studio XE 2015
2015 Create faster code faster with this comprehensive parallel software development suite. Faster code: Boost applications performance that scales on today s and next-gen processors Create code faster:
More informationImplementation of Parallelization
Implementation of Parallelization OpenMP, PThreads and MPI Jascha Schewtschenko Institute of Cosmology and Gravitation, University of Portsmouth May 9, 2018 JAS (ICG, Portsmouth) Implementation of Parallelization
More informationHigh Performance Computing: Tools and Applications
High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 8 Processor-level SIMD SIMD instructions can perform
More informationAddressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer
Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems Ed Hinkel Senior Sales Engineer Agenda Overview - Rogue Wave & TotalView GPU Debugging with TotalView Nvdia CUDA Intel Phi 2
More information