AMD S X86 OPEN64 COMPILER. Michael Lai AMD

Similar documents
Run Anywhere. The Hardware Platform Perspective. Ben Pollan, AMD Java Labs October 28, 2008

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD

SIMULATOR AMD RESEARCH JUNE 14, 2015

Introduction. No Optimization. Basic Optimizations. Normal Optimizations. Advanced Optimizations. Inter-Procedural Optimizations

The SGI Pro64 Compiler Infrastructure - A Tutorial

Panel Discussion: The Future of I/O From a CPU Architecture Perspective

OPENCL TM APPLICATION ANALYSIS AND OPTIMIZATION MADE EASY WITH AMD APP PROFILER AND KERNELANALYZER


Multi-core processors are here, but how do you resolve data bottlenecks in native code?

AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016

AMD Graphics Team Last Updated February 11, 2013 APPROVED FOR PUBLIC DISTRIBUTION. 1 3DMark Overview February 2013 Approved for public distribution

AMD IOMMU VERSION 2 How KVM will use it. Jörg Rödel August 16th, 2011

Maximizing Six-Core AMD Opteron Processor Performance with RHEL

MIGRATION OF LEGACY APPLICATIONS TO HETEROGENEOUS ARCHITECTURES Francois Bodin, CTO, CAPS Entreprise. June 2011

ADVANCED RENDERING EFFECTS USING OPENCL TM AND APU Session Olivier Zegdoun AMD Sr. Software Engineer

AMD Graphics Team Last Updated April 29, 2013 APPROVED FOR PUBLIC DISTRIBUTION. 1 3DMark Overview April 2013 Approved for public distribution

Fusion Enabled Image Processing

AMD RYZEN PROCESSOR WITH RADEON VEGA GRAPHICS CORPORATE BRAND GUIDELINES

AMD CORPORATE TEMPLATE AMD Radeon Open Compute Platform Felix Kuehling

The Rise of Open Programming Frameworks. JC BARATAULT IWOCL May 2015

Code Merge. Flow Analysis. bookkeeping

ROCm: An open platform for GPU computing exploration

Automatic Intra-Application Load Balancing for Heterogeneous Systems

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON GPUS USING THE CSR STORAGE FORMAT

FUSION PROCESSORS AND HPC

AMD APU and Processor Comparisons. AMD Client Desktop Feb 2013 AMD

HyperTransport Technology

STREAMING VIDEO DATA INTO 3D APPLICATIONS Session Christopher Mayer AMD Sr. Software Engineer

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to

SOLUTION TO SHADER RECOMPILES IN RADEONSI SEPTEMBER 2015

AMD DEVELOPER INSIDE TRACK

Designing Natural Interfaces

HIGHLY PARALLEL COMPUTING IN PHYSICS-BASED RENDERING OpenCL Raytracing Based. Thibaut PRADOS OPTIS Real-Time & Virtual Reality Manager

Gestural and Cinematic Interfaces - DX11. David Brebner Unlimited Realities CTO

Accelerating Applications. the art of maximum performance computing James Spooner Maxeler VP of Acceleration

RegMutex: Inter-Warp GPU Register Time-Sharing

Cilk Plus: Multicore extensions for C and C++

THE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD

SCALING DGEMM TO MULTIPLE CAYMAN GPUS AND INTERLAGOS MANY-CORE CPUS FOR HPL

EXPLOITING ACCELERATOR-BASED HPC FOR ARMY APPLICATIONS

BIOMEDICAL DATA ANALYSIS ON HETEROGENEOUS PLATFORM. Dong Ping Zhang Heterogeneous System Architecture AMD

NEXT-GENERATION MATRIX 3D IMMERSIVE USER INTERFACE [ M3D-IUI ] H Raghavendra Swamy AMD Senior Software Engineer

KVM CPU MODEL IN SYSCALL EMULATION MODE ALEXANDRU DUTU, JOHN SLICE JUNE 14, 2015

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to

FLASH MEMORY SUMMIT Adoption of Caching & Hybrid Solutions

Use cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games

viewdle! - machine vision experts

1 HiPEAC January, 2012 Public TASKS, FUTURES AND ASYNCHRONOUS PROGRAMMING

HPG 2011 HIGH PERFORMANCE GRAPHICS HOT 3D

Sequential Consistency for Heterogeneous-Race-Free

Heterogeneous Computing

OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data

Pattern-based analytics to estimate and track yield risk of designs down to 7nm

Understanding GPGPU Vector Register File Usage

INTERFERENCE FROM GPU SYSTEM SERVICE REQUESTS

HPCA 18. Reliability-aware Data Placement for Heterogeneous memory Architecture

AMD RYZEN CORPORATE BRAND GUIDELINES

Desktop Telepresence Arrived! Sudha Valluru ViVu CEO

AMD EPYC CORPORATE BRAND GUIDELINES

The Road to the AMD. Fiji GPU. Featuring Die Stacking and HBM Technology 1 THE ROAD TO THE AMD FIJI GPU ECTC 2016 MAY 2015

CAUTIONARY STATEMENT 1 AMD NEXT HORIZON NOVEMBER 6, 2018

3D Numerical Analysis of Two-Phase Immersion Cooling for Electronic Components

1 Presentation Title Month ##, 2012

ACCELERATING MATRIX PROCESSING WITH GPUs. Nicholas Malaya, Shuai Che, Joseph Greathouse, Rene van Oostrum, and Michael Schulte AMD Research

DR. LISA SU

Family 15h Models 00h-0Fh AMD Opteron Processor Product Data Sheet

AMD SEV Update Linux Security Summit David Kaplan, Security Architect

Family 15h Models 00h-0Fh AMD FX -Series Processor Product Data Sheet

GPGPU COMPUTE ON AMD. Udeepta Bordoloi April 6, 2011

D3D12 & Vulkan: Lessons learned. Dr. Matthäus G. Chajdas Developer Technology Engineer, AMD

Microsoft Windows 2016 Mellanox 100GbE NIC Tuning Guide

AMD AIB Partner Guidelines. Version February, 2015

Fan Control in AMD Radeon Pro Settings. User Guide. This document is a quick user guide on how to configure GPU fan speed in AMD Radeon Pro Settings.

MEASURING AND MODELING ON-CHIP INTERCONNECT POWER ON REAL HARDWARE

PROTECTING VM REGISTER STATE WITH AMD SEV-ES DAVID KAPLAN LSS 2017

AMD Radeon ProRender plug-in for Unreal Engine. Installation Guide

Generic System Calls for GPUs

AMD HD3D Technology. Setup Guide. 1 AMD HD3D TECHNOLOGY: Setup Guide

LIQUIDVR TODAY AND TOMORROW GUENNADI RIGUER, SOFTWARE ARCHITECT

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management

Vulkan (including Vulkan Fast Paths)

clarmor: A DYNAMIC BUFFER OVERFLOW DETECTOR FOR OPENCL KERNELS CHRIS ERB, JOE GREATHOUSE, MAY 16, 2018

Driver Options in AMD Radeon Pro Settings. User Guide

Overview Implicit Vectorisation Explicit Vectorisation Data Alignment Summary. Vectorisation. James Briggs. 1 COSMOS DiRAC.

Changing your Driver Options with Radeon Pro Settings. Quick Start User Guide v2.1

Performance Tools and Environments Carlo Nardone. Technical Systems Ambassador GSO Client Solutions

Overview of a Compiler

Overview of a Compiler

Nios II Embedded Design Suite Release Notes

Family 15h Models 10h-1Fh AMD Athlon Processor Product Data Sheet

Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements

Changing your Driver Options with Radeon Pro Settings. Quick Start User Guide v3.0

Introduction to Compilers

Oracle Developer Studio 12.6

オープンソ プンソース技術者のための AMD 最新テクノロジーアップデート 日本 AMD 株式会社 マーケティング ビジネス開発本部 エンタープライズプロダクトマーケティング部 山野 洋幸

About these Release Notes. Documentation Accessibility. New Features in Pro*COBOL

Optimization Prof. James L. Frankel Harvard University

IA-64 Compiler Technology

The Cray XT Compilers

Transcription:

AMD S X86 OPEN64 COMPILER Michael Lai AMD

CONTENTS Brief History AMD and Open64 Compiler Overview Major Components of Compiler Important Optimizations Recent Releases Performance Applications and Libraries Heterogeneous Computing More Information 3 AMD s x86 Open64 Compiler June 2011

BRIEF HISTORY Started as SGI MIPSpro/Pro64 Compiler in the 1990 s Open sourced in 2000 as Pro64 Compiler; later renamed to Open64 Compiler Has been re-targeted to many architectures (MIPS, IA-64, x86-64, ARM, ) Popular among industry and academia; used for both production and research Open64 Steering Group (with members from industry and universities) Major contributors include: AMD, Intel, HP, PathScale, Tsinghua University, Chinese Academy of Sciences, University of Houston, University of Delaware, SimpLight, 4 AMD s x86 Open64 Compiler June 2011

AMD AND OPEN64 AMD s x86 Open64 Compiler: Pull down from www.open64.net (leverage open source community) Work on bug fixes, new development and infrastructure, advanced optimizations Keep in sync with www.open64.net Check changes back into www.open64.net (contribute to open source community) http://developer.amd.com: First AMD release was version 4.2.2 in April 2009 Most recent AMD release was version 4.2.5 in April 2011 Active participant in the open source community: Member of the Open64 Steering Group (OSG) Many AMD global and local gatekeepers (design and code discussions and reviews) Release management and testing Present at workshops, tutorials, forums 5 AMD s x86 Open64 Compiler June 2011

COMPILER OVERVIEW Language standards ANSI C99, ISO C++98 Conforms to ISO/IEC 9899: 1999, Programming Languages C standard Conforms to ISO/IEC 14882: 1998(E), Programming Languages C++ standard Compatible with gcc Fortran 77, 90, 95 Conforms to ISO/IEC 1539-1: 1997, Programming Languages Fortran Inter-language calling support IEEE 754 floating point support OpenMP 2.5 for shared memory systems Platform highlights x86 32-bit and x86 64-bit code generation Large file support on 32-bit systems Vector and scalar SSE* code generation AVX, XOP, FMA4 code generation Optimized C/C++ and math libraries Optimized AMD Core Math Library (ACML) MPICH2 for distributed and shared memory systems 6 AMD s x86 Open64 Compiler June 2011

COMPILER OVERVIEW Global optimizations, e.g. Partial redundancy elimination Constant propagation and code motion Strength reduction and expression simplification Dead code elimination and common subexpression elimination Loop-nest optimizations, e.g. Loop fusion and distribution Loop interchange and cache locality optimization Vectorization for SSE*/AVX code generation Software prefetching Code generation and optimizations, e.g. Advanced register allocation Loop unrolling, peephole optimizations Instruction selection and scheduling Feedback-directed optimizations, e.g. Code layout Function inlining and de-virtualization Register allocation Value specialization Interprocedural analyses and optimizations, e.g. Function inlining and cloning Alias analysis Data re-layout optimizations for structures and arrays Constant propagation and dead code elimination Multi-core scalability optimizations OpenMP support and automatic parallelization 7 AMD s x86 Open64 Compiler June 2011

MAJOR COMPONENTS OF COMPILER Frontend Generates a WHIRL file from each input source file Backend Generates an object file from each WHIRL file Linker Generates an executable file from the object files IPA Pass1: ipl Pass 2: ipa_link 8 AMD s x86 Open64 Compiler June 2011

source source source frontend frontend frontend WHIRL WHIRL WHIRL backend backend backend.o.o.o linker a.out 9 AMD s x86 Open64 Compiler June 2011

source source source frontend frontend frontend WHIRL WHIRL WHIRL ipl ipl ipl.o.o.o ipa_link WHIRL backend WHIRL backend.o.o linker a.out 10 AMD s x86 Open64 Compiler June 2011

IMPORTANT OPTIMIZATIONS Backend LNO (loop nest optimization) Traditional loop transformations such as loop blocking, interchange, fusion, distribution Software prefetching Vectorization WOPT (global optimization) Build control flow graphs Data flow analysis Traditional global scalar optimizations such as constant folding, partial redundancy elimination, etc. CG (code generation) Instruction selection and scheduling Machine dependent optimizations such as address mode optimization and peephole optimization Emit instructions for the target machine 11 AMD s x86 Open64 Compiler June 2011

IMPORTANT OPTIMIZATIONS IPA (interprocedural analysis) Pass1: ipl Local analysis Pass 2: ipa_link Whole program analysis Data layout optimizations Function inlining, cloning Constant propagation Dead function elimination Profile feedback directed optimization -fb-create -fb-opt 12 AMD s x86 Open64 Compiler June 2011

RECENT RELEASES Release 4.2.2 (April 2009) Support for 2 MB huge pages Improved loop fusion (proactive loop fusion) and loop unrolling Improved head/tail duplication, if-merging, scalar replacement and constant folding optimizations Improved interprocedural alias analysis Improved partial inlining and inlining of virtual functions More advanced re-layout optimization for structure members Improved instruction selection and instruction scheduling Improved tuning of library functions 13 AMD s x86 Open64 Compiler June 2011

RECENT RELEASES Release 4.2.3 (December 2009) Improved interprocedural analysis to include structure array copy optimization and array remapping optimization Improved loop optimizations: loop unrolling, loop unroll and jam, triangular loops, proactive loop interchange, loop distribution, loop peeling Improved redundancy elimination optimizations for stores and memory initialization; better integration of reassociation and common subexpression elimination; enhanced expression factorization Improved instruction selection and addressing code generation Improved vectorization Extended prefetching to include arrays with inductive base addresses Enhanced loop multi-versioning Improved OpenMP and auto-parallelization code generation Improved tuning of OpenMP and parallel runtime library functions Introduced advanced optimizations to improve scalability/bandwidth utilization of multi-core processors (-mso) 14 AMD s x86 Open64 Compiler June 2011

RECENT RELEASES Release 4.2.4 (June 2010) Improved function inlining heuristics and enhanced inline expansion of library functions Enhanced framework for multi-versioning Improved inductive expression simplification and if-merging optimization Improved code generation for the % operator Improved interprocedural analysis for indirect function calls, virtual functions, and functions with "noreturn" attribute Optimized exception handling Optimized processing of Fortran 90 temporary arrays Improved processor affinity mapping in the OpenMP and parallel runtime library Added support for 1 GB huge pages 15 AMD s x86 Open64 Compiler June 2011

RECENT RELEASES Release 4.2.5 (March 2011) Optimized code generation for the new AMD Opteron Family 15h processors ("Bulldozer" core) (including instruction groups SSE*, AVX, XOP, FMA4) (-march=bdver1) Support for iso_c_binding, a Fortran 2003 feature Enhanced framework to support better vectorization Improved vectorization for outer loops and loops containing conditionals Enhanced framework to support better aliasing Modified -O3 to enable more powerful floating-point optimizations by default Improved compatibility with newer versions of gcc for function prototype definitions under OpenMP Compiler build infrastructure enhanced to be similar to other linux application builds involving configure, make and make install Incremental improvements to many generic optimizations such as loop fusion, dead code elimination, if merging, if conversion, function inlining, register pressure tuning, structure splitting, etc. Incremental improvements for C++ applications such as function de-virtualization, exception handling, etc. General correctness improvements including bug fixes for problems in Fortran intrinsics, Fortran frontend, Fortran I/O, x86 alignment, OpenMP General improvements to reduce the compilation times of large C++/Fortran applications 16 AMD s x86 Open64 Compiler June 2011

PERFORMANCE Used in benchmark submission, for example: HP Dell IBM Sun (Oracle ) SGI Performance on AMD platforms: Best performing compiler Both integer and floating point benchmark suites Performance on Intel platforms: Among the best performing compilers Both integer and floating point benchmark suites 17 AMD s x86 Open64 Compiler June 2011

APPLICATIONS AND LIBRARIES Libraries and utilities, for example: ACML (Fortran) BLAST (C/C++) Charm++ (C++) CLHEP (C++) FFTW (C) Goto BLAS (Fortran) MPICH/MPICH2 (Fortran, C/C++) NetCDF (Fortran, C/C++) LAM/MPI (Fortran, C/C++) OpenMPI (Fortran, C/C++) GSL (C/C++) 18 AMD s x86 Open64 Compiler June 2011

APPLICATIONS AND LIBRARIES Large applications, for example: GEANT4 (C/C++) GROMACS (Fortran, C/C++) NAMD (C/C++) NWChem (Fortran, C/C++) POP (Fortran) POV-Ray (C++) WRF (Fortran) Benchmarks, for example: HPCC (Fortran, C/C++) SPEC CPU2006 (Fortran, C/C++) SPEC OMP2001 (Fortran, C/C++) 19 AMD s x86 Open64 Compiler June 2011

HETEROGENEOUS COMPUTING Existing optimizations Vectorization, register allocation, IPA, New types of optimizations, for example: Pointer class analysis Variance analysis Multi-versioning Framework and infrastructure already present 20 AMD s x86 Open64 Compiler June 2011

MORE INFORMATION http://developer.amd.com Downloads Source code and binaries Documentation Quick reference guide User s guide and developer s guide White papers and videos Knowledge base articles Support Online help Forum AMD Developer Central Help Request 21 AMD s x86 Open64 Compiler June 2011

QUESTIONS

Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes. NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners. 2011 Advanced Micro Devices, Inc. All rights reserved. 23 AMD s x86 Open64 Compiler June 2011