The SGI Pro64 Compiler Infrastructure - A Tutorial

Similar documents
AMD S X86 OPEN64 COMPILER. Michael Lai AMD

Code Merge. Flow Analysis. bookkeeping

Introduction. No Optimization. Basic Optimizations. Normal Optimizations. Advanced Optimizations. Inter-Procedural Optimizations

Partial Redundancy Elimination and SSA Form

Overview of a Compiler

Overview of a Compiler

Topic I (d): Static Single Assignment Form (SSA)

Open64 Introduction. Yulei Sui. February 3, 2010

Overview of a Compiler

IA-64 Compiler Technology

OpenUH Compiler Suite

Topic 6 Basic Back-End Optimization

AMD DEVELOPER INSIDE TRACK

TOWARD A SOFTWARE PIPELINING FRAMEWORK FOR MANY-CORE CHIPS. by Juergen Ributzka

Just-In-Time Compilers & Runtime Optimizers

Static Single Assignment Form in the COINS Compiler Infrastructure - Current Status and Background -

Introduction to Machine-Independent Optimizations - 1

C6000 Compiler Roadmap

Loop Nest Optimizer of GCC. Sebastian Pop. Avgust, 2006

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Using Cache Models and Empirical Search in Automatic Tuning of Applications. Apan Qasem Ken Kennedy John Mellor-Crummey Rice University Houston, TX

OpenMP 3.0 Tasking Implementation in OpenUH

Preface. Intel Technology Journal Q4, Lin Chao Editor Intel Technology Journal

Compiling Java For High Performance on Servers

Lazy Code Motion. Jens Knoop FernUniversität Hagen. Oliver Rüthing University of Dortmund. Bernhard Steffen University of Dortmund

MIPSpro Auto-Parallelizing Option Programmer s Guide

Outline. Speculative Register Promotion Using Advanced Load Address Table (ALAT) Motivation. Motivation Example. Motivation

CS 701. Class Meets. Instructor. Teaching Assistant. Key Dates. Charles N. Fischer. Fall Tuesdays & Thursdays, 11:00 12: Engineering Hall

The GNU Compiler Collection

LLVM & LLVM Bitcode Introduction

Advanced Compiler Design ( ) Fall Semester Project Proposal. Out: Oct 4, 2017 Due: Oct 11, 2017 (Revisions: Oct 18, 2017)

HPC with PGI and Scalasca

Polyhedral Optimizations of Explicitly Parallel Programs

Complementing Software Pipelining with Software Thread Integration

Polly Polyhedral Optimizations for LLVM

CprE 488 Embedded Systems Design. Lecture 6 Software Optimization

Traced Based Dependence Analysis for Speculative Loop Optimizations. Ravi Ramaseshan

Supercomputing in Plain English Part IV: Henry Neeman, Director

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

Tiling: A Data Locality Optimizing Algorithm

Measuring the User Debugging Experience. Greg Bedwell Sony Interactive Entertainment

Tour of common optimizations

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Performance Cockpit: An Extensible GUI Platform for Performance Tools

SYZYGY A Framework for Scalable Cross-Module IPO

Clang - the C, C++ Compiler

A Characterization of Shared Data Access Patterns in UPC Programs

ProfileMe: Hardware-Support for Instruction-Level Profiling on Out-of-Order Processors

Performance Tools and Environments Carlo Nardone. Technical Systems Ambassador GSO Client Solutions

SSA Construction. Daniel Grund & Sebastian Hack. CC Winter Term 09/10. Saarland University

CS 406/534 Compiler Construction Putting It All Together

Usually, target code is semantically equivalent to source code, but not always!

CS553 Lecture Profile-Guided Optimizations 3

ThinLTO. A Fine-Grained Demand-Driven Infrastructure. Teresa Johnson, Xinliang David Li

MULTI-CORE PROGRAMMING. Dongrui She December 9, 2010 ASSIGNMENT

Randomized Stress-Testing of Link-Time Optimizers

Accelerating Ruby with LLVM

WRF performance on Intel Processors

Title: ====== Open Research Compiler (ORC): Proliferation of Technologies and Tools

Chapter 4: Threads. Chapter 4: Threads

Lecture 3 Overview of the LLVM Compiler

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

Loop Transformations! Part II!

Experiences Developing the OpenUH Compiler and Runtime Infrastructure

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Improving Error Checking and Unsafe Optimizations using Software Speculation. Kirk Kelsey and Chen Ding University of Rochester

A Compiler Framework for Speculative Optimizations

Compilation for Heterogeneous Platforms

ECE 5775 (Fall 17) High-Level Digital Design Automation. Static Single Assignment

Intermediate Code & Local Optimizations

Introduction to Machine-Independent Optimizations - 6

Static Single Assignment Form in the COINS Compiler Infrastructure Current Status and Background

Office Hours: Mon/Wed 3:30-4:30 GDC Office Hours: Tue 3:30-4:30 Thu 3:30-4:30 GDC 5.

Plugin Mechanisms in GCC

Multigrain Parallelism: Bridging Coarse- Grain Parallel Languages and Fine-Grain Event-Driven Multithreading

POSH: A TLS Compiler that Exploits Program Structure

Polyèdres et compilation

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

A Smart Fuzzer for x86 Executables

Code optimization with the IBM XL compilers on Power architectures IBM

Compiler Options. Linux/x86 Performance Practical,

A Compiler Framework for Recovery Code Generation in General Speculative Optimizations

Using GCC as a Research Compiler

Background: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation

Intel C++ Compiler Professional Edition 11.0 for Linux* In-Depth

Intel C++ Compiler Professional Edition 11.1 for Linux* In-Depth

OpenUH: An Optimizing, Portable OpenMP Compiler

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

Chapter 4: Threads. Operating System Concepts 9 th Edit9on

OPERATING SYSTEM. Chapter 4: Threads

Compiling for Performance on hp OpenVMS I64. Doug Gordon Original Presentation by Bill Noyce European Technical Update Days, 2005

Recap. Practical Compiling for Modern Machines (Special Topics in Programming Languages)

No Time to Read This Book?

Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world.

Barbara Chapman, Gabriele Jost, Ruud van der Pas

Introduction to Compilers

Intermediate Representations & Symbol Tables

Native Computing and Optimization. Hang Liu December 4 th, 2013

MIPSpro TM C and C++ Pragmas

Mike Martell - Staff Software Engineer David Mackay - Applications Analysis Leader Software Performance Lab Intel Corporation

Transcription:

The SGI Pro64 Compiler Infrastructure - A Tutorial Guang R. Gao (U of Delaware) J. Dehnert (SGI) J. N. Amaral (U of Alberta) R. Towle (SGI)

Acknowledgement The SGI Compiler Development Teams The MIPSpro/Pro64 Development Team University of Delaware CAPSL Compiler Team These individuals contributed directly to this tutorial A. Douillet (Udel) F. Chow (Equator) S. Chan (Intel) W. Ho (Routefree) Z. Hu (Udel) K. Lesniak (SGI) S. Liu (HP) R. Lo (Routefree) S. Mantripragada (SGI) C. Murthy (SGI) M. Murphy (SGI) G. Pirocanac (SGI) D. Stephenson (SGI) D. Whitney (SGI) H. Yang (Udel) /Gao/Pro64-Intro 2

What is Pro64? A suite of optimizing compiler tools for Linux/ Intel IA-64 systems C, C++ and Fortran90/95 compilers Conforming to the IA-64 Linux ABI and API standards Open to all researchers/developers in the community Compatible with HP Native User Environment /Gao/Pro64-Intro 3

Who Might Want to Use Pro64? Researchers : test new compiler analysis and optimization algorithms Developers : retarget to another architecture/system Educators : a compiler teaching platform /Gao/Pro64-Intro 4

Outline Background and Motivation Part I: An overview of the SGI Pro64 compiler infrastructure Part II: The Pro64 code generator design Part III: Using Pro64 in compiler research & development SGI Pro64 support Summary /Gao/Pro64-Intro 5

PART I: Overview of the Pro64 Compiler /Gao/Pro64-Intro 6

Outline Logical compilation model and component flow WHIRL Intermediate Representation Inter-Procedural Analysis (IPA) Loop Nest Optimizer (LNO) and Parallelization Global optimization (WOPT) Feedback Design for debugability and testability /Gao/Pro64-Intro 7

Logical Compilation Model driver (sgicc/sgif90/sgicc) front end + IPA (gfec/gfecc/mfef90) back end (be, as) linker (ld) Src (.c/.c/.f) WHIRL (.B/.I) obj (.o) a.out/.so Data Path Fork and Exec /Gao/Pro64-Intro 8

Components of Pro64 Front end Interprocedural Analysis and Optimization Loop Nest Optimization and Parallelization Global Optimization Code Generation /Gao/Pro64-Intro 9

Data Flow Relationship Between Modules Lower to High W..B -IPA Local IPA Inliner Main IPA -O 3 LNO gfec gfecc f90.i lower I/O (only for f90) WHIRL C.w2c.c.w2c.h Take either path Very high WHIRL High WHIRL Mid WHIRL -O 0 -phase: w=off -O 2 /O 3 Lower all Lower Mid W WHIRL fortran Main opt CG.w2f.f Low WHIRL /Gao/Pro64-Intro 10

Front Ends C front end based on gcc C++ front end based on g++ Fortran90/95 front end from MIPSpro /Gao/Pro64-Intro 11

Intermediate Representation IR is called WHIRL Tree structured, with references to symbol table Maps used for local or sparse annotation Common interface between components Multiple languages, multiple targets Same IR, 5 levels of representation Continuous lowering during compilation Optimization strategy tied to level /Gao/Pro64-Intro 12

IPA Main Stage Analysis alias analysis array section code layout Optimization (fully integrated) inlining cloning dead function and variable elimination constant propagation /Gao/Pro64-Intro 13

IPA Design Features User transparent No makefile changes Handles DSOs, unanalyzed objects Provide info (e.g. alias analysis, procedure properties) smoothly to: loop nest optimizer main optimizer code generator /Gao/Pro64-Intro 14

Loop Nest Optimizer/Parallelizer All languages (including OpenMP) Loop level dependence analysis Uniprocessor loop level transformations Automatic parallelization /Gao/Pro64-Intro 15

Loop Level Transformations Based on unified cost model Heuristics integrated with software pipelining Loop vector dependency info passed to CG Loop Fission Loop Fusion Loop Unroll and Jam Loop Interchange Loop Peeling Loop Tiling Vector Data Prefetching /Gao/Pro64-Intro 16

Parallelization Automatic Array privatization Doacross parallelization Array section analysis Directive based OpenMP Integrated with automatic methods /Gao/Pro64-Intro 17

Global Optimization Phase SSA is unifying technology Use only SSA as program representation All traditional global optimizations implemented Every optimization preserves SSA form Can reapply each optimization as needed /Gao/Pro64-Intro 18

Pro64 Extensions to SSA Representing aliases and indirect memory operations (Chow et al, CC 96) Integrated partial redundancy elimination (Chow et al, PLDI 97; Kennedy et al, CC 98, TOPLAS 99) Support for speculative code motion Register promotion via load and store placement (Lo et al, PLDI 98) /Gao/Pro64-Intro 19

Feedback Used throughout the compiler Instrumentation can be added at any stage Explicit instrumentation data incorporated where inserted Instrumentation data maintained and checked for consistency through program transformations. /Gao/Pro64-Intro 20

Design for Debugability (DFD) and Testability (DFT) DFD and DFT built-in from start Can build with extra validity checks Simple option specification used to: Substitute components known to be good Enable/disable full components or specific optimizations Invoke alternative heuristics Trace individual phases /Gao/Pro64-Intro 21

Where to Obtain Pro64 Compiler and its Support SGI Source download http://oss.sgi.com/projects/pro64/ University of Delaware Pro64 Support Group http://www.capsl.udel.edu/~pro64 pro64@capsl.udel.edu /Gao/Pro64-Intro 22