Code Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End

Similar documents
Code Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End

CS415 Compilers. Intermediate Represeation & Code Generation

Code Shape II Expressions & Assignment

Instruction Selection: Preliminaries. Comp 412

Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.

Compiler Design. Code Shaping. Hwansoo Han

Intermediate Representations

Handling Assignment Comp 412

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Local Optimization: Value Numbering The Desert Island Optimization. Comp 412 COMP 412 FALL Chapter 8 in EaC2e. target code

Intermediate Representations

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Implementing Control Flow Constructs Comp 412

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412

Naming in OOLs and Storage Layout Comp 412

The Software Stack: From Assembly Language to Machine Code

Local Register Allocation (critical content for Lab 2) Comp 412

Parsing II Top-down parsing. Comp 412

Instruction Selection, II Tree-pattern matching

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

CS 406/534 Compiler Construction Instruction Scheduling beyond Basic Block and Code Generation

Introduction to Optimization Local Value Numbering

The Processor Memory Hierarchy

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

Syntax Analysis, VII One more LR(1) example, plus some more stuff. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. target code.

Procedure and Function Calls, Part II. Comp 412 COMP 412 FALL Chapter 6 in EaC2e. target code. source code Front End Optimizer Back End

CS415 Compilers Register Allocation. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Computing Inside The Parser Syntax-Directed Translation. Comp 412

The ILOC Virtual Machine (Lab 1 Background Material) Comp 412

Runtime Support for OOLs Object Records, Code Vectors, Inheritance Comp 412

Optimizer. Defining and preserving the meaning of the program

COMP 181. Prelude. Intermediate representations. Today. High-level IR. Types of IRs. Intermediate representations and code generation

Grammars. CS434 Lecture 15 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Arrays and Functions

CS 406/534 Compiler Construction Putting It All Together

Lab 3, Tutorial 1 Comp 412

Just-In-Time Compilers & Runtime Optimizers

Syntax Analysis, III Comp 412

Global Register Allocation via Graph Coloring

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

Runtime Support for Algol-Like Languages Comp 412

Combining Optimizations: Sparse Conditional Constant Propagation

Syntax Analysis, III Comp 412

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Instruction Scheduling Beyond Basic Blocks Extended Basic Blocks, Superblock Cloning, & Traces, with a quick introduction to Dominators.

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java)

Data Structures and Algorithms in Compiler Optimization. Comp314 Lecture Dave Peixotto

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Syntax Analysis, VI Examples from LR Parsing. Comp 412

CS /534 Compiler Construction University of Massachusetts Lowell

Building a Parser Part III

Parsing II Top-down parsing. Comp 412

CS415 Compilers. Procedure Abstractions. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers. Lexical Analysis

CS 432 Fall Mike Lam, Professor. Compilers. Advanced Systems Elective

Instruction Selection: Peephole Matching. Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

Engineering a Compiler

CS 406/534 Compiler Construction Instruction Scheduling

CS 432 Fall Mike Lam, Professor. Code Generation

CS 432 Fall Mike Lam, Professor. Compilers. Advanced Systems Elective

Intermediate Representations

Parsing Part II (Top-down parsing, left-recursion removal)

ECE573 Introduction to Compilers & Translators

Intermediate Representations Part II

CSE 504: Compiler Design. Instruction Selection

Lecture 12: Compiler Backend

CS415 Compilers. Instruction Scheduling and Lexical Analysis

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Compilers and Code Optimization EDOARDO FUSELLA

Implementing Object-Oriented Languages - 2

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

CS 406/534 Compiler Construction Instruction Selection and Global Register Allocation

Loop Invariant Code Mo0on

Agenda. CSE P 501 Compilers. Big Picture. Compiler Organization. Intermediate Representations. IR for Code Generation. CSE P 501 Au05 N-1

The ILOC Simulator User Documentation

Intermediate Representations & Symbol Tables

DEMO A Language for Practice Implementation Comp 506, Spring 2018

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

CS 406/534 Compiler Construction Intermediate Representation and Procedure Abstraction

The So'ware Stack: From Assembly Language to Machine Code

Compiler Optimisation

The ILOC Simulator User Documentation

UG3 Compiling Techniques Overview of the Course

The ILOC Simulator User Documentation

Lecture 4: ISA Tradeoffs (Continued) and Single-Cycle Microarchitectures

Compiler Theory Introduction and Course Outline Sandro Spina Department of Computer Science

CSCI 565 Compiler Design and Implementation Spring 2014

The Procedure Abstraction

CS 406/534 Compiler Construction Parsing Part I

Chapter 2A Instructions: Language of the Computer

Compiler Code Generation COMP360

ECE 486/586. Computer Architecture. Lecture # 7

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

Acknowledgement. CS Compiler Design. Intermediate representations. Intermediate representations. Semantic Analysis - IR Generation

Introduction to Parsing. Comp 412

Instruction Set II. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 7. Instruction Set. Quiz. What is an Instruction Set?

The View from 35,000 Feet

Transcription:

COMP 412 FALL 2017 Code Shape Comp 412 source code IR IR target Front End Optimizer Back End code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. Chapters 4, 5, 6 & 7 in EaC2e

Code Generation In a modern, multi-ir compiler, code generation happens several times SDT Lowering Passes source code Front End IR 1 IR 2 IR 3 target ST Optimizer ST Optimizer ST Back End code Generate an IR straight out of the parser Might be an AST, might be some high-level (abstract) linear form Almost always accompanied by a symbol table of some form Code passes through one or more lowering pass Takes the code and decreases ( lowers ) the level of abstraction Expand complex operations (e.g., call or mvcl), make control-flow explicit The problems are, essentially, the same The mechanisms are, likely, quite different COMP 412, Fall 2017 1

Mechanism versus Content In the lectures on SDT, we focused on how to generate code In the lectures on code shape, we will focus on what code to generate Different instruction sequences for the same source code produce significantly different performance When two sequences produce the same result, choose the fastest one The compiler makes fundamental choices that are hard to change Tradeoffs in performance often involve tradeoffs in generality Code that handles the general case is robust and, often, slower Code tailored to specific cases is more brittle and, often, faster Code optimization is, to a large extent, the art of capitalizing on context COMP 412, Fall 2017 2

Performance versus Generality Consider copying characters from src to dst char src[256], dst[256]; int len, i; if (len < 256) { i = 0; while(i < len) { dst[i] = src[i]; i; } } Generic Version len loads & stores char src[256], dst[256]; int64 *p, * q; int len,i; if (len < 256 && len % 8 == 0) { i = 0; p = (int64 *) &src[0]; q = (int64 *) &dst[0]; while(i < len) { *p = *q; i = 8; } } Aligned Strings of 8x bytes (len/8) loads & stores Now, imagine the version that uses 8 byte loads as long as it can, then gets that last ( <7) bytes with 4, 2, & 1 byte loads. (as long as the strings do not overlap and start on an aligned boundary) Context: Declarations ensure that src and dst force the necessary storage alignment. COMP Separate 412, declarations Fall 2017 mean that src and dst do not overlap. 3

Code Shape (Chapter 7) Definition All those nebulous properties of the code that affect performance Includes code, approach for different constructs, cost, storage requirements & mapping, & choice of operations Code shape is the end product of many decisions (big & small) Impact Code shape influences algorithm choice & results Code shape can encode important facts, or hide them Rule of thumb: expose as much derived information as possible Example: explicit branch targets in ILOC simplify analysis Example: hierarchy of memory operations in ILOC (EaC2e, p 251) COMP See 412, Bob Fall 2017 Morgan s book for more ILOC examples [268 in EaC2e Bibliography] 4

Code Shape My favorite code shape example x x y z y z x y t1 t1 z t2 What if x is 2 and z is 3? What if yz is evaluated earlier? x y z x z t1 t1 y t2 y x z y z t1 t1 x t2 x y z The best shape for xyz depends on contextual knowledge There may be several conflicting options Addition is commutative & associative for integers COMP 412, Fall 2017 5

Code Shape Another example implementing the case statement Implement it as cascaded if-then-else statements Cost depends on where your case actually occurs O(number of cases) Implement it as a binary search Need a dense set of conditions to search Uniform (log n) cost Implement it as a jump table Need a compact & computable set of case labels Lookup address in a table & jump to it Uniform (constant) cost Performance depends on order of cases in the final code compiler should reorder based on frequency! Compiler must choose best implementation strategy No amount of massaging or transforming in the optimizer will convert one strategy into another COMP 412, Fall 2017 6

Code Shape Why worry about code shape? Can t we just trust the optimizer and the back end? Optimizer and back end approximate the answers to many hard problems The compiler s individual passes must run quickly It often pays to encode useful information into the IR Shape of an expression or a control structure A value kept in a register rather than in memory Deriving such information may be expensive, when possible Recording it explicitly in the IR is often easier and cheaper A good optimizer can tune the engine, but it cannot make a Volkswagen Beetle into a Porsche COMP 412, Fall 2017 7

Code Shape The best code shape depends on context Context in the code being compiled What are the common subexpressions? the known constants? Do case labels form a compact set? What values in registers are live after a call? Context in the compiler itself In SDT and optimization, shape the code so that it optimizes well In back end, shape the code so that it runs quickly And, again, code shape is a content issue not a mechanism issue Neither case statement nor string copy implementation change between SDT, a tree-walk lowering pass, and a pattern-matching instruction selector Code shape is one of the places where compilation resembles art. COMP 412, Fall 2017 8

Evaluation Order Code Shape & Instruction Scheduling Consider a b c d e f g h c d e f g h t 1 a b t 2 c t 1 t 3 d t 2 t 4 e t 3 t 5 f t 4 t 6 g t 5 t 7 h t 6 Sequential dependences constrain execution order a b Left-associative tree Corresponding code Left-recursive expression grammar (e.g., LR parser) would produce a leftassociative tree COMP Right recursion 412, Fall 2017 and right associative trees have a symmetric problem 9

Evaluation Order Code Shape & Instruction Scheduling Consider a b c d e f g h a b c d e f g h t 1 a b t 2 c d t 3 e f t 4 g h t 5 t 1 t 2 t 6 t 3 t 4 t 7 t 5 t 6 Sequential dependences are much more relaxed Balanced tree Corresponding code No rational parser will produce this tree. It results from an explicit reordering pass. Best approach might be a post-optimization, pre-scheduling pass. See 8.4.2 in EaC2e (pages 428ff ) COMP 412, Fall 2017 10

Evaluation Order Code Shape & Instruction Scheduling Balanced Tree Left-Associative Tree Unit 0 Unit 1 Unit 0 Unit 1 1 t 1 a b t 2 c d 1 t 1 a b nop 2 t 3 e f t 4 g h 2 t 2 t 1 c nop 3 t 5 t 1 t 2 t 6 t 3 t 4 3 t 3 t 2 d nop 4 t 7 t 5 t 6 4 t 4 t 3 e nop 5 5 t 5 t 4 f nop 6 6 t 6 t 5 g nop 7 7 t 7 t 6 h nop Schedules for Balanced Tree versus Left-Associative Tree Assumes two functional units Multi-unit COMP 412, parallelism Fall 2017 is called instruction-level parallelism or ILP 11

Evaluation Order Code Shape and Instruction Scheduling However, deep trees may need fewer registers than broad trees c d e f g h t 1 a b t 2 c t 1 t 3 d t 2 t 4 e t 3 t 5 f t 4 t 6 g t 5 t 7 h t 6 a b c d e f g h t 1 a b t 2 c d t 3 e f t 4 g h t 5 t 1 t 2 t 6 t 3 t 4 t 7 t 5 t 6 a b All of these VRs can become the same PR These VRs need multiple PRs. Deep trees may lead to less spilling Broad trees may lead to more instruction-level parallelism As with most code shape decisions, there is not a simple answer COMP 412, Fall 2017 12

Evaluation Order Evaluation Order Order of evaluation for expression trees The compiler can use commutativity & associativity to improve code This problem is truly hard Commuting operands at a single operation is much easier 1 st operand must be preserved while 2 nd is evaluated Takes an extra register for 2 nd operand Evaluate the more demanding subtree first This is a local strategy, guided by the actual operations and the actual flow of values (or dependences) in the current tree (Ershov in the 1950 s, Sethi in the 1970 s) Taken to its logical conclusion, this creates Sethi-Ullman scheme for register allocation STOPPED See Sethi & Ullman, [311] in EaC2e COMP See Floyd s 412, 1961 Fall 2017 CACM paper, [150] in EaC2e, also on the website 13