Instruction Selection: Peephole Matching. Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

Similar documents
Instruction Selection and Scheduling

Instruction Selection: Preliminaries. Comp 412

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

Agenda. CSE P 501 Compilers. Big Picture. Compiler Organization. Intermediate Representations. IR for Code Generation. CSE P 501 Au05 N-1

Topic 6 Basic Back-End Optimization

CS 406/534 Compiler Construction Instruction Selection and Global Register Allocation

Lecture 12: Compiler Backend

Instruction Selection, II Tree-pattern matching

CSE 504: Compiler Design. Instruction Selection

CS415 Compilers. Intermediate Represeation & Code Generation

Introduction to Compiler

CS415 Compilers Register Allocation. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Code Shape II Expressions & Assignment

CS415 Compilers. Instruction Scheduling and Lexical Analysis

CS 406/534 Compiler Construction Putting It All Together

The ILOC Simulator User Documentation

The ILOC Simulator User Documentation

CS 406/534 Compiler Construction Instruction Scheduling

Implementing Control Flow Constructs Comp 412

The ILOC Simulator User Documentation

CS415 Compilers. Lexical Analysis

CS 314 Principles of Programming Languages Fall 2017 A Compiler and Optimizer for tinyl Due date: Monday, October 23, 11:59pm

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412

Local Register Allocation (critical content for Lab 2) Comp 412

The View from 35,000 Feet

Intermediate Code Generation (ICG)

Local Optimization: Value Numbering The Desert Island Optimization. Comp 412 COMP 412 FALL Chapter 8 in EaC2e. target code

CS5363 Final Review. cs5363 1

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Intermediate Representations

Intermediate Representations

Compiler Design. Code Shaping. Hwansoo Han

Review. Pat Morin COMP 3002

CS 406/534 Compiler Construction Instruction Scheduling beyond Basic Block and Code Generation

Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.

Context-sensitive Analysis. Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

Code Merge. Flow Analysis. bookkeeping

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Code Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End

Lab 3, Tutorial 1 Comp 412

The structure of a compiler

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Compiler Optimisation

Compiler Design. Register Allocation. Hwansoo Han

Intermediate Code & Local Optimizations. Lecture 20

Handling Assignment Comp 412

Optimization. ASU Textbook Chapter 9. Tsan-sheng Hsu.

The Software Stack: From Assembly Language to Machine Code

Parsing II Top-down parsing. Comp 412

Code Optimization. Code Optimization

Grammars. CS434 Lecture 15 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

CS 132 Compiler Construction

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

Combining Optimizations: Sparse Conditional Constant Propagation

Just-In-Time Compilers & Runtime Optimizers

The ILOC Virtual Machine (Lab 1 Background Material) Comp 412

The Processor Memory Hierarchy

Syntax Analysis, III Comp 412

CS153: Compilers Lecture 15: Local Optimization

Alternatives for semantic processing

CST-402(T): Language Processors

Data Structures and Algorithms in Compiler Optimization. Comp314 Lecture Dave Peixotto

An Overview of Compilation

Syntax Analysis, III Comp 412

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

6. Intermediate Representation

Intermediate Code & Local Optimizations

Front End. Hwansoo Han

CS311 Lecture: The Architecture of a Simple Computer

Procedure and Function Calls, Part II. Comp 412 COMP 412 FALL Chapter 6 in EaC2e. target code. source code Front End Optimizer Back End

Runtime Support for Algol-Like Languages Comp 412

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

Overview of a Compiler

Context-sensitive Analysis

Lecture Outline. Intermediate code Intermediate Code & Local Optimizations. Local optimizations. Lecture 14. Next time: global optimizations

Background: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation

Operator Strength Reduc1on

CS 406/534 Compiler Construction Parsing Part I

High-level View of a Compiler

CS 432 Fall Mike Lam, Professor. Compilers. Advanced Systems Elective

Compiler Optimization and Code Generation

Generation of High Performance Domain- Specific Languages from Component Libraries. Ken Kennedy Rice University

CS415 Compilers. Procedure Abstractions. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Compiler Design (40-414)

Class Information ANNOUCEMENTS

Optimization Prof. James L. Frankel Harvard University

Introduction. L25: Modern Compiler Design

Compilers and Interpreters

COMS W4115 Programming Languages and Translators Lecture 21: Code Optimization April 15, 2013

CS 432 Fall Mike Lam, Professor. Code Generation

The MARIE Architecture

5. Semantic Analysis!

CS 432 Fall Mike Lam, Professor. Compilers. Advanced Systems Elective

(just another operator) Ambiguous scalars or aggregates go into memory

Transcription:

Instruction Selection: Peephole Matching Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

The Problem Writing a compiler is a lot of work Would like to reuse components whenever possible Would like to automate construction of components Front End Middle End Back End Today s lecture: Automating Instruction Selection Infrastructure Front end construction is largely automated Middle is largely hand crafted (Parts of back end can be automated

Definitions Instruction selection Mapping IR into assembly code Assumes a fixed storage mapping & code shape Combining operations, using address modes Instruction scheduling Reordering operations to hide latencies Assumes a fixed program (set of operations Changes demand for registers Register allocation Deciding which values will reside in registers Changes the storage mapping, may add false sharing Concerns about placement of data & memory operations

The Problem Modern computers (still have many ways to do anything Consider register-to-register copy Obvious operation is i2i r i r j Many others exist addi r i,0 r j subi r i,0 r j lshifti r i,0 r j multi r i,1 r j divi r i,1 r j rshifti r i,0 r j ori r i,0 r j xori r i,0 r j and others Human would ignore all of these Algorithm must look at all of them & find low-cost encoding Take context into account (busy functional unit? And this is an overly-simplified example

The Goal Want to automate generation of instruction selectors Front End Middle End Back End Infrastructure Machine description Back-end Generator Tables Pattern Matching Engine Description-based retargeting Machine description should also help with scheduling & allocation

The Big Picture Need pattern matching techniques Must produce good code (some metric for good Must run quickly Our treewalk (visitor code generator ran quickly How good was the code? Tree Treewalk Code Desired Code IDENT <a,4> x IDENT <b,8> loadi 4 r 5 loada r 5 r 6 loadi 8 r 7 loada r 7 r 8 mult r 6,r 8 r 9 loadai 4 r 5 loadai 8 r 6 mult r 5,r 6 r 7

The Big Picture Need pattern matching techniques Must produce good code (some metric for good Must run quickly Our treewalk code generator ran quickly How good was the code? Tree Treewalk Code Desired Code IDENT <a,4> x IDENT <b,8> loadi 4 r 5 loada r 5 r 6 loadi 8 r 7 loada r 7 r 8 mult r 6,r 8 r 9 loadai 4 r 5 loadai 8 r 6 mult r 5,r 6 r 7

The Big Picture Need pattern matching techniques Must produce good code (some metric for good Must run quickly Our treewalk code generator ran quickly How good was the code? Tree Treewalk Code Desired Code IDENT <a,4> x NUMBER <2> loadi 4 r 5 loada r 5 r 6 loadi 2 r 7 mult r 6,r 7 r 8 loada 4 r 5 multi r 5,2 r 7

The Big Picture Need pattern matching techniques Must produce good code (some metric for good Must run quickly Our treewalk code generator ran quickly How good was the code? Tree Treewalk Code Desired Code IDENT <a,arp,4> x NUMBER <2> loadi 4 r 5 loada r 5 r 6 loadi 2 r 7 mult r 6,r 7 r 8 loadai 4 r 5 multi r 5,2 r 7 Must combine these This is a nonlocal problem

The Big Picture Need pattern matching techniques Must produce good code (some metric for good Must run quickly Our treewalk code generator ran quickly How good was the code? Tree Treewalk Code Desired Code IDENT <c,@g,4> x IDENT <d,@h,4> loadi @G r 5 loadi 4 r 6 loadao r 5,r 6 r 7 loadi @H r 7 loadi 4 r 8 loadao r 8,r 9 mult r 7, r 11 loadi 4 r 5 loadai r 5,@G r 6 loadai r 5,@H r 7 mult r 6,r 7 r 8

The Big Picture Need pattern matching techniques Must produce good code (some metric for good Must run quickly Our treewalk code generator met the second criteria How did it do on the first? Tree Treewalk Code Desired Code IDENT <c,@g,4> x IDENT <d,@h,4> Common offset loadi @G r 5 loadi 4 r 6 loadao r 5,r 6 r 7 loadi @H r 7 loadi 4 r 8 loadao r 8,r 9 mult r 7, r 11 loadi 4 r 5 loadai r 5,@G r 6 loadai r 5,@H r 7 mult r 6,r 7 r 8 Again, a nonlocal problem

How do we perform this kind of matching? Tree-oriented IR suggests pattern matching on trees Tree-patterns as input, matcher as output Each pattern maps to a target-machine instruction sequence Use dynamic programming or bottom-up rewrite systems Linear IR suggests using some sort of string matching Strings as input, matcher as output Each string maps to a target-machine instruction sequence Use text matching (Aho-Corasick or peephole matching In practice, both work well; matchers are quite different

Peephole Matching Basic idea Compiler can discover local improvements locally Look at a small set of adjacent operations Move a peephole over code & search for improvement Classic example was store followed by load Original code Improved code storeai r 1 8 storeai r 1 8 loadai 8 i2i r 1

Peephole Matching Basic idea Compiler can discover local improvements locally Look at a small set of adjacent operations Move a peephole over code & search for improvement Classic example was store followed by load Simple algebraic identities Original code Improved code addi r 2,0 r 7 mult r 4,r 7 mult r 4,r 2

Peephole Matching Basic idea Compiler can discover local improvements locally Look at a small set of adjacent operations Move a peephole over code & search for improvement Classic example was store followed by load Simple algebraic identities Jump to a jump Original code Improved code jumpi L 10 L 10 : jumpi L 11 L 10 : jumpi L 11

Peephole Matching Implementing it Early systems used limited set of hand-coded patterns Window size ensured quick processing Modern peephole instruction selectors Break problem into three tasks IR Expander LLIR Simplifier LLIR Matcher ASM IR LLIR LLIR LLIR LLIR ASM Apply symbolic interpretation & simplification systematically

Peephole Matching Expander Turns IR code into a low-level IR (LLIR such as RTL* Operation-by-operation, template-driven rewriting LLIR form includes all direct effects Significant, albeit constant, expansion of size IR Expander LLIR Simplifier LLIR Matcher ASM IR LLIR LLIR LLIR LLIR ASM *RTL = Register transfer language

Peephole Matching Simplifier Looks at LLIR through window and rewrites is Uses forward substitution, algebraic simplification, local constant propagation, and dead-effect elimination Performs local optimization within window IR Expander LLIR Simplifier LLIR Matcher ASM IR LLIR LLIR LLIR LLIR ASM This is the heart of the peephole system Benefit of peephole optimization shows up in this step

Peephole Matching Matcher Compares simplified LLIR against a library of patterns Picks low-cost pattern that captures effects Must preserve LLIR effects, may add new ones (e.g., set cc Generates the assembly code output IR Expander LLIR Simplifier LLIR Matcher ASM IR LLIR LLIR LLIR LLIR ASM

Example Original IR Code OP Arg 1 Arg 2 Result mult 2 Y t 1 sub x t 1 w Expand LLIR Code r 11 @y + r 11 MEM( x + MEM( + MEM(

Example LLIR Code r 11 @y + r 11 MEM( x + MEM( + Simplify MEM(r 0 + @w LLIR Code MEM(r 0 + @y x MEM(r 0 + @x MEM(

Example MEM(r 0 + @w LLIR Code MEM(r 0 + @y x MEM(r 0 + @x Match ILOC Code loadai r 0,@y multi 2 x loadai r 0,@x sub storeai,@w Introduced all memory operations & temporary names Turned out pretty good code

Steps of the Simplifier (3-operation window LLIR Code r 11 @y + r 11 MEM( x + MEM( + MEM(

Steps of the Simplifier (3-operation window LLIR Code r 11 @y + r 11 MEM( x + MEM( + r 11 @y + r 11 MEM(

Steps of the Simplifier (3-operation window LLIR Code r 11 @y + r 11 MEM( x + MEM( + MEM( r 11 @y + r 11 + @y MEM(

Steps of the Simplifier (3-operation window LLIR Code r 11 @y + r 11 MEM( x + MEM( + MEM( + @y MEM( MEM(r 0 + @y x

Steps of the Simplifier (3-operation window LLIR Code r 11 @y + r 11 MEM( x + MEM( + MEM(r 0 + @y x 1 st op rolling out of window MEM(r 0 + @y x MEM(

Steps of the Simplifier (3-operation window LLIR Code r 11 @y + r 11 MEM( x + MEM( + MEM( MEM(r 0 + @y x x +

Steps of the Simplifier (3-operation window LLIR Code r 11 @y + r 11 MEM( x + MEM( + MEM( x + x + @x MEM(

Steps of the Simplifier (3-operation window LLIR Code r 11 @y + r 11 MEM( x + MEM( + MEM( x + @x MEM( x MEM(r 0 +@x

Steps of the Simplifier (3-operation window LLIR Code r 11 @y + r 11 MEM( x + MEM( + MEM( x MEM(r 0 +@x MEM(r 0 +@x

Steps of the Simplifier (3-operation window LLIR Code r 11 @y + r 11 MEM( x + MEM( + MEM( MEM(r 0 +@x +

Steps of the Simplifier (3-operation window LLIR Code r 11 @y + r 11 MEM( x + MEM( + MEM( + + @w MEM(

Steps of the Simplifier (3-operation window LLIR Code r 11 @y + r 11 MEM( x + MEM( + MEM( + @w MEM( MEM(r 0 + @w

Steps of the Simplifier (3-operation window LLIR Code r 11 @y + r 11 MEM( x + MEM( + MEM(r 0 + @w MEM(

Example LLIR Code r 11 @y + r 11 MEM( x + MEM( + Simplify MEM(r 0 + @w LLIR Code MEM(r 0 + @y x MEM(r 0 + @x MEM(

Making It All Work Details LLIR is largely machine independent (RTL Target machine described as LLIR ASM pattern Actual pattern matching Use a hand-coded pattern matcher (gcc Turn patterns into grammar & use LR parser Several important compilers use this technology It seems to produce good portable instruction selectors (VPO Key strength appears to be late low-level optimization

Next lecure Instruction selection Tree-based pattern matching