Tutorial: Building a backend in 24 hours. Anton Korobeynikov

Similar documents
Tutorial: Building a backend in 24 hours. Anton Korobeynikov

15-411: LLVM. Jan Hoffmann. Substantial portions courtesy of Deby Katz

LLVM MC in Practice. Jim Grosbach and Owen Anderson

Welcome to the Back End: The LLVM Machine Representation. Matthias Braun, Apple

LLVM, Clang and Embedded Linux Systems. Bruno Cardoso Lopes University of Campinas

Assignment 1c: Compiler organization and backend programming

FTL WebKit s LLVM based JIT

Apple LLVM GPU Compiler: Embedded Dragons. Charu Chandrasekaran, Apple Marcello Maggioni, Apple

Lecture 3 Overview of the LLVM Compiler

Targeting LLVM IR. LLVM IR, code emission, assignment 4

Object Code Emission & llvm-mc

MoviCompile : An LLVM based compiler for heterogeneous SIMD code generation Diken, E.; Jordans, R.; O'Riordan, M.

Challenges of mixed-width vector code generation and static scheduling in LLVM (for VLIW Architectures)

Intermediate Code & Local Optimizations

Lecture 2 Overview of the LLVM Compiler

Course Administration

Writing LLVM Backends Construindo Backends para o LLVM. Bruno Cardoso Lopes

MIPS Functions and Instruction Formats

The structure of LLVM backends

Translating JVM Code to MIPS Code 1 / 43

GlobalISel LLVM s Latest Instruction Selection Framework

ECE 5775 (Fall 17) High-Level Digital Design Automation. Static Single Assignment

Intermediate Code Generation (ICG)

Instruction Set Principles and Examples. Appendix B

Practical Malware Analysis

Register Allocation. Global Register Allocation Webs and Graph Coloring Node Splitting and Other Transformations

Instructions: Language of the Computer

IRIX is moving in the n32 direction, and n32 is now the default, but the toolchain still supports o32. When we started supporting native mode o32 was

Please refer to the turn-in procedure document on the website for instructions on the turn-in procedure.

Lecture 3 Overview of the LLVM Compiler

Processor design - MIPS

Getting started. Roel Jordans

CODE GENERATION AND ADAPTIVE CONTROL DIVERGENCE MANAGEMENT FOR LIGHT WEIGHT SIMT PROCESSORS

Computer Science and Engineering 331. Midterm Examination #1. Fall Name: Solutions S.S.#:

Chapter 1. Computer Abstractions and Technology. Lesson 3: Understanding Performance

CS356: Discussion #6 Assembly Procedures and Arrays. Marco Paolieri

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006

Adventures in Fuzzing Instruction Selection. 1 EuroLLVM 2017 Justin Bogner

LECTURE 19. Subroutines and Parameter Passing

Building Binary Optimizer with LLVM

Intermediate Representations

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine

Low-Level Issues. Register Allocation. Last lecture! Liveness analysis! Register allocation. ! More register allocation. ! Instruction scheduling

G Programming Languages - Fall 2012

Memory Usage 0x7fffffff. stack. dynamic data. static data 0x Code Reserved 0x x A software convention

Accelerating Ruby with LLVM

Lecture 2. Instructions: Language of the Computer (Chapter 2 of the textbook)

Optimization Prof. James L. Frankel Harvard University

LLVM LEON Backend Final Presentation

The structure of a compiler

Lecture 4: Instruction Set Architecture

Register Allocation. Introduction Local Register Allocators

Instruction Set Architectures Part I: From C to MIPS. Readings:

Connecting the EDG front-end to LLVM. Renato Golin, Evzen Muller, Jim MacArthur, Al Grant ARM Ltd.

Math 230 Assembly Programming (AKA Computer Organization) Spring 2008

Compiler Design. Homework 1. Due Date: Thursday, January 19, 2006, 2:00

Code Generation. Lecture 19

Generation System Advanced Seminar on Compilation and Run-time Systems for Object-Oriented Languages. David Bélanger

Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world.

Intermediate Representations

CS415 Compilers. Intermediate Represeation & Code Generation

Instructions: Language of the Computer

a translator to convert your AST representation to a TAC intermediate representation; and

SSA Construction. Daniel Grund & Sebastian Hack. CC Winter Term 09/10. Saarland University

Announcements. My office hours are today in Gates 160 from 1PM-3PM. Programming Project 3 checkpoint due tomorrow night at 11:59PM.

CNIT 127: Exploit Development. Ch 1: Before you begin. Updated

System Software Assignment 1 Runtime Support for Procedures

T-EMU 2.0: The Next Generation LLVM Based Micro-Processor Emulator. Dr. Mattias Holm

Measuring the User Debugging Experience. Greg Bedwell Sony Interactive Entertainment

Intermediate Representation (IR)

COSC 6385 Computer Architecture. Instruction Set Architectures

55:132/22C:160, HPCA Spring 2011

Advanced C Programming

Hello I am Zheng, Hongbin today I will introduce our LLVM based HLS framework, which is built upon the Target Independent Code Generator.

COE608: Computer Organization and Architecture

CS 61c: Great Ideas in Computer Architecture

The X86 Assembly Language Instruction Nop Means

ELEC / Computer Architecture and Design Fall 2013 Instruction Set Architecture (Chapter 2)

Compilers and Code Optimization EDOARDO FUSELLA

LLVM IR Code Generations Inside YACC. Li-Wei Kuo

llvmc2 - New LLVM Compiler Driver

Chapter 2A Instructions: Language of the Computer

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Compiler Construction: LLVMlite

CS 406/534 Compiler Construction Putting It All Together

Runtime management. CS Compiler Design. The procedure abstraction. The procedure abstraction. Runtime management. V.

Guy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany

Lecture 4: Instruction Set Design/Pipelining

231 Spring Final Exam Name:

Computer Organization MIPS Architecture. Department of Computer Science Missouri University of Science & Technology

Supporting the new IBM z13 mainframe and its SIMD vector unit

Dealing with Register Hierarchies

MIPS and QTSPIM Recap. MIPS Architecture. Cptr280. Dr Curtis Nelson. Cptr280, Autumn

ECE 486/586. Computer Architecture. Lecture # 7

Lectures 3-4: MIPS instructions

CS4617 Computer Architecture

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University

Compiler Optimization

Code Generation. Lecture 12

Agenda. CSE P 501 Compilers. Big Picture. Compiler Organization. Intermediate Representations. IR for Code Generation. CSE P 501 Au05 N-1

Transcription:

Tutorial: Building a backend in 24 hours Anton Korobeynikov anton@korobeynikov.info

Outline 1. From IR to assembler: codegen pipeline 2. MC 3. Parts of a backend 4. Example step-by-step

The Pipeline LLVM IR IR Passes Lower DAG Combine Legalize DAG Combine ISel IR SDAG MI MC Pre-RA RA Post-RA MC Streamers Object File Assembler Binary Code

LLVM IR IR Passes Lower DAG Combine Legalize DAG Combine ISel Pre-RA RA Post-RA MC Streamers Object File Assembler Binary Code

IR Level Passes Why? Some things are easier to do at IR level Simplifies codegen Safer (pass pipeline is much more fixed)

IR Level Passes Why? Some things are easier to do at IR level Simplifies codegen Safer (pass pipeline is much more fixed) What is done? Late opts (LSR, elimination of dead BBs) IR-level lowering: GC, EH, stack protector Custom pre-isel passes CodeGenPrepare

EH Lowering Why? To simplify codegen

EH Lowering Why? To simplify codegen What is done? Lowering of EH intrinsics to unwinding runtime constructs (e.g. sjlj stuff)

CodeGenPrepare Why? To workaround BB-at-a-time codegen

CodeGenPrepare Why? To workaround BB-at-a-time codegen What is done? Addressing mode-related simplifications Inline asm simplification (e.g. bswap patterns) Move debug stuff closer to defs

LLVM IR IR Passes Lower DAG Combine Legalize DAG Combine ISel Pre-RA RA Post-RA MC Streamers Object File Assembler Binary Code

Selection DAG First strictly backend IR Even lower level than LLVM IR Use-def chains + additional stuff to keep things in order Built on per-bb basis

DAG-level Passes Lowering Combine Legalize Combine Instruction Selection

DAG Combiner Optimizations on DAG Close to target Runs twice - before and after legalize Used to cleanup / handle optimization opportunities exposed by targets

DAG Legalization Turn non-legal operations into legal one

DAG Legalization Turn non-legal operations into legal one Examples: Software floating point Scalarization of vectors Widening of funky types (e.g. i42)

Instruction Selection Turns SDAGs into MIs Uses target-defined patters to select instructions and operands Does bunch of magic and crazy pattern-matching Target can provide fast but crude isel for -O0 (fallbacks to standard one if cannot isel something)

LLVM IR IR Passes Lower DAG Combine Legalize DAG Combine ISel Pre-RA RA Post-RA MC Streamers Object File Assembler Binary Code

Machine* Yet another set of IR: MachineInst + MachineBB + MachineFunction Close to target code Pretty explicit: set of impdef regs, basic block live in / live out regs, etc. Used as IR for all post-isel passes

Pre-RA Passes Pre-RA tail duplication PHI optimization MachineLICM, CSE, DCE More peephole opts

Pre-RA Passes Pre-RA tail duplication PHI optimization MachineLICM, CSE, DCE More peephole opts Code is still in SSA form!

Register Allocator Fast Greedy (default) PBQP

Post-RA Passes 1. Prologue / Epilogue Insertion & Abstract Frame Indexes Elimination 2. Branch Folding & Simplification 3. Tail duplication 4. Reg-reg copy propagation 5. Post-RA scheduler 6. BB placement to optimize hot paths

LLVM IR IR Passes Lower DAG Combine Legalize DAG Combine ISel Pre-RA RA Post-RA MC Streamers Object File Assembler Binary Code

Assembler Printing Lower MI-level constructs to MCInst Let MCStreamer decide what to do next: emit assembler, object file or binary code into memory

Customization Target can insert its own passes in specific points in the pipeline (e.g. after isel or before scheduler)

Customization Target can insert its own passes in specific points in the pipeline (e.g. after isel or before scheduler) Examples: IT block formation, load-store optimization on ARM Delay slot filling on MIPS or Sparc

The Backend Standalone library Mixed C++ code + TableGen TableGen is a special DSL used to describe register sets, calling conventions, instruction patterns, etc. Inheritance and overloading are used to augment necessary target bits into target-independent codegen classes

Stub Backend How much code we need to create no-op backend?

Stub Backend How much code we need to create no-op backend? Some decent amount...

Stub Backend How much code we need to create no-op backend? Some decent amount: 15 classes around 1k LOC (both C++ and TableGen)

FooTargetMachine Central class in each backend Glues (almost) all the backend classes Controls the backend pipeline

FooSubtarget Several subtargets inside one target Usually used to model different instruction sets, platform-specific things, etc. Done via subtarget features

FooRegisterInfo Provides various information about register sets: 1. Callee saved regs 2. Reserved (non-allocable) regs 3. Register allocation order 4. Register classes for cross-class copying & coalescing

FooRegisterInfo Provides various information about register sets: 1. Callee saved regs 2. Reserved (non-allocable) regs 3. Register allocation order 4. Register classes for cross-class copying & coalescing Partly autogenerated from FooRegisterInfo.td

FooRegisterInfo.td TableGen description of: 1. Registers, 2. Sub-registers (and aliasing sets for regs) 3. Register classes

FooISelLowering Central class for target-aware lowering Turns target-neutral SelectionDAG in target-aware (suitable for instruction selection) Something can be lowered (albeit not efficiently) in generic way Some cases (e.g. argument lowering) always require custom lowering

FooCallingConv.td Describes the calling convention: 1. What & where & in which order should be passed 2. Not self-containing: used to simplify custom lowering routines 3. Autogenerate set of callee-save registers

FooISelDAGToDAG Does most of instruction selection Most of C++ code is autogenerated from instruction patterns Custom instruction selection code: Complex addressing modes Instructions which require additional care

FooInstrInfo Hooks used by codegen to: 1. Emit reg-reg copies 2. Save / restore values on stack 3. Branch-related operations 4. Determine instruction sizes

FooInstrInfo.td Defines the instruction patterns: DAG: level of input & output operands MI: Instruction Encoding ASM: Assembler printing strings

FooInstrInfo.td Defines the instruction patterns: DAG: level of input & output operands MI: Instruction Encoding ASM: Assembler printing strings TableGen magic can autogenerate many things

FooInstInfo.td def REV : AMiscA1I<0b01101011, 0b0011, (outs GPR:$Rd), (ins GPR:$Rm), IIC_iUNAr, "rev", "\t$rd, $Rm", [(set GPR:$Rd, (bswap GPR:$Rm))]>, Requires<[IsARM, HasV6]>;

FooInstInfo.td def REV : AMiscA1I<0b01101011, 0b0011, (outs GPR:$Rd), (ins GPR:$Rm), IIC_iUNAr, "rev", "\t$rd, $Rm", [(set GPR:$Rd, (bswap GPR:$Rm))]>, Requires<[IsARM, HasV6]>;

FooInstInfo.td def REV : AMiscA1I<0b01101011, 0b0011, (outs GPR:$Rd), (ins GPR:$Rm), IIC_iUNAr, "rev", "\t$rd, $Rm", [(set GPR:$Rd, (bswap GPR:$Rm))]>, Requires<[IsARM, HasV6]>;

FooInstInfo.td def REV : AMiscA1I<0b01101011, 0b0011, (outs GPR:$Rd), (ins GPR:$Rm), IIC_iUNAr, "rev", "\t$rd, $Rm", [(set GPR:$Rd, (bswap GPR:$Rm))]>, Requires<[IsARM, HasV6]>;

FooInstInfo.td def REV : AMiscA1I<0b01101011, 0b0011, (outs GPR:$Rd), (ins GPR:$Rm), IIC_iUNAr, "rev", "\t$rd, $Rm", [(set GPR:$Rd, (bswap GPR:$Rm))]>, Requires<[IsARM, HasV6]>;

FooFrameLowering Hooks connected with function stack frames: 1. Prologue & epilogue expansion 2. Function call frame formation 3. Spilling & restoring of callee saved regs

FooMCInstPrinter Target part of generic assembler printing code

FooMCInstPrinter Target part of generic assembler printing code Specifies how a given MCInst should be represented as an assembler string: 1. Instruction opcodes, operands 2. Encoding of immediate values, 3. Workarounds for assembler bugs :)

What s not covered? MC-level stuff: MC{Asm,Instr,Reg}Info Assemblers and disassemblers Direct object code emission MI-level (post-ra) scheduler

OpenRISC IP core, not a real CPU chip Straightforward 32-bit RISC CPU 32 regs 3 address instructions Rich instruction set

The Goal Make the following IR to yield the valid assembler: define void @foo() { entry: ret void }

Triple Make sure the desired target triple is recognized: include/adt/triple.h & lib/ Support/Triple.cpp Add or32 entry Add or32 openrisc backend mapping

Stub classes Provide stub implementations of all necessary 15 backend classes :( Hook them into build system

Stub classes Provide stub implementations of all necessary 15 backend classes :( Hook them into build system Maybe it s a good idea to add stub backend to the tree

Registers Define all registers and register classes: class OpenRISCReg<string n> : Register<n> { } let Namespace = "OpenRISC"; def ZERO : OpenRISCReg<"r0">; def SP : OpenRISCReg<"r1">;... def R31 : OpenRISCReg<"r31">; def GR32 : RegisterClass<"OpenRISC", [i32], 32, (add (sequence "R%u", 3, 8), (sequence "R%u", 10, 31), LR, SP, FP, ZERO)>;

Calling Convention def CC_OpenRISC : CallingConv<[ // Promote i8 arguments to i32. CCIfType<[i8], CCPromoteToType<i32>>, // Promote i8 arguments to i32. CCIfType<[i16], CCPromoteToType<i32>>, // The first 6 integer arguments of non-varargs functions are passed in // integer registers. CCIfNotVarArg<CCIfType<[i32], CCAssignToReg<[R3, R4, R5, R6, R7, R8]>>>, // Integer values get stored in stack slots that are 4 bytes in // size and 4-byte aligned. CCIfType<[i32], CCAssignToStack<4, 4>> ]>;

Some hooks copyphysreg() blank emitprologue() / emitepilogue() hasfp() getreservedregs() getcalleesavedregs()

Some boilerplate ADJCALLSTACKUP / ADJCALLSTACKDOWN pseudo instructions Make sure lowering knows about native integer type and register classes

LowerFormalArguments 1. Assign locations to all incoming arguments (depending on their type) 2. Copy arguments passed in registers 3. Create frame index objects for arguments passed on stack 4. Create SelectionDAG nodes for loading of stack arguments

MI to MC The lowering MI to MC is straightforward: void OpenRISCMCInstLower::Lower(const MachineInstr *MI, MCInst &OutMI) const { OutMI.setOpcode(MI->getOpcode()); for (unsigned i = 0, e = MI->getNumOperands(); i!= e; ++i) { const MachineOperand &MO = MI->getOperand(i); } MCOperand MCOp; switch (MO.getType()) { case MachineOperand::MO_Immediate: MCOp = MCOperand::CreateImm(MO.getImm()); break;... } OutMI.addOperand(MCOp);

MCInst Printing Printing is easy as well: void OpenRISCInstPrinter::printInst(const MCInst *MI, raw_ostream &O, StringRef Annot) { printinstruction(mi, O); printannotation(o, Annot); } void OpenRISCInstPrinter::printOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O, const char *Modifier) { const MCOperand &Op = MI->getOperand(OpNo); if (Op.isReg()) { O << getregistername(op.getreg()); } else if (Op.isImm()) { O << Op.getImm(); } else assert(0 && "Unknown operand in printoperand"); }

First Instruction Add pattern for function return instruction: def return : SDNode<"OpenRISCISD::RET", SDTNone, [SDNPHasChain, SDNPOptInGlue]>; let isreturn = 1, isterminator = 1, isbarrier = 1 in { def RET : OpenRISCInst<(outs), (ins), "l.jr r9 # FIXME: delay slot", [(return)]>; }

Clang Want to write tescases in C?

Clang Want to write tescases in C? Hook in clang! One has to provide TargetInfo (pretending the toolchain looks binutils-ish) Detailed toolchain description can be added later

Next steps Reg-reg arithmetic instructions Loads / stores: matching address modes Proper function frame formation Delay slot filling (with NOPs for now) Branch folding hooks...

Q & A