Formal Languages and Compiler (CSE322) Introduction to Compiler Jungsik Choi chjs@khu.ac.kr 2018. 3. 8
Traditional Two-pass Compiler Source Front End Back End Compiler Target High level functions Recognize legal program, generate correct code (OS & linker can accept) Manage the storage of all variables and code Two passes Use an intermediate representation () Front end maps legal source code into O(n) or O(n log n) Back end maps into target machine code typically NP-complete Admits multiple front ends & multiple passes (better code) 2
Front End Source Scanner tokens Parser Responsibilities Recognize legal (& illegal) programs Report errors in a useful way Produce & preliminary storage map Shape the code for the back end Much of front end construction can be automated 3
Front End - Scanner Source Scanner tokens Parser Scanner (Lexical Analysis) Maps character stream into words (basic units of syntax) sum=x+y; becomes <id,sum> <eq,=> <id,x> <add,+> <id,y> <sc,;> word lexeme, part of speech token type Typical tokens include number, identifier, keyword, operator Scanner eliminates white space and comments Produced by automatic scanner generator 4
Front End - Parser Source Scanner tokens Parser Parser Recognizes context-free syntax Guides context-sensitive ( semantic ) analysis E.g. type checking Builds for source program Produced by automatic parser generators 5
Front End Example (1) Context-free syntax can be put to better use 1. goal expr 2. expr expr op term 3. term 4. term number 5. id 6. op + 7. - S = goal T = {number, id, +, -} NT = {goal, expr, term, op} P = {1, 2, 3, 4, 5, 6, 7} This grammar defines simple expressions with addition & subtraction over number and id This grammar, like many, falls in a class called context-free grammars, abbreviated CFG 6
Front End Example (2) A parse can be represented by a tree (parse tree or syntax tree) x + 2 y expr expr op term term + <number,2> goal expr op term <id,y> 1. goal expr 2. expr expr op term 3. term 4. term number 5. id 6. op + 7. - <id,x> This contains a lot of unneeded information 7
Front End Example (3) Compilers often use an Abstract Syntax Tree (AST) AST summarizes grammatical structure, without including detail about the derivation + <id,y> <id,x> <number,2> This is much more concise ASTs are one kind of intermediate representation () 8
Back End Instruction Selection Instruction Scheduling Register Allocation Target Responsibilities Translate into target machined code Choose instructions to implement each operation Decide which values to keep in registers Find optimal order of instruction execution Ensure conformance with system interfaces Automation has been less successful in the back end 9
Back End Instruction Selection Instruction Selection Instruction Scheduling Register Allocation Target Instruction Selection Produce fast, compact code Take advantage of target features such as addressing modes Usually viewed as a pattern matching problem ad hoc methods, pattern matching, dynamic programming 10
Back End Instruction Scheduling Instruction Selection Instruction Scheduling Register Allocation Target Instruction Scheduling Avoid hardware stalls and interlocks Use all functional units productively Can increase lifetime of variables (changing the allocation) Optimal scheduling is NP-Complete in nearly all cases Heuristic techniques are well developed 11
Back End Register Allocation Instruction Selection Instruction Scheduling Register Allocation Target Register Allocation Have each value in a register when it is used Manage a limited set of resources Can change instruction choices & insert LOADs & STOREs Optimal allocation is NP-Complete Compilers approximate solutions to NP-Complete problems 12
Optimizing Compiler Source Front End Middle End Back End Target Compiler Code Optimizations Analyzes and rewrites (or transforms) Primary goal is to reduce Execution time, Space usage, Power consumption, Must preserve meaning of the code 13
Instruction Selection Example Simple Treewalk for initial code Peephole matching for desired code IDENT <a, ARP, 4> IDENT <b, ARP, 4> load 4 r 5 loadao r 0,r 5 r 6 loadi 8 r 7 loadao r 0,r 7 r 8 mult r 6,r 8 r 9 loadai r 0,4 r 5 loadai r 0,8 r 6 mult r 5,r 6 r 7 Tree Treewalk Code Desired Code 14
Instruction Scheduling Example Schedule Instructions considering Latency Dependences Generate fast executing code a a: loadai r 0,@w r 1 b: add r 1,r 1 r 1 c: loadai r 0,@x r 2 d: mult r 1,r 2 r 1 b d c The Code The Precedence Graph 15
Register Allocation Example Instruction selection assume infinite # of registers (virtual registers) Mapping virtual registers to physical registers Sometimes need register spill/fill code (only r 5 is used later) (r 5 is renamed with r 2 ) loadi 4 r 1 loadao r 0,r 1 r 2 loadi 8 r 3 loadao r 0,r 3 r 4 mult r 2,r 4 r 5 loadi 4 r 1 loadao r 0,r 1 r 1 loadi 8 r 2 loadao r 0,r 2 r 2 mult r 1,r 2 r 2 6 virtual registers 16 3 physical registers
Summary Source Front End Middle End Back End Target Compiler Front End: Process high-level programming language Middle End: Apply optimization for speed, power, space, Back End: Produce machine-level assembly code 17