An Overview of Compilation (www.cse.iitb.ac.in/ uday) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay January 2014
cs306 Compilation Overview: Outline 1/18 Outline Introduction compilation sequence compilation models
Part 1 Introduction to Compilation
cs306 Compilation Overview: Introduction to Compilation 2/18 Binding Nothing is known except the problem No.of unbound objects Time
cs306 Compilation Overview: Introduction to Compilation 2/18 Binding Overall strategy, algorithm, data structures etc. No.of unbound objects Conceptualisation Time
cs306 Compilation Overview: Introduction to Compilation 2/18 Binding No.of unbound objects Functions, variables, their types etc. Conceptualisation Coding Time
cs306 Compilation Overview: Introduction to Compilation 2/18 Binding No.of unbound objects Machine instructions, registers etc. Conceptualisation Coding Compiling Time
cs306 Compilation Overview: Introduction to Compilation 2/18 Binding No.of unbound objects Addresses of functions, external data etc. Conceptualisation Coding Compiling Linking Time
cs306 Compilation Overview: Introduction to Compilation 2/18 Binding No.of unbound objects Actual addresses of code and data Conceptualisation Coding Compiling Linking Loading Time
cs306 Compilation Overview: Introduction to Compilation 2/18 Binding No.of unbound objects Values of variables Conceptualisation Coding Compiling Linking Loading Execution Time
cs306 Compilation Overview: Introduction to Compilation 2/18 Binding No.of unbound objects We will look at different binding times related to compiling Conceptualisation Coding Compiling Linking Loading Execution Time
cs306 Compilation Overview: Introduction to Compilation 3/18 Implementation Mechanisms Source Program Translator Target Program Machine
cs306 Compilation Overview: Introduction to Compilation 3/18 Implementation Mechanisms Source Program Translator Input Data Target Program Machine
cs306 Compilation Overview: Introduction to Compilation 3/18 Implementation Mechanisms Source Program Input Data Source Program Translator Target Program Machine Interpreter Machine
cs306 Compilation Overview: Introduction to Compilation 4/18 Implementation Mechanisms as Bridges Gap between the levels of program specification and execution Program Specification Machine
cs306 Compilation Overview: Introduction to Compilation 4/18 Implementation Mechanisms as Bridges Gap between the levels of program specification and execution Program Specification Translation Machine
cs306 Compilation Overview: Introduction to Compilation 4/18 Implementation Mechanisms as Bridges Gap between the levels of program specification and execution Program Specification Translation Interpretation Machine
cs306 Compilation Overview: Introduction to Compilation 4/18 Implementation Mechanisms as Bridges Gap between the levels of program specification and execution Program Specification State : Variables Operations: Expressions, Control Flow Translation Interpretation Machine State : Memory, Registers Operations: Machine Instructions
cs306 Compilation Overview: Introduction to Compilation 5/18 High and Low Level Abstractions Input C statement a = b<10?b:c; Spim Assembly Equivalent lw $t0, 4($fp) ; t0 <- b # Is b smaller slti $t0, $t0, 10 ; t0 <- t0 < 10 # than 10? not $t0, $t0 ; t0 <-!t0 bgtz $t0, L0: ; if t0>0 goto L0 lw $t0, 4($fp) ; t0 <- b # YES b L1: ; goto L1 L0: lw $t0, 8($fp) ;L0: t0 <- c # NO L1: sw 0($fp), $t0 ;L1: a <- t0
cs306 Compilation Overview: Introduction to Compilation 5/18 High and Low Level Abstractions Input C statement a = b<10?b:c; Spim Assembly Equivalent Conditional jump Condition False Part True Part Fall through lw $t0, 4($fp) ; t0 <- b # Is b smaller slti $t0, $t0, 10 ; t0 <- t0 < 10 # than 10? not $t0, $t0 ; t0 <-!t0 bgtz $t0, L0: ; if t0>0 goto L0 lw $t0, 4($fp) ; t0 <- b # YES b L1: ; goto L1 L0: lw $t0, 8($fp) ;L0: t0 <- c # NO L1: sw 0($fp), $t0 ;L1: a <- t0
cs306 Compilation Overview: Introduction to Compilation 5/18 High and Low Level Abstractions NOT Condition Input C statement a = b<10?b:c; Spim Assembly Equivalent True Part False Part lw $t0, 4($fp) ; t0 <- b # Is b smaller slti $t0, $t0, 10 ; t0 <- t0 < 10 # than 10? not $t0, $t0 ; t0 <-!t0 bgtz $t0, L0: ; if t0>0 goto L0 lw $t0, 4($fp) ; t0 <- b # YES b L1: ; goto L1 L0: lw $t0, 8($fp) ;L0: t0 <- c # NO L1: sw 0($fp), $t0 ;L1: a <- t0
cs306 Compilation Overview: Introduction to Compilation 5/18 High and Low Level Abstractions Input C statement a = b<10?b:c; Spim Assembly Equivalent Conditional jump NOT Condition True Part False Part Fall through lw $t0, 4($fp) ; t0 <- b # Is b smaller slti $t0, $t0, 10 ; t0 <- t0 < 10 # than 10? not $t0, $t0 ; t0 <-!t0 bgtz $t0, L0: ; if t0>0 goto L0 lw $t0, 4($fp) ; t0 <- b # YES b L1: ; goto L1 L0: lw $t0, 8($fp) ;L0: t0 <- c # NO L1: sw 0($fp), $t0 ;L1: a <- t0
cs306 Compilation Overview: Introduction to Compilation 6/18 Implementation Mechanisms Translation = Analysis + Synthesis Interpretation = Analysis + Execution
cs306 Compilation Overview: Introduction to Compilation 6/18 Implementation Mechanisms Translation = Analysis + Synthesis Interpretation = Analysis + Execution Translation Instructions Equivalent Instructions
cs306 Compilation Overview: Introduction to Compilation 6/18 Implementation Mechanisms Translation = Analysis + Synthesis Interpretation = Analysis + Execution Translation Instructions Equivalent Instructions Interpretation Instructions Actions Implied by Instructions
cs306 Compilation Overview: Introduction to Compilation 7/18 Language Implementation Models Synthesis Compilation Analysis Execution Interpretation
cs306 Compilation Overview: Introduction to Compilation 8/18 Language Processor Models Front End Optimizer Back End Virtual Machine C,C++ Java, C#
Part 2 An Overview of Compilation Phases
cs306 Compilation Overview: An Overview of Compilation Phases 9/18 The Structure of a Simple Compiler Parser Scanner Source Program Semantic Analyser Symtab Handler
cs306 Compilation Overview: An Overview of Compilation Phases 9/18 The Structure of a Simple Compiler Parser AST Instruction Selector Insn Assembly Emitter Scanner Semantic Analyser Symtab Handler Register Allocator Assembly Program Source Program
cs306 Compilation Overview: An Overview of Compilation Phases 9/18 The Structure of a Simple Compiler Front End Back End Parser AST Instruction Selector Insn Assembly Emitter Scanner Semantic Analyser Symtab Handler Register Allocator Assembly Program Source Program
cs306 Compilation Overview: An Overview of Compilation Phases 10/18 Translation Sequence in Our Compiler: Parsing a=b<10?b:c; Input
cs306 Compilation Overview: An Overview of Compilation Phases 10/18 Translation Sequence in Our Compiler: Parsing a=b<10?b:c; Input AsgnStmnt Lhs = E ; E? E : E E < E Issues: num Parse Tree Grammar rules, terminals, non-terminals Order of application of grammar rules eg. is it (a = b<10?) followed by (b:c)? Values of terminal symbols eg. string 10 vs. integer number 10.
cs306 Compilation Overview: An Overview of Compilation Phases 11/18 Translation Sequence in Our Compiler: Semantic Analysis a=b<10?b:c; Input AsgnStmnt Lhs = E ; E? E : E E < E num Parse Tree
cs306 Compilation Overview: An Overview of Compilation Phases 11/18 Translation Sequence in Our Compiler: Semantic Analysis a=b<10?b:c; Input AsgnStmnt Lhs = E ; (a,int) =?: (int) E < E E? E : E num Parse Tree < (bool) (b,int) (c,int) num (b,int) (10,int) Abstract Syntax Tree (with attributes) Issues: Symbol tables Have variables been declared? What are their types? What is their scope? Type consistency of operators and operands The result of computing b<10? is bool and not int
cs306 Compilation Overview: An Overview of Compilation Phases 12/18 Translation Sequence in Our Compiler: IR Generation a=b<10?b:c; Input AsgnStmnt Lhs = E E < E E? E : E num Parse Tree ; (a,int) = < (bool)?: (int) (b,int) (c,int) num (b,int) (10,int) Abstract Syntax Tree (with attributes)
cs306 Compilation Overview: An Overview of Compilation Phases 12/18 Translation Sequence in Our Compiler: IR Generation a=b<10?b:c; Input T 0 Tree List = Not < b 10 IfGoto T 0 = T 1 L0: T 1 L1: a Goto L1: = = L0: b c T 1 AsgnStmnt Lhs = E E < E Issues: E? E : E num Parse Tree ; (a,int) = < (bool)?: (int) (b,int) (c,int) num (b,int) (10,int) Abstract Syntax Tree (with attributes) Convert to maximal trees which can be implemented without altering control flow Simplifies instruction selection and scheduling, register allocation etc. Linearise control flow by flattening nested control constructs
cs306 Compilation Overview: An Overview of Compilation Phases 13/18 Translation Sequence in Our Compiler: Instruction Selection a=b<10?b:c; Input T 0 Tree List = Not < b 10 IfGoto T 0 = L0: AsgnStmnt Lhs = E E < E E? E : E num Parse Tree ; (a,int) = < (bool)?: (int) (b,int) (c,int) num (b,int) (10,int) Abstract Syntax Tree (with attributes) T 1 L0: T 1 L1: a Goto L1: = = b c T 1
cs306 Compilation Overview: An Overview of Compilation Phases 13/18 Translation Sequence in Our Compiler: Instruction Selection a=b<10?b:c; Input T 0 Tree List = Not < b 10 IfGoto T 0 = T 1 L0: T 1 L1: a Goto L1: = = L0: b c T 1 AsgnStmnt Lhs = E E < E E? E : E num Parse Tree ; Instruction List T 0 b T 0 T 0 < 10 T 0! T 0 if T 0 > 0 goto L0: T 1 b goto L1: L0: T 1 c L1: a T 1 Issues: (a,int) = < (bool)?: (int) (b,int) (c,int) num (b,int) (10,int) Abstract Syntax Tree (with attributes) Cover trees with as few machine instructions as possible Use temporaries and local registers
cs306 Compilation Overview: An Overview of Compilation Phases 13/18 Translation Sequence in Our Compiler: Instruction Selection a=b<10?b:c; Input T 0 Tree List = Not < b 10 IfGoto T 0 = T 1 L0: T 1 L1: a Goto L1: = = L0: b c T 1 AsgnStmnt Lhs = E E < E E? E : E num Parse Tree ; Instruction List T 0 b T 0 T 0 < 10 T 0! T 0 if T 0 > 0 goto L0: T 1 b goto L1: L0: T 1 c L1: a T 1 Issues: (a,int) = < (bool)?: (int) (b,int) (c,int) num (b,int) (10,int) Abstract Syntax Tree (with attributes) Cover trees with as few machine instructions as possible Use temporaries and local registers
cs306 Compilation Overview: An Overview of Compilation Phases 14/18 Translation Sequence in Our Compiler: Emitting Instructions a=b<10?b:c; Input T 0 Tree List = Not < b 10 IfGoto T 0 = T 1 L0: T 1 L1: a Goto L1: = = L0: b c T 1 AsgnStmnt Lhs = E E < E E? E : E num Parse Tree ; Instruction List T 0 b T 0 T 0 < 10 T 0! T 0 if T 0 > 0 goto L0: T 1 b goto L1: L0: T 1 c L1: a T 1 (a,int) = < (bool)?: (int) (b,int) (c,int) num (b,int) (10,int) Abstract Syntax Tree (with attributes)
cs306 Compilation Overview: An Overview of Compilation Phases 14/18 Translation Sequence in Our Compiler: Emitting Instructions a=b<10?b:c; Input T 0 Tree List = Not < b 10 IfGoto T 0 = T 1 L0: T 1 L1: a Goto L1: = = L0: b c T 1 AsgnStmnt Lhs Issues: = E Offsets E of? E variables : E in the stack frame E < E Actual register numbers andnum assembly mnemonics Code Parseto Tree construct and discard activation records Instruction List T 0 b T 0 T 0 < 10 T 0! T 0 if T 0 > 0 goto L0: T 1 b goto L1: L0: T 1 c L1: a T 1 ; (a,int) = < (bool)?: (int) (b,int) (c,int) num (b,int) (10,int) Abstract Syntax Tree (with attributes) Assembly Code lw $t0, 4($fp) slti $t0, $t0, 10 not $t0, $t0 bgtz $t0, L0: lw $t0, 4($fp) b L1: L0: lw $t0, 8($fp) L1: sw 0($fp), $t0
Part 3 Compilation Models
cs306 Compilation Overview: Compilation Models 15/18 Aho Ullman Model Compilation Models Davidson Fraser Model
cs306 Compilation Overview: Compilation Models 15/18 Aho Ullman Model Front End Compilation Models Input Source Program Davidson Fraser Model AST
cs306 Compilation Overview: Compilation Models 15/18 Aho Ullman Model Front End Compilation Models Input Source Program Davidson Fraser Model AST Optimizer Target Indep. IR
cs306 Compilation Overview: Compilation Models 15/18 Aho Ullman Model Front End Compilation Models Input Source Program Davidson Fraser Model AST Optimizer Target Indep. IR Code Generator Target Program
cs306 Compilation Overview: Compilation Models 15/18 Aho Ullman Model Front End Compilation Models Input Source Program Davidson Fraser Model Front End AST AST Optimizer Target Indep. IR Code Generator Target Program
cs306 Compilation Overview: Compilation Models 15/18 Aho Ullman Model Front End Compilation Models Input Source Program Davidson Fraser Model Front End AST Optimizer Target Indep. IR AST Expander Register Transfers Code Generator Target Program
cs306 Compilation Overview: Compilation Models 15/18 Aho Ullman Model Front End Compilation Models Input Source Program Davidson Fraser Model Front End AST Optimizer Target Indep. IR Code Generator AST Expander Register Transfers Optimizer Register Transfers Target Program
cs306 Compilation Overview: Compilation Models 15/18 Aho Ullman Model Front End Compilation Models Input Source Program Davidson Fraser Model Front End AST Optimizer Target Indep. IR Code Generator Target Program AST Expander Register Transfers Optimizer Register Transfers Recognizer Target Program
cs306 Compilation Overview: Compilation Models 15/18 Aho Ullman Model Front End AST Optimizer Target Indep. IR Code Generator Target Program Compilation Models Aho Ullman: Instruction selection over optimized IR using cost based tree tiling matching Davidson Fraser: Instruction selection over AST using simple full tree matching based algorithms that generate naive code which is target dependent, and is optimized subsequently Davidson Fraser Model Front End AST Expander Register Transfers Optimizer Register Transfers Recognizer Target Program
cs306 Compilation Overview: Compilation Models 16/18 Typical Front Ends Parser
cs306 Compilation Overview: Compilation Models 16/18 Typical Front Ends Source Program Parser Tokens Scanner
cs306 Compilation Overview: Compilation Models 16/18 Typical Front Ends Source Program Scanner Tokens Parser Parse Tree AST AST or Linear IR + Symbol Table Semantic Analyzer
cs306 Compilation Overview: Compilation Models 16/18 Typical Front Ends Source Program Scanner Tokens Parser Parse Tree AST AST or Linear IR + Symbol Table Semantic Analyzer Symtab Handler Error Handler
cs306 Compilation Overview: Compilation Models 17/18 Typical Back Ends in Aho Ullman Model m/c Ind. IR m/c Ind. Optimizer m/c Ind. IR Compile time evaluations Eliminating redundant computations
cs306 Compilation Overview: Compilation Models 17/18 Typical Back Ends in Aho Ullman Model m/c Ind. IR m/c Ind. Optimizer m/c Ind. IR Code Generator m/c Dep. IR Compile time evaluations Eliminating redundant computations Instruction Selection Local Reg Allocation Choice of Order of Evaluation
cs306 Compilation Overview: Compilation Models 17/18 Typical Back Ends in Aho Ullman Model m/c Ind. IR m/c Ind. Optimizer m/c Ind. IR Code Generator m/c Dep. IR m/c Dep. Optimizer Compile time evaluations Eliminating redundant computations Instruction Selection Local Reg Allocation Choice of Order of Evaluation Assembly Code
cs306 Compilation Overview: Compilation Models 17/18 Typical Back Ends in Aho Ullman Model m/c Ind. IR Register Allocator m/c Ind. Optimizer m/c Ind. IR Code Generator m/c Dep. IR Instruction Scheduler Compile time evaluations Eliminating redundant computations Instruction Selection Local Reg Allocation Choice of Order of Evaluation Peephole Optimizer Assembly Code
cs306 Compilation Overview: Compilation Models 18/18 Retargetability in Aho Ullman and Davidson Fraser Models Instruction Selection Aho Ullman Model Davidson Fraser Model Machine independent IR is expressed in the form of trees Machine instructions are described in the form of trees Trees in the IR are covered using the instruction trees Optimization
cs306 Compilation Overview: Compilation Models 18/18 Retargetability in Aho Ullman and Davidson Fraser Models Instruction Selection Aho Ullman Model Davidson Fraser Model Machine independent IR is expressed in the form of trees Machine instructions are described in the form of trees Trees in the IR are covered using the instruction trees Cost based tree pattern matching Optimization
cs306 Compilation Overview: Compilation Models 18/18 Retargetability in Aho Ullman and Davidson Fraser Models Instruction Selection Aho Ullman Model Davidson Fraser Model Machine independent IR is expressed in the form of trees Machine instructions are described in the form of trees Trees in the IR are covered using the instruction trees Cost based tree pattern Structural tree pattern matching matching Optimization
cs306 Compilation Overview: Compilation Models 18/18 Retargetability in Aho Ullman and Davidson Fraser Models Instruction Selection Aho Ullman Model Davidson Fraser Model Machine independent IR is expressed in the form of trees Machine instructions are described in the form of trees Trees in the IR are covered using the instruction trees Cost based tree pattern Structural tree pattern matching matching Optimization Machine independent
cs306 Compilation Overview: Compilation Models 18/18 Retargetability in Aho Ullman and Davidson Fraser Models Instruction Selection Aho Ullman Model Davidson Fraser Model Machine independent IR is expressed in the form of trees Machine instructions are described in the form of trees Trees in the IR are covered using the instruction trees Cost based tree pattern Structural tree pattern matching matching Machine dependent Optimization Machine independent
cs306 Compilation Overview: Compilation Models 18/18 Retargetability in Aho Ullman and Davidson Fraser Models Instruction Selection Aho Ullman Model Davidson Fraser Model Machine independent IR is expressed in the form of trees Machine instructions are described in the form of trees Trees in the IR are covered using the instruction trees Cost based tree pattern Structural tree pattern matching matching Machine dependent Optimization Machine independent Key Insight: Register transfers are target specific but their form is target independent