Compiler Design Spring 2018 Thomas R. Gross Computer Science Department ETH Zurich, Switzerland 1
Logistics Lecture Tuesdays: 10:15 11:55 Thursdays: 10:15 -- 11:55 In ETF E1 Recitation Announced later Watch lecture website and your ETH email Lecture website: via www.lst.inf.ethz.ch Lecture slides Homework assignment If questions related to assignments: contact assistants (mailing list) If questions related to lecture: write me 2
Rules Rule #1: Peace in the lecture hall 3
What do you want to get out of this class? Please fill out entry questionnaire It s anonymous 4
What I hope to teach you in this class 1. Compiler design: Structure of a simple compiler Simple: 2-3K lines of Java code (maybe a bit more) Industry: C1 compiler in HotSpot VM is considered simple 30K lines of C/C++/assembly code 2. Software engineering: How to design a large(r) software system Sometimes there is no right or wrong Sometimes there is 3. Programming What the programming language design document should tell you How to use that information 8
Course structure You will not learn the material from lectures alone Homework is essential! 9
Homework Core element of the course You will build a compiler More on this topic (organization, constraints) later 10
Compiler design and implementation What is your favorite compiler? Please talk to your neighbor and tell him/her which compiler(s) you used and if you have a favorite compiler. Why? Justify your answer. Can you and your neighbor agree on what matters to you in a compiler? 11
12
Observations Languages are important Source language L 1 Target language L 2 Host language L H Programs can be executed Program is a sequence of expressions E 1, E 2, A processor contains state Execution of expressions: Each expression E i may read state, modify state, and determine next expression to execute E j A special expression E stop indicates that program execution stops 14
Program execution Execution ( elaboration ) of expressions E 1, E 2, by some machine M M realized by hardware physical processor M defined by software virtual machine Other possibilities Expressions E 1, E 2, also referred to as statements or operations Elaboration sometimes referred to as interpretation The word interpretation sometimes hints at direct execution 15
Issues Languages: Choices for L 1 and L 2 16
Languages Please talk to your neighbor and find at least three languages that could serve as either source language L 1 or target language L 2 for a compiler. Think about compilers you used (or would have liked to use). 17
18
19
Languages L 1 and L 2 L 1 C ASM LLVM Java C# Scala JavaScript Python L 2 Machine instruction ASM C LLVM Java Byte Code JavaScript 23
(More) languages L 1 and L 2 php html pdf dvi Latex Tex VHDL SQL Lisp Haskell Prolog 24
Issues (continued) Languages: Choices for L 1 and L 2 Program written in L 1 (P L 1 ) translated into program written in L 2 (P L 2 ) P L 1 à P L2 Aspects of translation of programs P L 1 à P L2 What does it mean that P L 2 is a translation of P L1 P L 2 should produce the same result as P L1 25
Semantics Describes the meaning of programs Meaning of program defined by meaning of statements or operations Formal specification 1. Operational semantics Abstract machine A Sequences of steps interpreted ( elaboration ) Effect on A determines meaning 2. Denotational semantics Mathematical construct describes effect Can be manipulated (composition, projection, ) 3. Axiomatic semantics Assertions on program state and rules that describe the effect of operations Other ways: natural language, reference implementation 26
Semantics Translated (target) program P L 2 has the same meaning as the (source) program P L1 At least: computes the same result(s) for all legal inputs Same: must be defined... What about illegal inputs? What about non-functional properties? 27
30
Reasons for translation A compiler translates a program written in language L 1 into language L 2. Reasons to translate P L 1 à P L2 Faster execution of P L 2 No real machine to run P L 1 No abstract machine (virtual machine) to run P L 1 P L 2 can be realized (in hardware) (L 1 ==L 2 ) P L2 is more readable/optimized/stable Special case: L 1 =asm, binary rewriting tool adds bounds checks P L1 cannot be edited (by humans) Compiler Java byte code to Java P L2 requires less energy 31
Complications L 1 and L 2 have different resource models L 1 : no limit on resources, flexible description L 2 : finite resources, inflexible description, hardware-based 32
Complications L 1 : no limit on resources number of variables lines of code number of methods data space nesting characters in var name L 2 : finite resources Fixed number of registers Limited storage Finite representation Machine properties matter Caches TLBs NUMA 33
Compiler task: Translate P L 1 à P L2 Management of resources Preservation of semantics Is meaning defined? For all possible inputs? Check constraints on P L 1 Bailout: Not every program can be translated Not every aspect can be checked by compiler Escape: compiler inserts code into P L2 to check properties of program during execution ( at runtime ) 34
Compiler Design Spring 2018 1.1 Simple compiler model Thomas R. Gross Computer Science Department ETH Zurich, Switzerland 1
1.1 Simple and realistic compiler model Simple: Can be handled in one semester, 8 credits Two persons to work on the same project (more about teams later) Realistic: Experience problems encountered by real compilers Mirrors structure of many compilers 2
Compiler model Source program Compiler ASM file Assembler Object file 4
Compiler model Compilation prior to execution AOT Ahead of (Execution) Time compilation Commonly used for languages without language-specific execution environments (e.g., C, C++) Available in Java as well (IBM J9, Oracle HotSpot) Other model: Continuous compilation JIT Just in Time compilation Usually: optimization of methods that are frequently invoked (hot) Commonly used with language virtual machines (e.g., Java VM) E.g., HotSpot JVM has two JIT compilers (C1 and C2) 5
Compiler model Source program Compiler ASM file Frontend IR Back-end Read input, transform Intermediate representation Manage machine resources Generate code Assembler Object file 7
Compiler model Source program Frontend Frontend Frontend Compiler IR ASM file Back-end Assembler Object file 9
IR Intermediate representation Compiler-internal representation E.g., compiler must distinguish between names in different scopes E.g., many programs work with variables, computers work with locations Must express all language constructs/concepts Code generator maps IR to assembly code Machine code another option No best IR all are compromises 11
Compiler model Source program Frontend Compiler IR Optimizer ASM file Back-end Assembler Native code 13