Compiler Construction Introduction and overview Görel Hedin Reviderad 2013-01-22 2013 Compiler Construction 2013 F01-1
Agenda Course registration, structure, etc. Course overview Compiler Construction 2013 F01-2
Course registration You need to Confirm registration by signing the Registration Form To unregister Email me Compiler Construction 2013 F01-3
Prerequisites Object-oriented programming and Java Algorithms and Data structures (recursion, trees, lists, hash tables,... ) Compiler Construction 2013 F01-4
Course information Web page: http://cs.lth.se/eda180 will be updated during the course Literature Course material, will be made available on the web site. Lectures, seminars, labs, project, articles Not handed out - print yourself. Textbook A.W. Appel: Modern Compiler Implementation in Java, 2nd Edition, Cambridge University Press, 2002, ISBN: 0-521-82060-X. Available as an e-book via http://www.lub.lu.se/en/search/lubsearch.html Compiler Construction 2013 F01-5
Course structure 14 lectures Tuesday 15-17, MA:020 Wednesday 10-12, E:A 5 seminars (give extra points on exam) Thursday 10-12, MA:026 Start this week 6 computer assignments / lab sessions (mandatory) Thursday 13-15 or Friday 10-12 (E:Hacke) Sign up by Thursday Jan 24 (see course web) Start next week Must be completed before exam and course project Written exam, March 13 Course project, VT2 Compiler Construction 2013 F01-6
People helping with the course Lectures: Görel Hedin Emma Söderberg Guest lectures: Roger Henriksson (automatic memory management) Jonas Skeppstedt (code optimization) Seminars Emma Söderberg Programming assignments and lab sessions Niklas Fors Jesper Öqvist Compiler Construction 2013 F01-7
Seminars Active participation gives extra points at the exam Before the seminar Try to solve the problems at home. Write down your solutions, bring them with you, be prepared to present them at the seminar. At the seminar Mark the problems you are willing to present solutions for. The seminar leader selects some students to present their solutions. Discussion of the solutions. At the written exam (Note! Only for exams this year) Your markings will give you a maximum of 10% extra points at the written exam. Compiler Construction 2013 F01-8
Programming assignments / Lab sessions Work in pairs Use the lecture break to form pairs! Make the preparations before each lab session. If you complete the assignment in advance, you must anyway go to the lab session to get it approved. Compiler Construction 2013 F01-9
Examination Exams take place Wednesday, March 13, 8-13, Sparta:D Friday, August 30, 8-13, Victoriastadion 1A (1 week advance registration required) Prerequisites Completed programming assignments Compiler Construction 2013 F01-10
Project Standard project Design of a small procedural language Implementation of a compiler from source text to Intel assembly code Work in pairs Deadlines Intermediate deadlines given later Final deadline for completed and approved project: May 3rd Compiler Construction 2013 F01-11
Project outcome source code compiler assembly code Compiler Construction 2013 F01-12
Project outcome source code csum = a + b + 1; compiler assembly code movl a, %eax addl b, %eax addl $1, %eax movl %eax, csum Compiler Construction 2013 F01-12
What happens after compilation? source code assembler object code library object code compiler assembly code linker machine code loader machine memory Compiler Construction 2013 F01-13
A closer look at the compiler source code lexical analysis intermediate code generation syntactic analysis optimization semantic analysis machine code generation machine code Compiler Construction 2013 F01-14
Intermediate representations source code lexical analysis tokens intermediate code generation intermediate code syntactic analysis attributed AST optimization AST semantic analysis analysis intermediate code machine code generation synthesis machine code Compiler Construction 2013 F01-15
Front and back end source code lexical analysis intermediate code generation syntactic analysis optimization semantic analysis front end machine code generation back end machine code Compiler Construction 2013 F01-16
Intermediate code FrontEnd L intermediate code BackEnd Intel Compiler Construction 2013 F01-17
Several front and back ends FrontEnd L FrontEnd C FrontEnd PL0 intermediate code BackEnd Intel BackEnd MIPS Interpreter Compiler Construction 2013 F01-18
Why? It is more rational to implement m front ends + n back ends than m n compilers. Many optimizations are best performed on intermediate code. It may be easier to debug the front end using an interpreter than a target machine. Compiler Construction 2013 F01-19
Compilation and Interpretation A compiler translates a high level program to low level/machine code. An interpreter executes a high/low level program by calling one procedure for each program construct. An interpreter may use a JIT ( just in time ) compiler to compile all or parts of the the program into machine code during execution. Compiler Construction 2013 F01-20
Program representations source code lexical analysis tokens intermediate code generation intermediate code syntactic analysis attributed AST optimization AST semantic analysis intermediate code machine code generation machine code Compiler Construction 2013 F01-21
Lexical analysis (scanning) Source text while (k<=n) { sum=sum+k; k=k+1; } Tokens A token is a symbolic name, sometimes with an attribute. A lexeme is a string corresponding to a token. Compiler Construction 2013 F01-22
Lexical analysis (scanning) Source text Tokens while (k<=n) { sum=sum+k; k=k+1; } WHILE LPAR ID(k) LEQ ID(n) RPAR LBRA A token is a symbolic name, sometimes with an attribute. A lexeme is a string corresponding to a token. Compiler Construction 2013 F01-23
Lexical analysis (scanning) Source text Tokens while (k<=n) { WHILE LPAR ID(k) LEQ ID(n) RPAR LBRA sum=sum+k; ID(sum) EQ ID(sum) PLUS ID(k) SEMI k=k+1; ID(k) EQ ID(k) PLUS INT(1) SEMI } RBRA Compiler Construction 2013 F01-24
Syntactic analysis (parsing) Compiler Construction 2013 F01-25
Syntactic analysis (parsing) Compiler Construction 2013 F01-26
Abstract Syntax Tree (AST) Compiler Construction 2013 F01-27
Abstract Syntax Tree (AST) used for program representation inside tools very similar to the parse tree, but contains only essential tokens has a simpler more natural structure often represented by a typed object-oriented model abstract classes (statements, expressions,...) concrete classes (while, if, add, subtract,...) Compiler Construction 2013 F01-27
Parse tree spans all tokens Compiler Construction 2013 F01-28
Abstract syntax tree only essential structure and tokens Compiler Construction 2013 F01-29
AST class hierarchy Create class hierarchies for statements and expressions! Invent names for suitable abstract classes! Which methods are required to traverse the AST? Compiler Construction 2013 F01-30
Draw the class hierarchy Compiler Construction 2013 F01-31
Draw the class hierarchy Stmt WhileStmt getexpr() getstmt() Assignment getid() getexpr() CompoundStmt getnrofstmts() getstmt(int) Compiler Construction 2013 F01-31
Draw the class hierarchy Stmt Expr WhileStmt getexpr() getstmt() Assignment getid() getexpr() Add getexpr1() getexpr2() LessEqual getexpr1() getexpr2() CompoundStmt getnrofstmts() getstmt(int) Id getid() Int getint() Compiler Construction 2013 F01-31
Semantic analysis Analyze the AST, e.g. Which variable corresponds to which declaration? What is the type of an expression? Are there compile time errors in the program? Compiler Construction 2013 F01-32
Formalisms we will cover Regular expressions for defining tokens automatic generation of scanners Context-free grammars for defining concrete syntax trees automatic generation of parsers Abstract Grammars for defining abstract syntax trees automatic generation of Java classes Attribute Grammars for defining properties of AST nodes automatic evaluation of the attributes Aspect Modules for defining fields, methods, and attributes in separate modules automatic weaving into Java classes Compiler Construction 2013 F01-33
Compiler tools we will use JavaCC (Sun/Open source) Scanner and parser generator JJTree (Sun/Open source) adds AST building to JavaCC implements the Visitor pattern for ASTs JastAdd (LTH/Open source) generates Java classes supports static aspect oriented programming supports attribute grammars as (GNU/Open source) translates assembly code to machine code Compiler Construction 2013 F01-34
Other tools we will use Ant (Apache/Open source) Software system builder JUnit (Object Mentor/Open source) testing framework Gdb (GNU/Open source) debugger Compiler Construction 2013 F01-35
Synthesis Runtime systems How are variables accessed and procedures called? How are objects and classes represented? How is memory reused? Intermediate code generation Straight-forward mapping from AST Use unlimited number of registers (temporaries) Optimization Only brief overview (see EDA230 for detailed treatment) Machine code generation Instruction selection Register allocation Compiler Construction 2013 F01-36
Paradigms Imperative programming Compiler Construction 2013 F01-37
Paradigms Imperative programming Procedural Object oriented Compiler Construction 2013 F01-37
Paradigms Imperative programming Procedural Object oriented Declarative programming Compiler Construction 2013 F01-37
Paradigms Imperative programming Procedural Object oriented Declarative programming Functional Logical Constraint Regular expressions Context-free grammars Attribute grammars Compiler Construction 2013 F01-37
Paradigms Imperative programming Procedural Object oriented Declarative programming Functional Logical Constraint Regular expressions Context-free grammars Attribute grammars Hybrid languages Compiler Construction 2013 F01-37
Paradigms Imperative programming Procedural Object oriented Declarative programming Functional Logical Constraint Regular expressions Context-free grammars Attribute grammars Hybrid languages JastAdd Scala... Compiler Construction 2013 F01-37
Applications of compiler construction Compiler Construction 2013 F01-38
Applications of compiler construction Traditional compilers from source to assembly Source-to-source translators, preprocessors Interpreters and virtual machines Integrated programming environments Analysis tools Refactoring tools Domain-specific languages Compiler Construction 2013 F01-38
Related research at LTH Extensible compiler tools (Görel Hedin) Real-time garbage collection (Roger Henriksson) Code optimization for multiprocessors (Jonas Skeppstedt) Natural language processing (Pierre Nugues) Constraint solvers (Krzysztof Kuchcinski) Data-flow languages (Jörn Janneck) Languages for pervasive systems (Boris Magnusson) Languages for physical modeling (Johan Åkesson) Compiler Construction 2013 F01-39
Course goals After this course... You should be able to use regular expressions context-free grammars abstract grammars You should be able describe attribute grammars runtime systems and garbage collection some code optimizations You should be able to build a compiler, where you use a parser generator make semantic analysis do code generation You should be able to program, using static aspect oriented programming the visitor pattern Compiler Construction 2013 F01-40
Readings F1: Introduction Appel, chapter 1-1.2 F2: Regular expressions Appel, chapter 2 Appel, recommended exercises: 2.1 2.8 Try solve the problems in Seminar 1 Compiler Construction 2013 F01-41
Review questions Which are the major compilation phases? What is the difference between the analysis and synthesis phases? Why do we use intermediate code? What is the advantage of separating the front and back ends? What is a lexeme, a token, a parse tree, an abstract syntax tree, intermediate code, assembly code? Compiler Construction 2013 F01-42