CS202 Compiler Construction Christian Skalka www.cs.uvm.edu/~skalka/202 CS 202-1 Introduction 1 Course prerequisites CS101 Must be able to read assembly CS103 Understand tree operations basic grammars CS104 Basic data structures CS243 Helpful - more grammar + FA Solid programming skills a must. CS 202-1 Expectations 2 My specialty: programming language design, type systems Office hours: Tuesday 3:00PM-5:00PM Votey 379 Can schedule extra hours as needed skalka@cs.uvm.edu CS 202-1 Admin 3 1
GTF: Torgeir Lien Lab Hours: Monday 12:15PM-2:15PM Votey 369 Please see Torgeir for programming issues, me for conceptual issues CS 202-1 Admin 4 Weekly assignments Generally include both written problems and programming Assignments out every Tuesday in class, due next Tuesday at midnight Electronic submission via submit program NO LATE ASSIGNMENTS CS 202-1 Admin 5 Programming is central to course Programming is in java Program assignments accumulate End up with working compiler Compiles subset of C Generates SPARC code Significant code already written, you complete templates CS 202-1 Programming 6 2
Discussing assignments is permitted Everyone writes their own code See: http://www.cs.uvm.edu/ugradinfo/academic_honesty.html CS 202-1 Programming 7 13 assignments during course: 75% grad, 85% undergrad will drop lowest assignment Midterm Exam: 5% Final Exam: 10% Project: 10% Undergrads: either final or project Grads: final and project CS 202-1 Grading 8 Lectures will cover: Background theory Compiler implementations Material for assignments Slides on course web site Posted in time to print out for class notes Readings for each lecture Recommended; good book, provides insights CS 202-1 Lectures 9 3
Watch the schedule page on the class web site I will update as the semester goes on Please give me emba username Needed for submit program Please give me your effective email address I will post announcements/comments/hints to mailing list CS 202-1 Updates 10 Course is a lot of work Only about 100-200 working compiler writers in the world at a time A highly specialized craft Why take this class? CS 202-1 motivation 11 To fulfill a requirement To be well-rounded Component techniques are applicable to wide range of problems Programmers work with compilers Get the most out of your tools Understand error messages Optimize low level code CS 202-1 why 12 4
The C compiler (cc) Traditionally, the bin/cc command: cc myprog.c -o myprog executes 5 programs CS 202-1 classic bin/cc 13 cpp C preprocessor Resolves #if, #define, #include ccom C compiler itself Produces assembly source file copt (or c2) C optimizer From assembly to assembly as assembler Converts assembly to machine instrs ld linker Links machine instrs, libs CS 202-1 bin/cc programs 14 ccom consists of multiple pieces: 1. Lexer 2. Parser/type checking 3. Register allocation 4. Code generation CS 202-1 ccom internals 15 5
Classic compiler architecture source file chars lexer words (tokens) parser trees type check trees code gen machine instrs optim machine instrs emitter.o file CS 202-1 Classic compiler architecture 16 Our compiler (lake compiler) will include Lexer Parser + parse tree construction Type checking Intermediate code generation Canonicalization Code generation Data flow analysis Register allocation Optimization Code emit + runtime support CS 202-1 lake compiler 17 Two very useful tools: jlex (equivalent to lex/flex) Construct lexer from regular expressions; only need to write regexps. jcup (equivalent to yacc/bison) Construct parser from grammars; only need to write grammars CS 202-1 tools 18 6
Lexing: first step in compiler Turns stream of characters into stream of tokens Ignores many things: Comments White space CS 202-1 lexing 19 Tokens can have attributes: id or symbol what token is this Value(s) string, float, int Position where in the source code Tokens can be ints (lex) or simple objects (jlex) CS 202-1 tokens 20 Tokens defined by regular expressions Each class of tokens described by a separate regular expression Token is longest string matching any regular expression Also need expressions for portions that will be ignored (e.g. comments) CS 202-1 regular expressions 21 7
5 basic operations for reg exps: 1. primitive (character) 2. alternatives 3. concatenation 4. ε (nothing) 5. repetition CS 202-1 regular expressions 22 common notation character stands for itself (nothing) ε a b alternatives ab concatenation a* zero or more a+ one or more CS 202-1 notation 23 more common notation a? zero or one times [a-z] any character from a to z [^a-c] any character except a to c abc-g literals. any character (except new line) \x x, even if x is special character CS 202-1 notation 24 8
example regular expressions if [0-9]+ [a-za-z_][a-za-z0-9_]* // [^\n]*\n CS 202-1 examples 25 only consider longest match if8 could be lexed as 2 identifiers and a number (i,f,8) if and number (if,8) 2 identifiers (i, f8) 1 identifier (if8) 1 identifier (if8) is the right choice Always the longest possible match Order of regexp defns matters CS 202-1 longest match 26 Programs have 3 kinds of statically detectable errors: Lexical character stream matches no tokens Syntactic token stream cannot be parsed into legal abstract syntax tree (AST) Trees are semantically (e.g. type) inconsistent CS 202-1 errors 27 9
Lexer must account for lexical errors Recognize common errors and report them Missing close quotes most common Final regexp. to catch next few chars Error reporting, error recovery CS 202-1 lexical errors 28 Assignment 1 is very easy Mostly to get used to java Refresh your memory of recursively defined datastructures Download.tar.gz file from web site Includes java code for simple parse tree classes CS 202-1 assignment 1 29 ParseTree top level abstract class Concrete subclasses: UnaryExp BinaryExp ConstExp CallExp CS 202-1 ParseTree classes 30 10
ParseTree defines function argcount: Returns maximum number of arguments to any function call in the input tree You need to implement argcount in each subclass CS 202-1 argcount 31 We also include javadoc.html files provides simplistic documentation for classes We suggest starting with ParseTree.html or AllNames.html Java overview is the beginning of next lecture CS 202-1 javadoc 32 11