CS453 Compiler Construction Original Design: Michelle Strout Instructor: Wim Bohm wim.bohm@gmail.com, bohm@cs.colostate.edu Computer Science Building 344 Office hour: Monday 1-2pm TA: Andy Stone aistone@gmail.com, stonea@cs.colostate.edu URL: http://www.cs.colostate.edu/~cs453 Send around sheet to collect email addresses and CS linux account names. CS453 Lecture Introduction 1
Course Logistics (Highlights, see web page for more detail) Progress Page and Home/News Read both of these daily. Syllabus and Grading Professional Conduct Do your own work. Act like a professional in the lab. Participate Come to class and recitation. Come to lab and office hours. Provide anonymous feedback online. CS453 Lecture Introduction 2
Plan for Today Course Logistics A scanner/parser/interpreter for a simple expression language What is the difference between a compiler and an interpreter? Compilers class and reality Why study compilers? Interpreter and Compiler Structure, or Software Architecture Overview of Programming Assignments The MeggyJava compiler we will be building. CS453 Lecture Introduction 3
Structure of a Typical Compiler Analysis character stream lexical analysis Synthesis IR code generation tokens words IR syntactic analysis optimization AST sentences IR annotated AST semantic analysis code generation target language interpreter CS453 Lecture Introduction 4
Simple example: Expression Interpreter character stream (print 2+3*4; ) lexical analysis scanner tokens syntactic analysis plus calls to evaluate and print parser plus interpreter text (14,) CS453 Lecture Introduction 5
Expression Language: tokens Tokens: keyword(s): print operators/delimiters: +, -, *, ; integer literals: 0, 1, 2,, 10, 11,, 100, Symbols (Tokens+optional value) are formed by a scanner performing lexical analysis, while reading from a character stream eg: PRINT token+null, SEMI token+null, NUMBER token + Integer-value Valid tokens are defined by regular expressions, e.g.: NUMBER: [0-9]+ CS453 Lecture Introduction 6
Expression Language: sentences Sentences: Program sentences (statements here) are recognized by a parser. Valid programs are defined by a Grammar: Program::= stmts stmts::= stmts stmt <empty> stmt::= PRINT exp SEMI exp::= exp + exp exp exp exp * exp NUMBER In our case, the parser evaluates the expressions and prints their values, i.e. the parser interprets the language In this week s recitation you will be exercising with this language, and use tools to generate a scanner and a parser / interpreter CS453 Lecture Introduction 7
A LOT OF CONCEPTS, TOOLS, and CODE Compilers are large and complex software structures In this course you will learn a lot of concepts Regular and Context Free grammars, visitor pattern, architecture In this course you will use A LOT of tools Scanner generators and Parser Generators (recitation this week) Eclipse + version control Makefiles jar files assemblers (Meggy) hardware In this course you will write a lot of code 1000s of lines Don t get behind! It will be hard, if not impossible, to catch up. CS453 Lecture Introduction 8
Example MeggyJava program MeggyJava: a Java subset for the Meggy toy we are playing with in this course. Example code: import meggy.meggy; class PA3Flower { public static void main(string[] whatever){ } { } // Upper left petal, clockwise Meggy.setPixel( (byte)1, (byte)1, Meggy.Color.WHITE ); Meggy.setPixel( (byte)2, (byte)1, Meggy.Color.WHITE ); CS453 Lecture Introduction 9
Structure of the MeggyJava Compiler Analysis character stream lexical analysis tokens words syntactic analysis AST sentences semantic analysis AST and symbol table Synthesis code gen Atmel assembly code PA1: Write test cases in C++ and MeggyJava, and Atmel warmup PA2: MeggyJava scanner PA3: setpixel compiler (no AST/ symtab) PA4: add control flow (AST/symtab) PA5: add methods (calls) PA6: add variables and objects PA7: add arrays CS453 Lecture Introduction 10