CSE 431S Final Review. Washington University Spring 2013

Similar documents
CSE 431S Code Generation. Washington University Spring 2013

02 B The Java Virtual Machine

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

How do LL(1) Parsers Build Syntax Trees?

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Context-free grammars

Wednesday, August 31, Parsers

Compiler Construction: Parsing

CSE 431S Type Checking. Washington University Spring 2013

CSC 4181 Handout : JVM

Over-view. CSc Java programs. Java programs. Logging on, and logging o. Slides by Michael Weeks Copyright Unix basics. javac.

Compiler construction 2009

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

Principles of Programming Languages

Monday, September 13, Parsers

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Compiling Techniques

Under the Hood: The Java Virtual Machine. Lecture 23 CS2110 Fall 2008

COMP3131/9102: Programming Languages and Compilers

Time : 1 Hour Max Marks : 30

JVM. What This Topic is About. Course Overview. Recap: Interpretive Compilers. Abstract Machines. Abstract Machines. Class Files and Class File Format

CS 406/534 Compiler Construction Putting It All Together

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Tutorial 3: Code Generation

LR Parsing Techniques

LR Parsing LALR Parser Generators

Configuration Sets for CSX- Lite. Parser Action Table

SOFTWARE ARCHITECTURE 7. JAVA VIRTUAL MACHINE

UNIT-III BOTTOM-UP PARSING

Compiler Theory. (Semantic Analysis and Run-Time Environments)

Code Generation. Frédéric Haziza Spring Department of Computer Systems Uppsala University

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Lecture 7: Type Systems and Symbol Tables. CS 540 George Mason University

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

CSE 504. Expression evaluation. Expression Evaluation, Runtime Environments. One possible semantics: Problem:

In One Slide. Outline. LR Parsing. Table Construction

Space Exploration EECS /25

LALR Parsing. What Yacc and most compilers employ.

Torben./Egidius Mogensen. Introduction. to Compiler Design. ^ Springer

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

S Y N T A X A N A L Y S I S LR

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Under the Hood: The Java Virtual Machine. Problem: Too Many Platforms! Compiling for Different Platforms. Compiling for Different Platforms

Course Overview. PART I: overview material. PART II: inside a compiler. PART III: conclusion

shift-reduce parsing

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Semantic Processing. Semantic Errors. Semantics - Part 1. Semantics - Part 1


Introduction to Programming Using Java (98-388)

Building Compilers with Phoenix

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

Lecture Bottom-Up Parsing

CSE 401 Midterm Exam Sample Solution 11/4/11

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

Error Detection in LALR Parsers. LALR is More Powerful. { b + c = a; } Eof. Expr Expr + id Expr id we can first match an id:

The Structure of a Syntax-Directed Compiler

Answer: Early binding generally leads to greater efficiency (compilation approach) Late binding general leads to greater flexibility

CSE-304 Compiler Design

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Syntax Errors; Static Semantics

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

JVML Instruction Set. How to get more than 256 local variables! Method Calls. Example. Method Calls

Lexical and Syntax Analysis. Bottom-Up Parsing

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

LR Parsing LALR Parser Generators

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Context-free grammars (CFG s)

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Conflicts in LR Parsing and More LR Parsing Types

Lecture 8: Deterministic Bottom-Up Parsing

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

CSE P 501 Exam Sample Solution 12/1/11

Bottom-Up Parsing. Lecture 11-12

Table-Driven Top-Down Parsers

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Compilers. Type checking. Yannis Smaragdakis, U. Athens (original slides by Sam

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

CSE P 501 Compilers. Static Semantics Hal Perkins Winter /22/ Hal Perkins & UW CSE I-1

Downloaded from Page 1. LR Parsing

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Lecture 7: Deterministic Bottom-Up Parsing

Transcription:

CSE 431S Final Review Washington University Spring 2013

What You Should Know The six stages of a compiler and what each stage does. The input to and output of each compilation stage (especially the back-end). Context-free languages. Definition of a context-free grammar (including the formal definition). Leftmost and rightmost derivations and parse trees. Ambiguity.

What You Should Know Bottom-up (shift-reduce) parsing. LR(0) parser construction. SLR conflict resolution. LR(1) parser construction. Abstract syntax trees. L-value vs R-value Static type checking. Symbol tables.

What You Should Know CUP actions. Jasmin basics. Code generation. Call stack (function activation). Stack-based vs heap-based memory allocation. Parameter passing mechanisms. Register allocation (graph coloring).

Context-Free Languages Recall right-linear grammars: X a Y b Restricted right-hand side Context-free grammars: Allow anything on the right-hand side. A ( A ) x

Context-Free Grammars A grammar is a 4-tuple: : set of terminals V: set of nonterminals S: start nonterminal P: set of productions (rewrite rules) For a grammar to be context-free, all productions must be of the form: A α, where α is any sequence of symbols (terminals and nonterminals)

Ambiguity What about: E E + E a Two syntax trees for the string a + a + a E E E + E E + E E + E a a E + E a a a a

Ambiguity If there are multiple parse trees--or, equivalently, multiple leftmost derivations--for some string then the grammar is ambiguous. Note that it is the grammar that is ambiguous, not the language. There may exist a non-ambiguous grammar for the same language.

Bottom-Up Parsing Instead of starting from a start nonterminal and producing the parse tree, start from the leaves and build tree bottom up Start nonterminal is now the goal nonterminal

Sample Grammar 1. S E $ 2. E E + T 3. T 4. T a 5. ( E )

LR(0) Item The dot represents the current parse state (e.g. what has been seen ) The initial set of rules are the called the kernel The non-kernel items are generated from the closure operation and represents any nonterminals after the dot

LR(0) Parse States I 0 = START S E $

LR(0) Parse States I 0 = START S E $ E E + T E T T a T ( E ) 1 1 9 5 6 The closure operation adds all of the rules for a nonterminal to the immediate right of the dot Close on the operation The number in the square indicates which state to go to on the symbol to the right of the dot Must go to a single state for each symbol (deterministic)

LR(0) Parse States I 0 = START S E $ E E + T E T T a T ( E ) I 1 = GOTO(I 0, E ) S E $ E E + T 1 1 9 5 6 2 3 There must be only one state with a given kernel i.e., no identical states

1. S A C $ 2. A a B C d 3. B Q 4. λ 5. B b B 6. d 7. C c 8. λ 9. Q q Example

LR(0) Parse States I 0 S A C $ 1 I 3 S A C $ I 7 A a B C d 8 A a B C d 5 A B Q 12 I 4 I 8 A C c A a B C d B b B 9 B d I 1 S A C $ C c C 11 2 4 I 5 I 6 A a B C d B b B B d A a B C d 6 9 12 7 I 9 B b B B b B B d I 10 B b B 10 9 11 I 2 S A C $ 3 C c C 4

I 13 A B Q I 14 Q q LR(0) Parse States I 11 B d I 12 A B Q Q q 13 14 Grammar is not LR(0) parsable: shift/reduce conflicts in states 0, 1, and 6

SLR(1) Create the LR(0) states. If there are no conflicts then we are done. For states with conflicts Try to use follow sets to resolve the conflicts. If all conflicts can be resolved using the follow sets then the grammar is SLR(1).

SLR(1) Shift/Reduce conflict Need to make sure that every terminal to the immediate right of a in not in the Follow set of the nonterminal of the reduction rule I 1 S A C $ C c C I 6 A a B C d C c C States 1 and 6: Follow(C) = { d, $ } so c is not an element of Follow(C)

SLR(1) I 0 S A C $ A a B C d A B Q A B b B B d All conflicts can be resolved using the Follow sets, so the grammar is SLR parsable State 0: Follow(A) = { c, $ } so a, b, and d are not elements of Follow(A)

SLR(1) State Table State a b c d q $ A B C Q S 0 S5 S9 R4 S12 R4 S1 S13 Done 1 S4 R8 R8 S2 2 S3 3 R1 4 R7 5 S9 S12 S6 6 S4 R8 R8 S7 7 S8 8 R2 9 S9 S11 S10 10 R5 11 R6 12 S14 S13 13 R3 14 R9

Sample Parse Stack Remaining Input - 0 S5 a b b d d c $ - 0 a 5 S9 b b d d c $ - 0 a 5 b 9 S9 b d d c $ - 0 a 5 b 9 b 9 S11 d d c $ - 0 a 5 b 9 b 9 d 11 R6 d c $ - 0 a 5 b 9 b 9 S10 B d c $ - 0 a 5 b 9 b 9 B 10 R5 d c $

Sample Parse Stack Remaining Input - 0 a 5 b 9 S10 B d c $ - 0 a 5 b 9 B 10 S9 d c $ - 0 a 5 R5 B d c $ - 0 a 5 B 6 S6 d c $ - 0 a 5 B 6 R8 C d c $ - 0 a 5 B 6 C 7 S7 d c $ - 0 a 5 B 6 C 7 d 8 S8 c $

Sample Parse Stack Remaining Input - 0 R2 A c $ - 0 A 1 S1 c $ - 0 A 1 c 4 S4 $ - 0 A 1 R7 C $ - 0 A 1 C 2 S2 $ - 0 A 1 C 2 $ 3 S3-0 R1 Done S

Syntax Trees Concrete Actual parse tree Abstract Eliminates unnecessary nodes Structures the tree appropriately for evaluation Serves as basis for code generation

Concrete vs. Abstract

Construction Java code added to productions Most common action is to build a new tree node and assign to RESULT, which attaches it to the left-hand nonterminal Values for the nonterminals on the right-hand side are usually child tree nodes Stmt ::= id:id assign E:e {: RESULT = new AssignmentNode(id, e); :} if lparen E:pr rparent Stmt:s fi {: RESULT = new IfNode(pr, s); :} if lparen E:pr rparent Stmt:s1 else Stmt:s2 fi {: RESULT = new IfNode(pr, s1, s2); :} ;

Construction Stmt ::= begin Stmts:block end {: RESULT = block; :} ; Stmts ::= Stmts:block semi Stmt:stmt {: block.add(stmt); RESULT = block; :} Stmt:s {: RESULT = new BlockNode(s); :} ;

Construction Alternate construction of BlockNode Stmt ::= begin Stmts:list end {: RESULT = new BlockNode(list); :} ; Stmts ::= Stmts:list semi Stmt:stmt {: list.add(stmt); RESULT = list; :} Stmt:s {: RESULT = new ArrayList(); RESULT.add(s); :} ;

Left and Right Values x = y x is the L-value Refers to the location of x, not its value y is the R-value Refers to the value of y, not its location

Example Note that there is an error in this figure. The deref in the tree for example b should not be there.

Type Checking When are types checked? Statically at compile time Compiler does type checking during compilation Ideally eliminate runtime checks Dynamically at runtime Compiler generates code to do type checking at runtime JavaScript vs. Java Java still does a large amount of runtime type checking We ll focus on static typing for basic types

Expression Types For every operator we need to know allowed types of operands resulting type implicit coercion changes the representation, not the data short to long implicit conversion may change the data int to float explicit cast may lose information float to int, int to short

What are the types? =? x int +? y int 3.14 float

Determining Types make sure type is allowed (int + float) assign the resultant type to the operator (float) generate any necessary coercion(s) or conversion(s) most hardware has (int + int) and (float + float) but not (int + float)

Adding Coercion =? x int + float int 2 float float 3.14 float y int

Explicit Casting = int x int float 2 int int + float int 2 float float 3.14 float y int

Symbol Table Proc Dcls Body Synthesize symbol info Proc Inherit symbol info Dcls int I; float j; Body i=3; j = i * 3.14;

Symbol Table Persists the synthesized information as a side effect of the translation Maps a name and environment to information Environment is the scope Scope is static Basic actions Establish a mapping Retrieve a mapping

public class Car { int id; int color; int GetType() { String id; } public class Wheel { Object id; int GetType() { float id; } } } Name Scope Info id Car int color Car int id Car:GetType String id Car:Wheel Object id Car:Wheel:GetType float

Scopes Scopes are static Scopes are nested LIFO (last in, first out) Car scope GetType scope Wheel scope GetType scope

Possible Implementations Option 1: Keep all information available at all times Option 2: Use LIFO and process a scope at a time Name Scope Info id Car int color Car int id Car:GetType String id Car:Wheel Object id Car:Wheel:GetType float

LIFO Scopes Symbol table will be a stack of maps of name to information One map per scope (environment) Four basic operations Enter Scope Leave Scope Add Symbol Lookup Symbol

Implementation Scopes are LIFO so using a stack makes sense For each scope, use a map since we lookup names to retrieve info about them Typically use a hash map

Hello World :: Source public class HelloWorld { public static void main(string[] args) { System.out.println("Hello World!"); } }

Hello World :: Jasmin.class public HelloWorld.super java/lang/object ; ; standard initializer (calls java.lang.object's initializer) ;.method public <init>()v aload_0 invokenonvirtual java/lang/object/<init>()v return.end method ; ; main() - prints out Hello World ;.method public static main([ljava/lang/string;)v.limit stack 2 ; up to two items can be pushed ; push System.out onto the stack getstatic java/lang/system/out Ljava/io/PrintStream ; ; push a string onto the stack ldc "Hello World! ; call the PrintStream.println() method. invokevirtual java/io/printstream/println(ljava/lang/string;)v ; done return.end method

Source to AST Source if (i > 431) { a = b + c; } AST IF_STATEMENT GREATER_THAN VAR_USE IDENTIFIER (i) (SymbolInfo: INT, lv = 0) INTEGER_LITERAL (431) BLOCK EXPRESSION_STATEMENT ASSIGN IDENTIFIER (a) (SymbolInfo: INT, lv = 1) ADDITION VAR_USE IDENTIFIER (b) (SymbolInfo: INT, lv = 2) VAR_USE IDENTIFIER (c) (SymbolInfo: INT, lv = 3)

AST to Code AST IF_STATEMENT GREATER_THAN VAR_USE IDENTIFIER (i) (SymbolInfo: INT, lv = 0) INTEGER_LITERAL (431) BLOCK EXPRESSION_STATEMENT ASSIGN IDENTIFIER (a) (SymbolInfo: INT, lv = 1) ADDITION VAR_USE IDENTIFIER (b) (SymbolInfo: INT, lv = 2) VAR_USE IDENTIFIER (c) (SymbolInfo: INT, lv = 3) Code iload 0 ldc 431 if_icmpgt label3 iconst_0 goto label4 label3: iconst_1 label4: ifeq label1 iload 2 iload 3 iadd istore 1 goto label2 label1: label2:

Break It Down IF_STATEMENT node Create two labels (will be needed later) Visit first child Code for boolean test expression should be generated Code for the boolean expression should leave 0 (for false) or 1 (for true) on top of stack Output code that compares top of stack to 0 and jump to label for else block (to be output later) if 0 Visit second child Code for then block should be generated Output code that jumps over else block and output label to start else block Visit third child (if it exists) Code for else block should be generated Output label at end of else block

IF_STATEMENT private void visitifstatementnode(astnode node) throws Exception { String elselabel = generatelabel(); String endlabel = generatelabel(); } node.getchild(0).accept(this); // visit first child stream.println(" ifeq " + elselabel); node.getchild(1).accept(this); // visit second child stream.println(" goto " + endlabel); stream.println(elselabel + ":"); ASTNode elseblock = node.getchild(2); if (elseblock!= null) { elseblock.accept(this); // visit third child } stream.println(endlabel + ":");

Run-time System The run-time system consists of everything needed at run-time to support the execution of a process. This includes memory management, call-stack management, system call API, etc.

Function Calls Invoke f during runtime What happens? 1. Parameters are transmitted 2. Local storage is allocated 3. Local storage is initialized 4. Body of f executes 5. Return values prepared 6. Free storage 7. Return context to call

Function Calls Each invocation of f is a new activation What is the lifetime of f?

Lifetime a b a b overlapping a b b disjoint a

Activation Use a stack to represent activations No activation specific info survives death No activation specific info required for birth Each activation pushes a new activation record onto the run-time stack What will we record in it?

Activation Record Return address Storage information Local storage Parameters Access to non-locals

Parameter Passing Call by value Argument is R-value Value of arguments are copied into the function swap(x, y) won t change the value of x or y Call by reference Argument is L-value Variable in function points to the same location as the argument swap(x, y) would change the value of x and y Most modern languages use call-by-value semantics

Parameter Passing Java uses call-by-value semantics It is sometimes said that Java uses call-by-value for primitives and call-by-reference for object types, but that is not quite true. Java is call-by-value for everything, except that it does not copy objects but rather copies references to the objects. That is, the caller and callee both have references to the same object.

Parameter Passing Does not work in Java Primitive parameters are copied void swap(int x, int y) { int t = x; x = y; y = t; }

Parameter Passing Still does not work in Java References to objects are copied void swap(integer x, Integer y) { Integer t = x; x = y; y = t; }

Parameter Passing Cannot swap the objects, but could change the internal state of the objects void swap(modinteger x, ModInteger y) { int t = x.getvalue(); x.setvalue(y.getvalue()); y.setvalue(t); }

Register Allocation Most architectures have only a handful of registers to use for calculations Values need to be copied from memory into registers when needed, and then copied back to memory when a register is needed for something else For performance, we want to minimize the number of copies to/from memory

Register Allocation Can build an interference graph to determine what variables are live at the same time First, determine the live ranges of variables based on their "use" and "def" A def is an assignment to a variable (L-value) A use is the use of the value of a variable (R-value)

Live Ranges x y z x = y = z = = x Variables with ranges that overlap are live at the same time and therefore must use different registers to avoid extra copying in and out of memory = z = y

Interference Graph Each variable is a vertex in the graph An edge in the graph indicates that those two variables are live at the same time So the edges indicate which variables cannot share a register x y z

Graph Coloring The problem of allocating registers now becomes one of coloring the interference graph We want to color the vertices of the graph so that no two adjacent vertices have the same color The maximum number of colors we can use is the equal to the number of available registers A coloring with a maximum number of colors k is called a k- coloring But k-coloring a graph is NP-complete and we need it to be fast Use a heuristic algorithm

Graph Coloring Find a vertex whose edge count is < k Push the vertex on a stack and remove from the graph Repeat until there are no vertices left in the graph or there are no vertices with an edge count < k in the graph If all vertices have been removed from the graph then the graph can be k-colored Pop a vertex from the stack and add back to the graph Color the vertex a different color from any of its neighbors currently in the graph How can we know that there is an available color? Repeat until stack is empty

Graph Coloring Try k = 3 A B C G D E F

Graph Coloring A B C G D E F

Graph Coloring Note that if we get to a point when removing vertices from the graph where all of the remaining vertices have an edge count >= k then it does not necessarily mean the graph cannot be k-colored It just means the heuristic algorithm failed Could try a different algorithm But it could be that the graph is not k-colorable Will need to spill the registers At some point, copy the registers out to memory so we can use them to hold other variables

Parsers LR(0) 0 symbols of look ahead when creating the parse table SLR Simple LR resolves conflicts using global grammar follow sets LALR Look Ahead LR combines some states based on follow set information LR(k) Most powerful of those where parse states are created ahead of time

1. P S $ 2. S A B A C 3. a a c 4. A a a 5. B b 6. λ 7. C c 8. λ Yet Another Example

Kernel Rules I 0 P S $, {} Grammar Parse States I 2 S A B A C, {$} S a a c, {$} A a a, {b,a} 12 4 1 1 S a a c, {$} A a a, {b, a} I 3 S a a c, {$} 3 I 1 S a a c, {$} A a a, {b, a} 2 2 I 4 S A B A C, {$} B b, {a} B, {a} I 5 B b, {a} 6 5 I 6 S A B A C, {$} A a a, {c,$} 7 9

Kernel Rules Grammar Parse States (cont.) I 7 A a a, {c,$} 8 I 12 P S $, {} 13 I 8 A a a, {c,$} I 13 P S $, {} I 9 S A B A C, {$} C c, {$} C, {$} I 10 C c, {$} 11 10 I 11 S A B A C, {$}