CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Similar documents
CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

CSCI312 Principles of Programming Languages

A programming language requires two major definitions A simple one pass compiler

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Top down vs. bottom up parsing

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

CSE302: Compiler Design

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CS 314 Principles of Programming Languages

Parsing Part II (Top-down parsing, left-recursion removal)

Types of parsing. CMSC 430 Lecture 4, Page 1

Compilers Course Lecture 4: Context Free Grammars

CS 406/534 Compiler Construction Parsing Part I

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Building Compilers with Phoenix

Syntactic Analysis. Top-Down Parsing

Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Compilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky

3. Parsing. Oscar Nierstrasz

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

Syntax Analysis, III Comp 412

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Introduction to Parsing. Comp 412

Monday, September 13, Parsers

CS 321 Programming Languages and Compilers. VI. Parsing

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Syntax Analysis, III Comp 412

Chapter 4: Syntax Analyzer

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Compiler construction in4303 lecture 3

CS502: Compilers & Programming Systems

Lexical and Syntax Analysis. Top-Down Parsing

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

Lexical and Syntax Analysis

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Conflicts in LR Parsing and More LR Parsing Types

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Compilers. Predictive Parsing. Alex Aiken

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

Error Handling Syntax-Directed Translation Recursive Descent Parsing

CS 403: Scanning and Parsing

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

1. Consider the following program in a PCAT-like language.

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

CMSC 330: Organization of Programming Languages

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Semantic Analysis. Lecture 9. February 7, 2018

Concepts Introduced in Chapter 4

CA Compiler Construction

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages


Parsing II Top-down parsing. Comp 412

COMP 181. Prelude. Next step. Parsing. Study of parsing. Specifying syntax with a grammar

CSCI312 Principles of Programming Languages!

Computer Science 160 Translation of Programming Languages

Syntax Analysis Check syntax and construct abstract syntax tree

Syntax-Directed Translation. Lecture 14

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

CMPT 755 Compilers. Anoop Sarkar.

Menu. Problem 5: PRIMES. Problem 5: PRIMES. Please read the PS2 Comments thoroughly! cs3102: Theory of Computation

CMSC 330: Organization of Programming Languages

More Bottom-Up Parsing

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Compila(on (Semester A, 2013/14)

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Topdown parsing with backtracking

Parsing Expression Grammars and Packrat Parsing. Aaron Moss

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

Transcription:

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall

Phases of a compiler Syntactic structure Figure 1.6, page 5 of text

Recap Lexical analysis: LEX/FLEX (regex -> lexer) Syntactic analysis: YACC/BISON (grammar -> parser)

Continuing from Friday With precedence rule forcing an expression like 2+3*4 to be interpreted as 2+(3*4), how can be modify grammar to allow (2+3)*4 as a valid expression? <expr> -> <expr> + <term> <term> <term> -> <term> * <factor> <factor> <factor> -> <variable> <constant> '(' <expr> ')'

Lecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to read a syntax description to be able to write well-formed programs in the language. Understanding at least a little of what a compiler does in translating a program from high-level to low-level forms deepens your understanding of why programming languages are designed the way they are, and equips you to better diagnose subtle bugs in programs. The next slide shows the evaluation order remark in the C++ language reference, which alludes to the order being left unspecified to allow a compiler to optimize the code during translation. 32

Shown on Visualizer C++ Programming Language, 3rd edition. Bjarne Stroustrup. (c) 1997. Page 122. 33

A compiler translates high level language statements into a much larger number of low-level statements, and then applies optimizations. The entire translation process, including optimizations, must preserve the semantics of the original high-level program. The next slides shows that different phases of compilation can apply different types of optimizations (some target-independent, some target-dependent). By not specifying the order in which subexpressions are evaluated (left-to-right or right-to-left) a C++ compiler can potentially reorder the resulting low-level instructions to give a better result. 34

Returning to an earlier question A few lectures back the question was asked whether there are context free languages which are not regular.

Syntactic structure Lexical structure SOURCE: https://openi.nlm.nih.gov/detailedresult.php?img=pmc3367694_rstb20120103-g2&req=4 AUTHORS: Fitch WT, Friederici AD - Philos. Trans. R. Soc. Lond., B, Biol. Sci. (2012) LICENSE: http://creativecommons.org/licenses/by/3.0/

RL CFL proof sketch Given a regular language L we can always construct a context free grammar G such that L = L(G). For every regular language L there is an NFA M = (S,,δ,F,s 0 ) such that L = L(M). Build G = (N,T,P,S 0 ) as follows: N = { N s s S } T = { t t } If δ(i,a)=j, then add N i a N j to P If i F, then add N i ε to P S 0 = N so

(a b) * abb a a b b 0 1 2 3 b G = ( {A 0, A 1, A 2, A 3 }, {a, b}, {A 0 a A 0, A 0 b A 0, A 0 a A 1, A 1 b A 2, A 2 b A 3, A 3 ε}, A 0 }

RL CFL proof sketch Show that not all CF languages are regular. To do this we only need to demonstrate that there exists a CFL that is not regular. Consider L = { a n b n n 1 } Claim: L CFL, L RL

RL CFL proof sketch L CFL: S asb ab L RL (by contradiction): Assume L is regular. In this case there exists a DFA D=(S,,δ,F,s 0 ) such that L(D) = L. Let k = S. Consider a i b i, where i>k. Suppose δ(s 0, a i ) = s r. Since i>k, not all of the states between s 0 and s r are distinct. Hence, there are v and w, 0 v < w k such that s v = s w. In other words, there is a loop. This DFA can certainly recognize a i b i but it can also recognize a j b i, where i j, by following the loop. "REGULAR GRAMMARS CANNOT COUNT"

Relevance? Nested '{' and '}' public class Foo { public static void main(string[] args) { for (int i=0; i<args.length; i++) { if (args[i].length() < 3) { } else { } } } }

Context Free Grammars and parsing O(n 3 ) algorithms to parse any CFG exist Programming language constructs can generally be parsed in O(n)

Top-down & bottom-up A top-down parser builds a parse tree from root to the leaves easier to construct by hand A bottom-up parser builds a parse tree from leaves to root Handles a larger class of grammars tools (yacc/bison) build bottom-up parsers

Our presentation First top-down, then bottom-up Present top-down parsing first. Introduce necessary vocabulary and data structures. Move on to bottom-up parsing second.

vocab: look-ahead The current symbol being scanned in the input is called the lookahead symbol. PARSER token token token token token token

Top-down parsing

Top-down parsing Start from grammar's start symbol Build parse tree so its yield matches input predictive parsing: a simple form of recursive descent parsing

FIRST(α) If α (NUT)* then FIRST(α) is "the set of terminals that appear as the first symbols of one or more strings of terminals generated from α." [p. 64] Ex: If A -> a β then FIRST(A) = {a} Ex. If A -> a β B then FIRST(A) = {a} FIRST(B)

FIRST(α) First sets are considered when there are two (or more) productions to expand A N: A -> α β Predictive parsing requires that FIRST(α) FIRST(β) =

ε productions If lookahead symbol does not match first set, use ε production not to advance lookahead symbol but instead "discard" non-terminal: optexpt -> expr ε "While parsing optexpr, if the lookahead symbol is not in FIRST(expr), then the ε production is used" [p. 66]

Left recursion Grammars with left recursion are problematic for top-down parsers, as they lead to infinite regress.

Left recursion example expr Grammar: expr -> expr + term term expr + term term -> id expr + term FIRST sets for rule alternatives are not disjoint: expr + term FIRST(expr) = id FIRST(term) = id term

Left recursion example expr Grammar: α β expr + term expr -> expr + term term term -> id expr + term FIRST sets for rule alternatives are not disjoint: expr + term FIRST(expr) = id FIRST(term) = id term β α α α

Rewriting grammar to remove left recursion expr rule is of form A -> A α β Rewrite as two rules A -> β R R -> α R ε

Back to example expr term R Grammar is rewritten as + term R expr -> term R + term R R -> + term R ε + term R β α α α ε