Automatic generation of LL(1) parsers

Similar documents
LL(1) predictive parsing

Language Processing note 12 CS

LL(1) predictive parsing

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Lecture 8: Deterministic Bottom-Up Parsing

CS502: Compilers & Programming Systems

Lecture 7: Deterministic Bottom-Up Parsing

Lecture Notes on Top-Down Predictive LL Parsing

CS 321 Programming Languages and Compilers. VI. Parsing

Table-Driven Parsing

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

CS 4120 Introduction to Compilers

CA Compiler Construction

Semantics of programming languages

CS 536 Midterm Exam Spring 2013

Lecture Bottom-Up Parsing

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Abstract Syntax Trees & Top-Down Parsing

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Top down vs. bottom up parsing

Types of parsing. CMSC 430 Lecture 4, Page 1

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Lecture Notes on Bottom-Up LR Parsing

Parsing. Lecture 11: Parsing. Recursive Descent Parser. Arithmetic grammar. - drops irrelevant details from parse tree

Lecture Notes on Shift-Reduce Parsing

Example CFG. Lectures 16 & 17 Bottom-Up Parsing. LL(1) Predictor Table Review. Stacks in LR Parsing 1. Sʹ " S. 2. S " AyB. 3. A " ab. 4.

LR Parsing LALR Parser Generators

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

Parser. Larissa von Witte. 11. Januar Institut für Softwaretechnik und Programmiersprachen. L. v. Witte 11. Januar /23

Monday, September 13, Parsers

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Lecture Notes on Bottom-Up LR Parsing

LR Parsing LALR Parser Generators

More Bottom-Up Parsing

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Bottom-Up Parsing. Lecture 11-12

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

S Y N T A X A N A L Y S I S LR

Types and Static Type Checking (Introducing Micro-Haskell)

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Compilers. Predictive Parsing. Alex Aiken

Lecture 6: Top-Down Parsing. Beating Grammars into Programs. Expression Recognizer with Actions. Example: Lisp Expression Recognizer

Compiler Design 1. Top-Down Parsing. Goutam Biswas. Lect 5

Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

3. Parsing. Oscar Nierstrasz

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Bottom-Up Parsing. Lecture 11-12

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1. Top-Down Parsing. Lect 5. Goutam Biswas

Lexical and Syntax Analysis. Top-Down Parsing

Grammars and Parsing for SSC1

LL parsing Nullable, FIRST, and FOLLOW

Chapter 3 Parsing. time, arrow, banana, flies, like, a, an, the, fruit

Semantics of programming languages

Conflicts in LR Parsing and More LR Parsing Types

Concepts of Programming Languages Recitation 1: Predictive Parsing

Semantics of programming languages

Error Recovery during Top-Down Parsing: Acceptable-sets derived from continuation

Extra Credit Question

Wednesday, August 31, Parsers

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

LR Parsing. The first L means the input string is processed from left to right.

Outline. 1 Introduction. 2 Context-free Grammars and Languages. 3 Top-down Deterministic Parsing. 4 Bottom-up Deterministic Parsing

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES

In One Slide. Outline. LR Parsing. Table Construction

LL and LR parsing with abstract machines, or Syntax for Semanticists Midlands Graduate School 2016

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

November Grammars and Parsing for SSC1. Hayo Thielecke. Introduction. Grammars. From grammars to code. Parser generators.

AN ANALYSIS OF THE CONCURRENT CALCULATION OF THE FIRST SETS

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Bottom-Up Parsing II. Lecture 8

COMP3131/9102: Programming Languages and Compilers

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

Syntactic Analysis. Top-Down Parsing

CSE 582 Autumn 2002 Exam 11/26/02

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino


Context-Free Grammars

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Compilers and computer architecture From strings to ASTs (2): context free grammars

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Compilers - Chapter 2: An introduction to syntax analysis (and a complete toy compiler)

HW 3: Bottom-Up Parsing Techniques

CS 164 Programming Languages and Compilers Handout 9. Midterm I Solution

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

Transcription:

Automatic generation of LL(1) parsers Informatics 2A: Lecture 12 John Longley School of Informatics University of Edinburgh jrl@staffmail.ed.ac.uk 14 October, 2011 1 / 12

Generating parse tables We ve seen that if a grammar G happens to be LL(1) i.e. if it admits a parse table then efficient, deterministic, predictive parsing is possible with the help of a stack. What s more, if G is LL(1), G is automatically unambiguous. But how do we tell whether a grammar is LL(1)? And if it is, how can we construct a parse table for it? For very small grammars, might be able to answer these questions by eye inspection. But for realistic grammars, a systematic method is needed. In this lecture, we give an algorithmic procedure for answering both questions. 2 / 12

The overall picture once per language once per document "Natural" grammar Hand tweaking (e.g. disambiguation) LL(1) grammar Table generation (mechanical) LL(1) parse table Document (input string) LL(1) parsing algorithm Syntax tree Previous lecture: the LL(1) parsing algorithm, which works on a parse table and a particular input string. This lecture: algorithm for getting from a grammar G to a parse table. The algorithm will succeed if G is LL(1), or fail if it isn t. As in previous lecture, assume G has no useless nonterminals. Next lecture: ways of getting from a grammar to an equivalent LL(1) grammar. (Not always possible, but work quite often.) 3 / 12

First and Follow sets The construction of a parse table for a given grammar falls into two parts. 1 For each nonterminal X, compute two sets called First(X ) and Follow(X ), defined as follows: First(X ) is the set of all terminals that can appear at the start of a phrase derived from X. [Convention: if ɛ can be derived from X, also include the special symbol ɛ in First(X ).] Follow(X ) is the set of all terminals that can appear immediately after X in some sentential form derived from the start symbol S. [Convention: if X can appear at the end of some such sentential form, also include the special symbol $ in Follow(X ).] 2 Use these First and Follow sets to fill out the parse table. We ll do the latter (easier) stage first. 4 / 12

First and Follow sets: an example Look again at our grammar for well-matched bracket sequences: S ɛ TS T (S) By inspection, we can see that First(S) = { (, ɛ } because an S can begin with ( or be empty First(T ) = { ( } because a T must begin with ( Follow(S) = { ), $ } because within a complete phrase, an S can be followed by ) or appear at the end Follow(T ) = { (, ), $ } because a T can be followed by ( or ) or appear at the end Later we ll give a systematic method for computing these sets. Further convention: take First(a) = {a} for each terminal a. 5 / 12

Filling out the parse table Once we ve got these First and Follow sets, we can fill out the parse table as follows. For each production X α of G in turn: For each terminal a, if α can begin with a, insert X α in row X, column a. If α can be empty, then for each b Follow(X ) (where b may be $), insert X α in row X, column b. If doing this leads to clashes (i.e. two productions fighting for the same table entry), conclude that the grammar isn t LL(1). To explain the phrases in blue, suppose α = x 1... x n, where the x i may be terminals or nonterminals. α can be empty means ɛ First(x i ) for each i. α can begin with a means that for some i, ɛ First(x 1 ),..., First(x i 1 ), and a First(x i ). 6 / 12

Filling out the parse table: example S ɛ TS T (S) First(S) = {(, ɛ} Follow(S) = {), $} First(T ) = {(} Follow(T ) = {(, ), $} Use this information to fill out the parse table: (S) can begin with (, so insert T (S) in entry for (, T. TS can begin with (, so insert S TS in entry for (, S. ɛ can be empty, and Follow(S) = {), $}, so insert S ɛ in entries for ), S and $, S. This gives the parse table we had in the previous lecture: ( ) $ S S TS S ɛ S ɛ T T (S) 7 / 12

First and Follow sets: preliminary stage To complete the story, we d like an algorithm for calculating First and Follow sets. Easy first step: compute the set E of nonterminals that can be ɛ : 1 Start by adding X to E whenever X ɛ is a production of G. 2 If X Y 1... Y m is a production and Y 1,..., Y m are already in E, add X to E. 3 Repeat step 2 until E stabilizes. Example: for our grammar of well-matched bracket sequences, we have E = {S}. 8 / 12

Calculating First sets: the details 1 Set First(a) = {a} for each a Σ. For each nonterminal X, initially set First(X ) to {ɛ} if X E, or otherwise. 2 For each production X x 1... x n and each i n, if x 1,..., x i 1 E and a First(x i ), add a to First(X ). 3 Repeat step 2 until all First sets stabilize. Example: Start with First(S) = {ɛ}, First(T ) =, etc. Consider T (S) with i = 1: add ( to First(T ). Now consider S TS with i = 1: add ( to First(S). That s all. 9 / 12

Calculating Follow sets: the details 1 Initially set Follow(S) = {$} for the start symbol S, and Follow(X ) = for all other nonterminals X. 2 For each production X α, each splitting of α as βyx 1... x n where n 1, and each i with x 1,..., x i 1 E, add all of First(x i ) (excluding ɛ) to Follow(Y ). 3 For each production X α and each splitting of α as βy or βyx 1... x n with x 1,..., x n E, add all of Follow(X ) to Follow(Y ). 4 Repeat step 3 until all Follow sets stabilize. Example: Start with Follow(S) = {$}, Follow(T ) =. Apply step 2 to T (S) with i = 1: add ) to Follow(S). Apply step 2 to S TS with i = 1: add ( to Follow(T ). Apply step 3 to S TS: add ) and $ to Follow(T ). That s all. 10 / 12

Parser generators LL(1) is representative of a bunch of classes of CFGs that are efficiently parseable. E.g. LL1 LALR LR(1). These involve various tradeoffs of expressive power vs. efficiency/simplicity. For such languages, a parser can be generated automatically from a suitable grammar. (E.g. for LL(1), just need parse table plus fixed driver for the parsing algorithm.) So we don t need to write parsers ourselves just the grammar! (E.g. one can basically define the syntax of Java in about 7 pages of context-free rules.) This is the principle behind parser generators like yacc ( yet another compiler compiler ) and java-cup. 11 / 12

Reading Generating parse tables Dragon book: Aho, Sethi and Ullman, Compilers: Principles, Techniques and Tools, Section 4.4. Tiger book: Andrew Appel, Modern Compiler Implementation in (C Java ML). Turtle book: Aho and Ullman, Foundations of Computer Science. Some relevant lecture notes and a tutorial sheet from previous years are available via the Course Schedule webpage. 12 / 12