LL(1) predictive parsing

Similar documents
LL(1) predictive parsing

Automatic generation of LL(1) parsers

Syntax Analysis Part I

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

CS502: Compilers & Programming Systems

Semantics of programming languages

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Semantics of programming languages

Bottom-Up Parsing II. Lecture 8

Lecture 8: Deterministic Bottom-Up Parsing

Parsing II Top-down parsing. Comp 412

CS 314 Principles of Programming Languages

3. Parsing. Oscar Nierstrasz

LR Parsing. The first L means the input string is processed from left to right.

Lecture 7: Deterministic Bottom-Up Parsing

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Syntax Analysis Check syntax and construct abstract syntax tree

Chapter 4. Lexical and Syntax Analysis

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

4. Lexical and Syntax Analysis

COP 3402 Systems Software Syntax Analysis (Parser)

Types and Static Type Checking (Introducing Micro-Haskell)

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Semantics of programming languages

4. Lexical and Syntax Analysis

Types of parsing. CMSC 430 Lecture 4, Page 1

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CSE 582 Autumn 2002 Exam 11/26/02

CS 4120 Introduction to Compilers

CSCI312 Principles of Programming Languages

Optimizing Finite Automata

2.2 Syntax Definition

Types and Static Type Checking (Introducing Micro-Haskell)

Context-free grammars

1 Parsing (25 pts, 5 each)

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

CS 536 Midterm Exam Spring 2013

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

UNIT III & IV. Bottom up parsing

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

CA Compiler Construction

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Defining syntax using CFGs

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Lexical and Syntax Analysis. Top-Down Parsing

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

LR Parsing - The Items

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Introduction to Lexing and Parsing

CSCI312 Principles of Programming Languages!

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

Top down vs. bottom up parsing

Properties of Regular Expressions and Finite Automata

PS3 - Comments. Describe precisely the language accepted by this nondeterministic PDA.

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Chapter 14: Pushdown Automata

Compilers and computer architecture From strings to ASTs (2): context free grammars

Compiler Construction: Parsing


LECTURE 7. Lex and Intro to Parsing

Decision Properties for Context-free Languages

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Lexical and Syntax Analysis. Bottom-Up Parsing

CSE302: Compiler Design

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Part 5 Program Analysis Principles and Techniques

Chapter 3. Describing Syntax and Semantics ISBN

Compilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Introduction to Parsing. Lecture 8

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

CS 406/534 Compiler Construction Parsing Part I

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Introduction to Parsing. Comp 412

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Compiler Design Concepts. Syntax Analysis

It parses an input string of tokens by tracing out the steps in a leftmost derivation.

CS 321 Programming Languages and Compilers. VI. Parsing

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CT32 COMPUTER NETWORKS DEC 2015

Transcription:

LL(1) predictive parsing Informatics 2A: Lecture 11 Mary Cryan School of Informatics University of Edinburgh mcryan@staffmail.ed.ac.uk 10 October 2018 1 / 15

Recap of Lecture 10 A pushdown automaton (PDA) uses control states and a stack to recognise input strings. We considered PDAs whose goal is to empty the stack after having read the input string. The languages recognised by nondeterministic pushdown automata (NPDAs) are exactly the context-free languages. In contrast to the case of finite automata, deterministic pushdown automata (DPDAs) are less powerful than NPDAs. (Precise definition of DPDAs is a bit subtle not covered in Inf2A, though related material will be covered in Tutorial 3.) 2 / 15

Predictive parsing: first steps Consider how we d like to parse a program in the little programming language from Lecture 9. stmt if-stmt while-stmt begin-stmt assg-stmt if-stmt if bool-expr then stmt else stmt while-stmt while bool-expr do stmt begin-stmt begin stmt-list end stmt-list stmt stmt ; stmt-list assg-stmt VAR := arith-expr bool-expr arith-expr compare-op arith-expr compare-op < > <= >= == =! = We read the lexed program token-by-token from the start. 3 / 15

The start symbol of the grammar is stmt. Suppose the first lexical token in the program is begin. From this information, we can tell that the first production in the syntax tree must be stmt begin-stmt We thus have to parse the program as begin-stmt. We now see that the next production in the syntax tree has to be begin-stmt begin stmt-list end We thus have to parse the full program begin...... begin stmt-list end. We can thus step over begin, and proceed to parse the remaining program...... as stmt-list end, etc. as 4 / 15

LL(1) predictive parsing: intuition In the example, the correct production to apply is being determined from just two pieces of information: current lexical token (begin). a nonterminal to be expanded (stmt in first step, begin-stmt in second). If it s always possible to determine the next production from the above information, the grammar is said to be LL(1). When a grammar is LL(1), parsing can run efficiently and deterministically. As it turns out, in spite of the promising start to parsing on the previous slides, the grammar for our little programming language is not LL(1). However, we shall see in Lecture 13 how the grammar can be adjusted to make it LL(1). 5 / 15

LL(1) grammars: another intuition Think about a (one-state) NPDA derived from a CFG G. At each step, our NPDA can see two things: the current input symbol the topmost stack symbol Roughly speaking, G is an LL(1) grammar if, just from this information, it s possible to determine which transition/production to apply next. Here LL(1) means read input from Left, build Leftmost derivation, look just one symbol ahead. Subtle point: When doing LL(1) parsing (as opposed to executing an NPDA) we use the input symbol to help us choose a production without consuming the input symbol... hence look ahead. 6 / 15

Parse tables Saying the current input symbol and stack symbol uniquely determine the production means we can draw up a two-dimensional parse table telling us which production to apply in any situation. Consider e.g. the following grammar for well-bracketed sequences: S ɛ TS T (S) This has the following parse table: ( ) $ S S TS S ɛ S ɛ T T (S) Columns are labelled by terminals, which are the input symbols. We include an extra column for an end-of-input marker $. Rows are labelled by nonterminals. Blank entries correspond to situations that can never arise in processing a legal input string. 7 / 15

Predictive parsing with parse tables Given such a parse table, parsing can be done very efficiently using a stack. The stack (reading downwards) records the predicted sentential form for the remaining part of the input. Begin with just start symbol S on the stack. If current input symbol is a (maybe $), and current stack symbol is a non-terminal X, look up rule for a, X in table. [Error if no rule.] If rule is X β, pop X and replace with β (pushed right-end-first!) If current input symbol is a and stack symbol is a, just pop a from stack and advance input read position. [Error if stack symbol is any other terminal.] Accept if stack empties with $ as input symbol. [Error if stack empties sooner.] 8 / 15

Example of predictive parsing ( ) $ S S TS S ɛ S ɛ T T (S) Let s use this table to parse the input string (()). Operation Remaining input Stack state (())$ S Lookup (, S (())$ TS Lookup (, T (())$ (S)S Match ( ())$ S)S Lookup (, S ())$ TS)S Lookup (, T ())$ (S)S)S Match ( ))$ S)S)S Lookup ), S ))$ )S)S Match ) )$ S)S Lookup ), S )$ )S Match ) $ S Lookup $, S $ empty stack (Also easy to build a syntax tree as we go along!) 9 / 15

Self-assessment questions ( ) $ S S TS S ɛ S ɛ T T (S) For each of the following two input strings: ) ( what will go wrong when we try to apply our parsing algorithm? 1. Blank entry in table encountered 2. Input symbol (or end marker) doesn t match expected symbol 3. Stack empties before end of string reached 10 / 15

Self-assessment questions ( ) $ S S TS S ɛ S ɛ T T (S) For each of the following two input strings: ) ( what will go wrong when we try to apply our parsing algorithm? 1. Blank entry in table encountered 2. Input symbol (or end marker) doesn t match expected symbol 3. Stack empties before end of string reached Answer: For ), we start by expanding S to ɛ. But this empties the stack, whereas we haven t consumed any input yet. For (, we get to a point where we ve reached the end marker $ in the input, which doesn t match the predicted symbol ) on the stack. 10 / 15

Further remarks Slogan: the parse table entry for a, X tells us which rule to apply if we re expecting an X and see an a. Often, the a will be simply the first symbol of the X -subphrase in question. But not always: maybe the X -subphrase in question is ɛ, and the a belongs to whatever follows the X. E.g. in the lookups for ), S on the previous slide, the S in question turns out to be empty. Once we ve got a parse table for a given grammar G, we can parse strings of length n in O(n) time (and O(n) space). Our algorithm is an example of a top-down predictive parser: it works by predicting the form of the remainder of the input, and builds syntax trees a top-down way (i.e. starting from the root). There are other parsers (e.g. LR(1)) that work bottom up. 11 / 15

LL(1) grammars: formal definition Suppose G is a CFG containing no useless nonterminals, i.e. every nonterminal appears in some sentential form derived from the start symbol; every nonterminal can be expanded to some (possibly empty) string of terminals. We say G is LL(1) if for each terminal a and nonterminal X, there is some production X α with the following property: If b 1... b n X γ is a sentential form appearing in a leftmost deriviation of some string b 1... b n ac 1... c m (n, m 0), the next sentential form appearing in the derivation is necessarily b 1... b n αγ. (Note that if a, X corresponds to a blank entry in the table, any production X α will satisfy this property, because a sentential form b 1... b n X γ can t arise.) 12 / 15

Non-LL(1) grammars Roughly speaking, a grammar is by definition LL(1) if and only if there s a parse table for it. Not all CFGs have this property! Consider e.g. a different grammar for the same language of well-bracketed sequences: S ɛ (S) SS Suppose we d set up our initial stack with S, and we see the input symbol (. What rule should we apply? If the input string is (()), should apply S (S). If the input string is ()(), should apply S SS. We can t tell without looking further ahead. Put another way: if we tried to build a parse table for this grammar, the two rules S (S) and S SS would be competing for the slot (, S. So this grammar is not LL(1). 13 / 15

Remaining issues Easy to see from the definition that any LL(1) grammar will be unambiguous: never have two syntax trees for the same string. For computer languages, this is fine: normally want to avoid ambiguity anyway. For natural languages, ambiguity is a fact of life! So LL(1) grammars are normally inappropriate. Two outstanding questions... How can we tell if a grammar is LL(1) and if it is, how can we construct a parse table? (See Lecture 12.) If a grammar isn t LL(1), is there any hope of replacing it by an equivalent one that is? (See Lecture 13.) 14 / 15

Reading and prospectus Relevant reading: Some lecture notes from a previous year (covering the same material but with different examples) are available via the course website. (See Readings column of course schedule.) See also Aho, Sethi and Ullman, Compilers: Principles, Techniques, Tools, Section 4.4. 15 / 15