Unifying LL and LR syntax analysis of extended free grammars

Similar documents
COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Recursive Descent Parsers

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Parser. Larissa von Witte. 11. Januar Institut für Softwaretechnik und Programmiersprachen. L. v. Witte 11. Januar /23

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Compiler Construction Using

COP 3402 Systems Software Syntax Analysis (Parser)

Syntax Analysis Part I

Question Bank. 10CS63:Compiler Design

Syntax-Directed Translation. Lecture 14

Context-Free Languages and Parse Trees

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Syntax. In Text: Chapter 3

CST-402(T): Language Processors

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Introduction to Parsing. Lecture 8

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Chapter 3: Lexing and Parsing

Lecture 4: Syntax Specification

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Chapter 4. Lexical and Syntax Analysis

Writing a Simple DSL Compiler with Delphi. Primož Gabrijelčič / primoz.gabrijelcic.org

Dr. D.M. Akbar Hussain

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

CMSC 330: Organization of Programming Languages

Programming Language Syntax and Analysis

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

Bottom-Up Parsing. Lecture 11-12

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

CSE302: Compiler Design

CS131: Programming Languages and Compilers. Spring 2017

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

2068 (I) Attempt all questions.

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

CSE 3302 Programming Languages Lecture 2: Syntax

Lexical and Syntax Analysis. Bottom-Up Parsing

Context-Free Grammars

Compiler Design Aug 1996

Parsing II Top-down parsing. Comp 412

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

Chapter 3. Describing Syntax and Semantics ISBN

BSCS Fall Mid Term Examination December 2012

Syntax. 2.1 Terminology

Lecture 8: Deterministic Bottom-Up Parsing

Introduction to Parsing. Comp 412

Homework & Announcements

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

Formal Languages. Formal Languages

Wednesday, September 9, 15. Parsers

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

LL(1) predictive parsing

Compiler Design Overview. Compiler Design 1

Introduction to Lexing and Parsing

Lecture 7: Deterministic Bottom-Up Parsing

Introduction to Parsing

Lexical and Syntax Analysis

Languages and Compilers

CPS 506 Comparative Programming Languages. Syntax Specification

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

4. Lexical and Syntax Analysis

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Defining syntax using CFGs

shift-reduce parsing

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Principles of Programming Languages COMP251: Syntax and Grammars

Syntax Analysis, VII One more LR(1) example, plus some more stuff. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. target code.

4. Lexical and Syntax Analysis

LECTURE 7. Lex and Intro to Parsing

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Top down vs. bottom up parsing

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

Lexical and Syntax Analysis. Top-Down Parsing

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

Bottom-Up Parsing. Lecture 11-12

Syntax. A. Bellaachia Page: 1

Building Compilers with Phoenix

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

CSCI312 Principles of Programming Languages!

Downloaded from Page 1. LR Parsing

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

EECS 6083 Intro to Parsing Context Free Grammars

GUJARAT TECHNOLOGICAL UNIVERSITY

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

List of Figures. About the Authors. Acknowledgments

JNTUWORLD. Code No: R

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Transcription:

Unifying LL and LR syntax analysis of extended free grammars Luca Breveglieri Stefano Crespi Reghizzi Angelo Morzenti Politecnico di Milano 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 1 / 23

Outline motivations and target classical syntax analysis unification methodology hints on the constructions methodology deployment 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 2 / 23

Motivations PoliMi has a course on Formal Languages & Compilers for the master program course contents: regular and free languages grammars and automata classical syntax analysis (LL, LR and Earley) hints on semantic analysis practical compiler design (byflexand Bison) 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 3 / 23

Motivations wish to compact and simplify the LL and LR syntax analysis methodologies possibly also Early now being investigated these two methods are usually explained to the master students sequentially (e.g. first LL and then LR) independently (almost no notion share) and each one from beginning to end this appears to be the case also elsewhere 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 4 / 23

Motivations and to include grammars with (production) rules in the Extended Backus-Naur Form (EBNF grammars) as well that is, grammars with regular expressions on both the terminal and non-terminal alphabets in their right part extended rules stem naturally from the syntax diagrams of languages 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 5 / 23

Motivations example: standard Dyck gram.: S a S b S S ε and in the EBN Form: S ( a S b ) from a handbook of the C language: B S A B ( C ε) (, B ( C ε) ) S A C would look ugly in non-extended form 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 6 / 23

Objective derive the LL and LR syntax analysis methodologies in a unified framework starting directly from the EBNF grammar sharing notions and constructions (as much as possible) saving time and voice to explain (teacher) saving mind effort to learn (student) possibly include Earley and similar 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 7 / 23

Classical Syntax Analysis from the grammar to the syntax analyser a syntax analyser: is a Pushdown Automaton (PA) recognises the grammar language in addition builds the string derivation (or the string syntax tree) and is deterministic two classical methodologies: LL and LR 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 8 / 23

Classical LL Analysis (top-down or recursive descent method) constructs the leftmost derivation Syntax Analyser (SA) works intuitively SA is enough powerful for complier design yet does not capture all of determinism SA can be simply implemented by hand (by means of recursive syntactic procedures) commonly applied to EBNF grammars but with rules rather twisted to teach 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 9 / 23

Example LL S a S b look-ahead: a S ε look-ahead: b, eos S ( a S b ) look-ahead: a, b, eos procedure S if char = a then read char call S read char if char!= b error else if char = b, eos then null else error end if end procedure procedure S if char = a then while char = a do... end while else if char = b, eos then... else error end if end procedure 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 10 / 23

Classical LR Analysis (bottom-up or shift-reduction method) constructs the rightmost derivation Syntax Analyser (SA) is sophisticated SA is well suited for complier design and captures all of determinism SA need be implemented automatically there are attempts to apply to EBNF grammars (by Earley, Heilbrunner, Beatty and possibly others) but are unhappy: hard work and ugly SA these methods are unseen in textbooks 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 11 / 23

S asbs - S e - a Example LR pilot graph a S a S b S S ε S a SbS - S asbs b S e b S b S asb S - S asbs - S e - S S asbs - a a a S a SbS S asbs S e b b b S b S asb S b S asbs b S e b S S asbs b 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 12 / 23

References for LL and LR On the definition of ELR(k) and ELL(k) grammars, S. Heilbrunner, Acta Informatica, pp. 169-176, vol. 11, 1979 On the relationship between the LL(1) and LR(1) grammars, J. C. Beatty, Jou. of the ACM, pp. 1007-1022, vol. 29, n. 4, 1982 Tests for the LR- LL- and LC-Regular Conditions, S. Heilbrunner, Jou. of Comp. and Sys. Sci., pp. 1-13, vol. 27, n. 1, 1983 Formal Languages and Compilation, S. Crespi Reghizzi, Springer, 2009 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 13 / 23

Unification of LL and LR represent EBNF grammar rules as DFSA s (over both terminal and nonterminal alphabets) grammar becomes a network of recursive finite state automata basically is a PA start directly from the so-called pilot (or driver) graph of the automaton network (pilot is the heart of the classical LR analysis) both the LL and LR conditions and syntax analysers can be derived from the pilot 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 14 / 23

Grammar Rules to Finite Automata ε, a, ( ), (a), ((a)), (aa), (a) ( ), 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 15 / 23

Ext. P i l o t G r a p h 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 16 / 23

Unified LR Analysis use the classical LR condition: the EBNF grammar is LR if and only if in every macro-state of the extended pilot graph there are not any shift-reduction or reduction-reduction conflicts condition easy to be checked on the pilot the construction of the bottom-up syntax analyser is similar to the classical one 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 17 / 23

Unified LL Analysis use the so-called Beatty LL condition the EBNF grammar is LL if and only if it is LR (use the classical LR condition) and every macro-state of the ext. pilot graph has a base that is void or contains at most one state (or one marked rule in the classical version) condition easy to be checked on the pilot LL look-ahead sets are visibile in the pilot the construction of the top-down or recursive descent syntax analyser is sufficiently easy 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 18 / 23

Example LR yet not LL 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 19 / 23

Pro s of the Unification save teaching time, notion dup.s and tricky rules constructing the extended pilot graph is similar to the classical costruction for non-ext. gram.s in the unified scenario this will be the only rather ingenious step for the student costructing the SA s (top-down or bottom-up) is automatic (here the SA costruction rules are skipped for brevity) the extended pilot costruction looks like similar to the Berry-Sethi algorithm for obtaining a DFSA directly from a regular expression 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 20 / 23

and Con s there may be a little more complexity in deriving the Syntax Analyser (SA) from the pilot graph LR: the shift-reduction SA must in some cases use a stack enhanced with pointers (when reducing an ext. rule) LL: the SA implementation by recursive procedures may still be done yet is somehow less immediate but the conceptual steps (pilot graph costruction verification of the LL and LR conditions grammar modification if necessary for achieving determinism) ARE VERY DIRECT AND WELL INTEGRATED 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 21 / 23

Extension to Earley the Earley algorithm is used for analysing non-det. grammars (or even ambiguous) applied by Earley himself to EBNF gram.s but with a few errors and a hard notation might be made more easy to teach and handy to apply by representing the EBNF grammar as a network of recursive automata currently under investigation 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 22 / 23

Deployment Plan unification theory developed enough for the LL and LR methodologies (draft report almost finished) under investigation for the Earley algorithm teaching material (slides) still to be designed and written in the next two months (should be mostly based on examples) plan to test the unified syntax analysis methodology in the 2011-12 course Formal Languages & Compilers (Oct. 2011 Jan. 2012) if test successful, upgrade the next edition of the Springer textbook Formal Languages & Compilers (and suppress the classical version of syntax analysis) 5-7 settembre 2011 - PRIN Unifying LL and LR analysis pp. 23 / 23