Grammars and Parsing, second week

Similar documents
Grammars and Parsing for SSC1

November Grammars and Parsing for SSC1. Hayo Thielecke. Introduction. Grammars. From grammars to code. Parser generators.

Compiler Construction D7011E

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Project 1: Scheme Pretty-Printer

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

CSE431 Translation of Computer Languages

LL and LR parsing with abstract machines, or Syntax for Semanticists Midlands Graduate School 2016

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Chapter 3 Parsing. time, arrow, banana, flies, like, a, an, the, fruit

A simple syntax-directed

An ANTLR Grammar for Esterel

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

shift-reduce parsing

Grammars and Parsing. Paul Klint. Grammars and Parsing

Building Compilers with Phoenix

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

Formal Languages and Compilers Lecture VI: Lexical Analysis

Building Compilers with Phoenix

UNIVERSITY OF CALIFORNIA

1. Lexical Analysis Phase

Trustworthy Compilers

Profile Language Compiler Chao Gong, Paaras Kumar May 7, 2001

Introduction to Parsing. Lecture 8

How do LL(1) Parsers Build Syntax Trees?

Lecture 8: Deterministic Bottom-Up Parsing

2.2 Syntax Definition

CS2112 Fall Assignment 4 Parsing and Fault Injection. Due: March 18, 2014 Overview draft due: March 14, 2014

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Chapter 3: Lexing and Parsing

A programming language requires two major definitions A simple one pass compiler

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Lecture 7: Deterministic Bottom-Up Parsing

Outline. Top Down Parsing. SLL(1) Parsing. Where We Are 1/24/2013

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Top down vs. bottom up parsing

CSE 3302 Programming Languages Lecture 2: Syntax

Chapter 2 :: Programming Language Syntax

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Left to right design 1

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

UNIT III & IV. Bottom up parsing

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

In this simple example, it is quite clear that there are exactly two strings that match the above grammar, namely: abc and abcc

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Fundamentals of Compilation

Design Patterns: State, Bridge, Visitor

4. Lexical and Syntax Analysis

Question Points Score

Java+- Language Reference Manual

CS664 Compiler Theory and Design LIU 1 of 16 ANTLR. Christopher League* 17 February Figure 1: ANTLR plugin installer

LL(k) Compiler Construction. Choice points in EBNF grammar. Left recursive grammar

The following deflniüons h:i e been establish for the tokens: LITERAL any group olcharacters surrounded by matching quotes.

LL(k) Compiler Construction. Top-down Parsing. LL(1) parsing engine. LL engine ID, $ S 0 E 1 T 2 3

4. Lexical and Syntax Analysis

CSCI312 Principles of Programming Languages!

Building lexical and syntactic analyzers. Chapter 3. Syntactic sugar causes cancer of the semicolon. A. Perlis. Chomsky Hierarchy

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

CS606- compiler instruction Solved MCQS From Midterm Papers

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

Downloaded from Page 1. LR Parsing

Question Marks 1 /20 2 /16 3 /7 4 /10 5 /16 6 /7 7 /24 Total /100

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CSCI312 Principles of Programming Languages

Introduction to Lexical Analysis

Some Basic Definitions. Some Basic Definitions. Some Basic Definitions. Language Processing Systems. Syntax Analysis (Parsing) Prof.

4. An interpreter is a program that

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

Parser Tools: lex and yacc-style Parsing

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1

CS202 Compiler Construction. Christian Skalka. Course prerequisites. Solid programming skills a must.

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Last time. What are compilers? Phases of a compiler. Scanner. Parser. Semantic Routines. Optimizer. Code Generation. Sunday, August 29, 2010

Lexical and Syntax Analysis. Top-Down Parsing

COMP 181. Prelude. Prelude. Summary of parsing. A Hierarchy of Grammar Classes. More power? Syntax-directed translation. Analysis

Introduction to Programming Using Java (98-388)

Cunning Plan. Informal Sketch of Lexical Analysis. Issues in Lexical Analysis. Specifying Lexers

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Parser Tools: lex and yacc-style Parsing

LECTURE 3. Compiler Phases

CSE 401 Midterm Exam Sample Solution 2/11/15

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Abstract Syntax Trees

Syntax. In Text: Chapter 3

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Using an LALR(1) Parser Generator

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

(F)lex & Bison/Yacc. Language Tools for C/C++ CS 550 Programming Languages. Alexander Gutierrez

CPS 506 Comparative Programming Languages. Syntax Specification

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Transcription:

Grammars and Parsing, second week Hayo Thielecke 17-18 October 2005 This is the material from the slides in a more printer-friendly layout. Contents 1 Overview 1 2 Recursive methods from grammar rules 2 3 From grammars to methods 2 4 Predictive parser 2 5 The lookahead and match methods 2 6 Simplied view of parsing 2 7 Parsing with lookahead 3 8 Translation to code (only recognizing) 3 9 From grammars to Java classes 3 10 Classes for the parse tree 4 11 Parsing while building the parse tree 4 12 More general parsing 4 13 Parser generators 4 14 Parser generator notation 5 15 If you need to do parsing 5 1 Overview So far: Grammars, their language and parse trees. Grammars class hierarchy implementing parse trees. Grammars recursive methods for processing syntax. This week: put together the pieces: build simple parsers that generate parse trees, translation to another syntax. 1

2 Recursive methods from grammar rules From grammars to mutually recursive methods: For each non-terminal A there is a static method A. The method body is a switch statement that chooses a rule for A. For each rule A X 1... X n, there is a branch in the switch statement. There are method calls for all the non-terminals among X 1,..., X n. Each grammar gives us some recursive methods. For each derivation in the language, we have a sequence of method calls. 3 From grammars to methods Non-terminal method Production branch of a switch Occurrence of a non-terminal method call on the right-hand side of a production Occurrence of a terminal I/O of the symbol on the right-hand side of a production 4 Predictive parser A predictive parser can be constructed from grammar rules. The parser is allowed to look ahead in the input; based on what it sees there, it then makes predictions. Canonical example: matching brackets. If the parser sees a [ as the next input symbol, it predicts that the input contains something in brackets. More technically: switch on the lookahead; [ labels one case. 5 The lookahead and match methods A predictive parser relies on two methods for accessing the input string: char lookhead() returns the next symbol in the input, without removing it. void match(char c) compares the next symbol in the output to c. If they are the same, the symbol is removed from the input. Otherwise, the parsing is stopped with an error. 6 Simplied view of parsing In the real world, lookahead and match are calls to the lexical analyzer, and they return tokens, not characters. There are efficiency issues of buffering the input file, etc. 2

We ignore all that to keep the parser as simple as possible. (Only single-letter keywords.) But this setting is sufficient to demonstrate the principles. 7 Parsing with lookahead Parsing with lookahead is easy if every rule for a given non-terminal starts with a different terminal symbol: S [ S ] S + Idea: suppose you are trying to parse an S. Look at the first symbol in the input: if it is a [, use the first rule; if it is a +, use the second rule. 8 Translation to code (only recognizing) void parses() switch(lookahead()) // what is in the input? case [ : // I have seen a [ match( [ ); // remove the [ parses(); // now parse what is inside [...] match( ] ); // make sure there is a ] break; // done in this case case + : match( + ); break; default: error(); 9 From grammars to Java classes We translate a grammar to some mutually recursive Java classes: For each non-terminal A there is an abstract class A For each rule A X 1... X n, there is a concrete subclass of A. It has fields for all non-terminals among X 1,..., X n. The tostring method of A concatenates the tostring of all the X i : if X i is a terminal symbol, it is already a string; if X i is a non-terminal, its tostring method is called. Thus each grammar gives us a class hierarchy. 3

10 Classes for the parse tree abstract class S class Bracket extends S private S inbrackets; Brackets(S s) inbrackets = s; class Plus extends S 11 Parsing while building the parse tree S parses() // return type is abstract class S switch(lookahead()) case [ : S trees; match( [ ); trees = parses(); // remember tree inside [...] match( ] ); return new Bracket(treeS); // call the constructor of the class for this rule... 12 More general parsing Parsing with lookahead is easy if every rule for a given non-terminal starts with a different terminal symbol. But what if not? The right-hand-side could instead start with a non-teminal, or be an empty string. More general methods: FIRST and FOLLOW construction, covered in Compilers and Languages. 13 Parser generators The lookahead is tedious to compute. Parser generators do it automatically. Some parser generators, notably yacc and its descendants, use a more complicated parsing method that does not only use lookahead (LALR, a form of bottom-up parsing). 4

14 Parser generator notation Parser generators use some ASCII syntax rather than symbols like. ANTLR uses : instead of. There is one rule for each non-terminal, with alternatives indicated by. The rule is terminated with a semicolon. Yacc/bison is similar. ANTLR can construct the parse tree. Typically, one attaches parsing actions to each production that tell the parser what to do. 15 If you need to do parsing If you have a simple language to parse (rules start with different terminals), you can write a predictive parser by hand. If the grammar is more complex, you could use a parser generator: You give it the grammar; you need to add parsing actions, or walk the parse tree. ANTLR uses tree grammars for that, SableCC a Visitor pattern. 5