Grammars & Parsing. Lecture 12 CS 2112 Fall 2018

Similar documents
Announcements. Application of Recursion. Motivation. A Grammar. A Recursive Grammar. Grammars and Parsing

If you are going to form a group for A2, please do it before tomorrow (Friday) noon GRAMMARS & PARSING. Lecture 8 CS2110 Spring 2014

GRAMMARS & PARSING. Lecture 7 CS2110 Fall 2013

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS. Regrades 10/6/15. Prelim 1. Prelim 1. Expression trees. Pointers to material

syntax tree - * * * - * * * * * 2 1 * * 2 * (2 * 1) - (1 + 0)

ASTS, GRAMMARS, PARSING, TREE TRAVERSALS. Lecture 14 CS2110 Fall 2018

syntax tree - * * * * * *

Application of recursion. Grammars and Parsing. Recursive grammar. Grammars

CSE 3302 Programming Languages Lecture 2: Syntax

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

announcements CSE 311: Foundations of Computing review: regular expressions review: languages---sets of strings

COP 3402 Systems Software Syntax Analysis (Parser)

CPS 506 Comparative Programming Languages. Syntax Specification

CSE 401 Midterm Exam Sample Solution 2/11/15

Context-Free Grammars

Introduction to Parsing. Lecture 8

Parsing II Top-down parsing. Comp 412

Part 5 Program Analysis Principles and Techniques

Theory and Compiling COMP360

CS 314 Principles of Programming Languages

Syntax-Directed Translation. Lecture 14

Syntax and Grammars 1 / 21

A simple syntax-directed

CS164: Midterm I. Fall 2003

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Peace cannot be kept by force; it can only be achieved by understanding. Albert Einstein

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Compiling Regular Expressions COMP360

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

CSE 401 Midterm Exam 11/5/10

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Lecture 7: Deterministic Bottom-Up Parsing

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Optimizing Finite Automata

CS153: Compilers Lecture 4: Recursive Parsing

Lexical Scanning COMP360

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Non-deterministic Finite Automata (NFA)

Lecture 8: Deterministic Bottom-Up Parsing

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Context Free Grammars and Recursive Descent Parsing

CS 314 Principles of Programming Languages

Chapter 3 Parsing. time, arrow, banana, flies, like, a, an, the, fruit

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Introduction to Parsing. Lecture 5

Principles of Programming Languages COMP251: Syntax and Grammars

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

Front End. Hwansoo Han

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Introduction to Parsing. Lecture 5

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Outline. Top Down Parsing. SLL(1) Parsing. Where We Are 1/24/2013

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

Syntax. In Text: Chapter 3

CMSC 330, Fall 2009, Practice Problem 3 Solutions

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Chapter 3. Parsing #1

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Outline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

LECTURE 7. Lex and Intro to Parsing

CS2112 Fall Assignment 4 Parsing and Fault Injection. Due: March 18, 2014 Overview draft due: March 14, 2014

Principles of Programming Languages COMP251: Syntax and Grammars

Announcements. Written Assignment 1 out, due Friday, July 6th at 5PM.

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Lexical Analysis. Introduction

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

CSCI312 Principles of Programming Languages

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Fall Compiler Principles Context-free Grammars Refresher. Roman Manevich Ben-Gurion University of the Negev

Syntax Analysis Check syntax and construct abstract syntax tree

Introduction to Lexing and Parsing

EECS 6083 Intro to Parsing Context Free Grammars

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Question Points Score

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

LECTURE 3. Compiler Phases

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Context-free grammars (CFG s)

Properties of Regular Expressions and Finite Automata

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

Transcription:

Grammars & Parsing Lecture 12 CS 2112 Fall 2018

Motivation The cat ate the rat. The cat ate the rat slowly. The small cat ate the big rat slowly. The small cat ate the big rat on the mat slowly. The small cat that sat in the hat ate the big rat on the mat slowly. The small cat that sat in the hat ate the big rat on the mat slowly, then got sick. Not all sequences of words are legal sentences The ate cat rat the How many legal sentences are there? How many legal programs are there? Are all Java programs that compile legal programs? How do we know what programs are legal? http://java.sun.com/docs/books/jls/third_edition/html/syntax.html!2

A Grammar Sentence ::= Noun Verb Noun Noun ::= boys girls bunnies Verb ::= like see Our sample grammar has these rules: A Sentence can be a Noun followed by a Verb followed by a Noun A Noun can be boys or girls or bunnies A Verb can be like or see xamples of Sentence: boys see bunnies bunnies like girls Grammar: set of rules for generating sentences in a language White space between words does not matter The words boys, girls, bunnies, like, see are called tokens or terminals The words Sentence, Noun, Verb are called syntactic classes or nonterminals This is a very boring grammar because the set of Sentences is finite (exactly 18)!3

A Recursive Grammar Sentence ::= Sentence and Sentence Sentence or Sentence Noun Verb Noun Noun ::= boys girls bunnies Verb ::= like see This grammar is more interesting than the last one because the set of Sentences is infinite xamples of Sentences in this language: boys like girls boys like girls and girls like bunnies boys like girls and girls like bunnies and girls like bunnies boys like girls and girls like bunnies and girls like bunnies and girls like bunnies... What makes this set infinite? Answer: Recursive definition of Sentence!4

Detour What if we want to add a period at the end of every sentence? Sentence ::= Sentence and Sentence. Sentence or Sentence. Noun Verb Noun. Noun ::= Does this work? No! This produces sentences like: girls like boys. and boys like bunnies.. Sentence Sentence Sentence!5

Sentences with Periods TopLevelSentence ::= Sentence. Sentence ::= Sentence and Sentence Sentence or Sentence Noun Verb Noun Noun ::= boys girls bunnies Verb ::= like see Add a new rule that adds a period only at the end of the sentence. The tokens here are the 7 words plus the period (.) This grammar is ambiguous: boys like girls and girls like boys or girls like bunnies!6

Grammar for Simple xpressions ::= integer ( + ) Simple expressions: An can be an integer. An can be ( followed by an followed by + followed by an followed by ) Set of expressions defined by this grammar is an inductively-defined set Is the language finite or infinite? Do recursive grammars always yield infinite languages? Here are some legal expressions: 2 (3 + 34) ((4+23) + 89) ((89 + 23) + (23 + (34+12))) Here are some illegal expressions: (3 3 + 4 The tokens in this grammar are (, +, ), and any integer!7

Parsing Grammars can be used in two ways A grammar defines a language (i.e., the set of properly structured sentences) A grammar can be used to parse a sentence (thus, checking if the sentence is in the language) xample: Show that ((4+23) + 89) is a valid expression by building a parse tree ( + ) To parse a sentence is to build a parse tree This is much like diagramming a sentence ( + ) 4 23 89!8

Ambiguity Grammar is ambiguous if some strings have more than one parse tree xample: arithmetic expressions without precedence: 2 + 3 * 5 n + + * * ( ) * + 2 3 5 2 3 5

Precedence Ambiguities resulting from not handling precedence can be handled by introducing extra levels of nonterminals. (expr) T T + T (term) F F * T F (factor) n ( ) 2 + 3 * 5 T + F T 2 F * T 3 F Only one parse tree! 5!10

Recursive Descent Parsing Idea: Use the grammar to design a recursive program to check if a sentence is in the language To parse an expression, for instance We look for each terminal (i.e., each token) ach nonterminal (e.g., ) can handle itself by using a recursive call The grammar tells how to write the program! A recognizer: boolean parse( ) { if (first token is an integer) return true; if (first token is ( ) { } scan past ( token; parse( ); scan past + token; parse( ); scan past ) token; return true; return false; }!11

Abstract Syntax Trees vs. Parse Trees Result of parsing: often a data structure representing the input. Parse tree has information we don t need, e.g. parentheses. Abstract syntax tree 2 + 3 * 5 new BinaryOp(TIMS, new BinaryOp(PLUS, new Num(2), new Num(3)), new Num(5)) Parse tree / concrete syntax tree T + F T 2 F * T 3 F 5!12

Java Code for Parsing public static xprnode parse(scanner scanner) { if (scanner.hasnextint()) { int data = scanner.nextint(); return new Node(data); } check(scanner, ( ); left = parse(scanner); check(scanner, + ); right = parse(scanner); check(scanner, ) ); return new BinaryOpNode(PLUS, left, right); }!13

Responding to Invalid Input Parsing does two things: checks for validity (is the input a valid sentence?) constructs the parse tree (usually called an AST or abstract syntax tree) Q: How should we respond to invalid input? A: Throw an exception with as much information for the user as possible the nature of the error approximately where in the input it occurred!14

The associativity problem Top-down parsing works well with right-recursive grammars (e.g., (expr) T T + T (term) F F * T F (factor) n ( ) Problem: leads to right-associative operators: 1 + 2 + 3 : 1 + + 2 3!15

Reassociation Trick: rewrite right-recursive rules to use Kleene star: (expr) T T + becomes T (+ T) * <--- 0 or more repetitions of + T Recursion becomes a loop: public static xpr parse() { xpr e = parset(); while (peek() is + )) { consume( + ); e = new BinaryOpNode(PLUS, e, parset()); } return e; }!16

Using a Parser to Generate Code We can modify the parser so that it generates stack code to evaluate arithmetic expressions: 2 PUSH 2 STOP (2 + 3) PUSH 2 PUSH 3 ADD STOP Goal: Modify parse to return a string containing stack code for expression it has parsed Method parse can generate code in a recursive way: For integer i, it returns string PUSH + i + \n For (1 + 2), w Recursive calls for 1 and 2 return code strings c1 and c2, respectively w Return c1 + c2 + ADD\n Top-level method appends a STOP command!17

Does Recursive Descent Always Work? No some grammars cannot be used with recursive descent A trivial example (causes infinite recursion): S ::= b Sa Can rewrite grammar S ::= b ba A ::= a aa Sometimes recursive descent is hard to use There are more powerful parsing techniques (not covered in this course) Nowadays, there are automated parser and tokenizer generators you write down the grammar, it produces the parser and tokenizer automatically Many based on LR parsing, which can handle a larger class of grammars.!18

xercises Write a grammar and recursive-descent parser for palindromes: mom dad I prefer pi race car A man, a plan, a canal: Panama murder for a jar of red rum sex at noon taxes strings of the form A n B n for some n 0: AB AABB AAAAAAABBBBBBB Java identifiers: a letter, followed by any number of letters or digits decimal integers: an optional minus sign ( ) followed by one or more digits 0-9!19