Context-Free Grammars

Similar documents
Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

Introduction to Parsing. Lecture 5

Introduction to Parsing Ambiguity and Syntax Errors

Introduction to Parsing. Lecture 5

Introduction to Parsing. Lecture 8

Introduction to Parsing Ambiguity and Syntax Errors

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Parsing: Derivations, Ambiguity, Precedence, Associativity. Lecture 8. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Grammars and ambiguity. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 8 1

Intro To Parsing. Step By Step

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

E E+E E E (E) id. id + id E E+E. id E + E id id + E id id + id. Overview. derivations and parse trees. Grammars and ambiguity. ambiguous grammars

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Ambiguity. Lecture 8. CS 536 Spring

Outline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

CS 314 Principles of Programming Languages

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

Optimizing Finite Automata

Properties of Regular Expressions and Finite Automata

A Simple Syntax-Directed Translator

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

CMSC 330: Organization of Programming Languages

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Defining syntax using CFGs

COP 3402 Systems Software Syntax Analysis (Parser)

Syntax Analysis Check syntax and construct abstract syntax tree

Syntax Analysis Part I

LECTURE 3. Compiler Phases

A simple syntax-directed

Syntax-Directed Translation. Lecture 14

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

CMSC 330: Organization of Programming Languages. Context Free Grammars

Fall Compiler Principles Context-free Grammars Refresher. Roman Manevich Ben-Gurion University of the Negev

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

2.2 Syntax Definition

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Compilers and computer architecture From strings to ASTs (2): context free grammars

Abstract Syntax Trees & Top-Down Parsing

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

CMSC 330: Organization of Programming Languages

Grammars & Parsing. Lecture 12 CS 2112 Fall 2018

Introduction to Lexing and Parsing

Parsing II Top-down parsing. Comp 412

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

CMSC 330: Organization of Programming Languages

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Compiler Design Concepts. Syntax Analysis

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CMSC 330: Organization of Programming Languages. Context Free Grammars

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

Announcements. Written Assignment 1 out, due Friday, July 6th at 5PM.

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

CSE 3302 Programming Languages Lecture 2: Syntax

CMSC 330: Organization of Programming Languages. Context Free Grammars

Ambiguity and Errors Syntax-Directed Translation

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Week 2: Syntax Specification, Grammars

Bottom-Up Parsing. Lecture 11-12

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Conflicts in LR Parsing and More LR Parsing Types

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Lecture 4: Syntax Specification

CSE 401 Midterm Exam Sample Solution 2/11/15

Front End. Hwansoo Han

CSE 401 Midterm Exam 11/5/10

CS 406/534 Compiler Construction Parsing Part I

Bottom-Up Parsing. Lecture 11-12

A programming language requires two major definitions A simple one pass compiler

CSCE 314 Programming Languages

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

LR Parsing LALR Parser Generators

Chapter 3. Parsing #1

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Syntax Errors; Static Semantics

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Introduction to Compiler

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

In One Slide. Outline. LR Parsing. Table Construction

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

CSCI312 Principles of Programming Languages!

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

CS153: Compilers Lecture 4: Recursive Parsing

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Lexical and Syntax Analysis. Top-Down Parsing

Transcription:

Context-Free Grammars Lecture 7 http://webwitch.dreamhost.com/grammar.girl/

Outline Scanner vs. parser Why regular expressions are not enough Grammars (context-free grammars) grammar rules derivations parse trees ambiguous grammars useful examples Reading: Sections 4.1 and 4.2 CS 536 Fall 2000 2

The Functionality of the Parser Input: sequence of tokens from lexer Output: parse tree of the program parse tree is generated if the input is a legal program if input is an illegal program, syntax errors are issued Note: Instead of parse tree, some parsers produce directly: abstract syntax tree (AST) + symbol table (as in P3), or intermediate code, or object code In the following lectures, we ll assume that parse tree is generated. CS 536 Fall 2000 3

Comparison with Lexical Analysis Phase Input Output Lexer Parser String of characters String of tokens String of tokens Parse tree CS 536 Fall 2000 4

xample The program: x * y + z Input to parser: ID TIMS ID PLUS ID we ll write tokens as follows: id * id + id + Output of parser: the parse tree id * id id CS 536 Fall 2000 5

Why are regular expressions not enough? TST YOURSLF #1 Write an automaton that accepts strings a, (a), ((a)), and (((a))) a, (a), ((a)), (((a))), ( k a) k CS 536 Fall 2000 6

Why are regular expressions not enough? TST YOURSLF #2 What programs are generated by? digit+ ( ( + - * / ) digit+ )* What important properties this regular expression fails to express? CS 536 Fall 2000 7

What must parser do? 1. Recognizer: not all strings of tokens are programs must distinguish between valid and invalid strings of tokens 2. Translator: must expose program structure e.g., associativity and precedence hence must return the parse tree We need: A language for describing valid strings of tokens context-free grammars (analogous to regular expressions in the scanner) A method for distinguishing valid from invalid strings of tokens (and for building the parse tree) the parser (analogous to the state machine in the scanner) CS 536 Fall 2000 8

We need context-free grammars (CFGs) xample: Simple Arithmetic xpressions In nglish: An integer is an arithmetic expression. If exp 1 and exp 2 are arithmetic expressions, then so are the following: exp 1 - exp 2 exp 1 / exp 2 ( exp 1 ) the corresponding CFG: we ll write tokens as follows: exp INTLITRAL intlit exp exp MINUS exp - exp exp DIVID exp / exp LPARN exp RPARN ( ) CS 536 Fall 2000 9

Reading the CFG The grammar has five terminal symbols: intlit, -, /, (, ) terminals of a grammar = tokens returned by the scanner. The grammar has one non-terminal symbol: non-terminals describe valid sequences of tokens The grammar has four productions or rules, each of the form: α left-hand side = a single non-terminal. right-hand side = either a sequence of one or more terminals and/or non-terminals, or ε (an empty production); again, the book uses symbol λ CS 536 Fall 2000 10

xample, revisited Note: a more compact way to write previous grammar: intlit - / ( ) or intlit - / ( ) CS 536 Fall 2000 11

A formal definition of CFGs A CFG consists of A set of terminals T A set of non-terminals N A start symbol S (a non-terminal) A set of productions: X YY L Y where 1 2 n X N and Y T N ε i { } CS 536 Fall 2000 12

Notational Conventions In these lecture notes Non-terminals are written upper-case Terminals are written lower-case The start symbol is the left-hand side of the first production CS 536 Fall 2000 13

The Language of a CFG The language defined by a CFG is the set of strings that can be derived from the start symbol of the grammar. Derivation: Read productions as rules: X Y L Y 1 n Means X can be replaced by 1 Y L Y n CS 536 Fall 2000 14

Derivation: key idea 1. Begin with a string consisting of the start symbol S 2. Replace any non-terminal X in the string by a the right-hand side of some production X Y L Y 1 n 3. Repeat (2) until there are no non-terminals in the string CS 536 Fall 2000 15

Derivation: an example CFG: id + * ( ) Is string id * id + id in the language defined by the grammar? derivation: + id + + id id + id id + id CS 536 Fall 2000 16

Terminals Terminals are called because there are no rules for replacing them Once generated, terminals are permanent Therefore, terminals are the tokens of the language CS 536 Fall 2000 17

The Language of a CFG (Cont.) More formally, write X L X L X X L X YL YX L X 1 i n 1 i 1 1 m i+ 1 n if there is a production i X Y L Y 1 m CS 536 Fall 2000 18

The Language of a CFG (Cont.) Write if X L X Y L Y 1 n 1 m in 0 or more steps X L X L L YL Y 1 n 1 m CS 536 Fall 2000 19

The Language of a CFG Let G be a context-free grammar with start symbol S. Then the language of G is: { a and every is a terminal } 1K an S a1k an ai CS 536 Fall 2000 20

xamples Strings of balanced parentheses i i { () i 0} The grammar: S S ( S) ε same as S ( S) ε CS 536 Fall 2000 21

Arithmetic xample Simple arithmetic expressions: + () id Some elements of the language: id id + id (id) id id (id) id id (id) CS 536 Fall 2000 22

Notes The idea of a CFG is a big step. But: Membership in a language is yes or no we also need parse tree of the input! furthermore, we must handle errors gracefully Need an implementation of CFG s, i.e. the parser we ll create the parser using a parser generator available generators: CUP, bison, yacc CS 536 Fall 2000 23

More Notes Form of the grammar is important Many grammars generate the same language Parsers are sensitive to the form of the grammar xample: + intlit is not suitable for an LL(1) parser (a common kind of parser). Stay tuned, you will soon understand why. CS 536 Fall 2000 24

Derivations and Parse Trees A derivation is a sequence of productions S L L L A derivation can be drawn as a tree Start symbol is the tree s root X Y L Y n For a production add children 1 to node X Y1 L Y n CS 536 Fall 2000 25

Derivation xample Grammar + () id String id id + id CS 536 Fall 2000 26

Derivation xample (Cont.) + + + id + * id id id + id id + id id id CS 536 Fall 2000 27

Derivation in Detail (1) id * id + id CS 536 Fall 2000 28

Notes on Derivations A parse tree has Terminals at the leaves Non-terminals at the interior nodes An in-order traversal of the leaves is the original input The parse tree shows the association of operations, the input string does not CS 536 Fall 2000 29

Left-most and Right-most Derivations The example is a leftmost derivation At each step, replace the left-most non-terminal There is an equivalent notion of a right-most derivation + +id + id id + id id id + id CS 536 Fall 2000 30

Right-most Derivation in Detail (1) id * id + id CS 536 Fall 2000 31

Derivations and Parse Trees Note that right-most and left-most derivations have the same parse tree The difference is the order in which branches are added CS 536 Fall 2000 32

Summary of Derivations We are not just interested in whether s ε L(G) We need a parse tree for s, (because we need to build the AST) A derivation defines a parse tree But one parse tree may have many derivations Left-most and right-most derivations are important in parser implementation CS 536 Fall 2000 33

Ambiguity Grammar + () id String id id + id CS 536 Fall 2000 34

Ambiguity (Cont.) This string has two parse trees + * * id id + id id id id CS 536 Fall 2000 35

TST YOURSLF #3 Question 1: for each of the two parse trees, find the corresponding left-most derivation Question 2: for each of the two parse trees, find the corresponding right-most derivation CS 536 Fall 2000 36

Ambiguity (Cont.) A grammar is ambiguous if for some string (the following three conditions are equivalent) it has more than one parse tree if there is more than one right-most derivation if there is more than one left-most derivation Ambiguity is BAD Leaves meaning of some programs ill-defined CS 536 Fall 2000 37

Dealing with Ambiguity There are several ways to handle ambiguity Most direct method is to rewrite grammar unambiguously ' + nforces precedence of * over + ' id ' id () ' CS 536 Fall 2000 38