Introduction to Parsing. Lecture 8

Similar documents
Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Outline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

Introduction to Parsing. Lecture 5

Introduction to Parsing. Lecture 5

Introduction to Parsing Ambiguity and Syntax Errors

Introduction to Parsing Ambiguity and Syntax Errors

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Intro To Parsing. Step By Step

Context-Free Grammars

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Syntax-Directed Translation. Lecture 14

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

Bottom-Up Parsing. Lecture 11-12

Bottom-Up Parsing. Lecture 11-12

Grammars and ambiguity. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 8 1

Parsing: Derivations, Ambiguity, Precedence, Associativity. Lecture 8. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Lecture 8: Deterministic Bottom-Up Parsing

Syntax Analysis Check syntax and construct abstract syntax tree

Lecture 7: Deterministic Bottom-Up Parsing

A Simple Syntax-Directed Translator

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

E E+E E E (E) id. id + id E E+E. id E + E id id + E id id + id. Overview. derivations and parse trees. Grammars and ambiguity. ambiguous grammars

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Context-Free Grammars

COP 3402 Systems Software Syntax Analysis (Parser)

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Syntax Analysis Part I

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CSCI312 Principles of Programming Languages!

Compilers and computer architecture From strings to ASTs (2): context free grammars

Recursive Descent Parsers

CS 314 Principles of Programming Languages

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Context-Free Languages and Parse Trees

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Defining syntax using CFGs

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

CMSC 330: Organization of Programming Languages

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

LECTURE 7. Lex and Intro to Parsing

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

CMSC 330: Organization of Programming Languages

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Lexical Analysis. Lecture 2-4

syntax tree - * * * - * * * * * 2 1 * * 2 * (2 * 1) - (1 + 0)

Properties of Regular Expressions and Finite Automata

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Fall Compiler Principles Context-free Grammars Refresher. Roman Manevich Ben-Gurion University of the Negev

CMSC 330: Organization of Programming Languages

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Lexical and Syntax Analysis

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Abstract Syntax Trees & Top-Down Parsing

Some Basic Definitions. Some Basic Definitions. Some Basic Definitions. Language Processing Systems. Syntax Analysis (Parsing) Prof.

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Lexical and Syntax Analysis. Top-Down Parsing

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Formal Languages and Compilers Lecture VI: Lexical Analysis

Optimizing Finite Automata

Chapter 2 :: Programming Language Syntax

CMSC 330: Organization of Programming Languages. Context Free Grammars

Dr. D.M. Akbar Hussain

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Introduction to Lexing and Parsing

LECTURE 3. Compiler Phases

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

LR Parsing LALR Parser Generators

CMSC 330: Organization of Programming Languages. Context Free Grammars

CSE302: Compiler Design

Ambiguity. Lecture 8. CS 536 Spring

CMSC 330: Organization of Programming Languages. Context Free Grammars

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Lexical and Syntax Analysis

Chapter 4. Lexical and Syntax Analysis

Part 5 Program Analysis Principles and Techniques

CSE 3302 Programming Languages Lecture 2: Syntax

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

Compiler Design Concepts. Syntax Analysis

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Ambiguity and Errors Syntax-Directed Translation

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

4. Lexical and Syntax Analysis

CS 403: Scanning and Parsing

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

Transcription:

Introduction to Parsing Lecture 8 Adapted from slides by G. Necula

Outline Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations

Languages and Automata Formal languages are very important in CS specially in programming languages Regular languages The weakest formal languages widely used Many applications We will also study context-free languages

Limitations of Regular Languages Intuition: A finite automaton that runs long enough must repeat states Finite automaton can t remember # of times it has visited a particular state Finite automaton has finite memory Only enough to store in which state it is Cannot count, except up to a finite limit.g., language of balanced parentheses is not regular: { ( i ) i i 0}

The Structure of a Compiler Source Lexical analysis Tokens Today we start Parsing Optimization Interm. Language Code Gen. Machine Code

The Functionality of the Parser Input: sequence of tokens from lexer Output: abstract syntax tree of the program

xample Pyth: if x == y: z =1 else: z = 2 Parser input: IF ID == ID : ID = INT LS : ID = INT Parser output (abstract syntax tree): IF-THN-LS == = = ID ID ID INT ID INT

Why A Tree? ach stage of the compiler has two purposes: Detect and filter out some class of errors Compute some new information or translate the representation of the program to make things easier for later stages Recursive structure of tree suits recursive structure of language definition With tree, later stages can easily find the else clause, e.g., rather than having to scan through tokens to find it.

Comparison with Lexical Analysis Phase Input Output Lexer Parser Sequence of characters Sequence of tokens Sequence of tokens Syntax tree

The Role of the Parser Not all sequences of tokens are programs...... Parser must distinguish between valid and invalid sequences of tokens We need A language for describing valid sequences of tokens A method for distinguishing valid from invalid sequences of tokens

Programming Language Structure Programming languages have recursive structure Consider the language of arithmetic expressions with integers, +, *, and ( ) An expression is either: an integer an expression followed by + followed by expression an expression followed by * followed by expression a ( followed by an expression followed by ) int, int + int, ( int + int) * int are expressions

Notation for Programming Languages An alternative notation: int + * ( ) We can view these rules as rewrite rules We start with and replace occurrences of with some right-hand side * ( ) * ( + ) * (int + int) * int

Observation All arithmetic expressions can be obtained by a sequence of replacements Any sequence of replacements forms a valid arithmetic expression This means that we cannot obtain ( int ) ) by any sequence of replacements. Why? This set of rules is a context-free grammar

Context-Free Grammars A CFG consists of A set of non-terminals N By convention, written with capital letter in these notes A set of terminals T By convention, either lower case names or punctuation A start symbol S (a non-terminal) A set of productions Assuming N ε, or Y 1 Y 2... Y n where Y i N T

xamples of CFGs Simple arithmetic expressions: int + * ( ) One non-terminal: Several terminals: int, +, *, (, ) Called terminals because they are never replaced By convention the non-terminal for the first production is the start one

The Language of a CFG Read productions as replacement rules: X Y 1... Y n Means X can be replaced by Y 1... Y n X ε Means X can be erased (replaced with empty string)

Key Idea 1. Begin with a string consisting of the start symbol S 2. Replace any non-terminal X in the string by a right-hand side of some production X Y 1 Y n 3. Repeat (2) until there are only terminals in the string 4. The successive strings created in this way are called sentential forms.

The Language of a CFG (Cont.) More formally, may write X 1 X i-1 X i X i+1 X n X 1 X i-1 Y 1 Y m X i+1 X n if there is a production X i Y 1 Y m

The Language of a CFG (Cont.) Write if X 1 X n * Y 1 Y m X 1 X n Y 1 Y m in 0 or more steps

The Language of a CFG Let G be a context-free grammar with start symbol S. Then the language of G is: L(G) = { a 1 a n S * a 1 a n and every a i is a terminal }

xamples: S 0 also written as S 0 1 S 1 Generates the language { 0, 1 } What about S 1 A A 0 1 What about S 1 A A 0 1 A What about S ε ( S )

Pyth xample A fragment of Pyth: Compound while xpr: Block if xpr: Block lses lses ε else: Block elif xpr: Block lses Block Stmt_List Suite (Formal language papers use one-character nonterminals, but we don t have to!)

Notes The idea of a CFG is a big step. But: Membership in a language is yes or no we also need parse tree of the input Must handle errors gracefully Need an implementation of CFG s (e.g., bison)

More Notes Form of the grammar is important Many grammars generate the same language Tools are sensitive to the grammar Tools for regular languages (e.g., flex) are also sensitive to the form of the regular expression, but this is rarely a problem in practice

Derivations and Parse Trees A derivation is a sequence of sentential forms resulting from the application of a sequence of productions S A derivation can be represented as a tree Start symbol is the tree s root For a production X Y 1 Y n add children Y 1,, Y n to node X

Derivation xample Grammar + * () int String int * int + int

Derivation xample (Cont.) + * + int * + int * int + int * int + int

Derivation in Detail (1)

Derivation in Detail (2) + +

Derivation in Detail (3) + * + + *

Derivation in Detail (4) + * + int * + * + int

Derivation in Detail (5) + * + int * + int * int + * + int int

Derivation in Detail (6) + * + int * + int * int + int * int + int int * int + int

Notes on Derivations A parse tree has Terminals at the leaves Non-terminals at the interior nodes A left-right traversal of the leaves is the original input The parse tree shows the association of operations, the input string does not! There may be multiple ways to match the input Derivations (and parse trees) choose one

leftmost and Right-most Derivations The example was a leftmost derivation At each step, replaced the leftmost non-terminal There is an equivalent notion of a rightmost derivation, shown here: + + int * + int * int + int int * int + int

rightmost Derivation in Detail (1)

rightmost Derivation in Detail (2) + +

rightmost Derivation in Detail (3) + + int + int

rightmost Derivation in Detail (4) + + int * + int + * int

rightmost Derivation in Detail (5) + + int * + int * int + int * + int int

rightmost Derivation in Detail (6) + + int * + int * int + int int * int + int * + int int int

Aside: Canonical Derivations Take a look at that last derivation in reverse. The active part (red) tends to move left to right. We call this a reverse rightmost or canonical derivation. Comes up in bottom-up parsing. We ll return to it in a couple of lectures.

Derivations and Parse Trees For each parse tree there is a leftmost and a rightmost derivation The difference is the order in which branches are added, not the structure of the tree.

Parse Trees and Abstract Syntax Trees The example we saw near the start: IF-THN-LS == = = ID ID ID INT ID INT was not a parse tree, but an abstract syntax tree Parse trees slavishly reflect the grammar. Abstract syntax trees more general, and abstract away from the grammar, cutting out detail that interferes with later stages.

Summary of Derivations We are not just interested in whether s L(G) We need a parse tree for s, and ultimately an abstract syntax tree. A derivation defines a parse tree But one parse tree may have many derivations leftmost and rightmost derivations are important in parser implementation