Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Similar documents
Introduction to Parsing Ambiguity and Syntax Errors

Introduction to Parsing Ambiguity and Syntax Errors

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

Ambiguity and Errors Syntax-Directed Translation

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Introduction to Parsing. Lecture 8

Context-Free Grammars

Introduction to Parsing. Lecture 5

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Introduction to Parsing. Lecture 5

Error Handling Syntax-Directed Translation Recursive Descent Parsing. Lecture 6

Intro To Parsing. Step By Step

Outline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Syntax-Directed Translation. Lecture 14

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Syntax Analysis. Chapter 4

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Grammars and ambiguity. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 8 1

Abstract Syntax Trees & Top-Down Parsing

Some Basic Definitions. Some Basic Definitions. Some Basic Definitions. Language Processing Systems. Syntax Analysis (Parsing) Prof.

Syntax Analysis Part I

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

E E+E E E (E) id. id + id E E+E. id E + E id id + E id id + id. Overview. derivations and parse trees. Grammars and ambiguity. ambiguous grammars

A Simple Syntax-Directed Translator

Lexical and Syntax Analysis. Top-Down Parsing

CMPT 379 Compilers. Parse trees

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

Chapter 4: Syntax Analyzer

A simple syntax-directed

CSCI312 Principles of Programming Languages!

Recursive Descent Parsers

LECTURE 3. Compiler Phases

CMSC 330: Organization of Programming Languages

Lexical and Syntax Analysis

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Compilers and computer architecture From strings to ASTs (2): context free grammars

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Principles of Programming Languages COMP251: Syntax and Grammars

COP 3402 Systems Software Syntax Analysis (Parser)

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

Introduction to Lexical Analysis

CSE302: Compiler Design

Parsing CSCI-400. Principles of Programming Languages.

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Context-free grammars (CFG s)

Syntax Errors; Static Semantics

Semantic Analysis. Compiler Architecture

Building Compilers with Phoenix

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

CS /534 Compiler Construction University of Massachusetts Lowell

Introduction to Lexical Analysis

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CSE 3302 Programming Languages Lecture 2: Syntax

CS152 Programming Language Paradigms Prof. Tom Austin, Fall Syntax & Semantics, and Language Design Criteria

CS 314 Principles of Programming Languages

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Introduction to Lexing and Parsing

Question Bank. 10CS63:Compiler Design

Syntax Intro and Overview. Syntax

A programming language requires two major definitions A simple one pass compiler

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division

CS 406/534 Compiler Construction Putting It All Together

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

CS164: Midterm I. Fall 2003

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Lexical Analysis. Lecture 2-4

CMSC 330: Organization of Programming Languages

CSCE 314 Programming Languages

Front End. Hwansoo Han

Context-Free Grammars

Ambiguity. Lecture 8. CS 536 Spring

Syntax and Grammars 1 / 21

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Parsing: Derivations, Ambiguity, Precedence, Associativity. Lecture 8. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

CMSC 330: Organization of Programming Languages

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Parsing II Top-down parsing. Comp 412

Compiler Design Concepts. Syntax Analysis

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Lexical Analysis. Chapter 2

Syntactic Analysis. Top-Down Parsing

Transcription:

Programming Languages & Translators PARSING Baishakhi Ray Fall 2018 These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Languages and Automata Formal languages are very important in CS specially in programming languages Regular Languages Weakest formal languages that are widely used Many applications Many Languages are not regular

Automata that accepts odd numbers of 1 1 0 0 How many 1s it has accepted? - Only solution is duplicate state Automata does not have any memory 1

Intro to Parsing Regular Languages Weakest formal languages that are widely used Many applications Consider the language {( i ) i i 0} (), (( )), ((( ))) ((1 + 2) * 3) Nesting structures if.. if.. else.. else.. Regular languages cannot handle well

Intro to Parsing Input: if(x==y) 1 else 2; Parser Input (Lexical Input): KY(IF) ( ID(x) OP( == ) ) INT(1) KY(LS) INT(2) ; Parser Output IF-THN-LS == = = ID ID INT INT

Intro to Parsing Character stream Lexical Analysis Token stream Parser Syntax trees Semantic Analysis Syntax trees Code Generation Nor every strings of tokens are valid Parser must distinguish between valid and invalid token strings. We need A Language: to describe valid string A method: to distinguish valid from invalid.

Context Free Grammar A CFG consists of A set of terminal T A set of non-terminal N A start symbol S (S ε N) A set of production rules X -> Y 1..Y N X ε N Y i ε {N, T, ε} x: S -> ( S ) ε N = {S} T = { (, ), ε}

Context Free Grammar 1. Begin with a string with only the start symbol S 2. Replace a non-terminal X with in the string by the RHS of some production rule: X->Y 1..Y n 3. Repeat 2 again and again until there are no non-terminals X 1 X i X X i+1. X n -> X 1 X i Y 1..Y k X i+1. X n For the production rule X -> Y 1..Y k 0 1 n 0 n, n 0

Context Free Grammar Let G be a CFG with start symbol S. Then the language L(G) of G is: {a 1.. an i ai T S a 1 a 2 an}

Context Free Grammar There are no rules to replace terminals. Once generated, terminals are permanent Terminals ought to be tokens of programming languages Context-free grammars are a natural notation for this recursive structure

CFG: Simple Arithmetic expression à + * ( ) id Languages can be generated: id, ( id ), ( id + id ) * id,

Derivation A derivation is a sequence of production S -> -> -> A derivation can be drawn as a tree Start symbol is tree s root For a production X -> Y 1.Y n, add children Y 1.Y n to node X

Grammar -> + * () id String id * id + id Derivation -> + -> * + -> id * + -> id * id + -> id * id + id id + * id id Parse Tree

Parse Tree A parse tree has Terminals at the leaves Non-terminals at the interior nodes An in-order traversal of the leaves is the original input The parse tree shows the association of operations, the input string does not

Parse Tree Left-most derivation At each step, replace the left-most non-terminal -> + -> * + -> id * + -> id * id + -> id * id + id Right-most derivation At each step, replace the right-most non-terminal -> + -> + id -> * + id -> * id + id -> id * id + id Note that, right-most and left-most derivations have the same parse tree

Ambiguity Grammar -> + * () id String id * id + id + * id * id id id id + id

Ambiguity A grammar is ambiguous if it has more than one parse tree for a string There are more than one right-most or left-most derivation for some string Ambiguity is bad Leaves meaning for some programs ill-defined

rror Handling Purpose of the compiler is To detect non-valid programs To translate the valid ones Many kinds of possible errors (e.g., in C) rror Kind xample Detected by Lexical $... Lexer Syntax x*%... Parser Semantic int x; y = x(3);... Type Checker Correctness your program tester/user

rror Handling rror Handler should Recover errors accurately and quickly Recover from an error quickly Not slow down compilation of valid code Types of rror Handling Panic mode rror productions Automatic local or global correction

Panic Mode rror Handling Panic mode is simplest and most popular method When an error is detected Discard tokens until one with a clear role is found Continue from there Typically looks for synchronizing tokens Typically the statement of expression terminators

Panic Mode rror Handling xample: (1 + + 2 ) + 3 Panic-mode recovery: Skip ahead to the next integer and then continue Bison: use the special terminal error to describe how much input to skip -> int + ( ) error int ( error ) Normal mode rror mode

rror Productions Specify known common mistakes in the grammar xample: Write 5x instead of 5 * x Add production rule ->.. Disadvantages complicates the grammar

rror Corrections Idea: find a correct nearby program Try token insertions and deletions (goal: minimize edit distance) xhaustive search Disadvantages Hard to implement Slows down parsing of correct programs Nearby is not necessarily the intended program

rror Corrections Past Slow recompilation cycle (even once a day) Find as many errors in once cycle as possible Present Quick recompilation cycle Users tend to correct one error/cycle Complex error recovery is less compelling

Abstract Syntax Trees A parser traces the derivation of a sequence of tokens But the rest of the compiler needs a structural representation of the program Abstract Syntax Trees Like parse trees but ignore some details Abbreviated as AST

Abstract Syntax Trees Grammar -> int ( ) + String 5 + (2 + 3) After lexical analysis Int<5> + ( Int<2> + Int<3> )

Abstract Syntax Trees: 5 + ( 2 + 3) Parse Trees + Int<5> ( ) + Int<2> Int<3>

Abstract Syntax Trees: 5 + ( 2 + 3) Parse Trees + Int<5> ( ) + Int<2> Int<3> Have too much information Parentheses Single-successor nodes

Abstract Syntax Trees: 5 + ( 2 + 3) Parse Trees AST + + Int<5> ( ) Int<5> + + Int<2> Int<2> Int<3> Int<3> Have too much information Parentheses Single-successor nodes ASTs capture the nesting structure But abstracts from the concrete syntax More compact and easier to use

Disadvantages of ASTs AST has many similar forms.g., for, while, repeat...until.g., if,?:, switch xpressions in AST may be complex, nested (x * y) + (z > 5? 12 * z : z + 20) Want simpler representation for analysis...at least, for dataflow analysis 30