Today s Topics. Last Time Top-down parsers - predictive parsing, backtracking, recursive descent, LL parsers, relation to S/SL

Similar documents
Compiler Structure. Lexical. Scanning/ Screening. Analysis. Syntax. Parsing. Analysis. Semantic. Context Analysis. Analysis.

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

A simple syntax-directed

A programming language requires two major definitions A simple one pass compiler

Syntax Analysis. Chapter 4

Error Recovery during Top-Down Parsing: Acceptable-sets derived from continuation

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Recursive Descent Parsers

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

CPS 506 Comparative Programming Languages. Syntax Specification

Building Compilers with Phoenix

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Syntax Errors; Static Semantics

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

1 Lexical Considerations

CSE 3302 Programming Languages Lecture 2: Syntax

CS /534 Compiler Construction University of Massachusetts Lowell

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Programming Project II

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Lexical Considerations

More Assigned Reading and Exercises on Syntax (for Exam 2)

3.5 Practical Issues PRACTICAL ISSUES Error Recovery

CS664 Compiler Theory and Design LIU 1 of 16 ANTLR. Christopher League* 17 February Figure 1: ANTLR plugin installer

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

CSE302: Compiler Design

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Homework & Announcements

Defining syntax using CFGs

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

CST-402(T): Language Processors

A Simple Syntax-Directed Translator

Lexical and Syntax Analysis. Top-Down Parsing

Context-free grammars (CFG s)

Syntax Analysis Top Down Parsing

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Compiler Design Overview. Compiler Design 1

Error Handling Syntax-Directed Translation Recursive Descent Parsing. Lecture 6

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

CSCI 1260: Compilers and Program Analysis Steven Reiss Fall Lecture 4: Syntax Analysis I

Error Handling Syntax-Directed Translation Recursive Descent Parsing

CSCI312 Principles of Programming Languages!

Syntax Intro and Overview. Syntax

LANGUAGE PROCESSORS. Introduction to Language processor:

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS. Regrades 10/6/15. Prelim 1. Prelim 1. Expression trees. Pointers to material

Lexical Analysis. Introduction

COP4020 Programming Assignment 2 - Fall 2016

Chapter 3. Parsing #1

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Table-Driven Top-Down Parsers

Lexical Considerations

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Lexical and Syntax Analysis

Syntax Analysis Part I

Left to right design 1

Chapter 4: Syntax Analyzer

Lecture 8: Deterministic Bottom-Up Parsing

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

Parsing II Top-down parsing. Comp 412

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Programming Languages Third Edition. Chapter 9 Control I Expressions and Statements

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Introduction to Parsing

LECTURE 7. Lex and Intro to Parsing

22c:111 Programming Language Concepts. Fall Syntax III

Compiler Design Aug 1996

A clarification on terminology: Recognizer: accepts or rejects strings in a language. Parser: recognizes and generates parse trees (imminent topic)

Compilers and Interpreters

PL Revision overview

COMPILER DESIGN LECTURE NOTES

Compilers. Prerequisites

Last time. What are compilers? Phases of a compiler. Scanner. Parser. Semantic Routines. Optimizer. Code Generation. Sunday, August 29, 2010

Ambiguity and Errors Syntax-Directed Translation

Time : 1 Hour Max Marks : 30

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Context free grammars and predictive parsing

Using an LALR(1) Parser Generator

B The SLLGEN Parsing System

CMSC 330: Organization of Programming Languages

COP4020 Programming Assignment 2 Spring 2011

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

1. Lexical Analysis Phase

Transcription:

Today s Topics Last Time Top-down parsers - predictive parsing, backtracking, recursive descent, LL parsers, relation to S/SL This Time Constructing parsers in SL Syntax error recovery and repair

Parsing Using SL As we have observed, an SL program (an S/SL program with no semantic mechanisms) is a context-free parser The output of an SL parser is a stream of semantic tokens - that is, input tokens for the semantic analyzer Expressions are translated by the parser to postfix notation using SL rules, one for each level of precedence 1+4*(7*6+9) 1 4 7 6 * 9 + * + Postfix is a convenient linear notation for the output parse tree - all precedences have been resolved, and the order of operations has been determined and is represented directly Subsequent phases (semantic analysis, code generation) need not be bothered with precedence or associativity

Understanding Postfix Postfix is simply a convenient linear notation for a parse tree Constructing the represented parse tree consists of reading from left to right, rotating the operators above the previous two items Example: 1 4 7 6 * 9 + * + 1 4 * 9 + * + 1 4 7 6 9 + * + 1 4 + * + 1 4 * 9 * + 1 4 7 6 9 + * + 1 * + 9 * + 1 4 * + 9 * + 1 4 * 9 * + 1 4 7 6 9 + * + 1 + ** + 9 * 1 4 * + 9 * 1 4 * + 9 * + 1 4 * 9 * + 1 4 7 6 9 + * +

SL Parsing - Expressions For precedences: 1. unary highest (most binding) (C, Java, Pascal) 2. * / 3. + lowest (least binding) Expn: @Term {[ + : @Term.sAdd - : @Term.sSubtract *: > ]}; Term: @Factor {[ * : @Factor.sMult / : @Factor.sDivide *: > ]}; Factor: [ - : @Factor.sMinus pidentifier:.sidentifier @Subscripts pinteger:.sinteger ]; Example: 1+-3*2 1 3 sminus 2 smult sadd

SL Parsing - Expressions For precedences: 1. * / 2. unary 3. + Expn: @Term {[ + : @Term.sAdd - : @Term.sSubtract *: > ]}; Term: [ - : @Term.sMinus *: @SubTerm ]; SubTerm: @Factor {[ * : @Factor.sMult / : @Factor.sDivide *: > ]}; Example: 1+-3*2 1 3 2 smult sminus sadd Factor: [ pidentifier:.sidentifier @Subscripts pinteger:.sinteger ];

Role of the SL Parser The Parser s job is to: (1) check syntax to see that it is valid, (2) resolve precedence and order of operations (associativity), (3) recognize and explicitly relate structure of statements, blocks and scopes Each kind of Parser achieves these aims in different ways The SL Parser does the first of these automatically using S/SL syntax error detection (more on this later) It does the second by converting expressions to their parse tree in postfix form It does the third by recognizing structural relationships and explicitly marking them using semantic tokens

Syntax Error Recovery A Parser must detect syntax errors - but it must also: Isolate them, so that the user knows where the mistake is, and Recover from them, so that it can continue parsing and check the rest of the program Parsing methods vary in their ability to do each of these well - all methods can detect syntax errors (otherwise they wouldn t be parsers) Isolating them (pointing out where the actual fault is) is more difficult, and recovering from them without causing more (phantom) errors is even more difficult

Syntax Error Recovery Most parsing algorithms provide some general recovery strategy Basic problem: When syntax error is detected, (a mismatch between the next input token and what s expected by the grammar), Resynchronize the state of the parse and the next input token with a minimum number of actions and the minimum probability of further errors Effective syntax error recovery and repair is a big area of research, with many complex algorithms Varying degrees of success, ranging from virtually useless (with many cascading phantom errors), to virtually perfect, with little or no cascading Success varies depending on the particular error - most recovery algorithms have one or more pathological cases, where they do very badly

SL Syntax Error Recovery The built-in syntax error recovery used by SL (by Barnard, 1981) is based on using end-of-statement markers (e.g. ; ) as synchronization points for the parse This is a common idea in many syntactic error recovery strategies - e.g. COBOL compilers typically skip to the next statement keyword that starts a statement

SL Syntax Error Recovery In SL, a syntax error is detected when the expected input token (in either an explicit input token action or an input choice action) does not match the next input token The steps in recovery are: 1. Suspend input token acceptance (enter recovery state) 2. In recovery state, continue to walk S/SL table, pretending all expected input tokens are matched but accepting none, until we expect an end-of-statement token 3. Flush (discard) tokens from input until an end-of-statement token 4. Exit recovery state and continue as if nothing had happened

SL Syntax Error Recovery - Example VarDeclaration: SimpleType: pidentifier [.sidentifier pidentifier: : *: @TypeBody @Constant ; ;...srange @Constant TypeBody: ]; [ array : Constant: file : pinteger integer :.sintegerliteral; *: @SimpleType ]; Input: var x = integer; Recovery/repair: var x : 1..1 ;

SL Syntax Error Recovery - Input Choices If the error occurs during an input choice (i.e. the next input token does not match any alternative, and the choice has no default) then SL treats it as a mismatch of the first alternative So for recovery purposes (only), Constant: [ pidentifier:.sidentifier pinteger:.sinteger ]; Acts like: Constant: [ pinteger:.sinteger *: pidentifier.sidentifier ]; In other words, the error is treated as a mismatch of the first alternative, and then recovery proceeds as usual And if we are in recovery state, the first alternative is always taken

Recovery Loops SL recovery state walks the SL program looking for an expected end-of-statement token, pretending all input is matched When coding SL we have to be aware of the possibility of recovery - otherwise the path followed in recovery state may never reach and expected end-of-statement, and we have an infinite loop! ConstantList: ( @Constant {[ ) : > *:, @Constant ]}; If input is (1, 3; 2) infinite loop Each time around the loop, the default ( *: )is taken Must structure SL rules such that recovery will always terminate

Recovery Loop Resolution A simple rule-of-thumb makes it easy to be sure recovery will always terminate Always make the exiting alternative of an SL loop either: (a) the default (otherwise) alternative, or (b) the first alternative of a choice without a default ConstantList: ConstantList: ( ( @Constant @Constant {[ {[, : ) : @Constant > *:, : > @Constant ]} ]}; ) ; If input is (1, 3; 2) exits the loop as soon as enters recovery state

What if no Semicolons? Easier syntax error recovery was one of the original motivating factors behind the use of semicolons in programming languages Some languages, such as Euclid and Turing, do not have semicolons or other any other end-of-statement tokens (but are still free-format, that is, end-of-lines are not required) So how can the SL parser resynchronize after a syntax error? Observation: Programmers do not format their code in arbitrary ways - newlines (return or enter keys) tend to be typed at the ends of expressions and statements - therefore we can use the actual newline characters as synchronization points Of course this will not always work perfectly - but quite often in Computer Science we tend to overgeneralize and worry about the worst case when it s not reasonable - in 99% of cases this works If there happens to be a misplaced newline in the middle of a statement and there is an error, the only penalty will be a single extra cascaded syntax error

SL Syntax Error Recovery - No Semicolons Insert special new line ( <nl> ) tokens into the grammar at ends of statements and other forms where semicolons would be Example: VarDeclarations: pidentifier.sidentifier var x: int : var x: string @TypeBody <nl> ; SL Parser modifications to handle this : 1. Mismatches between expected newline tokens and input tokens are ignored (not syntax errors because newlines not required in the language) 2. Newline tokens are used as synchronization points : (a) Recovery state walks to the next expected newline token (b) Input is flushed to the next newline in the input

Summary SL Parsers Parsers in SL output a postfix token stream which represents a parse tree in linear form SL parsers encode precedence and associativity using cascaded nonterminals as in BNF Syntax error recovery is an important and difficult problem SL has a simple built-in syntax error recovery strategy based on end-of-statement markers such as semicolon SL parsers must be coded to take into account the recovery strategy in order to avoid recovery loops Next Week Programming language semantics Runtime model of execution Assignment #1 DUE - Wednesday, Feb 6, 4:30 pm (i.e., before next class)