Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

Similar documents
Syntax Analysis. Chapter 4

CSE302: Compiler Design

Chapter 4: Syntax Analyzer

Compiler Design Overview. Compiler Design 1

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Error Recovery during Top-Down Parsing: Acceptable-sets derived from continuation

A simple syntax-directed

Syntax Analysis Part I

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

A programming language requires two major definitions A simple one pass compiler

LECTURE NOTES ON COMPILER DESIGN P a g e 2

Introduction to Parsing Ambiguity and Syntax Errors

A Simple Syntax-Directed Translator

UNIT-4 (COMPILER DESIGN)

Introduction to Parsing Ambiguity and Syntax Errors

Syntax Analysis Top Down Parsing

Ambiguity and Errors Syntax-Directed Translation

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Today s Topics. Last Time Top-down parsers - predictive parsing, backtracking, recursive descent, LL parsers, relation to S/SL

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Error Handling Syntax-Directed Translation Recursive Descent Parsing

1 Lexical Considerations

Compiler Design. Subject Code: 6CS63/06IS662. Part A UNIT 1. Chapter Introduction. 1.1 Language Processors

CA Compiler Construction

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

3. Parsing. Oscar Nierstrasz

3.5 Practical Issues PRACTICAL ISSUES Error Recovery

Some Basic Definitions. Some Basic Definitions. Some Basic Definitions. Language Processing Systems. Syntax Analysis (Parsing) Prof.

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

COMP 3002: Compiler Construction. Pat Morin School of Computer Science

CPS 506 Comparative Programming Languages. Syntax Specification

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Syntax Errors; Static Semantics

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis


1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

LECTURE 3. Compiler Phases

IPCoreL. Phillip Duane Douglas, Jr. 11/3/2010

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Grammars and Parsing. Paul Klint. Grammars and Parsing

Stating the obvious, people and computers do not speak the same language.

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Recursive Descent Parsers

Compiler Design (40-414)

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Error Handling Syntax-Directed Translation Recursive Descent Parsing

CS 314 Principles of Programming Languages

Lexical and Syntax Analysis

MANIPAL INSTITUTE OF TECHNOLOGY (Constituent Institute of MANIPAL University) MANIPAL

2.2 Syntax Definition


Error Handling Syntax-Directed Translation Recursive Descent Parsing

Part 5 Program Analysis Principles and Techniques

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

Lexical Considerations

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Decaf Language Reference

COMPILER DESIGN LECTURE NOTES

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

Syntax-Directed Translation. Lecture 14

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Error Management in Compilers and Run-time Systems

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

CS 4201 Compilers 2014/2015 Handout: Lab 1

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, Pilani Pilani Campus Instruction Division

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Consider a description of arithmetic. It includes two equations that define the structural types of digit and operator:

1. Explain the input buffer scheme for scanning the source program. How the use of sentinels can improve its performance? Describe in detail.

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

LANGUAGE PROCESSORS. Introduction to Language processor:

Related Course Objec6ves

The Structure of a Syntax-Directed Compiler

Lexical Considerations

COMPILER CONSTRUCTION Seminar 02 TDDB44

4. An interpreter is a program that

Building Compilers with Phoenix

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

ECE251 Midterm practice questions, Fall 2010

CS 406/534 Compiler Construction Parsing Part I

Experiences in Building a Compiler for an Object-Oriented Language

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Transcription:

Introduction to Syntax Analysis Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila chirila@cs.upt.ro http://www.cs.upt.ro/~chirila

Outline Syntax Analysis Syntax Rules The Role of the Parser Types of Parsers Representative Grammars Ambiguous Grammars Syntax Error Handling

Syntax Analysis parsing methods typically used in compilers basic concepts techniques suitable for hand implementation algorithms used in automated tools recovery methods from common errors

Syntax Rules each programming language has precise rules that prescribe the syntactic structure of well formed programs e.g.: a C program functions declarations statements expressions can be expressed as context-free grammars BNF rules grammars offers benefits for both language designers compiler writers

Grammar Benefits precise syntactic specification of a programming language from certain classes of grammars efficient parsers can be automatically generated the structure disclosed by a grammar is useful for translating source programs object code detecting errors allows a language to to evolve to be developed iteratively and incrementally

Topics how a parser fits into a typical compiler take a look at typical grammars for arithmetic expressions carry over to most programming constructs error handling finding that the input can not be generated by the grammar

The Role of the Parser to receive a string of tokens from the lexical analyzer to verify whether the string of token names can be generated by the grammar for a source language to report any syntax errors in an intelligible way to recover from commonly occurring errors to continue processing the remainder of the program

The Role of the Parser to construct a parse tree to pass it to the rest of the compiler for further processing to intersperse with checking and translations actions

Types of Parsers Universal can parse any grammar Cocke-Younger-Kasami parsing methods Earley s algorithm inefficient to be used in production compilers Top-down - build parse trees from top (root) to the bottom (leaves) Bottom-up start from leaves work their way up to the root Both top-down and bottom-up scan the input from left to right one symbol at a time

Types of Parsers most efficient top-down and bottom-up work for subclasses of grammars LL and LR are expressive enough to describe most of the syntactic constructs in modern programming languages by hand implemented parsers use LL grammars tool generated parsers use the larger class of LR grammars

Parser Input Stream of tokens Output Some representation of the parse tree Tasks Collecting information about various tokens into the symbol table Performing type, domain checking, Generating intermediate code

Representative Grammars constructs starting with keywords while, int are easy to parse keywords guide the choice of the grammar production that must be applied to match the input expressions are more challenging because of operators which have association rules precedence

Representative Grammars E expressions of terms separated by + signs T terms consisting of factors separated by * signs F factors which can be parenthesized expressions or identifiers E -> E + T T T -> T * F F F -> ( E ) id LR grammar suitable for bottom-up parsing can be adapted to handle additional operators levels of precedence can not be used for top-down parsing because is left recursive!!!

Representative Grammars Non-left-recursive variant Suitable for top-down parsing E -> T E E -> + T E ε T -> F T T -> * F T ε F -> ( E ) id

Ambiguous Grammar E -> E + E E * E (E) id operators + and * are treated alike the grammar permits more than one parse for the expression a+b*c

Syntax Error Handling nature of syntactic errors general strategies of error recovery parsing only correct code design and implementation greatly simplified assisting the programmer to locate and track down errors few languages with error handling in design error induced behavior is not present in language specification error handling left to compiler designer planning it from the beginning simplifies the structure of the compiler improves the handling of errors

Common Programming Error Levels lexical errors misspelling of identifiers, keywords or operators e.g. missing quotes around text intended as string syntactic errors misplaced, extra, missing tokens: semicolons, braces e.g. case without enclosing switch (Java) semantic errors type mismatches between operators and operands e.g. return statement for a Java method with result type void logical errors incorrect reasoning on the part of the programmer e.g. using in C the assignment = operator instead of the comparison == operator

Viable Prefix Property precision of parsing methods allows efficient syntactic error detection LL and LR parsing methods detect an error as soon as possible Viable Prefix Property of parsing methods is to issue an error when the token stream can not be parsed further according to the grammar for the language when they see a prefix at the input that can not be completed to for a string in the language

Error Handler Goals report the presence of errors clearly and accurately recover from errors quickly in order to detect subsequent errors add minimal overhead to the processing of correct programs However accurate detection of semantic and logical errors at compile time is in general a difficult task!!! common errors are simple ones straightforward error-handling mechanisms suffices

Error Reporting the place in the source program where the error is detected the actual error is probably around the 2-3 neighbor tokens common strategy print the offending line point to the position where the error was detected

Error Recovery Strategies when error detected -> the parser should recover no strategy universally acceptable few methods with broad applicability to quit with an informative error message at first error additional errors are uncovered if the parser restores itself to a state where processing of the input can continue with reasonable hopes that further processing is meaningful if error number increases then the compiler should stop after a given error number limit will avoid issuing an avalanche of spurious messages

Panic Mode Recovery on discovering an error the parser discards input symbols one at a time until is found one of a designated set of synchronizing tokens delimiters ; or } have a clear and unambiguous role must be selected by the compiler designer skips considerable amount of input no checking for additional errors simple guaranteed not to go on an infinite loop

Phrase-Level Recovery on discovering an error to perform local correction on the remaining input to replace the remaining input by some string that allows the parser to continue examples to replace a comma by a semicolon to delete an extraneous semicolon to insert a missing semicolon the choice of local correction is left to the compiler designer to choose replacements that do not lead to infinite loops difficulty in coping with situations in which the actual error has occurred before the detection point

Error Productions to equip the grammar with productions which generate erroneous constructs such a parser detects the anticipated errors when an error production is used during parsing the parser can generate appropriate error diagnostics

Global Correction ideally a compiler would make as few changes as possible in processing an incorrect string algorithms for choosing the minimal sequence of changes to obtain globally a least-cost correction given an incorrect input x to find a parse tree for a related string y such as the number of insertions, deletions and changes of tokens required to transform x into y is as small as possible to costly to implement in time and space only of theoretical interest a closest correct program may not have the same semantics as the intended erroneous one the least cost correction is used for evaluating error recovery techniques finding optimal replacement strings for phrase-level recovery

Bibliography Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman Compilers, Principles, Techniques and Tools, Second Edition, 2007