Syntax Analysis MIF08. Laure Gonnord

Similar documents
Lexing, Parsing. Laure Gonnord sept Master 1, ENS de Lyon

Compilation: a bit about the target architecture

Writing Evaluators MIF08. Laure Gonnord

Lab 2. Lexing and Parsing with ANTLR4

Compil M1 : Front-End

Lexical Analysis. Chapter 2

Building Compilers with Phoenix

Semantic Analysis. Lecture 9. February 7, 2018

Introduction to Lexical Analysis

Lexical Analysis. Lecture 2-4

Topic 3: Syntax Analysis I

Syntax and Grammars 1 / 21

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Building Compilers with Phoenix

Introduction to Lexical Analysis

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Introduction to Compiler Design

Introduction to Parsing. Lecture 8

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Week 2: Syntax Specification, Grammars

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Introduction to Parsing Ambiguity and Syntax Errors

Compiler phases. Non-tokens

CSE 3302 Programming Languages Lecture 2: Syntax

Programming Languages and Compilers (CS 421)

Exercise ANTLRv4. Patryk Kiepas. March 25, 2017

A simple syntax-directed

n (0 1)*1 n a*b(a*) n ((01) (10))* n You tell me n Regular expressions (equivalently, regular 10/20/ /20/16 4

CSCI312 Principles of Programming Languages!

Lexical Analysis. Lecture 3-4

Introduction to Parsing Ambiguity and Syntax Errors

Chapter 3 Lexical Analysis

Compiler course. Chapter 3 Lexical Analysis

COSE312: Compilers. Lecture 1 Overview of Compilers

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

Compilers and computer architecture From strings to ASTs (2): context free grammars

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Lexical Analysis. Finite Automata

Introduction - CAP Course

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Lexical Analysis. Finite Automata

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Cunning Plan. Informal Sketch of Lexical Analysis. Issues in Lexical Analysis. Specifying Lexers

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Chapter 4. Lexical and Syntax Analysis

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Compiler Construction D7011E

Formal Languages and Compilers Lecture VI: Lexical Analysis

Compiler Design Overview. Compiler Design 1

Lab 2. Lexing and Parsing with Flex and Bison - 2 labs

Preparing for the ACW Languages & Compilers

Parser Tools: lex and yacc-style Parsing

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

CS 11 Ocaml track: lecture 6

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

CPS 506 Comparative Programming Languages. Syntax Specification

Lexical Analysis. Introduction

LECTURE 3. Compiler Phases

A Simple Syntax-Directed Translator

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

Compiler principles, PS1

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

CS152 Programming Language Paradigms Prof. Tom Austin, Fall Syntax & Semantics, and Language Design Criteria

Lecture 14 Sections Mon, Mar 2, 2009

CS153: Compilers Lecture 4: Recursive Parsing

CMSC 350: COMPILER DESIGN

Implementation of Lexical Analysis

Context-Free Grammars

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) Programming Languages & Compilers. Major Phases of a Compiler

CSCI 2041: Advanced Language Processing

Programming Languages and Compilers (CS 421)

Programming Assignment II

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) I. Major Phases of a Compiler. Programming Languages & Compilers

Introduction to Parsing. Lecture 5

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Lexical Analysis. Lecture 3. January 10, 2018

Building lexical and syntactic analyzers. Chapter 3. Syntactic sugar causes cancer of the semicolon. A. Perlis. Chomsky Hierarchy

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

An ANTLR Grammar for Esterel

Compiler Lab. Introduction to tools Lex and Yacc

4. Lexical and Syntax Analysis

Lisp: Lab Information. Donald F. Ross

COP 3402 Systems Software Syntax Analysis (Parser)

COMPILER DESIGN LECTURE NOTES

Compiling Regular Expressions COMP360

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

UNIT -2 LEXICAL ANALYSIS

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

Parser Tools: lex and yacc-style Parsing

4. Lexical and Syntax Analysis

Lexical and Syntax Analysis

2068 (I) Attempt all questions.

Transcription:

Syntax Analysis MIF08 Laure Gonnord Laure.Gonnord@univ-lyon1.fr

Goal of this chapter Understand the syntaxic structure of a language; Separate the different steps of syntax analysis; Be able to write a syntax analysis tool for a simple language; Remember: syntax semantics. Laure Gonnord (Lyon1/FST) Syntax Analysis 2 / 29

Syntax analysis steps How do you read text? Text=a sequence of symbols (letters, spaces, punctuation); Group symbols into tokens: Words: groups of letters; Punctuation; Spaces. Laure Gonnord (Lyon1/FST) Syntax Analysis 3 / 29

Syntax analysis steps How do you read text? Text=a sequence of symbols (letters, spaces, punctuation); Group symbols into tokens: Lexical analysis Words: groups of letters; Punctuation; Spaces. Laure Gonnord (Lyon1/FST) Syntax Analysis 3 / 29

Syntax analysis steps How do you read text? Text=a sequence of symbols (letters, spaces, punctuation); Group symbols into tokens: Lexical analysis Words: groups of letters; Punctuation; Spaces. Group tokens into: Propositions; Sentences. Laure Gonnord (Lyon1/FST) Syntax Analysis 3 / 29

Syntax analysis steps How do you read text? Text=a sequence of symbols (letters, spaces, punctuation); Group symbols into tokens: Lexical analysis Words: groups of letters; Punctuation; Spaces. Group tokens into: Parsing Propositions; Sentences. Laure Gonnord (Lyon1/FST) Syntax Analysis 3 / 29

Syntax analysis steps How do you read text? Text=a sequence of symbols (letters, spaces, punctuation); Group symbols into tokens: Lexical analysis Words: groups of letters; Punctuation; Spaces. Group tokens into: Parsing Propositions; Sentences. Then proceed with word meanings: Definition of each word. ex: a dog is a hairy mammal, that barks and... Role in the phrase: verb, subject,... Laure Gonnord (Lyon1/FST) Syntax Analysis 3 / 29

Syntax analysis steps How do you read text? Text=a sequence of symbols (letters, spaces, punctuation); Group symbols into tokens: Lexical analysis Words: groups of letters; Punctuation; Spaces. Group tokens into: Parsing Propositions; Sentences. Then proceed with word meanings: Semantics Definition of each word. ex: a dog is a hairy mammal, that barks and... Role in the phrase: verb, subject,... Laure Gonnord (Lyon1/FST) Syntax Analysis 3 / 29

Syntax analysis steps How do you read text? Text=a sequence of symbols (letters, spaces, punctuation); Group symbols into tokens: Lexical analysis Words: groups of letters; Punctuation; Spaces. Group tokens into: Parsing Propositions; Sentences. Then proceed with word meanings: Semantics Definition of each word. ex: a dog is a hairy mammal, that barks and... Role in the phrase: verb, subject,... Syntax analysis=lexical analysis+parsing Laure Gonnord (Lyon1/FST) Syntax Analysis 3 / 29

Lexical Analysis aka Lexing Outline 1 Lexical Analysis aka Lexing 2 Parsing Laure Gonnord (Lyon1/FST) Syntax Analysis 4 / 29

Lexical Analysis aka Lexing What for? int y = 12 + 4*x ; = [TINT, VAR("y"), EQ, INT(12), PLUS, INT(4), FOIS, VAR("x"), PVIRG] Group characters into a list of tokens, e.g.: The word int stands for type integer; A sequence of letters stands for a variable; A sequence of digits stands for an integer;... Laure Gonnord (Lyon1/FST) Syntax Analysis 5 / 29

Lexical Analysis aka Lexing What s behind From a Regular Language, produce a Finite State Machine (see LIF15) Laure Gonnord (Lyon1/FST) Syntax Analysis 6 / 29

Lexical Analysis aka Lexing Tools: lexical analyzer constructors Lexical analyzer constructor: builds an automaton from a regular language definition; Ex: Lex (C), JFlex (Java), OCamllex, ANTLR (multi),... input: a set of regular expressions with actions (Toto.g4); output: a file(s) (Toto.java) that contains the corresponding automaton. Laure Gonnord (Lyon1/FST) Syntax Analysis 7 / 29

Lexical Analysis aka Lexing Analyzing text with the compiled lexer The input of the lexer is a text file; Execution: Checks that the input is accepted by the compiled automaton; Executes some actions during the automaton traversal. Laure Gonnord (Lyon1/FST) Syntax Analysis 8 / 29

Lexical Analysis aka Lexing Lexing tool for Java: ANTLR The official webpage : www.antlr.org (BSD license); ANTLR is both a lexer and a parser; ANTLR is multi-language (not only Java). During the labs; we will use the Python back-end (here, demo in java) Laure Gonnord (Lyon1/FST) Syntax Analysis 9 / 29

Lexical Analysis aka Lexing ANTLR lexer format and compilation.g4 grammar XX; @header { // Some init code... } @members { // Some global variables } // More optional blocks are available --->> lex rules Compilation: antlr4 Toto.g4 javac *.java grun Toto r // produces several Java files // compiles into xx.class files // Run analyzer with starting rule r Laure Gonnord (Lyon1/FST) Syntax Analysis 10 / 29

Lexical Analysis aka Lexing Lexing with ANTLR: example Lexing rules: Must start with an upper-case letter; Follow the usual extended regular-expressions syntax (same as egrep, sed,...). A simple example grammar H e l l o ; / / This r u l e i s a c t u a l l y a parsing r u l e r : HELLO ID ; / / match " h e l l o " f o l l o w e d by an i d e n t i f i e r HELLO : 'hello ' ; / / beware the s i n g l e quotes ID : [ a z ]+ ; / / match lower case i d e n t i f i e r s WS : [ \ t \ r \ n ]+ > s kip ; / / skip spaces, t a b s, newlines Laure Gonnord (Lyon1/FST) Syntax Analysis 11 / 29

Lexical Analysis aka Lexing Lexing - more than regular languages Counting in ANTLR - CountLines.g4 l e x e r grammar CountLines; / / Members can be accessed i n any r u l e @members { i n t nblines=0 ; } NEWLINE : [ \ r \ n ] { nblines++ ; System. out. p r i n t l n ( " Current l i n e s : " +nblines ) ; } ; SK : ( [ a z ]+ [ \ t ] + ) > s kip ; antlr4 Toto.g4 javac *.java grun Toto 'tokens' // produces several Java files // compiles into xx.class files // Run the lexical analyser only Laure Gonnord (Lyon1/FST) Syntax Analysis 12 / 29

Parsing Outline 1 Lexical Analysis aka Lexing 2 Parsing Semantic actions / Attributes Laure Gonnord (Lyon1/FST) Syntax Analysis 13 / 29

Parsing What s Parsing? Relate tokens by structuring them. Flat tokens [TINT, VAR("y"), EQ, INT(12), PLUS, INT(4), FOIS, VAR("x"), PVIRG] Parsing Yes/No + Structured tokens = int y + 12 * 4 x int Laure Gonnord (Lyon1/FST) Syntax Analysis 14 / 29

Parsing Analysis Phases source code lexical analysis sequence of lexems (tokens) syntactic analysis (Parsing) abstract syntax tree (AST ) semantic analysis abstract syntax (+ symbol table) Laure Gonnord (Lyon1/FST) Syntax Analysis 15 / 29

Parsing What s behind? From a Context-free Grammar, produce a Stack Automaton (see LIF15). Laure Gonnord (Lyon1/FST) Syntax Analysis 16 / 29

Parsing Tools: parser generators Parser generator: builds a stack automaton from a grammar definition; Ex: yacc(c), javacup (Java), OCamlyacc, ANTLR,... input : a set of grammar rules with actions (Toto.g4); output : a file(s) (Toto.java) that contains the corresponding stack automaton. Laure Gonnord (Lyon1/FST) Syntax Analysis 17 / 29

Parsing Lexing vs Parsing Lexing supports ( regular) languages; We want more (general) languages rely on context-free grammars; To that intent, we need a way: To declare terminal symbols (tokens); To write grammars. Use both Lexing rules and Parsing rules. Laure Gonnord (Lyon1/FST) Syntax Analysis 18 / 29

Parsing From a grammar to a parser The grammar must be context-free: S-> asb S-> eps The grammar rules are specified as Parsing rules; a and b are terminal tokens, produced by Lexing rules. On board: notion of derivation tree (see also exercise session2) Laure Gonnord (Lyon1/FST) Syntax Analysis 19 / 29

Parsing Parsing with ANTLR: example 1/2 AnBnLexer.g4 lexer grammar AnBnLexer; // Lexing rules: recognize tokens A: 'a ' ; B: 'b ' ; WS : [ \ t \ r \ n ]+ -> skip ; // skip spaces, tabs, newlines Laure Gonnord (Lyon1/FST) Syntax Analysis 20 / 29

Parsing Parsing with ANTLR: example 2/2 AnBnParser.g4 parser grammar A nbnparser; options { tokenvocab = AnBnLexer; } // extern tokens definition // Parsing rules: structure tokens together prog : s EOF ; // EOF: predefined end - of - file token s : A s B ; // nothing for empty alternative Laure Gonnord (Lyon1/FST) Syntax Analysis 21 / 29

Parsing ANTLR expressivity LL(*) At parse-time, decisions gracefully throttle up from conventional fixed k 1 lookahead to arbitrary lookahead. Further reading (PLDI 11 paper, T. Parr, K. Fisher) http://www.antlr.org/papers/ll-star-pldi11.pdf Laure Gonnord (Lyon1/FST) Syntax Analysis 22 / 29

Parsing Left recursion ANTLR permits left recursion: a: a b; But not indirect left recursion. X 1... X n There exist algorithms to eliminate indirect recursions. Laure Gonnord (Lyon1/FST) Syntax Analysis 23 / 29

Parsing Lists ANTLR permits lists: prog: statement + ; Read the documentation! https: //github.com/antlr/antlr4/blob/master/doc/index.md Laure Gonnord (Lyon1/FST) Syntax Analysis 24 / 29

Parsing Semantic actions / Attributes Outline 1 Lexical Analysis aka Lexing 2 Parsing Semantic actions / Attributes Laure Gonnord (Lyon1/FST) Syntax Analysis 25 / 29

Parsing Semantic actions / Attributes Semantic actions Semantic actions: code executed each time a grammar rule is matched. Printing as a semantic action in ANTLR s : A s B { System. out. println (" rule s"); } s : A s B { print (" rule s"); } // python Right rule : Python/Java/C++, depending on the back-end antlr4 -Dlanguage=Python2 We can do more than acceptors. Laure Gonnord (Lyon1/FST) Syntax Analysis 26 / 29

Parsing Semantic actions / Attributes Semantic actions - attributes An attribute is a set attached to non-terminals/terminals of the grammar They are usually of two types: synthetized: sons father. inherited: the converse. Laure Gonnord (Lyon1/FST) Syntax Analysis 27 / 29

Parsing Semantic actions / Attributes Semantic attributes for numerical expressions 1/2 e ::= c constant x variable e + e add e e mult... Let s come to an attribution. On board. Laure Gonnord (Lyon1/FST) Syntax Analysis 28 / 29

Parsing Semantic actions / Attributes Semantic attributes 2/2 : Implem Implementation of the former actions (java): ArithExprParser.g4 parser grammar A r i t h E x p r P a r s e r ; options {tokenvocab=arithexprlexer; } prog : expr EOF { System. out. p r i n t l n ( " R e s u l t : " +$expr. v a l ) ; } ; expr r e t u r n s [ i n t v a l ] : / / expr has an i n t e g e r a t t r i b u t e LPAR e=expr RPAR { $val=$e. v a l ; } INT { $val=$int. i n t ; } / / i m p l i c i t a t t r i b u t e f o r INT e1=expr PLUS e2=expr / / name sub p a r t s { $val=$e1. v a l +$e2. v a l ; } / / access a t t r i b u t e s e1=expr MINUS e2=expr { $val=$e1. val $e2. v a l ; } ; Laure Gonnord (Lyon1/FST) Syntax Analysis 29 / 29