Lexing, Parsing. Laure Gonnord sept Master 1, ENS de Lyon

Similar documents
Syntax Analysis MIF08. Laure Gonnord

Compilation: a bit about the target architecture

Lexical Analysis. Chapter 2

Introduction - CAP Course

Building Compilers with Phoenix

Introduction to Lexical Analysis

Compil M1 : Front-End

Lab 2. Lexing and Parsing with ANTLR4

Writing Evaluators MIF08. Laure Gonnord

Lexical Analysis. Lecture 2-4

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Lexical Analysis. Lecture 3-4

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Introduction to Lexical Analysis

CSE 3302 Programming Languages Lecture 2: Syntax

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Compiler course. Chapter 3 Lexical Analysis

Week 2: Syntax Specification, Grammars

Formal Languages and Compilers Lecture VI: Lexical Analysis

Lexical Analysis. Introduction

2068 (I) Attempt all questions.

CSCI312 Principles of Programming Languages!

Chapter 3 Lexical Analysis

Topic 3: Syntax Analysis I

Semantic Analysis. Lecture 9. February 7, 2018

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Implementation of Lexical Analysis

Lexical Analysis. Lecture 3. January 10, 2018

Chapter 4. Lexical and Syntax Analysis

CPS 506 Comparative Programming Languages. Syntax Specification

Compiler phases. Non-tokens

Compiling Regular Expressions COMP360

Zhizheng Zhang. Southeast University

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Introduction to Parsing. Lecture 8

CSE302: Compiler Design

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Compilers and computer architecture From strings to ASTs (2): context free grammars

Context-Free Grammars

Implementation of Lexical Analysis

Programming Languages and Compilers (CS 421)

n (0 1)*1 n a*b(a*) n ((01) (10))* n You tell me n Regular expressions (equivalently, regular 10/20/ /20/16 4

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

A simple syntax-directed

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Building Compilers with Phoenix

Introduction to Parsing Ambiguity and Syntax Errors

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

A Simple Syntax-Directed Translator

CMSC 330: Organization of Programming Languages

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

Introduction to Parsing Ambiguity and Syntax Errors

Introduction to Parsing. Lecture 5

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

4. Lexical and Syntax Analysis

Question Bank. 10CS63:Compiler Design

Lab 2. Lexing and Parsing with Flex and Bison - 2 labs

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

UNIT -2 LEXICAL ANALYSIS

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

4. Lexical and Syntax Analysis

Gujarat Technological University Sankalchand Patel College of Engineering, Visnagar B.E. Semester VII (CE) July-Nov Compiler Design (170701)

Lexical and Syntax Analysis

Lexical Analysis. Finite Automata

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Lexical Analysis. Implementation: Finite Automata

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Lexical Analysis (ASU Ch 3, Fig 3.1)

CS 314 Principles of Programming Languages. Lecture 3

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

CS308 Compiler Principles Lexical Analyzer Li Jiang

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Exercise ANTLRv4. Patryk Kiepas. March 25, 2017

QUESTIONS RELATED TO UNIT I, II And III

Syntax and Grammars 1 / 21

Compiler Construction D7011E

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

CSc 453 Compilers and Systems Software

COMPILER DESIGN LECTURE NOTES

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Lexical Analysis. Finite Automata

Lexical Analysis 1 / 52

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

CSCE 314 Programming Languages

Transcription:

Lexing, Parsing Laure Gonnord http://laure.gonnord.org/pro/teaching/capm1.html Laure.Gonnord@ens-lyon.fr Master 1, ENS de Lyon sept 2017

Analysis Phase source code lexical analysis sequence of lexems (tokens) syntactic analysis (Parsing) abstract syntax tree (AST ) semantic analysis abstract syntax (+ symbol table) Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 2 / 45

Goal of this chapter Understand the syntaxic structure of a language ; Separate the different steps of syntax analysis ; Be able to write a syntax analysis tool for a simple language ; Remember : syntax semantics. Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 3 / 45

Syntax analysis steps How do you read text? Text=a sequence of symbols (letters, spaces, punctuation) ; Group symbols into tokens : Words : groups of letters ; Punctuation ; Spaces. Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 4 / 45

Syntax analysis steps How do you read text? Text=a sequence of symbols (letters, spaces, punctuation) ; Group symbols into tokens : Lexical analysis Words : groups of letters ; Punctuation ; Spaces. Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 4 / 45

Syntax analysis steps How do you read text? Text=a sequence of symbols (letters, spaces, punctuation) ; Group symbols into tokens : Lexical analysis Words : groups of letters ; Punctuation ; Spaces. Group tokens into : Propositions ; Sentences. Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 4 / 45

Syntax analysis steps How do you read text? Text=a sequence of symbols (letters, spaces, punctuation) ; Group symbols into tokens : Lexical analysis Words : groups of letters ; Punctuation ; Spaces. Group tokens into : Parsing Propositions ; Sentences. Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 4 / 45

Syntax analysis steps How do you read text? Text=a sequence of symbols (letters, spaces, punctuation) ; Group symbols into tokens : Lexical analysis Words : groups of letters ; Punctuation ; Spaces. Group tokens into : Parsing Propositions ; Sentences. Then proceed with word meanings : Definition of each word. ex : a dog is a hairy mammal, that barks and... Role in the phrase : verb, subject,... Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 4 / 45

Syntax analysis steps How do you read text? Text=a sequence of symbols (letters, spaces, punctuation) ; Group symbols into tokens : Lexical analysis Words : groups of letters ; Punctuation ; Spaces. Group tokens into : Parsing Propositions ; Sentences. Then proceed with word meanings : Definition of each word. ex : a dog is a hairy mammal, that barks and... Role in the phrase : verb, subject,... Semantics Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 4 / 45

Syntax analysis steps How do you read text? Text=a sequence of symbols (letters, spaces, punctuation) ; Group symbols into tokens : Lexical analysis Words : groups of letters ; Punctuation ; Spaces. Group tokens into : Parsing Propositions ; Sentences. Then proceed with word meanings : Definition of each word. ex : a dog is a hairy mammal, that barks and... Role in the phrase : verb, subject,... Syntax analysis=lexical analysis+parsing Semantics Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 4 / 45

Lexical Analysis 1 Lexical Analysis 2 Syntactic Analysis 3 Abstract Syntax Tree and evaluators Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 5 / 45

Lexical Analysis What for? int y = 12 + 4*x ; = [TINT, VAR("y"), EQ, INT(12), PLUS, INT(4), FOIS, VAR("x"), PVIRG] Group characters into a list of tokens, e.g. : The word int stands for type integer ; A sequence of letters stands for a variable ; A sequence of digits stands for an integer ;... Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 6 / 45

Lexical Analysis 1 Lexical Analysis 2 Syntactic Analysis Semantic actions / Attributes 3 Abstract Syntax Tree and evaluators Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 7 / 45

Lexical Analysis Principle Take a lexical description : E = ( E }{{} 1... E n ) Tokens class Construct an automaton. Example - lexical description ( lex file ) E = ((0 1) + (0 1) +.(0 1) + + ) Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 8 / 45

Lexical Analysis What s behind Regular languages, regular automata : Thompson construction non-det automaton Determinization, completion Minimisation Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 9 / 45

Lexical Analysis Construction/use of automaton Let E = ((0 1) + (0 1) +.(0 1) + }{{}}{{} # 1 Σ = {0, 1, +,., # 1, # 2, # 3 }. # 2 + }{{} # 3 ) and We define E # = ((0 1) + # 1 (0 1) +.(0.1) + # 2 + # 3 ). 0,1 # 2 0,1 # 1 0,1. 0,1 start 1 2 3 4 + # 3 5 sink Source C. Alias drawn by former M1 students Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 10 / 45

Lexical Analysis Remarks Notion of ambiguity. Compaction of transition table. Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 11 / 45

Lexical Analysis 1 Lexical Analysis 2 Syntactic Analysis Semantic actions / Attributes 3 Abstract Syntax Tree and evaluators Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 12 / 45

Lexical Analysis : lexical analyzer constructors Lexical analyzer constructor : builds an automaton from a regular language definition ; Ex : Lex (C), JFlex (Java), OCamllex, ANTLR (multi),... input : a set of regular expressions with actions (Toto.g4) ; output : a file(s) (Toto.java) that contains the corresponding automaton. Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 13 / 45

Lexical Analysis Analyzing text with the compiled lexer The input of the lexer is a text file ; Execution : Checks that the input is accepted by the compiled automaton ; Executes some actions during the automaton traversal. Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 14 / 45

Lexical Analysis Lexing tool for Java : ANTLR The official webpage : www.antlr.org (BSD license) ; ANTLR is both a lexer and a parser ; ANTLR is multi-language (not only Java). Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 15 / 45

Lexical Analysis ANTLR lexer format and compilation.g4 grammar XX; @header { // Some init code... } @members { // Some global variables } // More optional blocks are available --->> lex rules Compilation : antlr4 Toto.g4 javac *.java grun Toto r // produces several Java files // compiles into xx.class files // Run analyzer with starting rule r Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 16 / 45

Lexical Analysis Lexing with ANTLR : example Lexing rules : Must start with an upper-case letter ; Follow the usual extended regular-expressions syntax (same as egrep, sed,...). A simple example grammar Hello; // This rule is actually a parsing rule r : HELLO ID ; // match " hello " followed by an identifier HELLO : ' hello ' ; // beware the single quotes ID : [a - z ]+ ; // match lower - case identifiers WS : [ \ t \ r \ n ]+ -> skip ; // skip spaces, tabs, newlines Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 17 / 45

Lexical Analysis Lexing - more than regular languages Counting in ANTLR - CountLines.g4 grammar CountLines; // Members can be accessed in any rule @members { int nblines =0 ; } r : ( NEWLINE ) * ; NEWLINE : [\ r \ n ] { nblines ++ ; System. out. println ( " Current lines: " + nblines ) ; } ; WS : [ \ t ]+ -> skip ; Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 18 / 45

Syntactic Analysis 1 Lexical Analysis 2 Syntactic Analysis Semantic actions / Attributes 3 Abstract Syntax Tree and evaluators Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 19 / 45

Syntactic Analysis 1 Lexical Analysis 2 Syntactic Analysis Semantic actions / Attributes 3 Abstract Syntax Tree and evaluators Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 20 / 45

Syntactic Analysis What s Parsing? Relate tokens by structuring them. Flat tokens [TINT, VAR("y"), EQ, INT(12), PLUS, INT(4), FOIS, VAR("x"), PVIRG] Parsing Accept Structured tokens = int y + 12 * 4 x int Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 21 / 45

Syntactic Analysis In this course Only write acceptors or a little bit more. Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 22 / 45

Syntactic Analysis What s behind? From a Context-free Grammar, produce a Stack Automaton (already seen in L3 course?) Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 23 / 45

Syntactic Analysis Recalling grammar definitions Grammar A grammar is composed of : A finite set N of non terminal symbols A finite set Σ of terminal symbols (disjoint from N) A finite set of production rules, each rule of the form w w where w is a word on Σ N with at least one letter of N. w is a word on Σ N. A start symbol S N. Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 24 / 45

Syntactic Analysis Example Example : S asb S ε is a grammar with N =... and... Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 25 / 45

Syntactic Analysis Associated Language Derivation G a grammar defines the relation : x G y iff u, v, p, qx = upv and y = uqv and (p q) P A grammar describes a language (the set of words on Σ that can be derived from the start symbol). Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 26 / 45

Syntactic Analysis Example - associated language S asb S ε The grammar defines the language {a n b n, n N} S absc S abc Ba ab Bb bb The grammar defines the language {a n b n c n, n N} Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 27 / 45

Syntactic Analysis Context-free grammars Context-free grammar A CF-grammar is a grammar where all production rules are of the form N (Σ N). Example : S S + S S S a The grammar defines a language of arithmetical expressions. Notion of derivation tree. Draw a derivation tree of a*a+a, of S+S! Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 28 / 45

Syntactic Analysis 1 Lexical Analysis 2 Syntactic Analysis Semantic actions / Attributes 3 Abstract Syntax Tree and evaluators Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 29 / 45

Syntactic Analysis Recognizing grammars Grammar type Rules Decidable Regular grammar X ay, X b YES Context-free grammar X u YES Context-sensitive grammar uxv uwv YES General grammar u v NO Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 30 / 45

Syntactic Analysis Parser construction There exists algorithms to recognize class of grammars : Predictive (descending) analysis (LL) Ascending analysis (LR) See the Dragon book. Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 31 / 45

Syntactic Analysis : parser generators Parser generator : builds a stack automaton from a grammar definition ; Ex : yacc(c), javacup (Java), OCamlyacc, ANTLR,... input : a set of grammar rules with actions (Toto.g4) ; output : a file(s) (Toto.java) that contains the corresponding stack automaton. Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 32 / 45

Syntactic Analysis Lexing vs Parsing Lexing supports ( regular) languages ; We want more (general) languages rely on context-free grammars ; To that intent, we need a way : To declare terminal symbols (tokens) ; To write grammars. Use both Lexing rules and Parsing rules. Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 33 / 45

Syntactic Analysis From a grammar to a parser The grammar must be context-free : S-> asb S-> eps The grammar rules are specified as Parsing rules ; a and b are terminal tokens, produced by Lexing rules. Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 34 / 45

Syntactic Analysis Parsing with ANTLR : example 1/2 AnBnLexer.g4 l e x e r grammar AnBnLexer; / / Lexing r u l e s : recognize tokens A: 'a' ; B: 'b' ; WS : [ \ t \ r \ n ]+ > s kip ; / / skip spaces, t a b s, newlines Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 35 / 45

Syntactic Analysis Parsing with ANTLR : example 2/2 AnBnParser.g4 parser grammar AnBnParser; options {tokenvocab=anbnlexer;} / / extern tokens d e f i n i t i o n / / Parsing r u l e s : s t r u c t u r e tokens t o g e t h e r prog : s EOF ; / / EOF: predefined end of f i l e token s : A s B ; / / nothing f o r empty a l t e r n a t i v e Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 36 / 45

Syntactic Analysis ANTLR expressivity LL(*) At parse-time, decisions gracefully throttle up from conventional fixed k 1 lookahead to arbitrary lookahead. Further reading (PLDI 11 paper, T. Parr, K. Fisher) http://www.antlr.org/papers/ll-star-pldi11.pdf Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 37 / 45

Syntactic Analysis Left recursion ANTLR permits left recursion : a: a b; But not indirect left recursion. X 1 X n There exist algorithms to eliminate indirect recursions. Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 38 / 45

Syntactic Analysis Lists ANTLR permits lists : prog: statement + ; Read the documentation! https: //github.com/antlr/antlr4/blob/master/doc/index.md Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 39 / 45

Syntactic Analysis Semantic actions / Attributes 1 Lexical Analysis 2 Syntactic Analysis Semantic actions / Attributes 3 Abstract Syntax Tree and evaluators Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 40 / 45

Syntactic Analysis Semantic actions / Attributes Semantic actions Semantic actions : code executed each time a grammar rule is matched : Printing as a semantic action in ANTLR s : A s B { System. out. println (" rule s"); } // java s : A s B { print (" rule s"); } // python Right rule : Python/Java/C++, depending on the back-end antlr4 -Dlanguage=Python2 We can do more than acceptors. Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 41 / 45

Syntactic Analysis Semantic actions / Attributes Semantic actions - attributes An attribute is a set attached to non-terminals/terminals of the grammar They are usually of two types : synthetized : sons father. inherited : the converse. Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 42 / 45

Syntactic Analysis Semantic actions / Attributes Computing attributes with semantic actions ArithExprParser.g4 - Warning this is java parser grammar A r i t h E x p r P a r s e r ; options {tokenvocab=arithexprlexer; } prog : expr EOF { System. out. p r i n t l n ( " R e s u l t : " +$expr. v a l ) ; } ; expr r e t u r n s [ i n t v a l ] : / / expr has an i n t e g e r a t t r i b u t e LPAR e=expr RPAR { $val=$e. v a l ; } INT { $val=$int. i n t ; } / / i m p l i c i t a t t r i b u t e f o r INT e1=expr PLUS e2=expr / / name sub p a r t s { $val=$e1. v a l +$e2. v a l ; } / / access a t t r i b u t e s e1=expr MINUS e2=expr { $val=$e1. val $e2. v a l ; } ; Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 43 / 45

Abstract Syntax Tree and evaluators 1 Lexical Analysis 2 Syntactic Analysis 3 Abstract Syntax Tree and evaluators Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 44 / 45

Abstract Syntax Tree and evaluators So Far... ANTLR has been used to : Produce acceptors for context-free languages ; Do a bit of computation on-the-fly. In a classic compiler, parsing produces an Abstract Syntax Tree. Next course! Laure Gonnord (M1/DI-ENSL) Lexing, Parsing 2017 45 / 45