Marcello Bersani Ed. 22, via Golgi 42, 3 piano 3769

Similar documents
The structure of a compiler

LEX/Flex Scanner Generator

TDDD55- Compilers and Interpreters Lesson 2

Lexical and Syntax Analysis

Chapter 3 -- Scanner (Lexical Analyzer)

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

Using Lex or Flex. Prof. James L. Frankel Harvard University

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

CS4850 SummerII Lex Primer. Usage Paradigm of Lex. Lex is a tool for creating lexical analyzers. Lexical analyzers tokenize input streams.

FLEX(1) FLEX(1) FLEX(1) FLEX(1) directive. flex fast lexical analyzer generator

Flex, version 2.5. A fast scanner generator Edition 2.5, March Vern Paxson

An introduction to Flex

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

Lexical Analysis. Implementing Scanners & LEX: A Lexical Analyzer Tool

Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou

Version 2.4 November

CS143 Handout 04 Summer 2011 June 22, 2011 flex In A Nutshell

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool

Handout 7, Lex (5/30/2001)

Edited by Himanshu Mittal. Lexical Analysis Phase

Flex, version A fast scanner generator Edition , 27 March Vern Paxson W. L. Estes John Millaway

Lexical Analysis with Flex

An Introduction to LEX and YACC. SYSC Programming Languages

Flex, version A fast scanner generator Edition , 27 March Vern Paxson W. L. Estes John Millaway

Type 3 languages. Regular grammars Finite automata. Regular expressions. Deterministic Nondeterministic. a, a, ε, E 1.E 2, E 1 E 2, E 1*, (E 1 )

Flex and lexical analysis

File I/O in Flex Scanners

CSC 467 Lecture 3: Regular Expressions

Lexical and Parser Tools

Ray Pereda Unicon Technical Report UTR-02. February 25, Abstract

PRACTICAL CLASS: Flex & Bison

Introduction to Lex & Yacc. (flex & bison)

Syntax Analysis Part VIII

Flex, version 2.5. A fast scanner generator. Edition 2.5, March Vern Paxson

Ulex: A Lexical Analyzer Generator for Unicon

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Automatic Scanning and Parsing using LEX and YACC

Gechstudentszone.wordpress.com

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Flex and lexical analysis. October 25, 2016

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Syntax Analysis Part IV

Flex, version 2.5. A fast scanner generator. Edition 2.5, March Vern Paxson

UNIT - 7 LEX AND YACC - 1

Yacc: A Syntactic Analysers Generator

Lex (Lesk & Schmidt[Lesk75]) was developed in the early to mid- 70 s.

Lab 2. Lexing and Parsing with Flex and Bison - 2 labs

CSE302: Compiler Design

TDDD55 - Compilers and Interpreters Lesson 3

Compil M1 : Front-End

Figure 2.1: Role of Lexical Analyzer

Lesson 10. CDT301 Compiler Theory, Spring 2011 Teacher: Linus Källberg

COMPILER CONSTRUCTION Seminar 01 TDDB

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Lexical Analyzer Scanner

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

I. OVERVIEW 1 II. INTRODUCTION 3 III. OPERATING PROCEDURE 5 IV. PCLEX 10 V. PCYACC 21. Table of Contents

Lexical Analyzer Scanner

Introduction to Yacc. General Description Input file Output files Parsing conflicts Pseudovariables Examples. Principles of Compilers - 16/03/2006

Parsing and Pattern Recognition

Scanning. COMP 520: Compiler Design (4 credits) Alexander Krolik MWF 13:30-14:30, MD 279

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

LECTURE 11. Semantic Analysis and Yacc

A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer.

Lex Source The general format of Lex source is:

Chapter 3 Lexical Analysis

Scanning. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

TDDD55- Compilers and Interpreters Lesson 3

Compiler course. Chapter 3 Lexical Analysis

Compiler Construction

Compiler Construction

Using an LALR(1) Parser Generator

Compiler Lab. Introduction to tools Lex and Yacc

cmps104a 2002q4 Assignment 2 Lexical Analyzer page 1

Program Development Tools. Lexical Analyzers. Lexical Analysis Terms. Attributes for Tokens

CMSC445 Compiler design Blaheta. Project 2: Lexer. Due: 15 February 2012

Group A Assignment 3(2)

COMPILER CONSTRUCTION Seminar 02 TDDB44

LECTURE 6 Scanning Part 2

The Structure of a Syntax-Directed Compiler

Parsers. Chapitre Language

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

CSE302: Compiler Design

Lexical and Syntax Analysis

More Examples. Lex/Flex/JLex

Applications of Context-Free Grammars (CFG)

Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/}

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Table of Contents. Chapter 1. Introducing Flex and Bison

Etienne Bernard eb/textes/minimanlexyacc-english.html

Programming Assignment II

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

POLITECNICO DI TORINO. Formal Languages and Compilers. Laboratory N 1. Laboratory N 1. Languages?

Formal Languages and Compilers

CS143 Handout 12 Summer 2011 July 1 st, 2011 Introduction to bison

Transcription:

Marcello Bersani bersani@elet.polimi.it http://home.dei.polimi.it/bersani/ Ed. 22, via Golgi 42, 3 piano 3769

Flex, Bison and the ACSE compiler suite Marcello M. Bersani LFC Politecnico di Milano

Schedule Lexical analyzer: Flex 26 November Syntactic analyzer: Bison 3, 10, 17 December ACSE 7, 14 January

Basic compiler structure source IR Frontend Backend executable translates a computer programming source language into an intermediate representation works with the internal representation to produce code in a computer output language.

Front-end Tokens source Scanner Parser IR

Lexical analysis The lexical analysis recognizes patterns in a stream of characters. A pattern represents an atomic lexical elements, named tokens. Each token can have one or more attributes ID = (A..Z a..z)(a..z a..z 0..9 _ )* my_variable10

Scanner A program that performs lexical analysis. There are programs generating scanners automatically. flex is a free scanner generator, a complete rewriting of the AT&T Unix tool lex. www.gnu.org/software/flex/ A useful book: lex & yacc, 2nd Edition by John Levine, Tony Mason & Doug Brown O Reilly

How flex works Scanner description (file.lex) FLEX yylex() routine Scanner C code (file lex.yy.c) Input stream Scanner Executable GCC Tokenized stream

Flex input definitions rules user code

Definitions name definition DIGIT [0-9] ID [a-z][a-z0-9]* %{ C code for declarations }% Remark: C code is copied verbatim into lex.yy.c

Rules pattern action [DIGIT]+ { C code for DIGIT+} \n { C code for \n} ID { C code for ID} Action: C code to be executed each time a token in the input stream matches the associated pattern. Pattern condition: a regular expression

RegEx rules x matches the character x. matches any character except newline [xyz] any character in the given set (x y z) [a-z] any character in the given range (a b z) [^A-Z] any character but those in the range {x} expansion of x s definition a literal string \x ANSI-C interpretation of \x, if any, otherwise the literal x (to escape operators) \0 the NUL character <<EOF>> the end-of-file

RegEx rules R the regular expression R RS the concatenation of R and S R S either R or S R* zero or more R s R+ one or more R s R? zero or one R s R{m,n} a number of R s ranging from m to n R{n,} n or more times R s R{n} exactly n R s (R) (parentheses to override precedence) R/S R, but only if followed by S ^R R, but only at beginning of a line R$ R, but only at end of a line <s>r R, but only in start condition s 1 <s 1,,s n >R R, in any of start conditions s 1,,s n <*>R R, in any start condition, even an exclusive one

User code User C code is copied to the generated scanner source as is. This section may contain routines called by actions, or code which calls the scanner.

Example 1 A scanner which convert every lowercase character, in a given text, to an uppercase character and every uppercase character to a lowercase character

%{ #include <unistd.h> %} %option noyywrap LETTER [a-za-z] DIGIT [0-9] When the scanner reads EOF it may check yywrap() function: true: the scanner terminates false: yywrap() should have set a new input stream yywrap() is not provided "esci" yyterminate(); {LETTER}+ { int i; for(i=0; i<yyleng; i=i+1) if ( islower( yytext[i] )) putchar( toupper( yytext[i] )); else putchar( tolower ( yytext[i] )); } yyleng: length of current token yytext: current token main() { yylex(); }

How flex reads the input stream 1. It reads (YY_INPUT) the input stream (yyin), looking for strings matching any of its patterns. 2. Once the right match is found, the substring is available by the global char* variable yytext, and its length is available in yyleng. 3. Macro YYINPUT controls how yyin is read 1. Standard one provides block-reads until \n, EOF

4. yylex() continues processing tokens (from yyin) and returns when EOF is read an action returns. 5. Each time yylex() is called it continues processing tokens from where it stopped

How the right match is determined 1. Longest matching rule: if more than one matching string is found, the rule that generates the longest one is selected. 2. First Rule: if more than one string with the same length are found, the rule listed first in the rules section is selected. 3. If no rules were found, the scanner performs the standard action: the next character in input is considered matched and it is copied to the output stream (yyout); then it goes on.

Example 2 A scanner which recognizes, in a given text, the words composed by only lowercase letters and. It emits an message when it detects lowercase words and returns success if, at least one, is found.

%{ #include <unistd.h> int something_good=0; %} %option noyywrap LETTER [a-za-z] DIGIT [0-9] SPACE [" "] main() { printf("something is good %d\n", yylex()); } %{ printf("ok, now we try to check if some words are written in lowercase!\n"); %} "esci" { if (something_good) return 1; else yyterminate(); } {LETTER}+ { int i; int ok=1; for(i=0; i<yyleng && ok; i=i+1){ if ( isupper(yytext[i])) ok = 0; } if (ok){ printf("good! %s\n", yytext); something_good=1; } else printf("error!: upper case %c\n", yytext[i-1]); } {SPACE} ; {DIGIT} ;

Special directives ECHO copies yytext to the output. REJECT chooses the next best matching rule; a ab abc abcd ECHO;REJECT;. Input: abcd output: abcdabcaba

Special directives yyless(n) sends back to the input stream all but the first n characters of the matched string. abc [a-z]+ ECHO; yyless(2); ECHO; Input: abc Output: abcc

Special directives yymore() the next matched text is appended to yytext yytext is not rewritten) a b ECHO;yymore(); ECHO; Input: ab Output: aab

Special directives input() consumes the next character in input. unput(c) sends character c back to the input stream; c is the next character scanned WARNING: calls to unput(c) trash the contents of yytext; therefore contents of yytext must be copied before calling unput(c), if required.

Starting conditions Mechanism for conditionally activating rules Inclusive %s C1 %x C2 Exclusive <C1>R1 {do_something();} <C2>R2 ; <*>R3 {always_do_something();} R4 { } Inclusive %s C : Rules tagged by <C> or with no conditions are active. Exclusive %x C: only rules tagged by <C> are active INITIAL: state where only rules with no start conditions are active BEGIN(C): starting condition C begins

Example %x comment %option noyywrap <INITIAL>[^/]* ECHO; <INITIAL>"/"+[^*/]* ECHO; <INITIAL>"/*" BEGIN(comment); <comment>[^*]* <comment>"*"+[^/]* <comment>"*"+"/" BEGIN(INITIAL); int main(int argc, char* argv[]){ argv++; argc--; yyin=fopen(argv[0],"r"); yylex(); } ab a b c/ ab a b c /* / /* a */ * * a */ ab a b c/ /* * a */

Example %x comment %option noyywrap <INITIAL>[^/]* ECHO; <INITIAL>"/"+[^*/]* ECHO; <INITIAL>"/*" BEGIN(comment); <comment>[^*]* <comment>"*"+[^*/]* <comment>"*"+"/" BEGIN(INITIAL); ab a b c /* ab a b c a */ ab a b c int main(int argc, char* argv[]){ argv++; argc--; yyin=fopen(argv[0],"r"); yylex(); }

Starting conditions Start conditions are stored as integer values The initial start condition is INITIAL (0) The current start condition is stored in YY_START variable Starting condition scopes may be nested void yy_push_state (int n) void yy_pop_state () int yy_top_state () %option stack required

Multiple input buffers The scanner requires reading from several input streams YY_BUFFER_STATE yy_create_buffer(file* file, int size) void yy_switch_to_buffer(yy_buffer_state new_b) void yy_delete_buffer(yy_buffer_state buffer) void yypush_buffer_state(yy_buffer_state buffer) void yypop_buffer_state() YY_CURRENT_BUFFER is the handle (YY_BUFFER_STATE) for the current buffer.

Interaction Flex/Bison parser.y parser.tab.c scanner.lex YYSTYPE yylval; R {yylval = ; return NUMBER} parser.tab.h # define YYTOKENTYPE enum yytokentype { NUMBER = 258 }; typedef YYSTYPE ; extern YYSTYPE yylval;