LEX/Flex Scanner Generator

Similar documents
Marcello Bersani Ed. 22, via Golgi 42, 3 piano 3769

Lexical Analysis. Implementing Scanners & LEX: A Lexical Analyzer Tool

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool

The structure of a compiler

Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou

Edited by Himanshu Mittal. Lexical Analysis Phase

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

Compiler Design 1. Yacc/Bison. Goutam Biswas. Lect 8

Using Lex or Flex. Prof. James L. Frankel Harvard University

Lexical and Syntax Analysis

CS4850 SummerII Lex Primer. Usage Paradigm of Lex. Lex is a tool for creating lexical analyzers. Lexical analyzers tokenize input streams.

PRACTICAL CLASS: Flex & Bison

CS143 Handout 04 Summer 2011 June 22, 2011 flex In A Nutshell

Introduction to Lex & Yacc. (flex & bison)

TDDD55- Compilers and Interpreters Lesson 2

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

FLEX(1) FLEX(1) FLEX(1) FLEX(1) directive. flex fast lexical analyzer generator

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

Flex, version 2.5. A fast scanner generator Edition 2.5, March Vern Paxson

Figure 2.1: Role of Lexical Analyzer

LECTURE 11. Semantic Analysis and Yacc

CSC 467 Lecture 3: Regular Expressions

Chapter 3 -- Scanner (Lexical Analyzer)

Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park

File I/O in Flex Scanners

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

CSE302: Compiler Design

Chapter 3 Lexical Analysis

Parsing and Pattern Recognition

Compiler course. Chapter 3 Lexical Analysis

I. OVERVIEW 1 II. INTRODUCTION 3 III. OPERATING PROCEDURE 5 IV. PCLEX 10 V. PCYACC 21. Table of Contents

Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/}

An introduction to Flex

Compiler Construction

Compiler Construction

Lexical Analysis with Flex

Syntax Analysis Part IV

Gechstudentszone.wordpress.com

Handout 7, Lex (5/30/2001)

Lexical and Parser Tools

Compiler Lab. Introduction to tools Lex and Yacc

Version 2.4 November

Flex, version 2.5. A fast scanner generator. Edition 2.5, March Vern Paxson

Introduction to Yacc. General Description Input file Output files Parsing conflicts Pseudovariables Examples. Principles of Compilers - 16/03/2006

Flex, version A fast scanner generator Edition , 27 March Vern Paxson W. L. Estes John Millaway

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

Flex, version A fast scanner generator Edition , 27 March Vern Paxson W. L. Estes John Millaway

UNIT - 7 LEX AND YACC - 1

Compil M1 : Front-End

The Language for Specifying Lexical Analyzer

Flex, version 2.5. A fast scanner generator. Edition 2.5, March Vern Paxson

An Introduction to LEX and YACC. SYSC Programming Languages

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Type 3 languages. Regular grammars Finite automata. Regular expressions. Deterministic Nondeterministic. a, a, ε, E 1.E 2, E 1 E 2, E 1*, (E 1 )

Lex (Lesk & Schmidt[Lesk75]) was developed in the early to mid- 70 s.

Preparing for the ACW Languages & Compilers

Lexical Analyzer Scanner

Lexical Analyzer Scanner

Lexical Analysis - Flex

Applications of Context-Free Grammars (CFG)

CMSC445 Compiler design Blaheta. Project 2: Lexer. Due: 15 February 2012

Lesson 10. CDT301 Compiler Theory, Spring 2011 Teacher: Linus Källberg

CS143 Handout 12 Summer 2011 July 1 st, 2011 Introduction to bison

Flex and lexical analysis

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Automatic Scanning and Parsing using LEX and YACC

Syntax Analysis Part VIII

Program Development Tools. Lexical Analyzers. Lexical Analysis Terms. Attributes for Tokens

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Scanning. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1.

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

C mini reference. 5 Binary numbers 12

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Lexical and Syntax Analysis

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Scanning. COMP 520: Compiler Design (4 credits) Alexander Krolik MWF 13:30-14:30, MD 279

Principles of Programming Languages

CSE302: Compiler Design

As we have seen, token attribute values are supplied via yylval, as in. More on Yacc s value stack

Ray Pereda Unicon Technical Report UTR-02. February 25, Abstract

Better error handling using Flex and Bison

COMPILER CONSTRUCTION Seminar 01 TDDB

Flex and lexical analysis. October 25, 2016

Lex Source The general format of Lex source is:

Generating a Lexical Analyzer Program

IBM. UNIX System Services Programming Tools. z/os. Version 2 Release 3 SA

DOID: A Lexical Analyzer for Understanding Mid-Level Compilation Processes

2 Input and Output The input of your program is any file with text. The output of your program will be a description of the strings that the program r

C: How to Program. Week /Mar/05

Appendix. Grammar. A.1 Introduction. A.2 Keywords. There is no worse danger for a teacher than to teach words instead of things.

LECTURE 6 Scanning Part 2

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Chapter 2 - Introduction to C Programming

Yacc: A Syntactic Analysers Generator

COP4020 Programming Assignment 1 CALC Interpreter/Translator Due March 4, 2015

Lexical Analysis and jflex

YOLOP Language Reference Manual

cmps104a 2002q4 Assignment 2 Lexical Analyzer page 1

Transcription:

Compiler Design 1 LEX/Flex Scanner Generator

Compiler Design 2 flex - Fast Lexical Analyzer Generator We can use flex a to automatically generate the lexical analyzer/scanner for the lexical atoms of a language. a The original version was known as lex.

Compiler Design 3 Input The input to the flex program (known as flex compiler) is a set of patterns or specification of the tokens of the source language. Actions corresponding to different matches can also be specified.

Compiler Design 4 Input The pattern for every token class is specified as an extended regular expression and the corresponding action is a piece of C code. The specification can be given from a file (by default it is stdin).

Compiler Design 5 Output The flex software compiles the regular expression specification to a DFA and implements it as a C program with the action codes properly embedded in it.

Compiler Design 6 Output The output of flex is a C file lex.yy.c with the function int yylex(void). This function, when called, returns a token. Attribute information of a token is available through the global variable yylval.

Compiler Design 7 flex Specification The flex specification is a collection of regular expressions and the corresponding actions as C code. It has three parts: Definitions %% Rules %% User code

Compiler Design 8 flex Specification The definition has two parts: The portion within the pair of special parentheses %{ and %} is copied verbatim in lex.yy.c. Similarly the user code is also copied. The other part of the definition contains regular name definitions, start conditions and other options.

Compiler Design 9 flex Specification The rules are specified as <pattern> <action> pairs. The patterns are extended regular expressions corresponding to different tokens. The corresponding actions are given as C code. An action is taken at the end of the match.

Compiler Design 10 An Example: lex.l /* A scanner for a toy language:./lex_yacc/lex.l */ %{ // Copied verbatim #include <string.h> #include <stdlib.h> #include "y.tab.h" // tokens are defined int line_count = 0; yyltype yylval; %}

Compiler Design 11 %option noyywrap %x CMNT DELIM ([ \t]) WHITESPACES ({DELIM}+) POSI_DECI ([1-9]) DECI (0 {POSI_DECI}) DECIMAL (0 {POSI_DECI}{DECI}*) LOWER ([a-z]) LETTER ({LOWER} [:upper:] _) IDENTIFIER ({LETTER}({LETTER} {DECI})*) %%

Compiler Design 12 "/*" {BEGIN CMNT;} <CMNT>. {;} <CMNT>\n {++line_count;} <CMNT>"*/" {BEGIN INITIAL;} \n { ++line_count; return (int) \n ; } "\"".*"\"" { yylval.string=(char *)malloc((yyleng+1)*(sizeof(char))) strncpy(yylval.string, yytext+1, yyleng-2); yylval.string[yyleng-2]= \0 ; return STRNG; }

Compiler Design 13 {WHITESPACES} { ; } if {return IF;} else {return ELSE; } while { return WHILE; } for { return FOR; } int { return INT; } \( { return (int) ( ; } \) { return (int) ) ; } \{ { return (int) { ; } \} { return (int) } ; } ; { return (int) ; ; }, { return (int), ; } = { return (int) = ; } "<" { return (int) < ; }

Compiler Design 14 "+" { } "-" { } "*" { } "/" { yylval.integer = (int) + ; return BIN_OP; yylval.integer = (int) - ; return BIN_OP; yylval.integer = (int) * ; return BIN_OP; yylval.integer = (int) / ;

Compiler Design 15 return BIN_OP; } {IDENTIFIER} { yylval.string=(char *)malloc((yyleng+1)*(sizeof(char)) strncpy(yylval.string, yytext, yyleng + 1); return ID; } {DECIMAL} { yylval.integer = atoi(yytext); return NUM; }. { ; } %% /* The function yywrap() int yywrap(){return 1;} */

Compiler Design 16 y.tab.h #ifndef _Y_TAB_H #define _Y_TAB_H #define IF 300 #define ELSE 301 #define WHILE 302 #define FOR 303 #define INT 304 #define ID 306 #define NUM 307 #define STRNG 308 #define BIN_OP 310

Compiler Design 17 int yylex(void); typedef union { char *string; int integer; } yyltype; #endif

Compiler Design 18 mylex.c++ #include <iostream> using namespace std; #include <stdlib.h> #include "y.tab.h" extern yyltype yylval; int main() // mylex.c { int s; while((s=yylex())) switch(s) { case \n : cout << "\n";

Compiler Design 19 break; case ( : cout << "<(> "; break; case ) : cout << "<)> "; break; case { : cout << "<{> "; break; case } : cout << "<}> "; break; case ; : cout << "<;> "; break; case, : cout << "<,> "; break; case = : cout << "<=> ";

Compiler Design 20 break; case < : cout << "<<> "; break; case BIN_OP : cout << "<BIN_OP," << (char) yylval. break; case IF : cout << "<if> "; break; case ELSE : cout << "<else> "; break; case WHILE : cout << "<while> "; break; case FOR : cout << "<for> "; break; case INT : cout << "<int> ";

Compiler Design 21 } break; case ID : cout << "<ID," << yylval.string << "> " free (yylval.string); break; case NUM : cout << "<NUM," << yylval.integer << "> break; case STRNG : cout << "<STRNG," << yylval.string << free (yylval.string) ; break; default: ; } return 0;

Compiler Design 22 input /* Scanner input */ int main() { int n, fact, i; scanf("%d", &n); fact=1; for(i=1; i<=n; ++i) fact = fact*i; printf("%d! = %d\n", n, fact); return 0; } /* End */

Compiler Design 23 How to Compile? $ flex lex.l lex.yy.c $ c++ -Wall -c lex.yy.c lex.yy.o $ c++ -Wall -c mylex.c mylex.o $ c++ mylex.o lex.yy.o a.out $./a.out < input > out $

Compiler Design 24 A Simple Makefile objfiles = mylex.o lex.yy.o a.out : $(objfiles) c++ $(objfiles) mylex.o : mylex.c++ y.tab.h c++ -c -Wall mylex.c++ lex.yy.o : lex.yy.c c++ -c lex.yy.c lex.yy.c : lex.l y.tab.h

Compiler Design 25 flex lex.l clean : rm a.out lex.yy.c $(objfiles)

Compiler Design 26 Output <int> <ID, main> <(> <)> <{> <int> <ID, n> <,> <ID, fact> <,> <ID, i> <;> <ID, scanf> <(> <STRNG, %d> <,> <ID, n> <)> <;> <ID, fact> <=> <NUM, 1> <;> <for> <(> <ID, i> <=> <NUM, 1> <;> <ID, i> <<> <=> <ID, <ID, printf> <(> <STRNG, %d! = %d\n> <,> <ID, n> <,> <ID <ID, return> <NUM, 0> <;> <}>

Compiler Design 27 Note There is a problem with the rule for comment. It will give wrong result for the following comment (input4) /* There is a string "within */ this i comment" and it will not work */ Try with the rule in lex4.l

Compiler Design 28 Note When the function yylex() is called, it reads data from the input file (by default it is stdin) and returns a token. At the end-of-file (EOF) the function returns zero (0).

Compiler Design 29 Input/Output File Names The name of the input file to the scanner can be specified through the global file pointer (FILE *) yyin i.e. yyin = fopen("flex.spec", "r");. Similarly, the output file of the scanner a can also be supplied through the file pointer yyout i.e. yyout = fopen("lex.out", "w");. a Note that it will affect the output of ECHO and not change the output of printf() etc.

Compiler Design 30 Pattern and Action A pattern must start from the first column of the line and the action must start from the same line. Action corresponding to a pattern may be empty. The special parenthesis %{ and %} of the definition part should also start from the first column.

Compiler Design 31 Matching Rules If more than one pattern matches, then the longest match is taken. If both matches are of same length, then the earlier rule will take precedence. In our example if, else, while, for, int are specified above the identifier. If we change the order, the keywords will not be identified separately.

Compiler Design 32 Matching Rules In our example <=, is treated as two symbols < and =. But if there is a rule for <=, a single token can be generated.

Compiler Design 33 Lexeme If there is a match with the input text, the matched lexeme is pointed by the char pointer yytext (global) and its length is available in the variable yyleng. The yytext can be defined to be an array by using %array in the definition section. The array size will be determined by YYLMAX.

Compiler Design 34 ECHO The command ECHO copies the text of yytext to the output stream of the scanner a. The default rule is to echo any unmatched character to the output stream. a By default it is stdout and it can be changed by the file pointer yyout.

Compiler Design 35 yymore(), yyless(int) yymore(): the scanner appends the current lexeme to the next lexeme in the yytext. yyless(n): the scanner puts back all but the first n characters in the input buffer. Next time they will be rescanned. Both yytext and yyleng (n now) are properly modified.

Compiler Design 36 input(), unput(c) input(): reads the next character from the input stream. unput(c): puts the character c back in the input stream a. a flex manual: An important potential problem when using unput() is that if you are using %pointer (the default), a call to unput() destroys the contents of yytext, starting with its rightmost character and devouring one character to the left with each call. If you need the value of yytext preserved after a call to unput() (as in the above example), you must either first copy it elsewhere, or build your scanner using %array instead.

Compiler Design 37 Start Condition There is a mechanism to put a conditional guard for a rule. Let the condition for a rule be <C>. It is written as <C>P {action} The scanner will try to match the input with the pattern P, only if its start condition is C. In the example we have three rules guarded by the start condition <CMNT> used to remove the comment of a C program without any action.

Compiler Design 38 Start Condition A start condition is activated by a BEGIN action. In our example the start condition begins (BEGIN CMNT) after the scanner sees the starting of a comment (/*).

Compiler Design 39 Start Condition The state of the scanner when no other start condition is active is called INITIAL. All rules without any start condition or with the start condition <INITIAL> are active at this state. The command BEGIN(0) or BEGIN(INITIAL) brings the scanner from any other state to this state. In our example the scanner comes back to INITIAL after consuming the comment.

Compiler Design 40 Start Condition Start conditions can be defined in the definition part of the specification using %s or %x. The start condition specified by %x is called an exclusive start condition. When such a start is active, only the rules guarded by it are activated. On the other hand %s specifies a an inclusive start condition. In our example, only three rules are active after the scanner sees the beginning of a comment (/*).

Compiler Design 41 Start Condition The scope of start conditions may be nested and the start conditions can be stacked. Three routines are of interest: void yy_push_state(int new_state), void yy_pop_state() and int yy_top_state().

Compiler Design 42 yywrap() Once the scanner receives the EOF indication from the YY_INPUT macro (zero), the scanner calls the function yywrap(). If there is no other input file, the function returns true and the function yylex() returns zero (0). But if there is another input file, yywrap() opens it, sets the file pointer yyin to it and returns false. The scanner starts consuming input from the new file.

Compiler Design 43 %option noyywrap This option in the definition makes scanner behave as if yywrap() has returned true. No yywrap() function is required.

Compiler Design 44 Prototype of Scanner Function The name and the parameters of the scanner function can be changed by defining:yy_declmacro. Asanexamplewemayhave #define YY_DECL int scanner(vp) yylval *vp;

Compiler Design 45 Architecture of Flex Scanner The flex compiler includes a fixed program in its scanner. This program simulates the DFA. It reads the input and simulates the state transition using the state transition table constructed from the flex specification. Other parts of the scanner are as follows:

Compiler Design 46 Architecture of Flex Scanner The transition table, start state and the final states - this comes from the construction of the DFA. Declarations and functions given in the definition and user code of the specification file - these are copied verbatim.

Compiler Design 47 Architecture of Flex Scanner Actions specified in the rules corresponding to different patterns are put in such a way that the simulator can initiate them when a pattern is matched (in the corresponding final state).