Lexical and Syntax Analysis

Similar documents
Lexical and Syntax Analysis

The structure of a compiler

Using Lex or Flex. Prof. James L. Frankel Harvard University

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool

Marcello Bersani Ed. 22, via Golgi 42, 3 piano 3769

LEX/Flex Scanner Generator

Edited by Himanshu Mittal. Lexical Analysis Phase

Introduction to Lex & Yacc. (flex & bison)

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park

CS143 Handout 04 Summer 2011 June 22, 2011 flex In A Nutshell

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

Chapter 3 -- Scanner (Lexical Analyzer)

Parsing and Pattern Recognition

An Introduction to LEX and YACC. SYSC Programming Languages

CSC 467 Lecture 3: Regular Expressions

An introduction to Flex

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

Lexical Analysis. Implementing Scanners & LEX: A Lexical Analyzer Tool

PRACTICAL CLASS: Flex & Bison

Compiler Construction

Lexical and Parser Tools

Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Compiler Construction

Handout 7, Lex (5/30/2001)

Type 3 languages. Regular grammars Finite automata. Regular expressions. Deterministic Nondeterministic. a, a, ε, E 1.E 2, E 1 E 2, E 1*, (E 1 )

Lesson 10. CDT301 Compiler Theory, Spring 2011 Teacher: Linus Källberg

CS4850 SummerII Lex Primer. Usage Paradigm of Lex. Lex is a tool for creating lexical analyzers. Lexical analyzers tokenize input streams.

CSE302: Compiler Design

Flex and lexical analysis

Figure 2.1: Role of Lexical Analyzer

Ray Pereda Unicon Technical Report UTR-02. February 25, Abstract

Preparing for the ACW Languages & Compilers

TDDD55- Compilers and Interpreters Lesson 2

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

More Examples. Lex/Flex/JLex

Ulex: A Lexical Analyzer Generator for Unicon

Compil M1 : Front-End

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1.

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

Lexical Analysis with Flex

Flex and lexical analysis. October 25, 2016

Programming Assignment II

Syntax Analysis Part VIII

Etienne Bernard eb/textes/minimanlexyacc-english.html

Programming Assignment I Due Thursday, October 7, 2010 at 11:59pm

Introduction to Yacc. General Description Input file Output files Parsing conflicts Pseudovariables Examples. Principles of Compilers - 16/03/2006

Syntax Analysis Part IV

Gechstudentszone.wordpress.com

Simple Lexical Analyzer

Lexical Analyzer Scanner

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

FLEX(1) FLEX(1) FLEX(1) FLEX(1) directive. flex fast lexical analyzer generator

Lexical Analyzer Scanner

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/}

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1

Yacc: A Syntactic Analysers Generator

Scanning. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

File I/O in Flex Scanners

Flex, version 2.5. A fast scanner generator Edition 2.5, March Vern Paxson

Version 2.4 November

Compiler Construction D7011E

UNIT - 7 LEX AND YACC - 1

LECTURE 6 Scanning Part 2

Monday, August 26, 13. Scanners

Programming Assignment I Due Thursday, October 9, 2008 at 11:59pm

Wednesday, September 3, 14. Scanners

Scanning. COMP 520: Compiler Design (4 credits) Alexander Krolik MWF 13:30-14:30, MD 279

Lexical Analysis and jflex

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

LECTURE 11. Semantic Analysis and Yacc

B The SLLGEN Parsing System

Introduction to Compiler Design

CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017

The Structure of a Syntax-Directed Compiler

Flex, version A fast scanner generator Edition , 27 March Vern Paxson W. L. Estes John Millaway

I. OVERVIEW 1 II. INTRODUCTION 3 III. OPERATING PROCEDURE 5 IV. PCLEX 10 V. PCYACC 21. Table of Contents

CSCI Compiler Design

CMSC445 Compiler design Blaheta. Project 2: Lexer. Due: 15 February 2012

Lex (Lesk & Schmidt[Lesk75]) was developed in the early to mid- 70 s.

Flex, version A fast scanner generator Edition , 27 March Vern Paxson W. L. Estes John Millaway

CS 541 Spring Programming Assignment 2 CSX Scanner

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

Lexical analysis. Concepts. Lexical analysis in perspec>ve

Command Interpreters. command-line (e.g. Unix shell) On Unix/Linux, bash has become defacto standard shell.

Features of C. Portable Procedural / Modular Structured Language Statically typed Middle level language

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

CS143 Handout 12 Summer 2011 July 1 st, 2011 Introduction to bison

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

L L G E N. Generator of syntax analyzier (parser)

Left to right design 1

C: How to Program. Week /Mar/05

Advances in Compilers

Transcription:

Lexical and Syntax Analysis (of Programming Languages) Flex, a Lexical Analyser Generator

Lexical and Syntax Analysis (of Programming Languages) Flex, a Lexical Analyser Generator

Flex: a fast lexical analyser generator Flex Specification of a lexical analyser C function called yylex() List of Pattern-Action pairs. Match a pattern and Execute its action. Regular Expression C Statement

Flex: a fast lexical analyser generator Flex Specification of a lexical analyser C function called yylex() List of Pattern-Action pairs. Match a pattern and Execute its action. Regular Expression C Statement

Input to Flex The structure of a Flex (.lex) file is as follows. /* Declarations */ /* Rules (pattern-action pairs) */ /* C Code (including main function) */ Any text enclosed in /* and */ is treated as a comment.

Input to Flex The structure of a Flex (.lex) file is as follows. /* Declarations */ /* Rules (pattern-action pairs) */ /* C Code (including main function) */ Any text enclosed in /* and */ is treated as a comment.

What is a rule? A rule is a pattern-action pair, written pattern action The pattern is (like) a regular expression. The action is a C statement, or a block of C statements in the form { }.

What is a rule? A rule is a pattern-action pair, written pattern action The pattern is (like) a regular expression. The action is a C statement, or a block of C statements in the form { }.

Example 1 Replace all tom's with jerry's and vice-versa. /* No declarations */ tomandjerry.lex tom jerry printf("jerry"); printf("tom"); /* No main function */

Example 1 Replace all tom's with jerry's and vice-versa. /* No declarations */ tomandjerry.lex tom jerry printf("jerry"); printf("tom"); /* No main function */

Output of Flex Flex generates a C function int yylex() { } When yylex() is called: a pattern that matches a prefix of the input is chosen; the matching prefix is consumed. the action corresponding to the chosen pattern is executed; if no pattern chosen, a single character is consumed and echoed to output. repeats until all input consumed or an action executes a return statement.

Output of Flex Flex generates a C function int yylex() { } When yylex() is called: a pattern that matches a prefix of the input is chosen; the matching prefix is consumed. the action corresponding to the chosen pattern is executed; if no pattern chosen, a single character is consumed and echoed to output. repeats until all input consumed or an action executes a return statement.

Example 1, revisited Replace all tom's with jerry's and vice-versa. /* No declarations */ tomandjerry.lex tom jerry printf("jerry"); printf("tom"); void main() { yylex(); }

Example 1, revisited Replace all tom's with jerry's and vice-versa. /* No declarations */ tomandjerry.lex tom jerry printf("jerry"); printf("tom"); void main() { yylex(); }

Running Example 1 At a command prompt '>': > flex -o tomandjerry.c tomandjerry.lex > gcc -o tomandjerry tomandjerry.c -lfl > tomandjerry jerry should be scared of tom. Important! tom should be scared of jerry. Input Output

Running Example 1 At a command prompt '>': > flex -o tomandjerry.c tomandjerry.lex > gcc -o tomandjerry tomandjerry.c -lfl > tomandjerry jerry should be scared of tom. Important! tom should be scared of jerry. Input Output

Maximal munch! Many patterns may match a prefix of the input. Which one does Flex choose? The one that matches the longest string. If different patterns match strings of the same length then the first pattern in the file is preferred.

Maximal munch! Many patterns may match a prefix of the input. Which one does Flex choose? The one that matches the longest string. If different patterns match strings of the same length then the first pattern in the file is preferred.

What does yy mean? The string yy is a prefix used by Flex to avoid name clashes with user-defined C code. You can change the prefix by, for example, writing: %option prefix="tomandjerry_" in the declarations section.

What does yy mean? The string yy is a prefix used by Flex to avoid name clashes with user-defined C code. You can change the prefix by, for example, writing: %option prefix="tomandjerry_" in the declarations section.

Example 1, revisited Replace all tom's with jerry's and vice-versa. %option prefix="tomandjerry_" tomandjerry.lex tom jerry printf("jerry"); printf("tom"); void main() { tomandjerry_lex(); }

Example 1, revisited Replace all tom's with jerry's and vice-versa. %option prefix="tomandjerry_" tomandjerry.lex tom jerry printf("jerry"); printf("tom"); void main() { tomandjerry_lex(); }

What is a pattern? (Base cases) Pattern x. [xyz] [ad-f] [^A-Z] [a-z]{-}[aeiuo] <<EOF>> Meaning Match the character x. Match any character except a newline character ( \n ). Match either an x, y or z. Match an a, d, e, or f. Match any character not in the range A to Z. Lower case consonants. Matches end-of-file.

What is a pattern? (Base cases) Pattern x. [xyz] [ad-f] [^A-Z] [a-z]{-}[aeiuo] <<EOF>> Meaning Match the character x. Match any character except a newline character ( \n ). Match either an x, y or z. Match an a, d, e, or f. Match any character not in the range A to Z. Lower case consonants. Matches end-of-file.

What is a pattern? (Inductive cases) If p, p 1, p 2 are patterns then: Pattern Meaning p 1 p 2 Match a p 1 followed by an p 2. p 1 p 2 Match a p 1 or an p 2. p* Match zero or more p s. p+ Match one or more p s. p? Match zero or one p s. p{2,4} At least 2 p s and at most 4. p{4} (p) ^p Exactly 4 p s. Match a p, used to override precedence. Match a p at beginning of a line p$ Match a p at end of a line

What is a pattern? (Inductive cases) If p, p 1, p 2 are patterns then: Pattern Meaning p 1 p 2 Match a p 1 followed by an p 2. p 1 p 2 Match a p 1 or an p 2. p* Match zero or more p s. p+ Match one or more p s. p? Match zero or one p s. p{2,4} At least 2 p s and at most 4. p{4} (p) ^p Exactly 4 p s. Match a p, used to override precedence. Match a p at beginning of a line p$ Match a p at end of a line

Escaping Reserved symbols include:. $ ^ [ ] -? * + ( ) / { } < > All other symbols are treated literally. Reserved symbols can be matched by enclosing them in double quotes. For example: Pattern Meaning *xy+ Match * then x then y then +. + * Match zero or more + symbols. \ Match a double-quote symbol.

Escaping Reserved symbols include:. $ ^ [ ] -? * + ( ) / { } < > All other symbols are treated literally. Reserved symbols can be matched by enclosing them in double quotes. For example: Pattern Meaning *xy+ Match * then x then y then +. + * Match zero or more + symbols. \ Match a double-quote symbol.

What is a declaration? A declaration may be: a C declaration, enclosed in %{ and %}, visible to the action part of a rule. a regular definition of the form name pattern introducing a new pattern {name} equivalent to pattern.

What is a declaration? A declaration may be: a C declaration, enclosed in %{ and %}, visible to the action part of a rule. a regular definition of the form name pattern introducing a new pattern {name} equivalent to pattern.

Example 2 %{ int chars = 0; int lines = 0; %}. { chars++; } \n { lines++; chars++; } void main() { yylex(); printf( %i %i\n, chars, lines); }

Example 2 %{ int chars = 0; int lines = 0; %}. { chars++; } \n { lines++; chars++; } void main() { yylex(); printf( %i %i\n, chars, lines); }

Example 3 SPACE WORD %{ int words = 0; %} [ \t\r\n] [^ \t\r\n]+ {SPACE} {WORD} { words++; } void main() { yylex(); printf( %i\n, words); }

Example 3 SPACE WORD %{ int words = 0; %} [ \t\r\n] [^ \t\r\n]+ {SPACE} {WORD} { words++; } void main() { yylex(); printf( %i\n, words); }

yytext and yyleng The string matching a pattern is available to the action of a rule via the yytext variable, and its length via yyleng. char* yytext; int yyleng; Global variables Warning: the memory pointed to by yytext is destroyed upon completion of the action.

yytext and yyleng The string matching a pattern is available to the action of a rule via the yytext variable, and its length via yyleng. char* yytext; int yyleng; Global variables Warning: the memory pointed to by yytext is destroyed upon completion of the action.

Example 4 DIGIT [0-9] {DIGIT}+ { void main() { yylex(); } } inc.lex int i = atoi(yytext); printf( %i, i+1);

Example 4 DIGIT [0-9] {DIGIT}+ { void main() { yylex(); } } inc.lex int i = atoi(yytext); printf( %i, i+1);

Tokenising using Flex The idea is that yylex() returns the next token. This is achieved by using a return statement in the action part of a rule. Some tokens have a component value, e.g. NUM, which by convention is returned via the global variable yylval. int yylval; Global variable

Tokenising using Flex The idea is that yylex() returns the next token. This is achieved by using a return statement in the action part of a rule. Some tokens have a component value, e.g. NUM, which by convention is returned via the global variable yylval. int yylval; Global variable

Example 5 nums.lex %{ typedef enum { END, NUM } Token; %} [^0-9] /* Ignore */ [0-9]+ { yylval = atoi(yytext); return NUM; } <<EOF>> { return END; } void main() { while (yylex()!= END) printf( NUM(%i)\n, yylval); }

Example 5 nums.lex %{ typedef enum { END, NUM } Token; %} [^0-9] /* Ignore */ [0-9]+ { yylval = atoi(yytext); return NUM; } <<EOF>> { return END; } void main() { while (yylex()!= END) printf( NUM(%i)\n, yylval); }

The type of yylval By default yylval is of type int, but it can be overridden by the user. For example: union { int number; char* string; } yylval; Now yylval is of union type and can either hold a number or a string. NOTE: When interfacing Flex and Bison, the type of yylval is defined in the Bison file using the %union option.

The type of yylval By default yylval is of type int, but it can be overridden by the user. For example: union { int number; char* string; } yylval; Now yylval is of union type and can either hold a number or a string. NOTE: When interfacing Flex and Bison, the type of yylval is defined in the Bison file using the %union option.

Start conditions and states If p is a pattern then so is <s>p where s is a state. Such a pattern is only active when the scanner is in state s. Initially, the scanner is in state INITIAL. The scanner moves to a state s upon execution of a BEGIN(s) statement.

Start conditions and states If p is a pattern then so is <s>p where s is a state. Such a pattern is only active when the scanner is in state s. Initially, the scanner is in state INITIAL. The scanner moves to a state s upon execution of a BEGIN(s) statement.

Inclusive states An inclusive state S can be declared as follows. %s S When the scanner is in state S any rule with start condition S or no start condition is active.

Inclusive states An inclusive state S can be declared as follows. %s S When the scanner is in state S any rule with start condition S or no start condition is active.

Exclusive states An exclusive state S can be declared as follows. %x S When the scanner is in state S only rules with the start condition S are active.

Exclusive states An exclusive state S can be declared as follows. %x S When the scanner is in state S only rules with the start condition S are active.

Example 6 capitalise.lex %s NS /* New Sentence */ "." { putchar('.'); BEGIN(NS); } <NS>[a-zA-Z] { putchar(toupper(*yytext)); BEGIN(INITIAL); } void main() { yylex(); }

Example 6 capitalise.lex %s NS /* New Sentence */ "." { putchar('.'); BEGIN(NS); } <NS>[a-zA-Z] { putchar(toupper(*yytext)); BEGIN(INITIAL); } void main() { yylex(); }

Other aspects of Flex In an action, executing yyless(n) puts n characters back into the input stream. The global variables FILE* yyin; FILE* yyout; (Default: stdin) (Default: stdout) define the input and output streams used by Flex.

Other aspects of Flex In an action, executing yyless(n) puts n characters back into the input stream. The global variables FILE* yyin; FILE* yyout; (Default: stdin) (Default: stdout) define the input and output streams used by Flex.

Exercise 1 Consider the following output of the Unix command ls -l. -rw-r--r-- 1 mfn staff 12 2011-03-06 17:43 file1.txt -rw-r--r-- 1 mfn staff 13 2011-03-06 17:43 file2.txt -rw-r--r-- 1 mfn staff 9 2011-03-06 17:43 file3.txt Write a Flex specification that takes the output of ls -l and produces the sum of the sizes of all the listed files.

Exercise 1 Consider the following output of the Unix command ls -l. -rw-r--r-- 1 mfn staff 12 2011-03-06 17:43 file1.txt -rw-r--r-- 1 mfn staff 13 2011-03-06 17:43 file2.txt -rw-r--r-- 1 mfn staff 9 2011-03-06 17:43 file3.txt Write a Flex specification that takes the output of ls -l and produces the sum of the sizes of all the listed files.

Exercise 2 Recall the source language of the expression simplifier. v = [a-z] + n = [0-9] + e v n e + e e * e ( e ) Write a lexical analyser for the expression language.

Exercise 2 Recall the source language of the expression simplifier. v = [a-z] + n = [0-9] + e v n e + e e * e ( e ) Write a lexical analyser for the expression language.

Variants of Flex There are Flex variants available for many languages: Language Tool C++ Flex++ Java JLex Haskell Alex Python PLY Pascal TP Lex * ANTLR

Variants of Flex There are Flex variants available for many languages: Language Tool C++ Flex++ Java JLex Haskell Alex Python PLY Pascal TP Lex * ANTLR

Summary Flex converts a list of patternaction pairs into C function called yylex(). Patterns are similar to regular expressions. The idea is that yylex() identifies and returns the next token in the input. Gives a declarative (high level) way to define lexical analysers.