CS4850 SummerII Lex Primer. Usage Paradigm of Lex. Lex is a tool for creating lexical analyzers. Lexical analyzers tokenize input streams.

Similar documents
Lexical and Parser Tools

Edited by Himanshu Mittal. Lexical Analysis Phase

Using Lex or Flex. Prof. James L. Frankel Harvard University

The structure of a compiler

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

An introduction to Flex

Marcello Bersani Ed. 22, via Golgi 42, 3 piano 3769

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool

CSC 467 Lecture 3: Regular Expressions

CS143 Handout 04 Summer 2011 June 22, 2011 flex In A Nutshell

Gechstudentszone.wordpress.com

LEX/Flex Scanner Generator

Lex - A Lexical Analyzer Generator

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process

An Introduction to LEX and YACC. SYSC Programming Languages

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Yacc: A Syntactic Analysers Generator

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou

Lex A Lexical Analyzer Generator

Ulex: A Lexical Analyzer Generator for Unicon

UNIT - 7 LEX AND YACC - 1

TDDD55- Compilers and Interpreters Lesson 2

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Chapter 3 Lexical Analysis

Lexical Analyzer Scanner

Lexical Analyzer Scanner

Compiler course. Chapter 3 Lexical Analysis

Using an LALR(1) Parser Generator

Lexical Analysis. Implementing Scanners & LEX: A Lexical Analyzer Tool

Ray Pereda Unicon Technical Report UTR-02. February 25, Abstract

Lex Source The general format of Lex source is:

CSE302: Compiler Design

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Automatic Scanning and Parsing using LEX and YACC

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Figure 2.1: Role of Lexical Analyzer

Handout 7, Lex (5/30/2001)

LECTURE 11. Semantic Analysis and Yacc

Preparing for the ACW Languages & Compilers

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

PRACTICAL CLASS: Flex & Bison

Flex and lexical analysis. October 25, 2016

Introduction to Lexical Analysis

Compiler Lab. Introduction to tools Lex and Yacc

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Context-free grammars

Compiler Construction: Parsing

Generating a Lexical Analyzer Program

As we have seen, token attribute values are supplied via yylval, as in. More on Yacc s value stack

Flex and lexical analysis

LECTURE 6 Scanning Part 2

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

L L G E N. Generator of syntax analyzier (parser)

File I/O in Flex Scanners

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Compil M1 : Front-End

Introduction to Yacc. General Description Input file Output files Parsing conflicts Pseudovariables Examples. Principles of Compilers - 16/03/2006

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

Compiler Construction

Compiler Construction

Lexical and Syntax Analysis

Structure of Programming Languages Lecture 3

Lexical Analysis. Chapter 2

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Lex (Lesk & Schmidt[Lesk75]) was developed in the early to mid- 70 s.

Chapter 3 -- Scanner (Lexical Analyzer)

Part 5 Program Analysis Principles and Techniques

CS 403: Scanning and Parsing

Implementation of Lexical Analysis

Lexical Analysis. Lecture 2-4

Compiler Front-End. Compiler Back-End. Specific Examples

Introduction to Lex & Yacc. (flex & bison)

Parsing and Pattern Recognition

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Applications of Context-Free Grammars (CFG)

Lexical Analysis. Lecture 3-4

CPSC 434 Lecture 3, Page 1

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

TDDD55 - Compilers and Interpreters Lesson 3

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

Implementation of Lexical Analysis

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Formal Languages and Compilers Lecture VI: Lexical Analysis

I. OVERVIEW 1 II. INTRODUCTION 3 III. OPERATING PROCEDURE 5 IV. PCLEX 10 V. PCYACC 21. Table of Contents

Lexical Analysis. Introduction

Syntax Analysis Part IV

Lesson 10. CDT301 Compiler Theory, Spring 2011 Teacher: Linus Källberg

IBM. UNIX System Services Programming Tools. z/os. Version 2 Release 3 SA

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

CSCI Compiler Design

Scanning. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

CSE302: Compiler Design

Transcription:

CS4850 SummerII 2006 Lexical Analysis and Lex (contd) 4.1 Lex Primer Lex is a tool for creating lexical analyzers. Lexical analyzers tokenize input streams. Tokens are the terminals of a language. Regular expressions define tokens. 4.2 Usage Paradigm of Lex 4.3 1

To Use Lex and Yacc Together Lex source Yacc source (Lexical Rules) (Grammar Rules) Lex Yacc lex.yy.c y.tab.c Input yylex() call yyparse() Parsed Input return token 4.4 Lex Internals Mechanism Converts regular expressions into DFAs. DFAs are implemented as table driven state machines. 4.5 Running Lex To run lex on a source file, use the command: lex source.l This produces the file lex.yy.c which is the C source for the lexical analyzer. To compile this, use: cc -o prog -O lex.yy.c -ll 4.7 2

Versions and Reference Books AT&T lex, GNU flex, and Win32 version lex & yacc,2/e by John R.Levine, Tony Mason & Doug Brown, O Reilly Mastering Regular Expressions, by Jeffrey E.F. Friedl, O Reilly 4.8 General Format of Lex Source 4.9 Input specification file is in 3 parts Declarations: Definitions Rules: Token Descriptions and actions Auxiliary Procedures: User-Written code Three parts are separated by Tips: The first part defines patterns, the third part defines actions, the second part puts together to express If we see some pattern, then we do some action. 4.10 3

Regular Policy of Nonetranslated Source Remember that Lex is turning the rules into a program. Any source not intercepted by Lex is copied into the generated program. Any line which is not part of a Lex rule or action which begins with a blank or tab Anything included between lines containing only %{ and %} Anything after the second delimiter 4.11 Position of Copied Source source input prior to the first external to any function in the generated code after the first and prior to the second appropriate place for declarations in the function generated by Lex which contains the actions after the second after the Lex generated output 4.12 Regular Policy of Translated Source Various variables or tables whose name prefixed by yy yyleng, yysvec[], yywork[] Various functions whose name prefixed by yy yyless(), yymore(), yywarp(), yylex() Various definitions whose name are capital BEGIN, INITIAL 4.13 4

Default Rules and Actions The first and second part must exist, but may be empty, the third part and the second are optional. If the third part dose not contain a main(), - ll will link a default main() which calls yylex() then exits. Unmatched patterns will perform a default action, which consists of copying the input to the output 4.14 Default Input and Output If you don t write your own main() to deal with the input and the output of yylex(), the default input of default main() is stdin and the default output of default main() is stdout. stdin usually is to be keyboard input stdout usually is to be screen output cs20: %./a.out < inputfile > outputfile 4.15 Some Simple Lex Source Examples A minimum lex program: It only copies the input to the output unchanged. A trivial program to delete three spacing characters: [ \t\n]; Another trivial example: [ \t]+$; It deletes from the input all blanks or tabs at the ends of lines. 4.16 5

Token Definitions ( Extended Regular Expression ) Elementary Operations single characters except \. $ ^ [ ] -? * + ( ) / { } % < > concatenation (put characters together) alternation (a b c) [ab] == a b [a-k] == a b c... i j k [a-z0-9] == any letter or digit [^a] == any character but a 4.17 Elementary Operations (cont.). matches any character except the newline * -- Kleene Closure + -- Positive Closure Examples: [0-9]+"."[0-9]+ note: without the quotes it could be any character [ \t]+ -- is whitespace (except CR). There is a blank space character before the \t 4.18 Special Characters:. -- matches any single character (except newline) and \ -- quote the part as text \t -- tab \n -- newline \b -- backspace \" -- double quote \\ -- \? -- this means the preceding was optional ab? == a ab (ab)? == ab ε 4.19 6

Special Characters (cont.) ^ -- means at the beginning of the line (unless it is inside of a [ ]) $ means at the end of the line, same as /\n [^ ] -- means anything except \"[^\"]*\" is a double quoted string {n,m} n through m occurrences a{1,3} is a or aa or aaa {varname} translation of varname from definition / -- matches only if followed by right part of / 0/1 means the 0 of 01 but not 02 or 03 or ( ) -- grouping 4.20 Definitions NAME REG_EXPR digs [0-9]+ integer {digs} plainreal {digs}"."{digs} scientificreal {digs}"."{digs}[ee][+-]?{digs} real {plainreal} {scientificreal} NAME must be a valid C identifier {NAME} is replaced by prior REG_EXPR 4.21 The definitions can also contain variables and other declarations used by the code generated by Lex. These usually go at the start of this section, marked by %{ at the beginning and %} at the end or the line which begins with a blank or tab. includes usually go here. It is usually convenient to maintain a line counter so that error messages can be keyed to the lines in which the errors are found. %{ int linecount = 1; %} [\n] {linecount++;} 4.22 7

Transition Rules RGEXP <one or more blanks> { program statement program statement } A null statement ; will ignore the input Four special options: ECHO; REJECT; BEGIN; The unmatched token is using a default action that ECHO from the input to the output indicates that the action for this rule is from the action for the next rule 4.23 Tokens and Actions Example: {real} return FLOAT; begin return BEGIN; {newline} linecount++; {integer} { printf("i found an integer\n"); return INTEGER; } 4.24 Ambiguous Source Rules If 2 rules match the same pattern, Lex will use the first rule. Lex always chooses the longest matching substring for its tokens. To overide the choice, use action REJECT ex: she {s++; REJECT;} he {h++; REJECT;}. \n ; 4.25 8

Multiple States lex allows the user to explicitly declare multiple states ( in Definitions section ) %start COMMENT, TERMINATE Default states is INITIAL or 0 Transition rules can be classified into different states, which will be matched depending on states BEGIN is used to change state 4.26 Lex Special Variables identifiers used by Lex and Yacc begin with yy yytext -- a string containing the lexeme yyleng -- the length of the lexeme yyin the input stream pointer Example: {integer} { printf("i found an integer\n"); sscanf(yytext,"%d", &yylval); return INTEGER; } C++ Comments -- //... //.* ; 4.27 Lex library function calls yylex() default main() contains a return yylex(); yywrap() called by lexical analyzer if end of the input file default yywrap() always return 1 yyless(n) n characters in yytext are retained yymore() the next input expression recognized is to be tacked on to the end of this input 4.28 9

User Written Code The actions associated with any given token are normally specified using statements in C. But occasionally the actions are complicated enough that it is better to describe them with a function call, and define the function elsewhere. Definitions of this sort go in the last section of the Lex input. 4.29 Example 1 int lengs[100]; [a-z]+ lengs[yyleng]++;. \n ; yywrap() { int i; printf("length #words\n"); for(i=0; i<100; i++) if (lengs[i] > 0) printf("%5d%10d\n",i,lengs[i]); return(1); 4.30 } Example 2 4.31 10

To Use Lex and Yacc Together Lex source Yacc source (Lexical Rules) (Grammar Rules) Lex Yacc Input lex.yy.c y.tab.c call yylex() yyparse() return token Parsed Input 4.32 Using yacc with lex yacc will call yylex() to get the token from the input so that each lex rule should end with: return(token); where the appropriate token value is returned. An easy way is placing the line: #include lex.yy.c in the last section of yacc input. 4.33 To Use Lex and Yacc Together Lex source Yacc source (Lexical Rules) (Grammar Rules) Lex Yacc Input lex.yy.c y.tab.c call yylex() yyparse() return token Parsed Input 4.34 11