Using Lex or Flex. Prof. James L. Frankel Harvard University

Similar documents
EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool

CSC 467 Lecture 3: Regular Expressions

Edited by Himanshu Mittal. Lexical Analysis Phase

Lexical and Syntax Analysis

Flex and lexical analysis

Flex and lexical analysis. October 25, 2016

CS4850 SummerII Lex Primer. Usage Paradigm of Lex. Lex is a tool for creating lexical analyzers. Lexical analyzers tokenize input streams.

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process

CS143 Handout 04 Summer 2011 June 22, 2011 flex In A Nutshell

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

An introduction to Flex

The structure of a compiler

Lexical Analysis. Implementing Scanners & LEX: A Lexical Analyzer Tool

An Introduction to LEX and YACC. SYSC Programming Languages

Ray Pereda Unicon Technical Report UTR-02. February 25, Abstract

LEX/Flex Scanner Generator

Marcello Bersani Ed. 22, via Golgi 42, 3 piano 3769

CSE302: Compiler Design

Compil M1 : Front-End

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

File I/O in Flex Scanners

Lexical and Parser Tools

Handout 7, Lex (5/30/2001)

Preparing for the ACW Languages & Compilers

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

Lex (Lesk & Schmidt[Lesk75]) was developed in the early to mid- 70 s.

FSASIM: A Simulator for Finite-State Automata

CSCI Compiler Design

Introduction to Lex & Yacc. (flex & bison)

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Scanning. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

Department : Computer Science & Engineering

PRACTICAL CLASS: Flex & Bison

Type 3 languages. Regular grammars Finite automata. Regular expressions. Deterministic Nondeterministic. a, a, ε, E 1.E 2, E 1 E 2, E 1*, (E 1 )

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

LECTURE 6 Scanning Part 2

Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

TDDD55- Compilers and Interpreters Lesson 2

A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer.

Chapter 3 -- Scanner (Lexical Analyzer)

Yacc: A Syntactic Analysers Generator

2 Input and Output The input of your program is any file with text. The output of your program will be a description of the strings that the program r

JFlex. Lecture 16 Section 3.5, JFlex Manual. Robb T. Koether. Hampden-Sydney College. Mon, Feb 23, 2015

Ulex: A Lexical Analyzer Generator for Unicon

L L G E N. Generator of syntax analyzier (parser)

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Gechstudentszone.wordpress.com

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

Etienne Bernard eb/textes/minimanlexyacc-english.html

Generating a Lexical Analyzer Program

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

LECTURE 11. Semantic Analysis and Yacc

sed Stream Editor Checks for address match, one line at a time, and performs instruction if address matched

Scanning. COMP 520: Compiler Design (4 credits) Alexander Krolik MWF 13:30-14:30, MD 279

Lesson 10. CDT301 Compiler Theory, Spring 2011 Teacher: Linus Källberg

Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/}

Programming Assignment I Due Thursday, October 9, 2008 at 11:59pm

IBM. UNIX System Services Programming Tools. z/os. Version 2 Release 3 SA

Understanding Regular Expressions, Special Characters, and Patterns

1 CS580W-01 Quiz 1 Solution

Program Development Tools. Lexical Analyzers. Lexical Analysis Terms. Attributes for Tokens

As we have seen, token attribute values are supplied via yylval, as in. More on Yacc s value stack

Programming Assignment II

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

JFlex Regular Expressions

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

UNIT - 7 LEX AND YACC - 1

Syntax Analysis Part IV

Lecture 05 I/O statements Printf, Scanf Simple statements, Compound statements

Parsing How parser works?

Lexical and Syntax Analysis

I. OVERVIEW 1 II. INTRODUCTION 3 III. OPERATING PROCEDURE 5 IV. PCLEX 10 V. PCYACC 21. Table of Contents

LECTURE 7. Lex and Intro to Parsing

Language Reference Manual

Programming Assignment I Due Thursday, October 7, 2010 at 11:59pm

Features of C. Portable Procedural / Modular Structured Language Statically typed Middle level language

CS143 Handout 12 Summer 2011 July 1 st, 2011 Introduction to bison

CS143 Handout 05 Summer 2011 June 22, 2011 Programming Project 1: Lexical Analysis

Compiler Construction Assignment 2 Fall 2009

CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017

Principles of Programming Languages

cmps104a 2002q4 Assignment 2 Lexical Analyzer page 1

Parsing and Pattern Recognition

Dr. Sarah Abraham University of Texas at Austin Computer Science Department. Regular Expressions. Elements of Graphics CS324e Spring 2017

More Examples. Lex/Flex/JLex

TABLE OF CONTENTS PREFACE...5 I. INTRODUCTION Typographic Conventions Examples...7. II. Overview of PCLEX History...

C How to Program, 6/e by Pearson Education, Inc. All Rights Reserved.

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Structure of Programming Languages Lecture 3

CSE302: Compiler Design

CS 2210 Programming Project (Part I)

Introduction to Yacc. General Description Input file Output files Parsing conflicts Pseudovariables Examples. Principles of Compilers - 16/03/2006

Characters Lesson Outline

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1.

Lex and YACC primer/howto

Transcription:

Using Lex or Flex Prof. James L. Frankel Harvard University Version of 1:07 PM 26-Sep-2016 Copyright 2016, 2015 James L. Frankel. All rights reserved.

Lex Regular Expressions (1 of 4) Special characters are: \ (back slash) " (double quote). (period) ^ (caret or up arrow) $ (dollar sign) [ (open bracket) ] (close bracket) * (asterisk) + (plus sign)? (question mark) { (open brace) } (close brace) (vertical bar) / (slash) - (dash or hyphen) ( (open parenthesis) ) (close parenthesis)

Lex Regular Expressions (2 of 4) c matches the single non-operator char c \c matches the character c "s" matches the string s. matches any character except newline ^ matches beginning of line $ matches end of line [s] matches any one character in s [^s] matches any one character not in s

Lex Regular Expressions (3 of 4) r* matches zero or more strings matching r r+ matches one or more strings matching r r? matches zero or one strings matching r r{m, n} matches between m and n occurrences of r r 1 r 2 matches r 1 followed by r 2 r 1 r 2 matches either r 1 or r 2 (r) matches r r 1 /r 2 matches r 1 when followed by r 2 {name} matches the regex defined by name

Lex Regular Expressions (4 of 4) Within square brackets, referred to as a character class, all operators are ignored except for backslash, hyphen (dash), and caret Within a character class, backslash will introduce an escape code Within a character class, ranges of characters are allowed by using hyphen a-za-z Within a character class, if caret is the first character in the class, it indicates matching to any character that is not listed in the square brackets In any other position in the class, caret is a normal character in the character class

File Format Extension is.lex Content consists of three sections, as follows: <definitions> %% <rules> %% <user functions>

Definitions Section Anything in the <definitions> sections that is delimited by a line with "%{" to a line with "%}" is copied directly to the output C file This allows user functions to be declared here so that they are declared prior to being called from a rule Each line in the <definitions> section (other than those between "%{" and "%}") has the format: <name> <regex>

Rules Section (1 of 2) The rules section consists of a sequence of rules Each rule has a regular expression pattern that starts in column one followed by whitespace (space, tab, or newline) and optionally followed by either a C statement or a sequence of C statements enclosed in braces If there is no C statement, then the input is consumed, but no action is taken with that input and the lexer will look for a new token When used in a rule, a name enclosed within braces has its associated <regex> substituted This does not happen when the name within braces is quoted There is a default rule which matches any character and copies it to the output

Rules Section (2 of 2) The C code should return the kind of token (referred to as the token type) An optional value of the token may be placed in yylval By default, the type of yylval is int The type of yylval can be changed by using a #define with the preprocessor symbol YYSTYPE If present, this #define should appear at the beginning of the %{ part of the definitions section

Rules Details If two or more regular expression patterns match a string from the input, the rule which matches the longest input string is chosen If two or more regular expression patterns match a string from the input and the input strings are of the same length, then the first rule in the <rules> section is chosen Remember to include a rule for an action on whitespace

Lex Invocation and Return Value Call yylex() to invoke the generated lexer Lex scans for tokens from yyin yyin defaults to stdin Lex continues to scan for tokens until it executes a return statement in a matching rule in the Rules Section or until it reaches end-of-file On end-of-file, flex returns 0 Note: this end-of-file behavior is specific to flex

User Functions Section Any support functions to be used in the rules section should appear in the User Functions section These functions should be declared in the declaration section

Compiling a Lex file lex lexer-standalone.lex or flex lexer-standalone.lex gcc lex.yy.c -c -c means to create an object file, but do not link object file will have the extension ".o" gcc -pedantic -Wall lex.yy.o lexer.c -lfl -o lexer -pedantic means to issue all warnings demanded by Standard C -Wall means to issue many warnings that some users consider questionable -lfl means to link with the flex libraries (on some systems, -ll may be needed to link with lex libraries) -o is used to specify the name of the executable file

Files produced lex reads from stdin or from a specified file and produces a lexer named lex.yy.c lex.yy.c is source code in the C Programming Language that needs to be compiled The user must specify a main program In our example, the main program is in the file named lexer.c This is where yylex is called yylval must be defined in this file

Input and Output By default, input to lex comes from stdin and output goes to stdout The input and output files may be changed FILE *yyin in the input file FILE *yyout is the output file An optional function int yywrap(void) is called when input is exhausted It should return 1 if lexical analysis is done It should return 0 if more actions are required This allows yyin to be set to a subsequent file and then lex processing to continue with that file

Special Symbols yytext the matched string as a null terminated string yyleng the length of the matched string yylex() the name of the generated lexer function yylval the value of the token matched yyin the input file yyout the output file yywrap() function called on end of input