Flex and lexical analysis

Similar documents
Flex and lexical analysis. October 25, 2016

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Program Development Tools. Lexical Analyzers. Lexical Analysis Terms. Attributes for Tokens

Using Lex or Flex. Prof. James L. Frankel Harvard University

An Introduction to LEX and YACC. SYSC Programming Languages

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool

The structure of a compiler

CS143 Handout 04 Summer 2011 June 22, 2011 flex In A Nutshell

An introduction to Flex

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

Edited by Himanshu Mittal. Lexical Analysis Phase

CSC 467 Lecture 3: Regular Expressions

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process

Chapter 3 -- Scanner (Lexical Analyzer)

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

Figure 2.1: Role of Lexical Analyzer

Lexical and Parser Tools

TDDD55- Compilers and Interpreters Lesson 2

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park

Introduction to Lex & Yacc. (flex & bison)

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

Gechstudentszone.wordpress.com

Preparing for the ACW Languages & Compilers

Chapter 3 Lexical Analysis

PRACTICAL CLASS: Flex & Bison

Lexical Analysis. Implementing Scanners & LEX: A Lexical Analyzer Tool

Compiler course. Chapter 3 Lexical Analysis

Type 3 languages. Regular grammars Finite automata. Regular expressions. Deterministic Nondeterministic. a, a, ε, E 1.E 2, E 1 E 2, E 1*, (E 1 )

Marcello Bersani Ed. 22, via Golgi 42, 3 piano 3769

UNIT - 7 LEX AND YACC - 1

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou

Compiler Construction

CSE302: Compiler Design

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Compiler Construction

Structure of Programming Languages Lecture 3

CS4850 SummerII Lex Primer. Usage Paradigm of Lex. Lex is a tool for creating lexical analyzers. Lexical analyzers tokenize input streams.

Lexical Analysis - Flex

LECTURE 6 Scanning Part 2

Lexical and Syntax Analysis

LECTURE 7. Lex and Intro to Parsing

Parsing and Pattern Recognition

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

COMPILER CONSTRUCTION Seminar 01 TDDB

Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/}

Ray Pereda Unicon Technical Report UTR-02. February 25, Abstract

EECS483 D1: Project 1 Overview

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1

JFlex Regular Expressions

Part 2 Lexical analysis. Lexical analysis 44

CMSC445 Compiler design Blaheta. Project 2: Lexer. Due: 15 February 2012

Handout 7, Lex (5/30/2001)

Compil M1 : Front-End

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1.

Compiler Lab. Introduction to tools Lex and Yacc

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

L L G E N. Generator of syntax analyzier (parser)

LECTURE 11. Semantic Analysis and Yacc

UNIT -2 LEXICAL ANALYSIS

A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer.

CS 403: Scanning and Parsing

Scanning. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

In this session we will cover the following sub-topics: 1.Identifiers 2.Variables 3.Keywords 4.Statements 5.Comments 6.Whitespaces 7.Syntax 8.

Lexical and Syntax Analysis

LAB MANUAL OF COMPILER DESIGN. Department of Electronics & Computer Engg. Dronacharya College Of Engineering Khentawas, Gurgaon

Group A Assignment 3(2)

CS 326 Operating Systems C Programming. Greg Benson Department of Computer Science University of San Francisco

Syntax Analysis Part VIII

Simple Lexical Analyzer

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

Monday, August 26, 13. Scanners

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Wednesday, September 3, 14. Scanners

(F)lex & Bison/Yacc. Language Tools for C/C++ CS 550 Programming Languages. Alexander Gutierrez

Lexical Analysis (ASU Ch 3, Fig 3.1)

Lexical Analysis with Flex

CSc 453 Lexical Analysis (Scanning)

Syntax Analysis Part IV

As we have seen, token attribute values are supplied via yylval, as in. More on Yacc s value stack

c) Comments do not cause any machine language object code to be generated. d) Lengthy comments can cause poor execution-time performance.

LEX/Flex Scanner Generator

Lex (Lesk & Schmidt[Lesk75]) was developed in the early to mid- 70 s.

Compiler Construction Assignment 2 Fall 2009

Lexical Analysis and jflex

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Advances in Compilers

FSASIM: A Simulator for Finite-State Automata

Hello, World! in C. Johann Myrkraverk Oskarsson October 23, The Quintessential Example Program 1. I Printing Text 2. II The Main Function 3

2. Numbers In, Numbers Out

2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively

Computer Science & Information Technology (CS) Rank under AIR 100. Examination Oriented Theory, Practice Set Key concepts, Analysis & Summary

Automatic Scanning and Parsing using LEX and YACC

Ulex: A Lexical Analyzer Generator for Unicon

Transcription:

Flex and lexical analysis From the area of compilers, we get a host of tools to convert text files into programs. The first part of that process is often called lexical analysis, particularly for such languages as C. A good tool for creating lexical analyzers is flex. It takes a specification file and creates an analyzer, usually called lex.yy.c.

Lexical analysis terms A token is a group of characters having collective meaning. A lexeme is an actual character sequence forming a specific instance of a token, such as num. A pattern is a rule expressed as a regular expression and describing how a particular token can be formed. For example, [A-Za-z][A-Za-z_0-9]* is a rule. Characters between tokens are called whitespace; these include spaces, tabs, newlines, and formfeeds. Many people also count comments as whitespace, though since some tools such as lint/splint look at comments, this conflation is not perfect.

Attributes for tokens Tokens can have attributes that can be passed back to the calling function. Constants could have the value of the constant, for instance. Identifiers might have a pointer to a location where information is kept about the identifier.

Some general approaches to lexical analysis Use a lexical analyzer generator tool, such as flex. Write a one-off lexical analyzer in a traditional programming language. Write a one-off lexical analyzer in assembly language.

Flex - our lexical analyzer generator Is linked with its library (libfl.a) using -lfl as a compile-time option (or is now sometimes/often found in libc). Can be called as yylex(). It is easy to interface with bison/yacc. *l file lex lex.yy.c lex.yy.c and gcc lexical analyzer other files input stream lexical analyzer actions taken when rules applied

Flex specifications Lex source: { definitions } %% { rules } %% { user subroutines }

Definitions Declarations of ordinary C variables and constants. flex definitions

Rules The form of rules are: regularexpression The actions are C/C++ code. action

Flex regular expressions s string s literally \c character c literally, where c would normally be a lex operat [s] ^ [^s] [s-t] character class indicates beginning of line characters not in character class range of characters s? s occurs zero or one time

Flex regular expressions, continued. any character except newline s* zero or more occurrences of s s+ one or more occurrences of s r s (s) r or s grouping $ end of line s/r s{m,n} s iff followed by r (not recommended) (r is *NOT* consumed) m through n occurences of s

Examples of regular expressions in flex a* zero or more a s.* zero or more of any character except newline.+ one or more characters [a-z] [a-za-z] [^a-za-z] a.b rs tu a(b c)d ^start END$ a lowercase letter any alphabetic letter any non-alphabetic character a followed by any character followed by b rs or tu abd or acd beginning of line with then the literal characters start the characters END followed by an end-of-line.

Flex actions Actions are C source fragments. If it is compound, or takes more than one line, enclose with braces ( { } ). Example rules: [a-z]+ [A-Z][a-z]* printf("found word\n"); { printf("found capitalized word:\n"); printf(" %s \n",yytext); }

Flex definitions The form is simply name definition The name is just a word beginning with a letter (or an underscore, but I don t recommend those for general use) followed by zero or more letters, underscore, or dash. The definition actually goes from the first non-whitespace character to the end of line. You can refer to it via {name}, which will expand to (definition). (cite: this is largely from man flex.) For example: DIGIT [0-9] Now if you have a rule that looks like {DIGIT}*\.{DIGIT}+ that is the same as writing ([0-9])*\.([0-9])+

An example Flex program /* either indent or use %{ %} */ %{ int num_lines = 0; int num_chars = 0; %} %% \n ++num_lines; ++num_chars;. ++num_chars; %% int main(int argc, char **argv) { yylex(); printf("# of lines = %d, # of chars = %d\n", num_lines, num_chars ); }

Another example program digits [0-9] ltr [a-za-z] alphanum [a-za-z0-9] %% (- \+)*{digits}+ printf("found number: %s \n", yytext); {ltr}(_ {alphanum})* printf("found identifer: %s \n", yytext);. printf("found character: {%s}\n", yytext);. { /* absorb others */ } %% int main(int argc, char **argv) { yylex(); }