Flex and lexical analysis. October 25, 2016

Similar documents
Flex and lexical analysis

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Using Lex or Flex. Prof. James L. Frankel Harvard University

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool

CS143 Handout 04 Summer 2011 June 22, 2011 flex In A Nutshell

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

Program Development Tools. Lexical Analyzers. Lexical Analysis Terms. Attributes for Tokens

CSC 467 Lecture 3: Regular Expressions

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

An introduction to Flex

An Introduction to LEX and YACC. SYSC Programming Languages

Chapter 3 -- Scanner (Lexical Analyzer)

Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou

Edited by Himanshu Mittal. Lexical Analysis Phase

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process

The structure of a compiler

Figure 2.1: Role of Lexical Analyzer

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

TDDD55- Compilers and Interpreters Lesson 2

Chapter 3 Lexical Analysis

Preparing for the ACW Languages & Compilers

Compiler course. Chapter 3 Lexical Analysis

Gechstudentszone.wordpress.com

Lexical and Parser Tools

Type 3 languages. Regular grammars Finite automata. Regular expressions. Deterministic Nondeterministic. a, a, ε, E 1.E 2, E 1 E 2, E 1*, (E 1 )

UNIT - 7 LEX AND YACC - 1

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer.

CS4850 SummerII Lex Primer. Usage Paradigm of Lex. Lex is a tool for creating lexical analyzers. Lexical analyzers tokenize input streams.

Lexical Analysis. Implementing Scanners & LEX: A Lexical Analyzer Tool

Marcello Bersani Ed. 22, via Golgi 42, 3 piano 3769

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

JFlex Regular Expressions

LECTURE 6 Scanning Part 2

In this session we will cover the following sub-topics: 1.Identifiers 2.Variables 3.Keywords 4.Statements 5.Comments 6.Whitespaces 7.Syntax 8.

Parsing and Pattern Recognition

Structure of Programming Languages Lecture 3

Introduction to Lex & Yacc. (flex & bison)

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

EECS483 D1: Project 1 Overview

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CMSC445 Compiler design Blaheta. Project 2: Lexer. Due: 15 February 2012

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CSE302: Compiler Design

CSc 453 Lexical Analysis (Scanning)

Languages and Compilers

Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park

Lexical Analysis - Flex

UNIT -2 LEXICAL ANALYSIS

Compiler Construction

Compiler Construction

PRACTICAL CLASS: Flex & Bison

Scanning. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

LAB MANUAL OF COMPILER DESIGN. Department of Electronics & Computer Engg. Dronacharya College Of Engineering Khentawas, Gurgaon

Yacc: A Syntactic Analysers Generator

JFlex. Lecture 16 Section 3.5, JFlex Manual. Robb T. Koether. Hampden-Sydney College. Mon, Feb 23, 2015

Monday, August 26, 13. Scanners

Contents. Jairo Pava COMS W4115 June 28, 2013 LEARN: Language Reference Manual

Wednesday, September 3, 14. Scanners

Introduction to Lexical Analysis

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

Ray Pereda Unicon Technical Report UTR-02. February 25, Abstract

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

CSCI312 Principles of Programming Languages!

Simple Lexical Analyzer

A Pascal program. Input from the file is read to a buffer program buffer. program xyz(input, output) --- begin A := B + C * 2 end.

Lexical and Syntax Analysis

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

COMPILER CONSTRUCTION Seminar 01 TDDB

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Typescript on LLVM Language Reference Manual

Programming Assignment II

Handout 7, Lex (5/30/2001)

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1

L L G E N. Generator of syntax analyzier (parser)

Ulex: A Lexical Analyzer Generator for Unicon

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1.

Formal Languages and Compilers Lecture VI: Lexical Analysis

Compiler Construction D7011E

Full file at

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3

Programming Assignment I Due Thursday, October 7, 2010 at 11:59pm

LECTURE 7. Lex and Intro to Parsing

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

COMPILER DESIGN LECTURE NOTES

CS 415 Midterm Exam Spring SOLUTION

Introduction to Lexical Analysis

More Examples. Lex/Flex/JLex

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Features of C. Portable Procedural / Modular Structured Language Statically typed Middle level language

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

G Compiler Construction Lecture 4: Lexical Analysis. Mohamed Zahran (aka Z)

Group A Assignment 3(2)

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Transcription:

Flex and lexical analysis October 25, 2016

Flex and lexical analysis From the area of compilers, we get a host of tools to convert text files into programs. The first part of that process is often called lexical analysis, particularly for such languages as C. A good tool for creating lexical analyzers is flex, based on the older lex program. Both take a specification file and create an analyzer, usually called lex.yy.c. Flex is reasonably compatible with the UTF-8 encoding for Unicode.

Terminology A token is a minimal group of characters having collective meaning. A lexeme is an actual character sequence forming a specific instance of a token, such book. A pattern is a rule expressed as a regular expression (or even an extended regex) describing how a particular token can be formed. For example, a common convention for variable name tokens is [A-Za-z][A-Za-z 0-9]*. Characters between tokens are generally called whitespace; these include spaces, tabs, newlines, and formfeeds.

Attributes for tokens One common characteristic of standard lexical analysis (but certainly not universal) is the assignment of types; another is passing attributes to the caller. Upon recognizing a numerical constant, for instance, the scanner might pass back the the type (e.g., integer), the lexeme string, and the value as an C int. Upon recognizing an identifier, the lexer might pass back the type (e.g., identifier), the lexeme string, and a pointer to information about the identifier.

General approaches to lexical analysis Use a tool, like flex or re2c. (Example: code) Write a one-off analyzer in your favorite programming language. (Most common strategy these days.) For example, you can use libc s strtok() function. (Example: code) Write a one-analyzer in assembly. (Usually done for bootstrapping purposes, though lexical analysis in assembly against a mmap(2) can be an exceedingly fast technique.)

Flex While it might be found in some libc s, you might also have to link explicitly with -lfl. The lexer function is called yylex(), and it is quite easy to interface with bison/yacc. *.l file --> flex --> lex.yy.c lex.yy.c --> C compiler --> lexical analyzer input stream --> lexical analyzer --> actions taken when ru

Using Flex Flex source structure: { definitions } %% { rules } %% { user subroutines }

Definitions Declaration of ordinary C variables and whatnot. flex definitions

Rules The form of rules are regularexpression action The actions are C code.

Flex s regular expressions s literal string s \c character c literally [s] ^ [^s] character class beginning of line characters not in character class s? s occurs zero or one time

Flex s regular expressions. any character except newline s* zero or occurrences of s s+ one or more occurrences of s r s {s} r or s grouping $ end of line s{m,n} m through n occurrences of s

Examples a* zero or more a s.* zero or more of any char except newline.+ one or more characters [a-z] [a-za-z] a lower-case letter any letter [^a-za-z] not a letter

Examples a.b rs tu a(b c)d ^start END$ a followed by any char then followed by b rs or tu abd or acd "start" at the beginning of line the characters END followed by end-of-line

Flex actions Actions are just C code. If it is compound, or requires more than a single line, enclose with curly braces. Examples: [a-z]+ printf("found word\n"); [A-Z[a-z]* { printf("found capitalized word:\n"); printf(" %s \n",yytext); }

Flex definitions The form is simply name definition The name is just a word beginning with a letter (or underscore, but I don t recommend those) followed by zero or more letters, underscores, or dashes. The definition actually from the first non-whitespace character to the end of line. You can refer to it via {name}, which will expand to your definition.

Flex definitions For example: DIGIT [0-9] {DIGIT}*\.{DIGIT}+ is equivalent to ([0-9])*\.([0-9])+

Flex example %{ int num_lines = 0; int num_chars = 0; %} %% \n {++num_lines; ++num_chars;}. {++num_chars;} %% int main(int argc, char **argv) { yylex(); printf("# of lines = %d, # of chars = %d\n", num_lines, num_chars); } code

Another example digits [0-9] ltr [a-za-z] alphanum [a-za-z0-9] %% (- \+)*{digits}+ printf("found number: %s \n",yytext) {ltr}(_ {alphanum})* printf("found identifier: %s \n",yyt \. printf("found character: {%s}\n",yyte. { /* ignore others */ } %% int main(int argc, char **argv) { yylex(); } code