Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou

Similar documents
EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

Using Lex or Flex. Prof. James L. Frankel Harvard University

CSC 467 Lecture 3: Regular Expressions

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

File I/O in Flex Scanners

Edited by Himanshu Mittal. Lexical Analysis Phase

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

LEX/Flex Scanner Generator

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

An introduction to Flex

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Languages and Compilers

The structure of a compiler

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Handout 7, Lex (5/30/2001)

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool

CS143 Handout 04 Summer 2011 June 22, 2011 flex In A Nutshell

Ray Pereda Unicon Technical Report UTR-02. February 25, Abstract

Flex and lexical analysis. October 25, 2016

CS4850 SummerII Lex Primer. Usage Paradigm of Lex. Lex is a tool for creating lexical analyzers. Lexical analyzers tokenize input streams.

JFlex Regular Expressions

Marcello Bersani Ed. 22, via Golgi 42, 3 piano 3769

A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer.

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

LECTURE 6 Scanning Part 2

Lexical Analysis. Implementing Scanners & LEX: A Lexical Analyzer Tool

Gechstudentszone.wordpress.com

CSE302: Compiler Design

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

TDDD55- Compilers and Interpreters Lesson 2

Lex (Lesk & Schmidt[Lesk75]) was developed in the early to mid- 70 s.

Lexical and Parser Tools

FSASIM: A Simulator for Finite-State Automata

Typescript on LLVM Language Reference Manual

Contents. Jairo Pava COMS W4115 June 28, 2013 LEARN: Language Reference Manual

I. OVERVIEW 1 II. INTRODUCTION 3 III. OPERATING PROCEDURE 5 IV. PCLEX 10 V. PCYACC 21. Table of Contents

LECTURE 7. Lex and Intro to Parsing

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Structure of Programming Languages Lecture 3

Chapter 3 -- Scanner (Lexical Analyzer)

Ulex: A Lexical Analyzer Generator for Unicon

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

UNIT - 7 LEX AND YACC - 1

Figure 2.1: Role of Lexical Analyzer

ML 4 A Lexer for OCaml s Type System

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

Flex and lexical analysis

Department : Computer Science & Engineering

Compil M1 : Front-End

COMPILER CONSTRUCTION Seminar 02 TDDB44

Decaf Language Reference Manual

Lexical and Syntax Analysis

CS 297 Report. By Yan Yao

The Language for Specifying Lexical Analyzer

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

COP4020 Programming Assignment 1 CALC Interpreter/Translator Due March 4, 2015

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Lexical Analysis and jflex

\n is used in a string to indicate the newline character. An expression produces data. The simplest expression

VENTURE. Section 1. Lexical Elements. 1.1 Identifiers. 1.2 Keywords. 1.3 Literals

TED Language Reference Manual

CSc 10200! Introduction to Computing. Lecture 2-3 Edgardo Molina Fall 2013 City College of New York

Yacc: A Syntactic Analysers Generator

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

CS143 Handout 05 Summer 2011 June 22, 2011 Programming Project 1: Lexical Analysis

PRACTICAL CLASS: Flex & Bison

Full file at C How to Program, 6/e Multiple Choice Test Bank

Java+- Language Reference Manual

TDDD55 - Compilers and Interpreters Lesson 3

Understanding Regular Expressions, Special Characters, and Patterns

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

X Language Definition

Fundamentals of Programming Session 4

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

UNIVERSITY OF COLUMBIA ADL++ Architecture Description Language. Alan Khara 5/14/2014

BoredGames Language Reference Manual A Language for Board Games. Brandon Kessler (bpk2107) and Kristen Wise (kew2132)

Simple Lexical Analyzer

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3

Parser Design. Neil Mitchell. June 25, 2004

Parsing and Pattern Recognition

Dr. Sarah Abraham University of Texas at Austin Computer Science Department. Regular Expressions. Elements of Graphics CS324e Spring 2017

JME Language Reference Manual

IBM. UNIX System Services Programming Tools. z/os. Version 2 Release 3 SA

L L G E N. Generator of syntax analyzier (parser)

Chapter 1 Summary. Chapter 2 Summary. end of a string, in which case the string can span multiple lines.

An Introduction to LEX and YACC. SYSC Programming Languages

CPSC 434 Lecture 3, Page 1

MP 3 A Lexer for MiniJava

Parser Tools: lex and yacc-style Parsing

12/22/11. Java How to Program, 9/e. Help you get started with Eclipse and NetBeans integrated development environments.

Program Development Tools. Lexical Analyzers. Lexical Analysis Terms. Attributes for Tokens

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

Transcription:

Lecture Outline COMP-421 Compiler Design! Lexical Analyzer Lex! Lex Examples Presented by Dr Ioanna Dionysiou Figures and part of the lecture notes taken from A compact guide to lex&yacc, epaperpress.com Copyright (c) 2012 Ioanna Dionysiou 2 What is Lex?! Lex is a tool for building lexical analyzers or lexers! A lexer takes an arbitrary input stream and tokenizes it Divides it into lexical tokens Tokenized output can be furthered processed, usually by yacc It can be the end product Lex Specification! Lex specification Create a set of patterns which lex matches against the input When one of the patterns matches the lex program invokes C code you provide this code that does something with the matched text Lex itself does not produce an executable program it translates the lex specification into a file containing a C routine called yylex() your program calls yylex() to run the lexer using C compiler, you compile the file that lex produced Copyright (c) 2012 Ioanna Dionysiou 3 Copyright (c) 2012 Ioanna Dionysiou 4

Lex in Compilation Sequence Lex and Yacc Copyright (c) 2012 Ioanna Dionysiou 5 Copyright (c) 2012 Ioanna Dionysiou 6 Lex Pattern Matching Lex Pattern Matching! Lex is using a rich regular expression language Any regular expression can be expressed as a FSA Lex is using regular expressions for pattern matching There are limitations though Lex only has states and transitions between states Lex cannot be used to recognize nested structures such as parentheses Nested structures are handled by incorporating a stack Yacc augments the DFA with a stack and can process those easily Copyright (c) 2012 Ioanna Dionysiou 7 Copyright (c) 2012 Ioanna Dionysiou 8

Pattern Matching Primitives Pattern Matching Primitives Pattern Matches. Any single character except newline * Zero or more occurrences of the preceding expression + One or more occurrences of the preceding expression? Zero or one occurrence of the preceding expression Literal within the quotation marks, C escape sequences are recognized (\n, \t, ) [ ] Character class (matches any character within brackets). If the first character is ^ it changes the meaning to match any character except the ones within the brackets A dash - indicates a character range A dash - or bracket as the first character lets you include dashes and brackets in character classes C escape sequences with \ are recognized Pattern ^ Matches Beginning of line as the first character (also used for negation within square brackets \ Used to escape sequences Either the preceding expression or the following / Matches the preceding expression but only if followed by the following regular expression () Sequences of characters Copyright (c) 2012 Ioanna Dionysiou 9 Copyright (c) 2012 Ioanna Dionysiou 10 Pattern Matching Examples Pattern Matching Examples if [\n\t ].* #.* \/\/.* Copyright (c) 2012 Ioanna Dionysiou 11 Copyright (c) 2012 Ioanna Dionysiou 12

Pattern Matching Rule! If two patterns match the same string, the longest match wins! In case both matches are the same length, then the first pattern listed is used Lex Specification Sections! A lex specification consists of three sections: Definition section Rules section User subroutines section The parts are separated by lines consisting of %% The first two parts are required, although a part may be empty The third part and the preceding %% may be omitted Copyright (c) 2012 Ioanna Dionysiou 13 Copyright (c) 2012 Ioanna Dionysiou 14! It can include literal block definitions Definition Section Definition Section - Literal Block! The literal block in the definition section is C code bracketed by the lines %{ and %} %{ C code declarations %}! C code is copied verbatim to the generated C source file near the beginning, before the beginning of yylex() It usually contains declarations of variables and functions used by code in the rules section, #include lines for header files Copyright (c) 2012 Ioanna Dionysiou 15 Copyright (c) 2012 Ioanna Dionysiou 16

Definition Section - Definitions! Definitions (or substitutions) allow you to name a regular expression (or part of it ) and refer to it by name in the rules section It is useful to break up complex expressions! Definition Syntax NAME expression where name can contain letters, digits, underscores, and must not start with a digit. Some implementations allow hyphen as well e.g. DIGIT [0-9] Rules Section! It contains pattern lines and C code A line that starts with a whitespace or material enclosed in %{ and %} is C code A line that starts with anything else is a pattern line Copyright (c) 2012 Ioanna Dionysiou 17 Copyright (c) 2012 Ioanna Dionysiou 18 C code lines! They are copied verbatim to the generated C file! Lines at the beginning of the section are placed near the beginning of the generated yylex() function Should be declarations of variables used by code associated with the patterns initialization code for the scanner C code lines anywhere else are copied to an unspecified place in the generated file should contain only comments Pattern lines! They contain a pattern followed by some whitespace and C code to execute when the input matches the pattern If C code is more than one statement or spans multiple lines, it must be enclosed in braces {} Copyright (c) 2012 Ioanna Dionysiou 19 Copyright (c) 2012 Ioanna Dionysiou 20

Pattern lines! It may include references to definitions with the name in braces {DIGIT}+ Pattern Matching while Scanning input! Lexer runs matches the input against the patterns in the rules section. Every time it finds a match (token) it executes the C code associated with that pattern If a pattern is followed by (instead of C code), the pattern uses the same C code as the next pattern in the file When an input character matches no pattern, the lexer acts as though it matched a pattern whose code is ECHO ECHO is a macro that writes code matched by the pattern» writes a copy of the token to the output Copyright (c) 2012 Ioanna Dionysiou 21 Copyright (c) 2012 Ioanna Dionysiou 22 User Subroutines Section Lex Predefined Variables! The contents of this section are copied verbatim by lex to the C file This section typically includes routines called from the rules If you redefine input(), unput(), output() or yywrap(), then the new versions might be here Copyright (c) 2012 Ioanna Dionysiou 23 Copyright (c) 2012 Ioanna Dionysiou 24

Lex Example Lex Example Empty definition section Two patterns specified in rules section Defaults yyin is stdin yyout is stdout You may change these to read from and write to a file respectively Copyright (c) 2012 Ioanna Dionysiou 25 Copyright (c) 2012 Ioanna Dionysiou 26 Lex Example Lex example 2 yywrap() is called when lex encounters an end of file. It returns either 1 (done) or 0 (more processing is required) Definition Section : 1 declaration Rules Section: 1 pattern Subroutine Section : main routine Copyright (c) 2012 Ioanna Dionysiou 27 Copyright (c) 2012 Ioanna Dionysiou 28

Lex Example 3 Lex Example 4 Definition Section : 2 definitions, 1 declaration Rules Section: 1 pattern Subroutine Section : main routine Definition Section : 3 declarations Rules Section: 2 patterns Subroutine Section : main routine Copyright (c) 2012 Ioanna Dionysiou 29 Copyright (c) 2012 Ioanna Dionysiou 30