Buffering Techniques: Buffer Pairs and Sentinels

Similar documents
UNIT II LEXICAL ANALYSIS

Lexical Analysis (ASU Ch 3, Fig 3.1)

2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Chapter 3 Lexical Analysis

Compiler course. Chapter 3 Lexical Analysis

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

Chapter 3: Lexical Analysis

Figure 2.1: Role of Lexical Analyzer

Formal Languages and Compilers Lecture VI: Lexical Analysis

UNIT -2 LEXICAL ANALYSIS

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

CSc 453 Lexical Analysis (Scanning)

Front End: Lexical Analysis. The Structure of a Compiler

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

1. Lexical Analysis Phase

UNIT I- LEXICAL ANALYSIS. 1.Interpreter: It is one of the translators that translate high level language to low level language.

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

Lexical Analysis. Chapter 2

Part 5 Program Analysis Principles and Techniques

The Language for Specifying Lexical Analyzer

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Lexical Analyzer Scanner

Lexical Analyzer Scanner

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Dixita Kagathara Page 1

Zhizheng Zhang. Southeast University

We use L i to stand for LL L (i times). It is logical to define L 0 to be { }. The union of languages L and M is given by

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

2. Lexical Analysis! Prof. O. Nierstrasz!

Lexical Analysis. Lecture 2-4

A simple syntax-directed

Lexical Analysis. Lecture 3. January 10, 2018

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Lexical Analysis. Introduction

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

COMPILER DESIGN LECTURE NOTES

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

A Simple Syntax-Directed Translator

Introduction to Lexical Analysis

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

Group A Assignment 3(2)

CS 403: Scanning and Parsing

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Lexical Analysis 1 / 52

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Projects for Compilers

[Lexical Analysis] Bikash Balami

LANGUAGE TRANSLATORS

CS308 Compiler Principles Lexical Analyzer Li Jiang


Lexical Analysis. Finite Automata

Lexical Analysis. Finite Automata

Chapter 3. Describing Syntax and Semantics ISBN

Lexical Analysis. Lecture 3-4

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

Recognition of Tokens

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

MATVEC: MATRIX-VECTOR COMPUTATION LANGUAGE REFERENCE MANUAL. John C. Murphy jcm2105 Programming Languages and Translators Professor Stephen Edwards

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Week 2: Syntax Specification, Grammars

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Compiler Design. Lexical Analysis

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Introduction to Lexical Analysis

Cunning Plan. Informal Sketch of Lexical Analysis. Issues in Lexical Analysis. Specifying Lexers

TABLE OF CONTENTS S.No DATE TOPIC PAGE No UNIT I LEXICAL ANALYSIS 1 Introduction to Compiling-Compilers 6 2 Analysis of the source program 7 3 The

2.2 Syntax Definition

A Pascal program. Input from the file is read to a buffer program buffer. program xyz(input, output) --- begin A := B + C * 2 end.

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Programming Language Syntax and Analysis

Compiler Techniques MN1 The nano-c Language

Lexical Analysis. Implementation: Finite Automata

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

G Compiler Construction Lecture 4: Lexical Analysis. Mohamed Zahran (aka Z)

CS664 Compiler Theory and Design LIU 1 of 16 ANTLR. Christopher League* 17 February Figure 1: ANTLR plugin installer

Lecture 4: Syntax Specification

Syntax. In Text: Chapter 3

Lexical Analysis - 1. A. Overview A.a) Role of Lexical Analyzer

B The SLLGEN Parsing System

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Alternation. Kleene Closure. Definition of Regular Expressions

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Academic Formalities. CS Language Translators. Compilers A Sangam. What, When and Why of Compilers

Dr. D.M. Akbar Hussain

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

More Assigned Reading and Exercises on Syntax (for Exam 2)

Transcription:

Week 3 Lexical Analysis Tasks of Lexical Analysis Why separating lexical analysis and parsing? Tokens, Patterns and Lexemes Complex tokens like identifier and numeral are described using regularexpression notations. Attributes for Tokens: provide actual value for a lexeme (a pointer to an entry in symbol table). A symbol table may contain word symbols, names, and numerals. Buffering Techniques: Buffer Pairs and Sentinels Buffer Paris: a buffer contains 2 N-character halves. Each system read system call brings N characters into buffer. Two pointers are maintained: forward and lexeme_ning. Drawback: can not recognize token with length more than N. i = i + 1 ; e n d EOF if forward at of first half then reload second half; forward := forward + 1 else if forward at of second half then reload first half; move forward to ning of first half else forward := forward +1 Sentinels: reduce two comparisons to on e. i = i + 1 ; EOF e n d EOF forward := forward +1 if *forward = EOF then if forward at of first half then reload second half; forward := forward + 1 else if forward at of second half then reload first half; move forward to ning of first half else terminate lexical analysis 1

Token Specification Strings and Languages A language denotes any set of strings over some fixed alphabet. The empty set or the set that contains empty string are languages. Operations on Languages: union, concatenation, and closure. Regular Expressions Regular expressions over alphabet sigma is defined as follows: 1) epsilon is a regular expression that denotes {epsilon 2) if a is a symbol in alphabet, then a is a regular expression that denotes {a. 3) suppose r and s are regular expressions denoting the languages L(r) and L(s). then a. (r) (s) is a regular expression denoting L(r) U L(s). b. (r) (s) is a regular expression denoting L(r)L(s). c. (r)* is a regular expression denoting (L(r))*. d. (r) is a regular expression denoting L(r). A language denoted by a regular expression is said to be a regular set. Regular Definitions A regular definition over sigma is a sequence of definitions of the form d1 -> r1 d2 -> r2 dn -> rn where each di is a distinct name and each ri is a reqular expression over the symbols in sigma U {d1, d2,, di-1. Non-regular sets Some language can not be described by any regular expression. 1) Balanced or nested construct: e.g., the set of all strings of balanced parentheses can not be described by a regular expression. However, it can be specified by a context-free grammar. 2) Repeating strings cannot be described by regular expressions. {wcw w is a string of a s and b s can not be described by any regular expression, nor can it be specified by a context-free grammar. Recognition of Tokens Consider the Pascal- grammar. The terminals program, ;, etc., generate sets of strings given by the following regular definitions. (add Line 41 and 42 to recognize names and numbers where digit = {0, 1, 2,, 9 and letter = {A, B,, Z, a, b,, z) Numeral -> digit+ 2

Name -> letter (letter digit)* program -> program ; -> ; const -> const TypeName -> Name VariableName -> Name ProcedureName -> Name FieldName -> Name ConstantName -> Name We also assume lexemes are separated by white space, consisting of nonnull sequences of blanks, tabs and newlines. The lexical analyzer will do so by comparing a string against the regular definition ws: delim -> blank tab newline ws -> delim+ Our goal is to isolate the lexeme for the next token in the input buffer and produce as output a pair consisting of the appropriate token and attribute-value using the translation table: Regular Expression Token Attribute Value Ws - - Numeral Numeral pointer to table entry Name Name pointer to table entry program program PROGRAM ; Semicolon SEMICOLON const const CONST TypeName Name Pointer to table entry VariableName Name Pointer to table entry ProcedureName Name Pointer to table entry FieldName Name Pointer to table entry ConstantName Name Pointer to table entry The PROGRAM, SEMICOLON, CONST are predefined constants; the pointer to table entry may be the index of the table. Transition Diagrams: Start state, accepting state (double cirle), and asterisk (retract forward pointer). Each edge is associated with a character. If failure occurs in all transition diagrams, a lexical error has been detected and an errorrecovery routine is invoked. For performance consideration, looking for frequently occurring tokens before less frequently occurring ones, e.g., white space. 3

other 4 start [ 0 { ] 1 2 return(bar,bar) delimiter other 3 * return(leftsquare, LSQUARE) A technique for separating word symbols and names is to initialize a symbol table with word symbols. Implementing a Transition Diagram: Procedure Begin c := If c = [] then c := switch (c) { case ] : return(bar, BAR); case c in delimiter: while c in delimiter c:= case { : while c!= c := default: retract(); return(leftsquare, LSQUARE); ; if; Procedure nexttoken() skipseparator(); switch (ch) { 4

; case letter: scanword(); case digit: scannumeral(); case + : scanplus(); case EOF: return(eof); procedure skipseparator() while ch in delimiter and { do switch (ch) { case space: case newline: lineno++; case tab: case { : comment(); ; procedure comment() while ch!= do if ch = { then comment(); else if ch = newline then lineno++; if ch = then else error( comment ); ; Searching Linear search Letter indexing Hashing Direct chaining Linear Probing 5