Front End: Lexical Analysis. The Structure of a Compiler

Similar documents
Lexical Analyzer Scanner

Lexical Analyzer Scanner

Zhizheng Zhang. Southeast University

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Formal Languages and Compilers Lecture VI: Lexical Analysis

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Compiler course. Chapter 3 Lexical Analysis

Chapter 3 Lexical Analysis

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

Lexical Analysis. Implementation: Finite Automata

Lexical Analysis. Chapter 2

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

Lexical Analysis. Prof. James L. Frankel Harvard University

CS308 Compiler Principles Lexical Analyzer Li Jiang

Figure 2.1: Role of Lexical Analyzer

UNIT -2 LEXICAL ANALYSIS

Implementation of Lexical Analysis

Lexical Analysis. Lecture 2-4

6 NFA and Regular Expressions

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Lexical Analysis (ASU Ch 3, Fig 3.1)

Implementation of Lexical Analysis

Lexical Analysis. Lecture 3-4

Introduction to Lexical Analysis

Lexical Analysis. Lecture 3. January 10, 2018

Implementation of Lexical Analysis

CSE302: Compiler Design

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

2. Lexical Analysis! Prof. O. Nierstrasz!

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Compiler Construction

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

Theory of Computations Spring 2016 Practice Final Exam Solutions

UNIT II LEXICAL ANALYSIS

CT32 COMPUTER NETWORKS DEC 2015

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Compiler Construction

Implementation of Lexical Analysis. Lecture 4

CS415 Compilers. Lexical Analysis

Lexical Analysis. Finite Automata

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

UNIT I- LEXICAL ANALYSIS. 1.Interpreter: It is one of the translators that translate high level language to low level language.

Midterm I (Solutions) CS164, Spring 2002

LANGUAGE TRANSLATORS

Lexical Analysis. Finite Automata

CSc 453 Lexical Analysis (Scanning)

Compiler Construction

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Automata Theory TEST 1 Answers Max points: 156 Grade basis: 150 Median grade: 81%

CSE450. Translation of Programming Languages. Automata, Simple Language Design Principles

Lexical Analysis 1 / 52

CS2 Language Processing note 3

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2004 Columbia University Department of Computer Science

Lexical Analysis/Scanning

Lexical Analysis. Finite Automata. (Part 2 of 2)

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

CS 403: Scanning and Parsing

Skyup's Media. PART-B 2) Construct a Mealy machine which is equivalent to the Moore machine given in table.

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

[Lexical Analysis] Bikash Balami

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

CSE302: Compiler Design

Kinder, Gentler Nation

Lexical Analysis. Introduction

Dixita Kagathara Page 1

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Buffering Techniques: Buffer Pairs and Sentinels

CS 314 Principles of Programming Languages. Lecture 3

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Regular Languages and Regular Expressions

Compiler Design Aug 1996

Lexical Analysis - 2

2068 (I) Attempt all questions.

G Compiler Construction Lecture 4: Lexical Analysis. Mohamed Zahran (aka Z)

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Implementation of Lexical Analysis

Lexical and Syntax Analysis

CS 310: State Transition Diagrams

Cunning Plan. Informal Sketch of Lexical Analysis. Issues in Lexical Analysis. Specifying Lexers

Languages and Compilers

Part 5 Program Analysis Principles and Techniques

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

CS 314 Principles of Programming Languages

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Transcription:

Front End: Lexical Analysis The Structure of a Compiler

Constructing a Lexical Analyser By hand: Identify lexemes in input and return tokens Automatically: Lexical-Analyser generator We will learn about Lex First we need to introduce: Regular expressions Non-deterministic automata Deterministic automata

Scanning and Parsing

Important Notions Token: pair consisting of (token-name, opt-value) Pattern: form of the lexemes for a token Lexemes: sequence of characters matching the pattern for a token Example printf("total = %d/n", score); printf and score are lexemes for token id that matches pattern in Table 3.2

Classes of tokens

Languages An alphabet is any finite set of symbols A string over an alphabet is a finite sequence of symbols from that alphabet A language is any countable set of strings over a fixed alphabet. s denotes the length of a string ε is the string of length 0 Exponentiation: s 0 = ε s i = s i 1 s

Operations on Strings Parts of a string: example string necessary prefix : deleting zero or more tailing characters; eg: nece suffix : deleting zero or more leading characters; eg: ssary substring : deleting prefix and suffix; eg: ssa subsequence : deleting zero or more not necessarily contiguous symbols; eg: ncsay proper prefix, suffix, substring or subsequence: one that cannot equal to the original string;

Operations on Languages We define: L 0 = {ε} L i = L i 1 L.

Regular Expressions Definition A regular expression is defined inductively as follows: Basis ε is a regular expression denoting the language L(ε) = {ε} If a Σ then a is a regular expression denoting L(a) = {a} Induction: if r and s are regular expression with languages L(r) and L(s) (r) (s), (r)(s), (r), (r) are r.e. denoting L(r) L(s), L(r)L(s), (L(r)) and L(r).

Examples and Properties Let Σ = {a, b}. a b denotes the language {a, b} What are the laguages denoted by (a b)(a b), a, (a b), a a b.

Non-regular sets Balanced or nested construct Example: if cond 1 then if cond 2 then else else Can be recognized by context free grammars. Matching strings: {wcw}, where w is a string of a s and b s and c is a legal symbol. Cannot be recognized even using context free grammars. Remark: anything that needs to memorize non-constant amount of information happened in the past cannot be recognized by regular expressions.

Recognition of Tokens Problem: Find prefixes of the input string that match the patterns. Example Consider the following grammar

Example (ctdn.)

Example (ctdn.)

Language Recognisers Definition A Finite Automata is a transition graph that recognises whether an input string belongs to a given regular language or not. Nondeterministic Finite Automata (NFA) Deterministic Finite Automata (DFA) Both recognise the same languages, i.e. the regular languages.

NFA A NFA consists of: A finite set of states S A alphabet Σ of input symbols A transition function move : S Σ {ε} P(S) A initial state s 0 S A set of accepting states F S.

An Example of NFA

Executing a NFA An NFA accepts an input string x if and only if there is some path in the transition graph initiating from the starting state to some accepting state such that the edge labels along the path spell out x. (a b)*abb input string: aabb a start a b b 0 1 2 3 b a b 0 {0,1} {0} 1 {2} 2 {3} 0 a 0 a 1 b 2 b 3 Accept! 0 a 0 a 0 b 0 b 0 this is not an accepting path

DFA A DFA is a special case of NFA where there are no ε-transitions For each s S and a Σ there is exactly one transition from s labelled a. How to execute a DFA?

An Example of DFA

How to implement a NFA? Recall: A NFA for language (a b) abb is Because of the non-deterministic choices, simulating a NFA is not as straightforward as for a DFA. Convert NFA in DFA!

Algorithm for Subset Construction Given NFA N constructs DFA D by simulating in parallel all the moves N can make on a input string.

Example

Example (ctdn.)