Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Similar documents
CS415 Compilers. Lexical Analysis

Lexical Analysis. Introduction

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Intermediate Representations

Figure 2.1: Role of Lexical Analyzer

Front End. Hwansoo Han

Lecture 3: CUDA Programming & Compiler Front End

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Structure of Programming Languages Lecture 3

ECS 120 Lesson 7 Regular Expressions, Pt. 1

CSc 453 Lexical Analysis (Scanning)

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Lexical Analysis. Lecture 3-4

Intermediate Representations

Syntax Analysis, VII One more LR(1) example, plus some more stuff. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. target code.

Parsing II Top-down parsing. Comp 412

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Implementation of Lexical Analysis

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Lexical Analysis. Chapter 2

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Implementation of Lexical Analysis

Regular Expressions. Regular Expressions. Regular Languages. Specifying Languages. Regular Expressions. Kleene Star Operation

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

Local Optimization: Value Numbering The Desert Island Optimization. Comp 412 COMP 412 FALL Chapter 8 in EaC2e. target code

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Syntax Analysis, VI Examples from LR Parsing. Comp 412

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Lexical Analysis, V Implemen'ng Scanners. Comp 412 COMP 412 FALL Sec0on 2.5 in EaC2e. target code. source code Front End OpMmizer Back End

CSE 105 THEORY OF COMPUTATION

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Part 5 Program Analysis Principles and Techniques

Syntax Analysis, III Comp 412

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Compiling Regular Expressions COMP360

2. Lexical Analysis! Prof. O. Nierstrasz!

Lexical Analysis. Lecture 2-4

Formal Languages and Compilers Lecture VI: Lexical Analysis

Syntax Analysis, III Comp 412

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Introduction to Lexical Analysis

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Instruction Selection: Preliminaries. Comp 412

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Lexical Analysis. Lecture 3. January 10, 2018

Lexical Analysis. Implementation: Finite Automata

Compiler Construction LECTURE # 3

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Lexical Analysis. Finite Automata

CS 406/534 Compiler Construction Putting It All Together

Compiler Construction

Dr. D.M. Akbar Hussain

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS606- compiler instruction Solved MCQS From Midterm Papers

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

Compiler Construction

CS308 Compiler Principles Lexical Analyzer Li Jiang

Chapter 2 - Programming Language Syntax. September 20, 2017

Lexical Analysis. Prof. James L. Frankel Harvard University

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Cunning Plan. Informal Sketch of Lexical Analysis. Issues in Lexical Analysis. Specifying Lexers

A simple syntax-directed

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

Lexical Analysis. Finite Automata

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

Building a Parser Part III

The Compiler s Front End (viewed from a lab 1 persepc2ve) Comp 412 COMP 412 FALL Chapter 1 & 2 in EaC2e. target code

UNIT II LEXICAL ANALYSIS

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Theory and Compiling COMP360

Lecture 4: Syntax Specification

SEM / YEAR : VI / III CS2352 PRINCIPLES OF COMPLIERS DESIGN UNIT I - LEXICAL ANALYSIS PART - A

2068 (I) Attempt all questions.

Computing Inside The Parser Syntax-Directed Translation. Comp 412

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Lexical Scanning COMP360

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412

The Processor Memory Hierarchy

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Implementation of Lexical Analysis

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions


CST-402(T): Language Processors

Lexical Analysis 1 / 52

Languages and Compilers

Implementation of Lexical Analysis. Lecture 4

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Transcription:

Lexical Analysis Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. Traditional Three-part Compiler 1 1

The Front End Source Code Front End IR Optimizer (Middle End) IR Back End Machine code Errors The purpose of the front end is to deal with the input language Perform a membership test: code Î source language? Is the program well-formed (semantically)? Build an IR version of the code for the rest of the compiler The front end deals with form (syntax) & meaning (semantics) 2 The Front End stream of characters Scanner microsyntax stream of tokens Parser syntax IR + annotations Errors Why separate the scanner and the parser? Scanner classifies words Parser constructs grammatical derivations Parsing is harder and slower Scanner is only pass that touches every character of the input. Separation simplifies the implementation Scanners are simple Scanner leads to a faster, smaller parser token is a pair <part of speech, lexeme > 3 2

Lexical analygator. [http://www.craftinginterpreters.com/scanning.html] 4 Recognizing the word new Comp 412, Fall 2010 5 3

Recognizing new, not, while 6 Finite Automata (S, Σ, δ, s 0, S A ) S: finite set of states Σ: alphabet δ: transition function s 0 : start state S A : set of accepting states 7 4

More complex words Comp 412, Fall 2010 8 Represent the transition function as a table The recognizer code can be used for other cases as well. E.g: Just change the table. 9 5

The next question Finite automata are good and useful, but not concise. We need a concise notation that can be transformed into FA s. Regular Expressions 10 Set Operations (review) Operation Union of L and M written L M Concatenation of L and M written LM Kleene closure of L written L * Positive closure of L written L + Definition L M = { s s L or s M } LM = { st s L and t M } L* = 0 i L i L+ = 1 i L i These definitions should be well known 11 6

Regular Expressions Regular Expression (over alphabet S) e is a RE denoting the set {e} If a is in S, then a is a RE denoting {a} If x and y are REs denoting L(x) and L(y) then x y is an RE denoting L(x) È L(y) xy is an RE denoting L(x)L(y) x * is an RE denoting L(x)* Precedence is closure, then concatenation, then alternation 12 Examples of Regular Expressions Identifiers: Letter (a b c z A B C Z) Digit (0 1 2 9) Identifier Letter ( Letter Digit ) * shorthand for Numbers: (a b c z A B C Z) ((a b c z A B C Z) (0 1 2 9)) * Integer (+ - e) (0 (1 2 3 9)(Digit * ) ) Decimal Integer. Digit * Real ( Integer Decimal ) E (+ - e) Digit * Complex ( Real, Real ) Numbers can get much more complicated! Using symbolic names does not imply recursion underlining indicates a letter in the input stream 13 7

Regular Expressions So what s the point? We use regular expressions to specify the mapping of words to parts of speech for the lexical analyzer Using results from automata theory and theory of algorithms, we can automate construction of recognizers from REs Þ We study REs and associated theory to automate scanner construction! Þ Fortunately, the automatic techiques lead to fast scanners used in text editors, URL filtering software, 14 Example Consider the problem of recognizing ILOC register names Register r (0 1 2 9) (0 1 2 9) * Allows registers of arbitrary number Requires at least one digit RE corresponds to a recognizer (or DFA) r (0 1 2 9) S 0 S 1 S 2 (0 1 2 9) Recognizer for Register accepting state Transitions on other inputs go to an error state, s e 15 8

Example RE for C/Java-style single-line comments //(^\n)*\n RE for C/Java-style multi-line comments / (^ )* / / (^ +^/)* / (better) More states implies a larger table. The larger table might have mattered when computers had 128 KB or 640 KB of RAM. Today, when a cell phone has 16 megabytes and a laptop has gigabytes, the concern seems outdated. Examples All strings of 1s and 0s ending in a 1 ( 0 1 ) * 1 All strings over lowercase letters where the vowels (a,e,i,o, & u) occur exactly once, in ascending order Let Cons be (b c d f g h j k l m n p q r s t v w x y z) Cons * a Cons * e Cons * i Cons * o Cons * u Cons * All strings of 1s and 0s that do not contain three 0s in a row: ( 1 * ( e 01 001 ) 1 * ) * ( e 0 00 ) 17 9

Next Step RE NFA (Thompson s construction) Build an NFA for each term Combine them with e-moves NFA DFA (Subset construction) Build the simulation DFA Minimal DFA Hopcroft s algorithm The Cycle of Constructions RE NFA DFA DFA RE All pairs, all paths problem Union together paths from s 0 to a final state minimal DFA In another course 18 What About Hand-Coded Scanners? Many (most?) modern compilers use hand-coded scanners Starting from a DFA simplifies design & understanding Avoiding straight-jacket of a tool allows flexibility Computing the value of an integer In LEX or FLEX, many folks use sscanf() & touch chars many times Can use old assembly trick and compute value as it appears We preferred this approach in our Cool scanner. Combine similar states Handling keywords Instead of having explicit RE for each keyword, first recognize them as ordinary identifiers, then look up in a hash table. A good example of perfect hashing, because of fixed set of keys. Clang and GCC s front ends are also hand-written. 19 10