Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Similar documents
CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

CSc 453 Lexical Analysis (Scanning)

Writing a Lexical Analyzer in Haskell (part II)

CS 314 Principles of Programming Languages

Compiler phases. Non-tokens

Lexical Analysis. Introduction

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Lexical Analysis. Implementation: Finite Automata

Implementation of Lexical Analysis

Lexical Analysis 1 / 52

Monday, August 26, 13. Scanners

CS415 Compilers. Lexical Analysis

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

Wednesday, September 3, 14. Scanners

Lexical Analysis. Lecture 2-4

CS 314 Principles of Programming Languages. Lecture 3

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Chapter 3 Lexical Analysis

Finite Automata and Scanners

Implementation of Lexical Analysis

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Figure 2.1: Role of Lexical Analyzer

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Lexical Analysis. Chapter 2

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Assignment 1 (Lexical Analyzer)

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Compiler course. Chapter 3 Lexical Analysis

Dr. D.M. Akbar Hussain

Assignment 1 (Lexical Analyzer)

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

Alternation. Kleene Closure. Definition of Regular Expressions

Structure of a Compiler: Scanner reads a source, character by character, extracting lexemes that are then represented by tokens.

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Compiler Construction LECTURE # 3

Implementation of Lexical Analysis

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Lexical Analysis. Lecture 3-4

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CSCI312 Principles of Programming Languages!

Lexical Analyzer Scanner

Lexical Analyzer Scanner

Lexical Error Recovery

Structure of Programming Languages Lecture 3

Lexical Analysis - 2

CS164: Midterm I. Fall 2003

6 NFA and Regular Expressions

Compilers CS S-01 Compiler Basics & Lexical Analysis

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Lexical Analysis. Lecture 3. January 10, 2018

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

2. Syntax and Type Analysis

Lexical Error Recovery

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Syntax and Type Analysis

CMSC 350: COMPILER DESIGN

CS 441G Fall 2018 Exam 1 Matching: LETTER

LECTURE 6 Scanning Part 2

CS308 Compiler Principles Lexical Analyzer Li Jiang

Administrativia. Extra credit for bugs in project assignments. Building a Scanner. CS164, Fall Recall: The Structure of a Compiler

Week 2: Syntax Specification, Grammars

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 5

KEY. A 1. The action of a grammar when a derivation can be found for a sentence. Y 2. program written in a High Level Language

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Front End: Lexical Analysis. The Structure of a Compiler

Compilers CS S-01 Compiler Basics & Lexical Analysis

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

G52LAC Languages and Computation Lecture 6

WARNING for Autumn 2004:

Introduction to Lexical Analysis

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CS 310: State Transition Diagrams

Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou

Chapter 2 :: Programming Language Syntax

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Compiler Construction

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CS S-01 Compiler Basics & Lexical Analysis 1

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

CSE 401/M501 Compilers

Zhizheng Zhang. Southeast University

Compiler Construction

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CSE302: Compiler Design

Formal Languages and Compilers Lecture VI: Lexical Analysis

Programming Languages 2nd edition Tucker and Noonan"

CSE 105 THEORY OF COMPUTATION

Midterm I (Solutions) CS164, Spring 2002

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

A Scanner should create a token stream from the source code. Here are 4 ways to do this:

Transcription:

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday 1

Finite-state machines CS 536

Last time! A compiler is a recognizer of language S (Source) a translator from S to T (Target) a program in language H (Host) For example, gcc: S is C, T is x86, H is C 3

Last time! Why do we need a compiler? Processors can execute only binaries (machine-code/assembly programs) Writing assembly programs will make you want to reconsider your life choices Write programs in a nice(ish) high-level language like Java; compile to binaries

Last time! front end back end 5

The scanner! Translates sequence of chars into sequence of tokens Each time scanner is called it should: find longest sequence of chars corresponding to a token return that token 6

Special linkage between scanner and parser in most compilers! Source Program lexical analyzer (scanner) syntax analyzer (parser) Sequence of characters Sequence of tokens syntax analyzer (parser) next token, please lexical analyzer (scanner) a < = p source code Conceptual organization

Scanner generator! Generates a scanner!!! Needs one regular expression for each token Needs regular expressions for things to ignore comments, whitespace, etc. To understand how it works, we need FSMs finite state machines 8

FSMs: Finite State Machines! Aka finite automata Input: string (seq of chars) Output: accept / reject i.e., input is legal in language 9

FSMs! Represent regular languages Good enough for tokens in PLs 10

Example 1! single line comments with // 11

Example 2! What language does this accept? Can you find an equivalent, but smaller, FSM? 12

How an FSM works! curr_state = start_state! let in_ch = current input char! repeat! if there is edge out of curr_state with label in_ch into next_state! cur_state = next_state! in_ch = next char of input! o/w stuck // error condition! until stuck or input string is consumed! string is accepted iff entire string is consumed and final_states.contains(cur_state)! 13

FSMs, formally! finite set of states the alphabet (characters) start state final states transition function 14

FSMs, formally! FSM accepts string The language of FSM M is the set of all words it accepts, denoted 15

FSM example, formally! a b c s0 s1! s1 s0! anything else, machine is stuck 16

Coding an FSM! curr_state = start_state! done = false! while (!done)! ch = nextchar()! next = transition[curr_state][ch]! if (next == error ch == EOF)! done = true! else! curr_state = next! return final_states.contains(curr_state) &&! next!=error! 17

FSM types: DFA & NFA! Deterministic no state has > 1 outgoing edge with same label Nondeterministic states may have multiple outgoing edges with same label edges may be labelled with special symbol ɛ (empty string) ɛ-transitions can happen without reading input 18

NFA example! Equivalent DFA 19

Why NFA?! Much more compact What does this accept? An equivalent DFA needs 2^5 states 20

Extra example! Hex literals must start with 0x or 0X followed by at least one hex digit (0-9,a-f,A-F) can optionally have long specifier (l,l) at the end 21

Extra example! A C/C++ identifier is a sequence of one or more letters, digits, or underscores. It cannot start with a digit. 22

Extra Example - Part 1! A C/C++ identifier is a sequence of one or more letters, digits, or underscores. It cannot start with a digit. digit, letter, _ 1 _, letter 2 23

Extra example! A C/C++ identifier is a sequence of one or more letters, digits, or underscores. It cannot start with a digit. What if you wanted to add the restriction that it can't end with an underscore? 24

Extra Example - Part 2! What if you wanted to add the restriction that it can't end with an underscore? digit, letter, _ 1 _, letter 2 digit, letter 3 letter 25

Recap! The scanner reads stream of characters and finds tokens Tokens are defined using regular expressions, which are finite-state machines Finite-state machines can be non-deterministic Next time: understand connection between deterministic and non-deterministic FSMs 26

automatatutor.com Course ID: 176CS536S Password:2GXEJLMR