Non-deterministic Finite Automata (NFA)

Similar documents
Lecture 4: Syntax Specification

Languages and Compilers

Week 2: Syntax Specification, Grammars

CS 314 Principles of Programming Languages. Lecture 3

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

CMSC 330: Organization of Programming Languages. Context Free Grammars

CSE 3302 Programming Languages Lecture 2: Syntax

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Lexical Analysis. Introduction

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Introduction to Lexical Analysis

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CS 314 Principles of Programming Languages

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

CSE 401 Midterm Exam 11/5/10

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

LANGUAGE PROCESSORS. Presented By: Prof. S.J. Soni, SPCE Visnagar.

Formal Languages and Compilers Lecture VI: Lexical Analysis

Introduction to Lexical Analysis

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

Part 5 Program Analysis Principles and Techniques

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

announcements CSE 311: Foundations of Computing review: regular expressions review: languages---sets of strings

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

Lexical Analysis. Chapter 2

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

Lexical Analysis. Lecture 3. January 10, 2018

Lexical Analysis. Lecture 2-4

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

CS415 Compilers. Lexical Analysis

Formal Languages. Formal Languages

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Related Course Objec6ves

Wednesday, August 31, Parsers

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Lexical Analysis. Lecture 3-4

Building Compilers with Phoenix

Optimizing Finite Automata

CMSC 350: COMPILER DESIGN

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

Lexical Analysis - 2

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Ambiguous Grammars and Compactification

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

Compiler Construction

Dr. D.M. Akbar Hussain

Midterm I - Solution CS164, Spring 2014

Lexical Analysis 1 / 52

Context Free Grammars. CS154 Chris Pollett Mar 1, 2006.

1. Lexical Analysis Phase

2. Lexical Analysis! Prof. O. Nierstrasz!

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

CSCI312 Principles of Programming Languages!

Converting a DFA to a Regular Expression JP

CMSC 330: Organization of Programming Languages

Implementation of Lexical Analysis

Dr. D.M. Akbar Hussain

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Question Bank. 10CS63:Compiler Design

CS 403: Scanning and Parsing

CMPT 755 Compilers. Anoop Sarkar.

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Theory and Compiling COMP360

Context-Free Grammars

Lexical Analyzer Scanner

CMSC 330: Organization of Programming Languages

Lexical Analyzer Scanner

CSE 431S Scanning. Washington University Spring 2013

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Chapter Seven: Regular Expressions

DVA337 HT17 - LECTURE 4. Languages and regular expressions

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Principles of Programming Languages COMP251: Syntax and Grammars

Zhizheng Zhang. Southeast University

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Structure of Programming Languages Lecture 3

If you are going to form a group for A2, please do it before tomorrow (Friday) noon GRAMMARS & PARSING. Lecture 8 CS2110 Spring 2014

Compiler Construction

CMSC 330: Organization of Programming Languages. Context Free Grammars

CSE 105 THEORY OF COMPUTATION

Implementation of Lexical Analysis

Grammars & Parsing. Lecture 12 CS 2112 Fall 2018

CSE302: Compiler Design

COP 3402 Systems Software Syntax Analysis (Parser)

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

KEY. A 1. The action of a grammar when a derivation can be found for a sentence. Y 2. program written in a High Level Language

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Transcription:

Non-deterministic Finite Automata (NFA) CAN have transitions on the same input to different states Can include a ε or λ transition (i.e. move to new state without reading input) Often easier to design than equivalent DFA, but more complex to evaluate input Can always create a DFA from an NFA (and vice versa) Algorithms exist to handle the conversion CS 230 - Spring 2018 4-1

NFA Picture Bit harder to evaluate if any given input string is part of the language, for example input 00 is accepted since it can end in an accepting state, but depending on the path you follow it could end in any state.. CS 230 - Spring 2018 4-2

Regular Expression Sequence of regular and meta characters that describes a regular language. Alphabet: (Eg. {a-z}, {0,1}, {a-z0-9}, etc.) empty set: Ø empty string: ε literal character: a in CS 230 - Spring 2018 4-3

Operations Alternation R S = R U S (union of R and S) Concatenation RS = { αβ : α in R and β in S} Kleene star R* = smallest superset of R containing ε and closed under concatenation CS 230 - Spring 2018 4-4

Examples a* = { ε, a, aa, aaa,... } b a* = { b, ε, a, aa, aaa,... } (0 1)* = finite binary, plus empty string (h c)at = { hat, cat } (a b)(c d) = { ac, ad, bc, bd } while = { while } CS 230 - Spring 2018 4-5

Syntax / Extensions square brackets (with ranges) match one of the given letters [a-z] matches all lowercase characters plus sign: like star but excluding ε [0-9]+ matches integer numbers [^] - Matches a single symbol NOT inside the brackets. [^0-9] matches any non numeric character dot matches any single letter.at matches hat, cat, fat, mat, bat, 7at, Aat, etc. CS 230 - Spring 2018 4-6

Example Q: What is the regex to search for all occurrences of the following name in a text with different possible spellings Georg Friedrich Händel: Händel Haendel Handel Hendel "[Hh](ae a e ä)ndel" CS 230 - Spring 2018 4-7

Syntax / Extensions? - matches the previous symbol or pattern 0 or 1 times. escape character \ To use syntactical symbols as constants Eg. [0-9]+\.[0-9]+ matches positive fractional numbers Q: Write a regex that matches real numbers in the decimal number format (Eg. -81.998) or in the scientific notation, so a decimal number followed by E, then a natural number (-5.9E-2) CS 230 - Spring 2018 4-8

Cases: x.y, x.yez, x.ye-z, -x.y, -x.yez, x.ye-z -?[0-9]+\.[0-9]+(E-?[0-9]+)? CS 230 - Spring 2018 4-9

Scanner: Resolving Conflicts Splitting the input stream: How to tokenize if17? One token, variable name if17 Variable name if1" followed by number 7" Keyword if followed by number 17 keyword if followed by number 1 then 7 Var name i followed by var name f17 etc. To solve conflicting rules, the rule that is written first takes priority. (Eg. is if a variable or keyword) To solve problems where an input could be broken up multiple times by the same rule we take the longest match (17 is 17 not 1 and 7) CS 230 - Spring 2018 4-10

Scanner If the input does not match any of our rules then it is not part of our language, in the context of compiling: The code will not compile The user is alerted with a compilation error message. After the scanner's execution we are left with a sequence of valid tokens. Can be thought of as having a sequence of correctly spelled English words. 4-11

Parsing After scanning, step 2 of compilation is Parsing In Parsing, we analyze the sequence of tokens provided by the scanner, and verify that they follow the rules of our language (In English does the series of correctly spelled words follow the rules of grammar?) So we need some way to state the rules of our grammar. We need a way to specify what our language is! CS 230 - Spring 2018 4-12

Language Specification Simple for humans Unlike human language, must be unambiguous not dependent on context. Easy to build parsing tools We want a formal language defined by what we call a Context Free Grammar (CFG) CS 230 - Spring 2018 4-13

Example Simple Sentence (R1) <sentence> <subj phrase> <verb phrase> (R2) <subj phrase> <noun phrase> (R3) <verb phrase> <verb> <noun phrase> (R4) <noun phrase> <article> <noun> (R5) <verb> has (R6) <article> a (R7) <article> the (R8) <noun> man (R9) <noun> dog CS 230 - Spring 2018 4-14

CFG Specification Components Terminal Symbols (Tokens): actual symbols that appear in our language (word) Non-terminal Symbols: Abstract components of our language. Think of these as variable names These do not literally appear in input One nonterminal is chosen as our language s Start symbol. All valid sentences of our language start with it We donate non-terminals with < > Production Rule: Rules of expansion of a nonterminal symbols into zero or more terminals and non-terminals More than one rule per non-terminal possible <NT> -> <Expansion1> <Expansion2> CS 230 - Spring 2018 4-15

Derivation Deriving Input strings that are part of the language defined by our context free grammar is done by: application of rules to generate valid input string beginning with start symbol repeatedly replace one variable by one rule continue until there are no more variables resulting sequence of terminals is a syntactically correct input string formal definition of language: collection of all valid sequences that can be derived from start symbol CS 230 - Spring 2018 4-16