Formal Languages. Formal Languages

Similar documents
Programming Language Definition. Regular Expressions

Syntax. In Text: Chapter 3

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Part 5 Program Analysis Principles and Techniques

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Syntax. A. Bellaachia Page: 1

CS 314 Principles of Programming Languages. Lecture 3

Chapter 3. Describing Syntax and Semantics ISBN

Describing Syntax and Semantics

Lecture 4: Syntax Specification

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

COP 3402 Systems Software Syntax Analysis (Parser)

CPS 506 Comparative Programming Languages. Syntax Specification

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

1.0 Languages, Expressions, Automata

CS 314 Principles of Programming Languages

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

Theory and Compiling COMP360

Programming Language Syntax and Analysis

3. Context-free grammars & parsing

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Chapter 3. Describing Syntax and Semantics

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Introduction to Lexing and Parsing

Chapter 3. Describing Syntax and Semantics

Chapter 3. Describing Syntax and Semantics ISBN

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

This book is licensed under a Creative Commons Attribution 3.0 License

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Dr. D.M. Akbar Hussain

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Chapter 3. Describing Syntax and Semantics

CSCE 314 Programming Languages

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CMSC 330: Organization of Programming Languages

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Context-Free Languages and Parse Trees

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Habanero Extreme Scale Software Research Project

Principles of Programming Languages COMP251: Syntax and Grammars

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Languages and Compilers

Principles of Programming Languages COMP251: Syntax and Grammars

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

Non-deterministic Finite Automata (NFA)

CMSC 330: Organization of Programming Languages

3. DESCRIBING SYNTAX AND SEMANTICS

Week 2: Syntax Specification, Grammars

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

CMSC 330: Organization of Programming Languages

Building Compilers with Phoenix

CSE 401 Midterm Exam 11/5/10

CMSC 330: Organization of Programming Languages. Context Free Grammars

Lexical Scanning COMP360

Theory of Programming Languages COMP360

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

announcements CSE 311: Foundations of Computing review: regular expressions review: languages---sets of strings

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Chapter 3. Syntax - the form or structure of the expressions, statements, and program units

CS 314 Principles of Programming Languages

ECS 120 Lesson 7 Regular Expressions, Pt. 1

2.2 Syntax Definition

Lexical Analysis (ASU Ch 3, Fig 3.1)

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Parsing a primer. Ralf Lämmel Software Languages Team University of Koblenz-Landau

Syntax and Semantics

CMSC 330: Organization of Programming Languages. Context Free Grammars

Goals for course What is a compiler and briefly how does it work? Review everything you should know from 330 and before

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Homework & Announcements

Lexical Analysis. Introduction

Lexical and Syntax Analysis

CSE 401/M501 Compilers

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Chapter 3. Topics. Languages. Formal Definition of Languages. BNF and Context-Free Grammars. Grammar 2/4/2019

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Principles of Programming Languages Topic: Formal Languages II

Chapter 2 - Programming Language Syntax. September 20, 2017

CSE 3302 Programming Languages Lecture 2: Syntax

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

CMSC 330: Organization of Programming Languages. Context Free Grammars

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

UNIT I Programming Language Syntax and semantics. Kainjan Sanghavi

Context-free grammars (CFG s)

Context-Free Grammars

CS402 - Theory of Automata Glossary By

Compiler Construction LECTURE # 3

Compiler Design Overview. Compiler Design 1

Regular Languages and Regular Expressions

Grammar Rules in Prolog!!

Grammar Rules in Prolog

Formal Languages and Compilers Lecture VI: Lexical Analysis

Transcription:

Regular expressions Formal Languages Finite state automata Deterministic Non-deterministic Review of BNF Introduction to Grammars Regular grammars Formal Languages, CS34 Fall2 BGRyder Formal Languages A way to describe difficulty of computation problems formulated as language recognition problems A mechanism to aid description of programming language constructs Regular expressions - PL tokens (e.g., keywords) Finite state automata (FSAs) Formal Languages, CS34 Fall2 BGRyder 2

Regular Expressions Formalism for describing simple PL constructs reserved words identifiers numbers Simplest sort of structure Recognized by a finite state automaton Defined recursively Formal Languages, CS34 Fall2 BGRyder 3 Formally PL is a set of strings (called sentences) over some finite alphabet of symbols, called terminals Not necessarily a finite set Rules describe how to combine the terminals into well-formed sentences in the PL PLs categorized by complexity of these rules BNF used to describe context-free languages, most PLs fall in this category Formal Languages, CS34 Fall2 BGRyder 4 2

Regular Expressions PL construct RE Notation Language an empty RE { } symbol a a {a} null symbol {} R,S regular exprs R S L R L S a,b terminals a b (alternation) {a,b} R,S regular exprs RS L R L S a,b terminals ab (concatenation) {ab} Formal Languages, CS34 Fall2 BGRyder 5 Regular Expressions PL construct RE Notation Language R,S regular exprs R * { } L R L R L R L R L R L R a a * {,a,aa,aaa, } R,S regular exprs R + L R L R L R L R L R L R... a a + {a,aa,aaa, } Note: a = a = a Precedence is {* +} ----concatenation ---- high to low (all are left associative operators) Formal Languages, CS34 Fall2 BGRyder 6 3

RE Examples 2 {,2} * 2 {2,,,,, } 2 * {, 2, 22, 222, } 2 * + {,,,,,2,22, } ( 2) * {,,2,2,,2,22, } ( ) * Binary numbers that end in Formal Languages, CS34 Fall2 BGRyder 7 RE s for PLs Let letter stand for a b c z and stand for 2 3 4 5 6 7 8 9 letter (letter ) * is identifier + is integer constant *. + is real number Which identifiers are described by letter (letter ) *? ABC C B% X Formal Languages, CS34 Fall2 BGRyder 8 4

Examples Which of the following are legal real numbers described by *. +?.5.5 2 4. 6.3.2 Can see that simple PL constructs can be defined as regular expressions Can you define a number in scientific notation as an RE? Formal Languages, CS34 Fall2 BGRyder 9 Finite State Automaton (FSA) Recognizer of the language generated by a regular expression Described by <set of states, labelled transitions, state, final state(s)> ( ) S <{S,S,S2}, S ---> S, S, {S,S2}> S ---> S2 S S2 Formal Languages, CS34 Fall2 BGRyder 5

FSA FSA accepts or recognizes an input string iff there is a path from its state to a final state such that the labels on the path are the terminals in that string Empty transitions signify illegal moves; can think of FSA going to a sink error state inputs: states: S S S S S2 S --- --- S2 S2 --- --- transition table Formal Languages, CS34 Fall2 BGRyder Examples Binary numbers containing a pair of adjacent s: ( ) * ( ) * FSA S S2 S S3, S S S2 S S S2 S2 S S3 S3 S3 S3 Formal Languages, CS34 Fall2 BGRyder 2 6

A An Equivalent FSA, B C FSA2 FSA and FSA2 recognize the same set of strings of terminals, the same language! Therefore FSAs are NOT UNIQUE. Formal Languages, CS34 Fall2 BGRyder 3 Example Exponent in scientific notation: E (+ -) + E + E +,- S S S2 S3 Formal Languages, CS34 Fall2 BGRyder 4 7

Example Binary numbers which begin and end with a, ( ) * S All binary numbers containing at least one, where all their s precede all their s + + * S S2 S S Formal Languages, CS34 Fall2 BGRyder 5 S2 Jobs for REs/FSAs Recognition Is this string in the language described (recognized) by this RE (FSA)? Description Given an RE (FSA), what language does it generate (recognize)? Codification Given a language, find an RE and FSA corresponding to it Formal Languages, CS34 Fall2 BGRyder 6 8

Recognition Example Given *, which of these strings are described by it?,,,, Which of these strings,,,, is recognized by the following FSA? S S Formal Languages, CS34 Fall2 BGRyder 7 Description Example What language is generated by () + () +? What language is recognized by this FSA? What if we added the transition? S S S2 S3 S4 Formal Languages, CS34 Fall2 BGRyder 8 9

Example Codification Complex constants are parenthesized pairs of two integers Let stand for ( 2 3 4 5 6 7 8 9). Then the RE is ( +, + ) FSA is: ( S S S2, S3 S4 ) S5 Formal Languages, CS34 Fall2 BGRyder 9 Nondeterministic FSAs Allow more than one transition with same label Allow transition e.g., DFSA: NFSA: ( ) * ( ) * A X B, Y Formal Languages, CS34 Fall2 BGRyder 2 C Z,,

NFSAs Recognize a sentence in a language by progressing from initial state to a final state Think of following many threads of computation at same time; one must lead to a final state for a recognition to occur class of languages recognizable by NFSAs is SAME as class of languages recognizable by DFSAs. There are algorithms to build NFSA directly from RE Formal Languages, CS34 Fall2 BGRyder 2 Example + + all binary numbers containing at least one, in which all s proceed all s DFSA: S NFSA: S S S S2 S3 S2 S3 Why doesn t this NFSA work? S2 S2 Formal Languages, CS34 Fall2 BGRyder 22

RE to NFSA Construction Standardized translation for RE expressions into corresponding NFSAs Can then translate resulting NFSA into a corresponding DFSA which recognizes the same language! All can be automated Formal Languages, CS34 Fall2 BGRyder 23 RE to NFSA For a in alphabet, construct: For, construct: For s,t REs, for s t construct: a e.g., N(s) N(t) Formal Languages, CS34 Fall2 BGRyder 24 2

RE to NFSA For s,t REs, for st construct: N(s) N(t) e.g., For RE s, s * construct: N(s) e.g., * Formal Languages, CS34 Fall2 BGRyder 25 Example Build the NFSA for complex numbers using this RE ( +, + ). + Note this is same as Kleene * machine except for bottom transition Formal Languages, CS34 Fall2 BGRyder 26 3

+, Example, +, +, Formal Languages, CS34 Fall2 BGRyder 27 ( +, + ) Example (, ) Q: How can we make this NFSA efficient by converting it into a DFSA? Formal Languages, CS34 Fall2 BGRyder 28 4

(c d ) * dd c,d d NFSA to DFSA d S S S2 Idea: look for sets of states with same transitions. Let one state in the DFSA represent sets of states in the NFSA S on c to {S} S on d to {S,S} c {S,S,S2} d {S,S} on c to {S} c {S,S} on d to {S,S,S2} d {S,S,S2} on c to {S} {S} {S,S} {S,S,S2} on d to {S,S,S2} d c Formal Languages, CS34 Fall2 BGRyder 29 Backus Naur Form (BNF) Metasymbols < > ::= Terminal symbols of the PL e.g., keywords, operators) Nonterminal symbols <while_stmt> ::= while <expr> do <stmt> <identifier> ::= <letter> <identifier> <> <identifier> <letter> Formal Languages, CS34 Fall2 BGRyder 3 5

EBNF Nonterminals begin with capital letters or are shown in a different font { } means repeat the enclosed or more times [ ] means the enclosed is optional ( ) is used for grouping, usually with the alternation symbol If { }, [ ], or ( ) are terminals in the PL being defined, then when they are used as terminals they must be underlined { } terminals, { } metasymbols Formal Languages, CS34 Fall2 BGRyder 3 EBNF Examples Identifier ::= Letter { LetterorDigit } LetterorDigit ::= Letter Digit Expr ::= [ Expr - ] Subexpr IfStmt ::= if LogicExpr then Stmt [else Stmt] CompoundStmt ::= begin Stmt {; Stmt} end WhileStmt ::= while ( LogicExpr ) Stmt {; Stmt} ArrayElement ::= Identifier [ Identifier ] Formal Languages, CS34 Fall2 BGRyder 32 6

Grammar <set of terminals, set of nonterminals, productions (rules), special symbol> terminals are alphabet symbols nonterminals represent PL constructs (e.g., Stmt) productions are rules for forming syntactically correct constructs special symbol tells where to applying the rules Formal Languages, CS34 Fall2 BGRyder 33 Example <letter>::= a b c d e f g h i j k l m n o p q r s t u v w x y z <>::= 2 3 4 5 6 7 8 9 <identifier> ::= <letter> <identifier> <letter> <identifier> <> <assign-stmt> ::= <identifier> = //terminals; //nonterminals are {<letter><><assign_stmt><identifier>} //special symbol is <assign-stmt> Formal Languages, CS34 Fall2 BGRyder 34 7

Form of rules Regular PLs Each rhs is length <= 2 symbols A terminal or nonterminal a nonterminal followed by a terminal All PLs describable by REs can be written as regular grammars e.g., 2 * + N::= X Y X ::= X 2 Y ::= Y Formal Languages, CS34 Fall2 BGRyder 35 8