LING/C SC/PSYC 438/538. Lecture 20 Sandiway Fong

Similar documents
LING/C SC/PSYC 438/538. Lecture 15 Sandiway Fong

LING/C SC/PSYC 438/538. Lecture 22 Sandiway Fong

Topic B: Backtracking and Lists

LING/C SC/PSYC 438/538. Lecture 18 Sandiway Fong

CMSC 330: Organization of Programming Languages

A Characterization of the Chomsky Hierarchy by String Turing Machines

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Decision Properties for Context-free Languages

LING/C SC/PSYC 438/538. Lecture 21 Sandiway Fong

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages. Context Free Grammars

Homework & Announcements

CMSC 330: Organization of Programming Languages

JNTUWORLD. Code No: R

Compiler Construction

Lec-5-HW-1, TM basics

Part 3. Syntax analysis. Syntax analysis 96

Introduction to Lexing and Parsing

Theory and Compiling COMP360

TAFL 1 (ECS-403) Unit- V. 5.1 Turing Machine. 5.2 TM as computer of Integer Function

QUESTION BANK. Formal Languages and Automata Theory(10CS56)

Turing Machine Languages

CT32 COMPUTER NETWORKS DEC 2015

1. [5 points each] True or False. If the question is currently open, write O or Open.

CMSC 330: Organization of Programming Languages. Context Free Grammars

CS402 - Theory of Automata Glossary By

CSE 105 THEORY OF COMPUTATION

Lecture 8: Context Free Grammars

Structure of a Compiler: Scanner reads a source, character by character, extracting lexemes that are then represented by tokens.

(a) R=01[((10)*+111)*+0]*1 (b) ((01+10)*00)*. [8+8] 4. (a) Find the left most and right most derivations for the word abba in the grammar

Multiple Choice Questions

Theory of Computations Spring 2016 Practice Final Exam Solutions

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

Regular Languages (14 points) Solution: Problem 1 (6 points) Minimize the following automaton M. Show that the resulting DFA is minimal.

Automata Theory TEST 1 Answers Max points: 156 Grade basis: 150 Median grade: 81%

CS311 / MATH352 - AUTOMATA AND COMPLEXITY THEORY

Models of Computation II: Grammars and Pushdown Automata

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

Theory of Computations III. WMU CS-6800 Spring -2014

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Theory Bridge Exam Example Questions Version of June 6, 2008

Lexical and Syntax Analysis. Top-Down Parsing

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010

Proof Techniques Alphabets, Strings, and Languages. Foundations of Computer Science Theory

A language is a subset of the set of all strings over some alphabet. string: a sequence of symbols alphabet: a set of symbols

Actually talking about Turing machines this time

COP 3402 Systems Software Syntax Analysis (Parser)

CS 441G Fall 2018 Exam 1 Matching: LETTER

UNIT I PART A PART B

Theory of Computation, Homework 3 Sample Solution

1 Parsing (25 pts, 5 each)

Regular Expressions & Automata

Context Free Languages and Pushdown Automata

Plan for Today. Regular Expressions: repetition and choice. Syntax and Semantics. Context Free Grammars

CS375: Logic and Theory of Computing

Universal Turing Machine Chomsky Hierarchy Decidability Reducibility Uncomputable Functions Rice s Theorem Decidability Continued

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

CS 314 Principles of Programming Languages

Skyup's Media. PART-B 2) Construct a Mealy machine which is equivalent to the Moore machine given in table.

ECS 120 Lesson 16 Turing Machines, Pt. 2

University of Nevada, Las Vegas Computer Science 456/656 Fall 2016

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Theory of Programming Languages COMP360

Homework. Context Free Languages. Before We Start. Announcements. Plan for today. Languages. Any questions? Recall. 1st half. 2nd half.

Understanding Regular Expressions, Special Characters, and Patterns

The Turing Machine. Unsolvable Problems. Undecidability. The Church-Turing Thesis (1936) Decision Problem. Decision Problems

CHAPTER TWO LANGUAGES. Dr Zalmiyah Zakaria

Chapter 14: Pushdown Automata

Derived from PROgramming in LOGic (1972) Prolog and LISP - two most popular AI languages. Prolog programs based on predicate logic using Horn clauses

Logic Languages. Hwansoo Han

AUBER (Models of Computation, Languages and Automata) EXERCISES

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

Turing Machines. A transducer is a finite state machine (FST) whose output is a string and not just accept or reject.

Advanced Prolog Programming

SCHEME 10 COMPUTER SCIENCE 61A. July 26, Warm Up: Conditional Expressions. 1. What does Scheme print? scm> (if (or #t (/ 1 0)) 1 (/ 1 0))

If you are going to form a group for A2, please do it before tomorrow (Friday) noon GRAMMARS & PARSING. Lecture 8 CS2110 Spring 2014

Context-Free Grammars

Part I Logic programming paradigm

Parsing Expression Grammar and Packrat Parsing

Recursively Enumerable Languages, Turing Machines, and Decidability

Lexical and Syntax Analysis

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Introduction to Computers & Programming

CS164: Midterm I. Fall 2003

CSE au Final Exam Sample Solution

CSE 3302 Programming Languages Lecture 2: Syntax

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Turning Automata Theory into a Hands-on Course

CS 314 Principles of Programming Languages. Lecture 3

S206E Lecture 19, 5/24/2016, Python an overview

This book is licensed under a Creative Commons Attribution 3.0 License

Introduction. Introduction. Introduction. Lexical Analysis. Lexical Analysis 4/2/2019. Chapter 4. Lexical and Syntax Analysis.

Homework. Announcements. Before We Start. Languages. Plan for today. Chomsky Normal Form. Final Exam Dates have been announced

Recursion, Structures, and Lists

Decidable Problems. We examine the problems for which there is an algorithm.

Lexical Analysis. Lecture 2-4

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

Transcription:

LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong

Today's Topics SWI-Prolog installed? We will start to write grammars today Quick Homework 8

SWI Prolog Cheatsheet At the prompt?- 1. halt. 2. listing. listing(name). 3. [filename]. loads filename.pl 4. trace. 5. notrace. 6. debug. 7. nodebug. 8. spy(name). 9. pwd. 10.working_directory(_,Y). switch directories to Y Anytime ^C (then a(bort) or h(elp) for other options) Notation: \+ negation, conjunction ; disjunction :- if Facts: predicate(arguments). Rules: p(args) :- q(args),.., r(args). Data structures: list: [a,..b] empty list: [] head/tail: [h List] Atom: name, number Term: functor(arguments) arguments: comma-separated terms/atoms

Recursion and modes of usage Define a list relation len (of arity 2) as: infinite loop!

Recursion and modes of usage length/2 is a built-in predicate it has better behavior than our simple definition (len/2).

Regular Languages Three formalisms, same expressive power 1. Regular expressions 2. Finite State Automata 3. Regular Grammars We ll look at this case using Prolog

Chomsky Hierarchy Chomsky Hierarchy Natural languages? division of grammar into subclasses partitioned by generative power/capacity Type-0 General rewrite rules Turing-complete, powerful enough to encode any computer program can simulate a Turing machine anything that s computable can be simulated using a Turing machine Type-1 Context-sensitive rules weaker, but still very powerful a n b n c n Type-2 Context-free rules weaker still a n b n Pushdown Automata (PDA) Type-3 Regular grammar rules very restricted Regular Expressions a + b + Finite State Automata (FSA) read write / / read/write head finite state transducer tape

Chomsky Hierarchy Type-1 FSA Regular Expressions Regular Grammars Type-3 Type-2 DCG = Type-0

Prolog Grammar Rule System known as Definite Clause Grammars (DCG) based on type-2 restrictions (context-free grammars) but with extensions (powerful enough to encode the hierarchy all the way up to type-0) Prolog was originally designed (1970s) to also support natural language processing we ll start with the bottom of the hierarchy i.e. the least powerful regular grammars (type-3)

Definite Clause Grammars (DCG) Background a typical formal grammar contains 4 things <N,T,P,S> a set of non-terminal symbols (N) these symbols will be expanded or rewritten by the rules a set of terminal symbols (T) these symbols cannot be expanded production rules (P) of the form LHS RHS In regular and CF grammars, LHS must be a single non-terminal symbol RHS: a sequence of terminal and non-terminal symbols: possibly with restrictions, e.g. for regular grammars a designated start symbol (S) a non-terminal to start the derivation Language set of terminal strings generated by <N,T,P,S> e.g. through a top-down derivation

Definite Clause Grammars (DCG) Background a typical formal grammar contains 4 things <N,T,P,S> a set of non-terminal symbols (N) a set of terminal symbols (T) production rules (P) of the form LHS RHS a designated start symbol (S) Example grammar (regular): S ab B ab B bc B b C bc C b Notes: Start symbol: S Non-terminals: {S,B,C} (uppercase letters) Terminals: {a,b} (lowercase letters)

DefiniteClause Grammars (DCG) Example Formal grammar DCG format S ab s --> [a],b. B ab b --> [a],b. B bc b --> [b],c. B b b --> [b]. C bc c --> [b],c. C b c --> [b]. Notes: Start symbol: S Non-terminals: {S,B,C} (uppercase letters) Terminals: {a,b} (lowercase letters) DCG format: both terminals and non-terminal symbols begin with lowercase letters variables begin with an uppercase letter (or underscore) --> is the rewrite symbol terminals are enclosed in square brackets (list notation) nonterminals don t have square brackets surrounding them the comma (,) represents the concatenation symbol a period (.) is required at the end of every DCG rule

Regular Grammars Regular or Chomsky hierarchy type-3 grammars are a class of formal grammars with a restricted RHS LHS RHS LHS rewrites/expands to RHS all rules contain only a single non-terminal, and (possibly) a single terminal) on the right hand side Canonical Forms: x --> y, [t]. x --> [t]. (left recursive) or x --> [t], y. x --> [t]. (right recursive) Terminology: or left/right linear where x and y are non-terminal symbols and t (enclosed in square brackets) represents a terminal symbol. Note: can t mix these two forms (and still have a regular grammar)! can t have both left and right recursive rules in the same grammar 13

Definite Clause Grammars (DCG) What language does our regular grammar generate? one or more a s followed by one or more b s 1. s --> [a],b. 2. b --> [a],b. 3. b --> [b],c. 4. b --> [b]. 5. c --> [b],c. 6. c --> [b]. by writing the grammar in Prolog, we have a ready-made recognizer program no need to write a separate grammar rule interpreter (in this case) Example queries?- s([a,a,b,b,b],[]). Yes?- s([a,b,a],[]). No Note: Query uses the start symbol s with two arguments: (1) sequence (as a list) to be recognized and (2) the empty list [] Prolog lists: In square brackets, separated by commas e.g. [a] [a,b,c]

Prolog lists revisited Perl lists: Python lists: @list = ("a", "b", "c"); list = ["a", "b", "c"] @list = qw(a b c); @list = (); list = [] Prolog lists: List = [a, b, c] List = [a [b [c []]]] List = [] Mixed notation: [a [b,c]] [a,b [c]] (List is a variable) (a = head, tail = [b [c []]])

Regular Grammars Tree representation Example?- s([a,a,b],[]). true Derivation: s [a], b (rule 1) [a],[a],b (rule 2) [a],[a],[b] (rule 4) 1. s --> [a],b. 2. b --> [a],b. 3. b --> [b],c. 4. b --> [b]. 5. c --> [b],c. 6. c --> [b]. There s a choice of rules for nonterminal b: Prolog tries the first rule Through backtracking It can try other choices our a regular grammar all terminals, so we stop Using trace, we can observe the progress of the derivation

Regular Grammars Tree representation Example?- s([a,a,b,b,b],[]). 1. s --> [a],b. 2. b --> [a],b. 3. b --> [b],c. 4. b --> [b]. 5. c --> [b],c. 6. c --> [b]. Derivation: s [a], b (rule 1) [a],[a],b (rule 2) [a],[a],[b],c (rule 3) [a],[a],[b],[b],c (rule 5) [a],[a],[b],[b],[b] (rule 6)

Prolog Derivations Prolog s computation rule: Try first matching rule in the database (remember others for backtracking) Backtrack if matching rule leads to failure undo and try next matching rule (or if asked for more solutions) For grammars: Top-down left-to-right derivations left-to-right = expand leftmost nonterminal first Leftmost expansion done recursively = depth-first

Prolog Derivations For a top-down derivation, logically, we have: Choice about which rule to use for nonterminals b and c No choice About which nonterminal to expand next 1. s --> [a],b. 2. b --> [a],b. 3. b --> [b],c. 4. b --> [b]. 5. c --> [b],c. 6. c --> [b]. Bottom up derivation for [a,a,b,b] 1. [a],[a],[b],[b] 2. [a],[a],[b],c (rule 6) 3. [a],[a],b (rule 3) 4. [a],b (rule 2) 5. s (rule 1) Prolog doesn t give you bottom-up derivations for free you d have to program it up separately

SWI Prolog Grammar rules are translated when the program is loaded into Prolog rules. Solves the mystery why we have to type two arguments with the nonterminal at the command prompt Recall list notation: [1 [2,3,4]] = [1,2,3,4] 1. s --> [a],b. 2. b --> [a],b. 3. b --> [b],c. 4. b --> [b]. 5. c --> [b],c. 6. c --> [b]. 1. s([a A], B) :- b(a, B). 2. b([a A], B) :- b(a, B). 3. b([b A], B) :- c(a, B). 4. b([b A], A). 5. c([b A], B) :- c(a, B). 6. c([b A], A).

Regex, FSA and a Regular Grammar Textbook Exercise: find a RE for Examples (* denotes string not in the language): *ab *ba bab λ (empty string) bb *baba babab

Regex, FSA and a Regular Grammar Draw a FSA and convert it to a RE: > b 1 b 2 a 3 b 4 ε b [Powerpoint Animation] b* b ab+ = b* b(ab+)+ Ungraded exercise: Could use the technique shown in a previous class: eliminate one state at a time

Regex, FSA and a Regular Grammar Regular Grammar in Prolog. Let s begin with something like: s --> [b], b. (grammar generates bb + ) b --> [b]. b --> [b], b. Add rule: b --> [a], b. (grammar generates b+(a b)*b) Classroom Exercise: How would you fix this grammar so it can handle the specified language?

Regex, FSA and a Regular Grammar Regular Grammar in Prolog notation: s --> []. s --> [b], b. s --> [b], s. (s = start state ) (b = seen a b ) b --> [a], c. (c = expect a b ) c --> [b]. c --> [b], b. c --> [b], c.

Regex, FSA and a Regular Grammar Compare the FSA with our Regular Grammar (RG) s --> []. s --> [b], b. s --> [b], s. (s = start state ) (b = seen a b ) b --> [a], c. (c = expect a b ) c --> [b]. c --> [b], b. c --> [b], c. > b 1 b 2 a 3 b 4 ε b There is a straightforward correspondence between right recursive RGs and FSA

Regex, FSA and a Regular Grammar Informally, we can convert RG to a FSA by treating non-terminals as states and introducing (new) states for rules of the form x --> [a]. > s [Powerpoint animation] in order of the RG rules b b b a b c b b e 1. s --> []. 2. s --> [b], b. 3. s --> [b], s. 4. b --> [a], c. 5. c --> [b]. 6. c --> [b], b. 7. c --> [b], c.

Quick Homework 8 Question1: write a Prolog regular grammar for the following language: Σ = {0, 1} L = strings from Σ* with an odd number of 0's and an even number of 1's assume 0 is an even number Hint: start with a FSA How many grammar rules do you have? submit your program and sample runs Examples: *[] 0 000 00000 *1 *10 *01 *11 011 101 110 (empty string)

Last Time: enumerating the strings of a language Example: Σ={0,1}, enumerate Σ* sigma(0). sigma(1). sigmas([]). sigmas([x L]) :- sigmas(l), sigma(x). Run (hit ; for more answers)?- sigmas(l). L = [] ; L = [0] ; L = [1] ; L = [0, 0] ; L = [1, 0] ; L = [0, 1] ; L = [1, 1] ; L = [0, 0, 0] ; L = [1, 0, 0] ; L = [0, 1, 0] ; L = [1, 1, 0] ; L = [0, 0, 1] ; L = [1, 0, 1] ; L = [0, 1, 1] ; L = [1, 1, 1] ;

Quick Homework 8: Extra Credit Question 2: note the order of the rules matters. What happens when you run the following Prolog query on your grammar? s(string,[]). `(assume here s is the start symbol) Can you reorder your grammar rules so that s(string,[]) returns strings in the language? Show your grammar and run(s).

Quick Homework 8 Due tomorrow night