Speech Recognition CSCI-GA Fall Homework Assignment #1: Automata Operations. Instructor: Eugene Weinstein. Due Date: October 17th

Size: px
Start display at page:

Download "Speech Recognition CSCI-GA Fall Homework Assignment #1: Automata Operations. Instructor: Eugene Weinstein. Due Date: October 17th"

Transcription

1 Speech Recognition CSCI-GA Fall 2013 Homework Assignment #1: Automata Operations Instructor: Eugene Weinstein Due Date: October 17th Note: It is advised, but not required, to use the OpenFST library to complete this assignment. In order to receive full credit, you must give the full sequence of shell commands and give the code of any script/program used to complete each problem. Note that not every single automata operation required to complete this assignment has been covered in class, and you may need to do some research on your own in the relevant documentation. A. Regular expressions and automata. Give epsilon free, deterministic, and minimal finite state acceptors accepting only the strings matching the following regular expressions: a (a ba)* ba* ( a b) *(baab)* ( a b) * baab

2 Producing deterministic and minimal versions of the acceptor (for first regular expression): $ cat > syms.txt a 1 b 2 $ cat > 1a1.txt 0 1 a 1 1 a 1 2 b 2 1 a 1 3 b 3 4 a 4 4 a 4 fstcompile acceptor isymbols=syms.txt 1a1.txt > 1a1.fst cat 1a1.fst fstdeterminize fstminimize > 1a1 opt.fst fstdraw isymbols=syms.txt acceptor portrait 1a1 opt.fst dot Tpng o 1a1 opt.png B. Weighted acceptors and transducers 1. Give a weighted finite state acceptor in the tropical semiring producing strings of length four over the alphabet Σ = { a, b} where a is three times as likely to be produced than b. l n(0.25) = ; l n(0.75) = Give a weighted finite state transducer T which can be used to count the number of and b symbols accepted by a (possibly weighted) finite state acceptor A. State the semiring over which your method gives the desired result, and give the precise sequence of automata operations required (hint: the composition A T will come in handy here). a The counting transducer T is over the log semiring. Note that T (x) = 1 x {a, b } *. Let C = Π o (A T ), where Π o is the projection to the output operation. Then e C(x) is the count of occurrences of the string x in A.

3 3. Randomly generate 100 strings according to the constraints of part 1: the alphabet is Σ = { a, b}, the string is of length four and a is more likely than b. You may do this using the OpenFST tools, or using a script of your own. # In order to generate the strings, we will sample from the acceptor of part 1 above. $ cat fstcompile acceptor isymbols=syms.txt 1b.fst 0 1 a b a b a b a b *ctrl d* # Generate 4 letter strings and put them all into strings.txt, one per line $ for i in {1..100}; do fstrandgen weighted select=log_prob 1b.fst fstprint isymbols=syms.txt awk '{print $3}' tr '\n' ' '; echo; sleep 2; done > strings.txt 4. Apply the transducer T you created in part 2 to the strings from part 3. Does the count of a and b symbols match the distribution of symbols given in part1? # Make a FST archive (far) of the generated strings: $ farcompilestrings symbols=syms.txt strings.txt > strings.far $ farextract strings.far $ echo fstcompile > union.fst # Empty transducer # Union together all string transducers produced $ for file in `ls strings.txt *`; do fstunion union.fst $file > foo.fst; mv foo.fst union.fst ; done # Convert to log semiring and sort on output label for composition $ fstprint union.fst fstcompile arc_type=log fstarcsort sort_type=olabel > union log.fst $ cat fstcompile arc_type=log isymbols=syms.txt osymbols=syms.txt count.fst 0 0 a <eps> 0 0 b <eps> 0 1 a a 0 1 b b 1 1 a <eps> 1 1 b <eps> 1 *ctrl d* $ fstcompose fsts/union log.fst count.fst fstproject project_output fstrmepsilon fstdeterminize fstminimize fstprint acceptor isymbols=syms.txt # output looks like: 0 1 a b The final answer is obtained as e , e The counts add up to 400, and

4 match our expected distribution. C. Camel casing with automata. 1. Download the list of 100 most common English words from 2. From this list, build a camel casing transducer T, i.e., that which, when composed with acceptor A accepting lowercase strings consisting of words in the list, can be used to produce a string acceptor A c, where the strings are camel cased. For example, if the input acceptor accepts the strings gowithme and dosomework, the output acceptor should accept the strings GoWithMe and DoSomeWork. Do not assume that the words are space separated. $ head words.txt the be to of and a in that have I # See appendix below for code listing of words.py $ cat words.txt./words.py fstcompile isymbols=syms.txt osymbols=syms.txt fstclosure > camelcase.fst The desired camelcased acceptor is found as A c = Π o (A T ) where Π o is projection on the output labels. 3. Repeat the process for just the first five words in the list to construct transducer T. Show this transducer.

5 4. Demonstrate that your transducer T works by showing the input acceptor A and the resulting transducer A T for both of the examples given in sub question 2 above. Be sure to also verify that no outputs are produced if the input does not consist solely of words in the list of 100 words. $ cat fstcompile isymbols=syms.txt acceptor > dosomework.fst 0 1 d 1 2 o 2 3 s 3 4 o 4 5 m 5 6 e 6 7 w 7 8 o 8 9 r 9 10 k 10 *ctrl d* $ fstcompose dosomework.fst camelcase.fst fstrmepsilon > dosomework camel.fst $ cat dosomework camel.fst fstdraw isymbols=syms.txt osymbols=syms.txt portrait dot Tpng odosomework camel.png 5. How many states and transitions does your transducer T have? Is it epsilon free and/or deterministic? If not, what ideas do you have for determinizing it (you do not have to implement them)? $ fstinfo camelcase.fst # of states 340 # of arcs 439

6 input deterministic output deterministic input/output epsilons input epsilons output epsilons n n y y y No, the transducer is not deterministic, and it has epsilons. The issue with determinizing such a transducer is that it s not functional, i.e., that is it does not map each input sequence to a unique output sequence. It s possible to determinize such a transducer by introducing special symbols at the end of each output string corresponding to more than one input path. With such symbols, the transducer becomes functional, and can then be determinized. The special symbols can then be replaced with epsilons. See the discussion about p subsequential transducers in Mehryar Mohri. Finite State Transducers in Language and Speech Processing. Computational Linguistics, 23:2, Appendix: words.py #!/usr/bin/python import sys state_num = 1 last_state = 0 for line in sys.stdin: for i, c in enumerate(line.rstrip()): input_sym = c.lower() if i == 0: last_state = 0 output_sym = c.upper() else: output_sym = input_sym print "%d %d %s %s" % ( last_state, state_num, input_sym, output_sym) last_state = state_num state_num += 1 print "%d" % last_state

OpenFst: a General and Efficient Weighted Finite-State Transducer Library. Part I. Library Design and Use

OpenFst: a General and Efficient Weighted Finite-State Transducer Library. Part I. Library Design and Use OpenFst: a General and Efficient Weighted Finite-State Transducer Library Part I. Library Design and Use Outline. Definitions Semirings Weighted Automata and Transducers 2. Library Overview FST Construction

More information

Table of Contents OpenFst Library...1 OpenFst Authors...2 OpenFst Background Material...3 OpenFst Quick Tour...4 ArcSort...14 Closure...

Table of Contents OpenFst Library...1 OpenFst Authors...2 OpenFst Background Material...3 OpenFst Quick Tour...4 ArcSort...14 Closure... OpenFst Library OpenFst Library Table of Contents OpenFst Library...1 OpenFst Authors...2 Principal Contacts:...2 Contributors:...2 OpenFst Background Material...3 OpenFst Quick Tour...4 Finding and Using

More information

WFST: Weighted Finite State Transducer. September 12, 2017 Prof. Marie Meteer

WFST: Weighted Finite State Transducer. September 12, 2017 Prof. Marie Meteer + WFST: Weighted Finite State Transducer September 12, 217 Prof. Marie Meteer + FSAs: A recurring structure in speech 2 Phonetic HMM Viterbi trellis Language model Pronunciation modeling + The language

More information

Report for each of the weighted automata obtained ˆ the number of states; ˆ the number of ɛ-transitions;

Report for each of the weighted automata obtained ˆ the number of states; ˆ the number of ɛ-transitions; Mehryar Mohri Speech Recognition Courant Institute of Mathematical Sciences Homework assignment 3 (Solution) Part 2, 3 written by David Alvarez 1. For this question, it is recommended that you use the

More information

Weighted Finite-State Transducers in Computational Biology

Weighted Finite-State Transducers in Computational Biology Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). This Tutorial Weighted

More information

CSCI 2132 Software Development. Lecture 7: Wildcards and Regular Expressions

CSCI 2132 Software Development. Lecture 7: Wildcards and Regular Expressions CSCI 2132 Software Development Lecture 7: Wildcards and Regular Expressions Instructor: Vlado Keselj Faculty of Computer Science Dalhousie University 20-Sep-2017 (7) CSCI 2132 1 Previous Lecture Pipes

More information

Finite-State Transducers in Language and Speech Processing

Finite-State Transducers in Language and Speech Processing Finite-State Transducers in Language and Speech Processing Mehryar Mohri AT&T Labs-Research Finite-state machines have been used in various domains of natural language processing. We consider here the

More information

A General Weighted Grammar Library

A General Weighted Grammar Library A General Weighted Grammar Library Cyril Allauzen, Mehryar Mohri, and Brian Roark AT&T Labs Research, Shannon Laboratory 80 Park Avenue, Florham Park, NJ 0792-097 {allauzen, mohri, roark}@research.att.com

More information

Speech Recognition Lecture 12: Lattice Algorithms. Cyril Allauzen Google, NYU Courant Institute Slide Credit: Mehryar Mohri

Speech Recognition Lecture 12: Lattice Algorithms. Cyril Allauzen Google, NYU Courant Institute Slide Credit: Mehryar Mohri Speech Recognition Lecture 12: Lattice Algorithms. Cyril Allauzen Google, NYU Courant Institute allauzen@cs.nyu.edu Slide Credit: Mehryar Mohri This Lecture Speech recognition evaluation N-best strings

More information

Lexicographic Semirings for Exact Automata Encoding of Sequence Models

Lexicographic Semirings for Exact Automata Encoding of Sequence Models Lexicographic Semirings for Exact Automata Encoding of Sequence Models Brian Roark, Richard Sproat, and Izhak Shafran {roark,rws,zak}@cslu.ogi.edu Abstract In this paper we introduce a novel use of the

More information

Weighted Finite State Transducers in Automatic Speech Recognition

Weighted Finite State Transducers in Automatic Speech Recognition Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 10.04.2013 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri and M. Riley

More information

CSE302: Compiler Design

CSE302: Compiler Design CSE302: Compiler Design Instructor: Dr. Liang Cheng Department of Computer Science and Engineering P.C. Rossin College of Engineering & Applied Science Lehigh University February 01, 2007 Outline Recap

More information

A General Weighted Grammar Library

A General Weighted Grammar Library A General Weighted Grammar Library Cyril Allauzen, Mehryar Mohri 2, and Brian Roark 3 AT&T Labs Research 80 Park Avenue, Florham Park, NJ 07932-097 allauzen@research.att.com 2 Department of Computer Science

More information

Lexical Analysis. Prof. James L. Frankel Harvard University

Lexical Analysis. Prof. James L. Frankel Harvard University Lexical Analysis Prof. James L. Frankel Harvard University Version of 5:37 PM 30-Jan-2018 Copyright 2018, 2016, 2015 James L. Frankel. All rights reserved. Regular Expression Notation We will develop a

More information

Weighted Finite State Transducers in Automatic Speech Recognition

Weighted Finite State Transducers in Automatic Speech Recognition Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 15.04.2015 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri, M. Riley and

More information

Ling/CSE 472: Introduction to Computational Linguistics. 4/6/15: Morphology & FST 2

Ling/CSE 472: Introduction to Computational Linguistics. 4/6/15: Morphology & FST 2 Ling/CSE 472: Introduction to Computational Linguistics 4/6/15: Morphology & FST 2 Overview Review: FSAs & FSTs XFST xfst demo Examples of FSTs for spelling change rules Reading questions Review: FSAs

More information

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1} CSE 5 Homework 2 Due: Monday October 6, 27 Instructions Upload a single file to Gradescope for each group. should be on each page of the submission. All group members names and PIDs Your assignments in

More information

Homework #1: CMPT-825 Reading: fsmtools/fsm/ Anoop Sarkar

Homework #1: CMPT-825 Reading:   fsmtools/fsm/ Anoop Sarkar Homework #: CMPT-825 Reading: http://www.research.att.com/ fsmtools/fsm/ Anoop Sarkar anoop@cs.sfu.ca () Machine (Back) Transliteration Languages have different sound inventories. When translating from

More information

Course Project 2 Regular Expressions

Course Project 2 Regular Expressions Course Project 2 Regular Expressions CSE 30151 Spring 2017 Version of February 16, 2017 In this project, you ll write a regular expression matcher similar to grep, called mere (for match and echo using

More information

Wildcards and Regular Expressions

Wildcards and Regular Expressions CSCI 2132: Software Development Wildcards and Regular Expressions Norbert Zeh Faculty of Computer Science Dalhousie University Winter 2019 Searching Problem: Find all files whose names match a certain

More information

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed HKN CS 374 Midterm 1 Review Tim Klem Noah Mathes Mahir Morshed Midterm topics It s all about recognizing sets of strings! 1. String Induction 2. Regular languages a. DFA b. NFA c. Regular expressions 3.

More information

Hierarchical Phrase-Based Translation with Weighted Finite State Transducers

Hierarchical Phrase-Based Translation with Weighted Finite State Transducers Hierarchical Phrase-Based Translation with Weighted Finite State Transducers Gonzalo Iglesias Adrià de Gispert Eduardo R. Banga William Byrne University of Vigo. Dept. of Signal Processing and Communications.

More information

Formal languages and computation models

Formal languages and computation models Formal languages and computation models Guy Perrier Bibliography John E. Hopcroft, Rajeev Motwani, Jeffrey D. Ullman - Introduction to Automata Theory, Languages, and Computation - Addison Wesley, 2006.

More information

CS/ECE 374 Fall Homework 1. Due Tuesday, September 6, 2016 at 8pm

CS/ECE 374 Fall Homework 1. Due Tuesday, September 6, 2016 at 8pm CSECE 374 Fall 2016 Homework 1 Due Tuesday, September 6, 2016 at 8pm Starting with this homework, groups of up to three people can submit joint solutions. Each problem should be submitted by exactly one

More information

Ping-pong decoding Combining forward and backward search

Ping-pong decoding Combining forward and backward search Combining forward and backward search Research Internship 09/ - /0/0 Mirko Hannemann Microsoft Research, Speech Technology (Redmond) Supervisor: Daniel Povey /0/0 Mirko Hannemann / Beam Search Search Errors

More information

5/20/2007. Touring Essential Programs

5/20/2007. Touring Essential Programs Touring Essential Programs Employing fundamental utilities. Managing input and output. Using special characters in the command-line. Managing user environment. Surveying elements of a functioning system.

More information

Implementation of Lexical Analysis

Implementation of Lexical Analysis Implementation of Lexical Analysis Lecture 4 (Modified by Professor Vijay Ganesh) Tips on Building Large Systems KISS (Keep It Simple, Stupid!) Don t optimize prematurely Design systems that can be tested

More information

University of Windsor : System Programming Winter Midterm 01-1h20mn. Instructor: Dr. A. Habed

University of Windsor : System Programming Winter Midterm 01-1h20mn. Instructor: Dr. A. Habed University of Windsor 0360-256: System Programming Winter 2007 - Midterm 01-1h20mn. Instructor: Dr. A. Habed Solution Last name: First name: Student #: NONE NONE NONE Read this first Make sure your paper

More information

Creating LRs with FSTs Part II Compiling automata and transducers

Creating LRs with FSTs Part II Compiling automata and transducers Creating LRs with FSTs Part II Compiling automata and transducers Mans Hulden (University of Helsinki) Iñaki Alegria (University of The Basque Country) Recap: finite automata one or more as : {a,aa,...}:

More information

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions CSE45 Translation of Programming Languages Lecture 2: Automata and Regular Expressions Finite Automata Regular Expression = Specification Finite Automata = Implementation A finite automaton consists of:

More information

Unleashing the Shell Hands-On UNIX System Administration DeCal Week 6 28 February 2011

Unleashing the Shell Hands-On UNIX System Administration DeCal Week 6 28 February 2011 Unleashing the Shell Hands-On UNIX System Administration DeCal Week 6 28 February 2011 Last time Compiling software and the three-step procedure (./configure && make && make install). Dependency hell and

More information

Non-deterministic Finite Automata (NFA)

Non-deterministic Finite Automata (NFA) Non-deterministic Finite Automata (NFA) CAN have transitions on the same input to different states Can include a ε or λ transition (i.e. move to new state without reading input) Often easier to design

More information

CSE Theory of Computing Fall 2017 Project 2-Finite Automata

CSE Theory of Computing Fall 2017 Project 2-Finite Automata CSE 30151 Theory of Computing Fall 2017 Project 2-Finite Automata Version 1: Sept. 27, 2017 1 Overview The goal of this project is to have each student understand at a deep level the functioning of a finite

More information

Assignment 4 CSE 517: Natural Language Processing

Assignment 4 CSE 517: Natural Language Processing Assignment 4 CSE 517: Natural Language Processing University of Washington Winter 2016 Due: March 2, 2016, 1:30 pm 1 HMMs and PCFGs Here s the definition of a PCFG given in class on 2/17: A finite set

More information

Regular Expressions & Automata

Regular Expressions & Automata Regular Expressions & Automata CMSC 132 Department of Computer Science University of Maryland, College Park Regular expressions Notation Patterns Java support Automata Languages Finite State Machines Turing

More information

You will likely want to log into your assigned x-node from yesterday. Part 1: Shake-and-bake language generation

You will likely want to log into your assigned x-node from yesterday. Part 1: Shake-and-bake language generation FSM Tutorial I assume that you re using bash as your shell; if not, then type bash before you start (you can use csh-derivatives if you want, but your mileage may vary). You will likely want to log into

More information

CSE Theory of Computing Spring 2018 Project 2-Finite Automata

CSE Theory of Computing Spring 2018 Project 2-Finite Automata CSE 30151 Theory of Computing Spring 2018 Project 2-Finite Automata Version 2 Contents 1 Overview 2 1.1 Updates................................................ 2 2 Valid Options 2 2.1 Project Options............................................

More information

Midterm 1 1 /8 2 /9 3 /9 4 /12 5 /10. Faculty of Computer Science. Term: Fall 2018 (Sep4-Dec4) Student ID Information. Grade Table Question Score

Midterm 1 1 /8 2 /9 3 /9 4 /12 5 /10. Faculty of Computer Science. Term: Fall 2018 (Sep4-Dec4) Student ID Information. Grade Table Question Score Faculty of Computer Science Page 1 of 8 Midterm 1 Term: Fall 2018 (Sep4-Dec4) Student ID Information Last name: First name: Student ID #: CS.Dal.Ca userid: Course ID: CSCI 2132 Course Title: Instructor:

More information

Front End: Lexical Analysis. The Structure of a Compiler

Front End: Lexical Analysis. The Structure of a Compiler Front End: Lexical Analysis The Structure of a Compiler Constructing a Lexical Analyser By hand: Identify lexemes in input and return tokens Automatically: Lexical-Analyser generator We will learn about

More information

Learning with Weighted Transducers

Learning with Weighted Transducers Learning with Weighted Transducers Corinna CORTES a and Mehryar MOHRI b,1 a Google Research, 76 Ninth Avenue, New York, NY 10011 b Courant Institute of Mathematical Sciences and Google Research, 251 Mercer

More information

Weighted Finite-State Transducers in Computational Biology

Weighted Finite-State Transducers in Computational Biology Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial

More information

Lexical Analysis. Introduction

Lexical Analysis. Introduction Lexical Analysis Introduction Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California have explicit permission to make copies

More information

Lexical Analysis. Implementation: Finite Automata

Lexical Analysis. Implementation: Finite Automata Lexical Analysis Implementation: Finite Automata Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs)

More information

Applications of Lexicographic Semirings to Problems in Speech and Language Processing

Applications of Lexicographic Semirings to Problems in Speech and Language Processing Applications of Lexicographic Semirings to Problems in Speech and Language Processing Richard Sproat Google, Inc. Izhak Shafran Oregon Health & Science University Mahsa Yarmohammadi Oregon Health & Science

More information

Implementation of Lexical Analysis

Implementation of Lexical Analysis Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation

More information

CS356: Discussion #1 Development Environment. Marco Paolieri

CS356: Discussion #1 Development Environment. Marco Paolieri CS356: Discussion #1 Development Environment Marco Paolieri (paolieri@usc.edu) Contact Information Marco Paolieri PhD at the University of Florence, Italy (2015) Postdoc at USC since 2016 Email: paolieri@usc.edu

More information

Implementation of Lexical Analysis

Implementation of Lexical Analysis Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation

More information

Lecture 3 Regular Expressions and Automata

Lecture 3 Regular Expressions and Automata Lecture 3 Regular Expressions and Automata CS 6320 Fall 2018 @ Dan I. Moldovan, Human Language Technology Research Institute, The University of Texas at Dallas 78 Outline Regular Expressions Finite State

More information

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages Regular Languages MACM 3 Formal Languages and Automata Anoop Sarkar http://www.cs.sfu.ca/~anoop The set of regular languages: each element is a regular language Each regular language is an example of a

More information

Lab 2: Training monophone models

Lab 2: Training monophone models v. 1.1 Lab 2: Training monophone models University of Edinburgh January 29, 2018 Last time we begun to get familiar with some of Kaldi s tools and set up a data directory for TIMIT. This time we will train

More information

Theory of Computations Spring 2016 Practice Final Exam Solutions

Theory of Computations Spring 2016 Practice Final Exam Solutions 1 of 8 Theory of Computations Spring 2016 Practice Final Exam Solutions Name: Directions: Answer the questions as well as you can. Partial credit will be given, so show your work where appropriate. Try

More information

CS 314 Principles of Programming Languages. Lecture 3

CS 314 Principles of Programming Languages. Lecture 3 CS 314 Principles of Programming Languages Lecture 3 Zheng Zhang Department of Computer Science Rutgers University Wednesday 14 th September, 2016 Zheng Zhang 1 CS@Rutgers University Class Information

More information

CSE Theory of Computing Spring 2018 Project 2-Finite Automata

CSE Theory of Computing Spring 2018 Project 2-Finite Automata CSE 30151 Theory of Computing Spring 2018 Project 2-Finite Automata Version 1 Contents 1 Overview 2 2 Valid Options 2 2.1 Project Options.................................. 2 2.2 Platform Options.................................

More information

Compiler Construction LECTURE # 3

Compiler Construction LECTURE # 3 Compiler Construction LECTURE # 3 The Course Course Code: CS-4141 Course Title: Compiler Construction Instructor: JAWAD AHMAD Email Address: jawadahmad@uoslahore.edu.pk Web Address: http://csandituoslahore.weebly.com/cc.html

More information

CMSC 132: Object-Oriented Programming II

CMSC 132: Object-Oriented Programming II CMSC 132: Object-Oriented Programming II Regular Expressions & Automata Department of Computer Science University of Maryland, College Park 1 Regular expressions Notation Patterns Java support Automata

More information

Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition

Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition by Hong-Kwang Jeff Kuo, Brian Kingsbury (IBM Research) and Geoffry Zweig (Microsoft Research) ICASSP 2007 Presented

More information

Basic Linux (Bash) Commands

Basic Linux (Bash) Commands Basic Linux (Bash) Commands Hint: Run commands in the emacs shell (emacs -nw, then M-x shell) instead of the terminal. It eases searching for and revising commands and navigating and copying-and-pasting

More information

Implementation of Lexical Analysis

Implementation of Lexical Analysis Written ssignments W assigned today Implementation of Lexical nalysis Lecture 4 Due in one week :59pm Electronic hand-in Prof. iken CS 43 Lecture 4 Prof. iken CS 43 Lecture 4 2 Tips on uilding Large Systems

More information

Introduction: Language Description:

Introduction: Language Description: SAKÉ S halva Kohen: sak2232 ( Language Guru ) A runavha Chanda: ac3806 ( Manager ) K ai-zhan Lee: kl2792 ( System Architect ) E mma Etherington: ele2116 ( Tester ) Introduction: Behind all models of computation

More information

Structure of Programming Languages Lecture 3

Structure of Programming Languages Lecture 3 Structure of Programming Languages Lecture 3 CSCI 6636 4536 Spring 2017 CSCI 6636 4536 Lecture 3... 1/25 Spring 2017 1 / 25 Outline 1 Finite Languages Deterministic Finite State Machines Lexical Analysis

More information

CSCI 340: Computational Models. Turing Machines. Department of Computer Science

CSCI 340: Computational Models. Turing Machines. Department of Computer Science CSCI 340: Computational Models Turing Machines Chapter 19 Department of Computer Science The Turing Machine Regular Expressions Acceptor: FA, TG Nondeterminism equal? Yes Closed Under: L 1 + L 2 L 1 L

More information

FSASIM: A Simulator for Finite-State Automata

FSASIM: A Simulator for Finite-State Automata FSASIM: A Simulator for Finite-State Automata P. N. Hilfinger Chapter 1: Overview 1 1 Overview The fsasim program reads in a description of a finite-state recognizer (either deterministic or non-deterministic),

More information

The Replace Operator. Lauri Karttunen Rank Xerox Research Centre 6, chemin de Maupertuis F Meylan, France lauri, fr

The Replace Operator. Lauri Karttunen Rank Xerox Research Centre 6, chemin de Maupertuis F Meylan, France lauri, fr The Replace Operator Lauri Karttunen Rank Xerox Research Centre 6, chemin de Maupertuis F-38240 Meylan, France lauri, karttunen@xerox, fr Abstract This paper introduces to the calculus of regular expressions

More information

String Matching. Pedro Ribeiro 2016/2017 DCC/FCUP. Pedro Ribeiro (DCC/FCUP) String Matching 2016/ / 42

String Matching. Pedro Ribeiro 2016/2017 DCC/FCUP. Pedro Ribeiro (DCC/FCUP) String Matching 2016/ / 42 String Matching Pedro Ribeiro DCC/FCUP 2016/2017 Pedro Ribeiro (DCC/FCUP) String Matching 2016/2017 1 / 42 On this lecture The String Matching Problem Naive Algorithm Deterministic Finite Automata Knuth-Morris-Pratt

More information

A Flexible XML-based Regular Compiler for Creation and Conversion of Linguistic Resources

A Flexible XML-based Regular Compiler for Creation and Conversion of Linguistic Resources A Flexible XML-based Regular Compiler for Creation and Conversion of Linguistic Resources Jakub Piskorski,, Oliver Scherf, Feiyu Xu DFKI German Research Center for Artificial Intelligence Stuhlsatzenhausweg

More information

Lab - 8 Awk Programming

Lab - 8 Awk Programming Lab - 8 Awk Programming AWK is another interpreted programming language which has powerful text processing capabilities. It can solve complex text processing tasks with a few lines of code. Listed below

More information

CS214-AdvancedUNIX. Lecture 2 Basic commands and regular expressions. Ymir Vigfusson. CS214 p.1

CS214-AdvancedUNIX. Lecture 2 Basic commands and regular expressions. Ymir Vigfusson. CS214 p.1 CS214-AdvancedUNIX Lecture 2 Basic commands and regular expressions Ymir Vigfusson CS214 p.1 Shellexpansions Let us first consider regular expressions that arise when using the shell (shell expansions).

More information

CENG 334 Computer Networks. Laboratory I Linux Tutorial

CENG 334 Computer Networks. Laboratory I Linux Tutorial CENG 334 Computer Networks Laboratory I Linux Tutorial Contents 1. Logging In and Starting Session 2. Using Commands 1. Basic Commands 2. Working With Files and Directories 3. Permission Bits 3. Introduction

More information

Chapter 3: Lexing and Parsing

Chapter 3: Lexing and Parsing Chapter 3: Lexing and Parsing Aarne Ranta Slides for the book Implementing Programming Languages. An Introduction to Compilers and Interpreters, College Publications, 2012. Lexing and Parsing* Deeper understanding

More information

Implementation of Lexical Analysis

Implementation of Lexical Analysis Outline Implementation of Lexical nalysis Specifying lexical structure using regular expressions Finite automata Deterministic Finite utomata (DFs) Non-deterministic Finite utomata (NFs) Implementation

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain 1 2 Compiler Construction F6S Lecture - 2 1 3 4 Compiler Construction F6S Lecture - 2 2 5 #include.. #include main() { char in; in = getch ( ); if ( isalpha (in) ) in = getch ( ); else error (); while

More information

6 NFA and Regular Expressions

6 NFA and Regular Expressions Formal Language and Automata Theory: CS21004 6 NFA and Regular Expressions 6.1 Nondeterministic Finite Automata A nondeterministic finite automata (NFA) is a 5-tuple where 1. is a finite set of states

More information

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs Summer 2010 Department of Computer Science and Engineering York University Toronto June 29, 2010 1 / 36 Table of contents 1 2 3 4 2 / 36 Our goal Our goal is to see how we can use Unix as a tool for developing

More information

Handbook of Weighted Automata

Handbook of Weighted Automata Manfred Droste Werner Kuich Heiko Vogler Editors Handbook of Weighted Automata 4.1 Springer Contents Part I Foundations Chapter 1: Semirings and Formal Power Series Manfred Droste and Werner Kuich 3 1

More information

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions. Finite automata We have looked at using Lex to build a scanner on the basis of regular expressions. Now we begin to consider the results from automata theory that make Lex possible. Recall: An alphabet

More information

Unix for Poets (in 2016) Christopher Manning Stanford University Linguistics 278

Unix for Poets (in 2016) Christopher Manning Stanford University Linguistics 278 Unix for Poets (in 2016) Christopher Manning Stanford University Linguistics 278 Operating systems The operating system wraps the hardware, running the show and providing abstractions Abstractions of processes

More information

Lecture 18 Regular Expressions

Lecture 18 Regular Expressions Lecture 18 Regular Expressions In this lecture Background Text processing languages Pattern searches with grep Formal Languages and regular expressions Finite State Machines Regular Expression Grammer

More information

Stone Soup Translation

Stone Soup Translation Stone Soup Translation DJ Hovermale and Jeremy Morris and Andrew Watts December 3, 2005 1 Introduction 2 Overview of Stone Soup Translation 2.1 Finite State Automata The Stone Soup Translation model is

More information

Hierarchical Phrase-Based Translation with WFSTs. Weighted Finite State Transducers

Hierarchical Phrase-Based Translation with WFSTs. Weighted Finite State Transducers Hierarchical Phrase-Based Translation with Weighted Finite State Transducers Gonzalo Iglesias 1 Adrià de Gispert 2 Eduardo R. Banga 1 William Byrne 2 1 Department of Signal Processing and Communications

More information

DISCRETE-event dynamic systems (DEDS) are dynamic

DISCRETE-event dynamic systems (DEDS) are dynamic IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 7, NO. 2, MARCH 1999 175 The Supervised Control of Discrete-Event Dynamic Systems François Charbonnier, Hassane Alla, and René David Abstract The supervisory

More information

CMSC Introduction to Computer Science 2 Summer Quarter 2007 Homework #8 (08/17/2007) Due: 1:30pm

CMSC Introduction to Computer Science 2 Summer Quarter 2007 Homework #8 (08/17/2007) Due: 1:30pm Name: Student ID: Instructor: Borja Sotomayor Do not write in this area 1 2 3 TOTAL Maximum possible points: 20 + 40 Page 1 of 8 Exercise 1 You are provided with an XML file with information

More information

Formal Languages and Compilers Lecture VI: Lexical Analysis

Formal Languages and Compilers Lecture VI: Lexical Analysis Formal Languages and Compilers Lecture VI: Lexical Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/ Formal

More information

An Ecient Compiler for Weighted Rewrite Rules

An Ecient Compiler for Weighted Rewrite Rules An Ecient Compiler for Weighted Rewrite Rules Mehryar Mohri AT&T Research 600 Mountain Avenue Murray Hill, 07974 NJ mohri@research.att.com Abstract Context-dependent rewrite rules are used in many areas

More information

/665 Natural Language Processing Assignment 7: Finite-State Programming

/665 Natural Language Processing Assignment 7: Finite-State Programming 601.465/665 Natural Language Processing Assignment 7: Finite-State Programming Prof. Jason Eisner Fall 2017 Due date: Friday 8 December, 11:59pm This assignment exposes you to finite-state programming.

More information

bash Tests and Looping Administrative Shell Scripting COMP2101 Fall 2017

bash Tests and Looping Administrative Shell Scripting COMP2101 Fall 2017 bash Tests and Looping Administrative Shell Scripting COMP2101 Fall 2017 Command Lists A command is a sequence of commands separated by the operators ; & && and ; is used to simply execute commands in

More information

The Kleene Language for Weighted Finite-State Programming:

The Kleene Language for Weighted Finite-State Programming: The Kleene Language for Weighted Finite-State Programming: User Documentation, Version 0.9.5.0 This Document is Work in Progress Corrections and Suggestions Are Welcome Kenneth R. Beesley SAP Labs, LLC

More information

Midterm I - Solution CS164, Spring 2014

Midterm I - Solution CS164, Spring 2014 164sp14 Midterm 1 - Solution Midterm I - Solution CS164, Spring 2014 March 3, 2014 Please read all instructions (including these) carefully. This is a closed-book exam. You are allowed a one-page handwritten

More information

Theory of Computation Dr. Weiss Extra Practice Exam Solutions

Theory of Computation Dr. Weiss Extra Practice Exam Solutions Name: of 7 Theory of Computation Dr. Weiss Extra Practice Exam Solutions Directions: Answer the questions as well as you can. Partial credit will be given, so show your work where appropriate. Try to be

More information

Lexical Analysis. Lecture 2-4

Lexical Analysis. Lecture 2-4 Lexical Analysis Lecture 2-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 2 1 Administrivia Moving to 60 Evans on Wednesday HW1 available Pyth manual available on line.

More information

FSA: An Efficient and Flexible C++ Toolkit for Finite State Automata Using On-Demand Computation

FSA: An Efficient and Flexible C++ Toolkit for Finite State Automata Using On-Demand Computation FSA: An Efficient and Flexible C++ Toolkit for Finite State Automata Using On-Demand Computation Stephan Kanthak and Hermann Ney Lehrstuhl für Informatik VI, Computer Science Department RWTH Aachen University

More information

Exercise 2: Automata Theory

Exercise 2: Automata Theory Exercise 2: Automata Theory Formal Methods II, Fall Semester 2013 Distributed: 11.10.2013 Due Date: 25.10.2013 Send your solutions to: tobias.klauser@uzh.ch or deliver them in the class. Finite State Automata

More information

CS/IT 114 Introduction to Java, Part 1 FALL 2016 CLASS 3: SEP. 13TH INSTRUCTOR: JIAYIN WANG

CS/IT 114 Introduction to Java, Part 1 FALL 2016 CLASS 3: SEP. 13TH INSTRUCTOR: JIAYIN WANG CS/IT 114 Introduction to Java, Part 1 FALL 2016 CLASS 3: SEP. 13TH INSTRUCTOR: JIAYIN WANG 1 Notice Reading Assignment Chapter 1: Introduction to Java Programming Homework 1 It is due this coming Sunday

More information

Formal Languages. Formal Languages

Formal Languages. Formal Languages Regular expressions Formal Languages Finite state automata Deterministic Non-deterministic Review of BNF Introduction to Grammars Regular grammars Formal Languages, CS34 Fall2 BGRyder Formal Languages

More information

LOG ON TO LINUX AND LOG OFF

LOG ON TO LINUX AND LOG OFF EXPNO:1A LOG ON TO LINUX AND LOG OFF AIM: To know how to logon to Linux and logoff. PROCEDURE: Logon: To logon to the Linux system, we have to enter the correct username and password details, when asked,

More information

CS 124/LINGUIST 180 From Languages to Information. Unix for Poets Dan Jurafsky

CS 124/LINGUIST 180 From Languages to Information. Unix for Poets Dan Jurafsky CS 124/LINGUIST 180 From Languages to Information Unix for Poets Dan Jurafsky (original by Ken Church, modifications by me and Chris Manning) Stanford University Unix for Poets Text is everywhere The Web

More information

1 Finite Representations of Languages

1 Finite Representations of Languages 1 Finite Representations of Languages Languages may be infinite sets of strings. We need a finite notation for them. There are at least four ways to do this: 1. Language generators. The language can be

More information

Theory of Computations Spring 2016 Practice Final

Theory of Computations Spring 2016 Practice Final 1 of 6 Theory of Computations Spring 2016 Practice Final 1. True/False questions: For each part, circle either True or False. (23 points: 1 points each) a. A TM can compute anything a desktop PC can, although

More information

Symbolic Automata Library for Fast Prototyping

Symbolic Automata Library for Fast Prototyping http://excel.fit.vutbr.cz Symbolic Automata Library for Fast Prototyping Michaela Bieliková not_in{@} in{e,x,c} in{e,l} F I T Abstract Finite state automata are widely used in the fields of computer science

More information

Essentials for Scientific Computing: Stream editing with sed and awk

Essentials for Scientific Computing: Stream editing with sed and awk Essentials for Scientific Computing: Stream editing with sed and awk Ershaad Ahamed TUE-CMS, JNCASR May 2012 1 Stream Editing sed and awk are stream processing commands. What this means is that they are

More information

1.3 Functions and Equivalence Relations 1.4 Languages

1.3 Functions and Equivalence Relations 1.4 Languages CSC4510 AUTOMATA 1.3 Functions and Equivalence Relations 1.4 Languages Functions and Equivalence Relations f : A B means that f is a function from A to B To each element of A, one element of B is assigned

More information