CMPE 4003 Formal Languages & Automata Theory Project Due May 15, 2010.

Similar documents
CSE 105 THEORY OF COMPUTATION

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

Regular Languages and Regular Expressions

Lexical Analysis. Chapter 2

NFAs and Myhill-Nerode. CS154 Chris Pollett Feb. 22, 2006.

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

Lexical Analysis. Lecture 2-4

Implementation of Lexical Analysis

Chapter Seven: Regular Expressions

Lexical Analysis. Implementation: Finite Automata

Converting a DFA to a Regular Expression JP

Lexical Analysis. Lecture 3-4

CSE 105 THEORY OF COMPUTATION

Implementation of Lexical Analysis


CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Implementation of Lexical Analysis

ECS 120 Lesson 7 Regular Expressions, Pt. 1

CS2 Practical 2 CS2Ah

Automating Construction of Lexers

Introduction to Lexical Analysis

lec3:nondeterministic finite state automata

Lexical Analysis. Prof. James L. Frankel Harvard University

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Ambiguous Grammars and Compactification

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Midterm Exam II CIS 341: Foundations of Computer Science II Spring 2006, day section Prof. Marvin K. Nakayama

Implementation of Lexical Analysis. Lecture 4

CS 202, Fall 2017 Homework #4 Balanced Search Trees and Hashing Due Date: December 18, 2017

Theory of Computation Dr. Weiss Extra Practice Exam Solutions

2. Lexical Analysis! Prof. O. Nierstrasz!

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

CS 310: State Transition Diagrams

6 NFA and Regular Expressions

Closure Properties of CFLs; Introducing TMs. CS154 Chris Pollett Apr 9, 2007.

Compiler Construction

14.1 Encoding for different models of computation

Midterm I - Solution CS164, Spring 2014

Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

Behaviour Diagrams UML

Lexical Analysis - 2

Compiler course. Chapter 3 Lexical Analysis

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Course Project 2 Regular Expressions

Lexical Analysis. Introduction

CSE100 Principles of Programming with C++

Dr. D.M. Akbar Hussain

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Automata & languages. A primer on the Theory of Computation. The imitation game (2014) Benedict Cumberbatch Alan Turing ( ) Laurent Vanbever

Chapter 3 Lexical Analysis

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Definition: A context-free grammar (CFG) is a 4- tuple. variables = nonterminals, terminals, rules = productions,,

CMSC330 Spring 2015 Midterm #1 9:30am/11:00am/12:30pm

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

CT32 COMPUTER NETWORKS DEC 2015

CS 301. Lecture 05 Applications of Regular Languages. Stephen Checkoway. January 31, 2018

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

G52LAC Languages and Computation Lecture 6

6.1 Skip List, Binary Search Tree

CSE 105 THEORY OF COMPUTATION

Lexical Analysis. Lecture 3. January 10, 2018

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

UNIT -2 LEXICAL ANALYSIS

Compiler Construction

CS/ECE 374 Fall Homework 1. Due Tuesday, September 6, 2016 at 8pm

Homework 1 Due Tuesday, January 30, 2018 at 8pm

CS 211 Programming Practicum Fall 2018

CMSC330 Spring 2015 Midterm #1 9:30am/11:00am/12:30pm

FAdo: Interactive Tools for Learning Formal Computational Models

CSCE 5400 ASSIGNMENT #1 SOLUTION

Formal Languages and Automata

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Kinder, Gentler Nation

Actually talking about Turing machines this time

CS103 Handout 14 Winter February 8, 2013 Problem Set 5

CSE450. Translation of Programming Languages. Automata, Simple Language Design Principles

Chapter 3: Lexing and Parsing

Definition 2.8: A CFG is in Chomsky normal form if every rule. only appear on the left-hand side, we allow the rule S ǫ.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

CS164: Midterm I. Fall 2003

Formal Languages and Compilers Lecture VI: Lexical Analysis

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

T.E. (Computer Engineering) (Semester I) Examination, 2013 THEORY OF COMPUTATION (2008 Course)

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

A Typed Lambda Calculus for Input Sanitation

UNION-FREE DECOMPOSITION OF REGULAR LANGUAGES

CMPSCI 250: Introduction to Computation. Lecture 20: Deterministic and Nondeterministic Finite Automata David Mix Barrington 16 April 2013

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES

Transcription:

CMPE 4003 Formal Languages & Automata Theory Project Due May 15, 2010. Regular Expression Engine (This document contains 5 pages. Read carefully all parts of this document.) Introduction In this project, you will implement a Regular Expression Engine. The engine will be used for searching patterns (which match a given regular expression) in a given text file. Your program will take two inputs: a regular expression and a text file. As output, your program will print those lines of the text file which contain strings that match the given regular expression. Definition of a Regular Expression A definition of regular expressions is given in your text book (Definition 1.52, p. 64). Additionally, we will assume the following: The alphabet, Σ, will consists of all printable ASCII characters except the metacharacters. Metacharacters are: (, ),, *, and &. We will omit the symbol o in writing regular expressions. That is, instead of R 1 o R 2, we will simply write R 1 R 2. The textbook uses the symbol to denote the union operator. Since this symbol is a nonkeyboard character, we will use the symbol to denote the union operator. For simplicity you may assume that regular expression do not contain the symbols ε and. Infix to Postfix It is easier to process a regular expression if it is in postfix form. For example the postfix equivalent of the regular expression (a b)*xz* is given by the expression ab *x&z*&. Note that the concatenation symbol is explicitly shown as the symbol & in this postfix expression. This makes it easier to process the postfix expression. So, before processing the regular expression, you need to convert it into postfix form. A C++ and Java code which makes this conversion will be provided. Project Parts The project will consist of two parts: Part I: Construction of an NFA from a given regular expression. Part II: Processing the given text file with the NFA. In part I, you should first convert the given (infix) regular expression to postfix form. Then, process this postfix expression and construct the NFA step by step. An example start-up algorithm (Algorithm 1) for processing the postfix expression and for building the final NFA is given in this document. For constructing the individual NFAs, use the construction methods which we have seen in class (Lemma 1.55 in textbook). 1

After constructing the final NFA, you should process the given text file. Process the text file line by line. That is, begin with the first line of text and check whether the NFA accepts it or not. If it accepts then print that line, if it does not accept then print nothing. Proceed in this fashion until the end of the text. An example algorithm (Algorithm 2) for simulating the processing of an NFA for a given string is provided in this document. For the first part you are given a class design with the accompanying code in both C++ and Java, so it could be easier for you to start with it. It is up to you to decide whether to use that code or not, you may also use it partially. What is required is that your code should be able to build NFA that recognizes given regex after translating it to postfix form and then be able to process a file and output the matching lines as expected. Bonus (30 points): Write the code for converting the NFA into a DFA (using the construction in Theorem 1.39). Give the user an option so that s/he can choose whether to process the text file with a DFA or an NFA. If the user chooses DFA then process the text file with the DFA. If the user chooses NFA then process the text file with the NFA. (Note that if the user opts for an NFA then you don t need to construct the DFA). Compare the running times of both cases (DFA or NFA) and report your observations. How can you explain the difference in the running times? Project Groups You can work in groups. There can be at most 2 people in a single group. You are responsible for forming your group until 22.04.11 and register yourself in the list maintained by the TA. Submission and Grading: (Please read very carefully!) 1. You must provide source code in Java or C/C++ and make sure that your code can be compiled and run from Linux OS command line using javac or gcc/g++. If you use multiple languages, we will appreciate if your project can simply be compiled e.g. using makefile. Make a tarball and send it to cmpe4003@gmail.com before deadline. 2. Comment and indent your code clearly. 3. You must prepare a clear detailed report of your work including your reasoning (design and implementation details) and at least 5 test runs on large files with regular expressions of different complexities. Present this report both as hardcopy and also soft copy in the tarball. 4. You must do your own work. You may discuss high level ideas with classmates, but you must write them ALONE in an isolated environment! If you get help, you must acknowledge that help in your writing. IT S NOT ALLOWED TO USE SOMEBODY ELSE s CODE PARTLY or FULLY THIS MAY BE A FRIEND, WEB PAGE OR A BOOK. WE HAVE ZERO TOLERANCE WHEN IT COMES TO CHEATING, UNIVERSITY S RULES and REGULATIONS APPLY. 5. With appointment, we will ask you to present your work for 5-10 minutes. Questions will be asked to group members separately. If one of the group members cannot explain what they have done clearly, it is considered cheating. Therefore, do only what you can. 2

Algorithm 1 for constructing an NFA from a given postfix regular expression: while (not end of postfix expression) { c = next character in postfix expression; if (c == & ) { nfa2 = pop(); nfa1 = pop(); push(nfa that accepts the concatenation of L(nfa1) followed by L(nfa2)); else if (c == ) { nfa2 = pop(); nfa1 = pop(); push(nfa that accepts L(nfa1) L(nfa2)); else if (c == * ) { nfa = pop(); push(nfa that accepts L(nfa)*); else { push(nfa that accepts a single character c); 3

Algorithm 2 for simulating the processing of an NFA, M = (Q, Σ, δ, S, F), for a given input string. T = ; P = ; while (true) { P = P S; // compute the epsilon closure of P. do { T = P; for (each δ(q, ε) where q P ) P = P δ( q, ε); while (T P ); // loop again if P is modified if (T F ) // if it is the end of input string then break if (input empty) c = next character in the string // compute next state of the NFA T = ; for (each δ(q, c) where q P ) T = T δ( q, c); if (T F ) P = T; accept iff (T F ); 4

Sample Test run USAGE: java Main (infix regexp) (file) $ java Main tam (b*az) test.txt Regex in Postfix form: ta&m&b*a&z& Following NFA was built: startstate = 15 acceptstates = [13, 16] allstates = [1, 2, 3,...] edges: (5, 6, 'm') (12, 13, ε) (10, 11, ε) (9, 10, ε) (1, 2, 't') (11, 12, 'a') (7, 8, 'b') (8, 10, ε)... NFA simulation results : ACCEPTED LINE (0) : tamtamlar caliyor. ACCEPTED LINE (4) : tatatamam. ACCEPTED LINE (5) : mxazabbaaaabcaz ACCEPTED LINE (6) : iki cambaz bir ipte oynamaz NFA simulation took: 61.925702 ms 5