Nested Words. A model of linear and hierarchical structure in data. Examples of such data

Similar documents
Colored Nested Words

The Generalized Star-Height Problem

Pushdown Automata. A PDA is an FA together with a stack.

Streaming Tree Transducers

where is the transition function, Pushdown Automata is the set of accept states. is the start state, and is the set of states, is the input alphabet,

OpenNWA: A Nested-Word-Automaton Library

Formal languages and computation models

Lecture 4: Syntax Specification

A Characterization of the Chomsky Hierarchy by String Turing Machines

CS6160 Theory of Computation Problem Set 2 Department of Computer Science, University of Virginia

Course Project 2 Regular Expressions

CS402 - Theory of Automata Glossary By

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

1. Draw the state graphs for the finite automata which accept sets of strings composed of zeros and ones which:

Rational subsets in HNN-extensions

CSE 105 THEORY OF COMPUTATION

From Theorem 8.5, page 223, we have that the intersection of a context-free language with a regular language is context-free. Therefore, the language

Final Course Review. Reading: Chapters 1-9

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

Chapter 14: Pushdown Automata

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Reflection in the Chomsky Hierarchy

Introduction to Lexing and Parsing

Ambiguous Grammars and Compactification

PS3 - Comments. Describe precisely the language accepted by this nondeterministic PDA.

Lexical and Syntax Analysis. Bottom-Up Parsing

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Lexical Analysis. Introduction

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

CSE 105 THEORY OF COMPUTATION

Ling/CSE 472: Introduction to Computational Linguistics. 4/6/15: Morphology & FST 2

Theory of Computation

THEORY OF COMPUTATION

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Lexical Analysis. Prof. James L. Frankel Harvard University

Theory of Computation Dr. Weiss Extra Practice Exam Solutions

Lecture 12 ADTs and Stacks

Languages and Finite Automata

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Zhizheng Zhang. Southeast University

PA3 Design Specification

CIT3130: Theory of Computation. Regular languages

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Regular Expression Derivatives in Python

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1

Formal Languages and Automata Theory, SS Project (due Week 14)

We can create PDAs with multiple stacks. At each step we look at the current state, the current input symbol, and the top of each stack.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Definition: A context-free grammar (CFG) is a 4- tuple. variables = nonterminals, terminals, rules = productions,,

Definition 2.8: A CFG is in Chomsky normal form if every rule. only appear on the left-hand side, we allow the rule S ǫ.

Lexical Analysis - 2

The Streaming Complexity of Validating XML Documents

CS402 - Theory of Automata FAQs By

Limited Automata and Unary Languages

Multiple Choice Questions

LR Parsing. The first L means the input string is processed from left to right.

Converting a DFA to a Regular Expression JP

CMSC 350: COMPILER DESIGN

A Note on the Succinctness of Descriptions of Deterministic Languages

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010

Compiler Construction

Conversion of Fuzzy Regular Expressions to Fuzzy Automata using the Follow Automata

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

A Survey on Graph Automata and Tree-Width

Automata & languages. A primer on the Theory of Computation. The imitation game (2014) Benedict Cumberbatch Alan Turing ( ) Laurent Vanbever

ONE-STACK AUTOMATA AS ACCEPTORS OF CONTEXT-FREE LANGUAGES *

PDA s. and Formal Languages. Automata Theory CS 573. Outline of equivalence of PDA s and CFG s. (see Theorem 5.3)

T.E. (Computer Engineering) (Semester I) Examination, 2013 THEORY OF COMPUTATION (2008 Course)

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

Prediction of Infinite Words with Automata

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Midterm Exam II CIS 341: Foundations of Computer Science II Spring 2006, day section Prof. Marvin K. Nakayama

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

Decision Properties of RLs & Automaton Minimization

Introduction to Lexical Analysis

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

Regular Expressions for Data Words

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

A Typed Lambda Calculus for Input Sanitation

DVA337 HT17 - LECTURE 4. Languages and regular expressions

Restricting Grammars with Tree Automata

COMP 330 Autumn 2018 McGill University

Deterministic automata for extended regular expressions

STACKS. A stack is defined in terms of its behavior. The common operations associated with a stack are as follows:

Dr. D.M. Akbar Hussain

Learn Smart and Grow with world

Decision Properties for Context-free Languages


Lexical Analysis. Chapter 2

Lexical Analysis. Finite Automata

[Ch 6] Set Theory. 1. Basic Concepts and Definitions. 400 lecture note #4. 1) Basics

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

Building Compilers with Phoenix

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9

UNIT I PART A PART B

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Transcription:

Nested Words A model of linear and hierarchical structure in data. s o m e w o r d Examples of such data Executions of procedural programs Marked-up languages such as XML Annotated linguistic data

Example: Execution of a Structured Program as a Nested Word 3 procedures

Example: Execution of a Structured Program as a Nested Word 3 procedures

Example: Execution of a Structured Program as a Nested Word 3 procedures

Example: Execution of a Structured Program as a Nested Word 3 procedures

Example: Execution of a Structured Program as a Nested Word 3 procedures

Example: Execution of a Structured Program as a Nested Word 3 procedures

Example: Execution of a Structured Program as a Nested Word 3 procedures

Example: Execution of a Structured Program as a Nested Word 3 procedures

Example: Execution of a Structured Program as a Nested Word 3 procedures

Example: Execution of a Structured Program as a Nested Word 3 procedures

Example: Execution of a Structured Program as a Nested Word 3 procedures

Example: Execution of a Structured Program as a Nested Word 3 procedures

Example: Execution of a Structured Program as a Nested Word 3 procedures

Example: Execution of a Structured Program as a Nested Word 3 procedures

Example: Execution of a Structured Program as a Nested Word 3 procedures

Nested Words Automata (NWA) Special kinds of pushdown automata whose push/pop actions are directed by the hierarchical structure in the input nested word. On every open-tag, the automaton pushes On every close-tag, the automaton pops On data with no hierarchical connections, no stack operation is made. A language of nested words is said to be regular if it can be recognized by a NWA

Properties of Regular Languages of Nested Words Automata Deterministic nested word automata are as expressive as their non-deterministic counterparts The class is closed under: union intersection complementation concatenation Kleene-* prefixes reversal homomorphism The following problems are decidable: emptiness membership inclusion equivalence

Weakness of Nested Words But what if an exceptions is thrown?

Weakness of Nested Words But what if an exceptions is thrown? r 3

Weakness of Nested Words But what if an exceptions is thrown? p 5 p 6 r 3

Weakness of Nested Words But what if an exceptions is thrown? p 5 r 3

Weakness of Nested Words But what if an exceptions is thrown? p 5 r 3 r 3 p 5 p 6 p 5

Other Examples Python program

Colored Nested Words - Encoding Explicitly, using a graph with several types of edges (linear, mono-chromatic and bi-chromatic) and defining the constraints they should satisfy. Implicitly, using colored/types parenthesis. (p 0 (p 1 p 1 p 3 (q 0 q 1 q 2 (r 0 r 1 r 2 r 3 ) p 5 p 6 ) [p 0 (p 1 p 1 p 3 [q 0 q 1 q 2 [r 0 r 1 r 2 r 3 ) p 5 p 6 ]

Regularity How can we define regularity of a language of colored nested words? Intuitively, we d like the language obtained by inserting the missing closing tags to be a well-colored regular language of nested words. (p 0 (p 1 p 1 p 3 (q 0 q 1 q 2 (r 0 r 1 r 2 r 3 ) p 5 p 6 ) (p 0 (p 1 p 1 p 3 (q 0 q 1 q 2 (r 0 r 1 r 2 -) -) r 3 ) p 5 p 6 )

Regularity Suppose we have such a mapping from CNW to NW f(w) = w such that w has all the missing closing tags Definition: Let L be a language of CNW. Then L is regular if { f(w) w L } is a regular lang. of NW

How can we process regular CNW? w Transducer Implementing f f(w) NWA But how can we implement this transducer? It is not regular, and not regular NW, because it needs to pop more than one letter when reading a rescuing closing tag!

Colored Nested Words Automata The pushdown automaton marks the pushed letters with the respective color A read of an x-colored close-tag pops from the stack all symbols of y-colored symbols up to the first x-colored symbol (as well as this letter itself) R(n) Q(n) try P(n) r 3

Colored Nested Words Automata The pushdown automaton marks the pushed letters with the respective color A read of an x-colored close-tag pops from the stack all symbols of y-colored symbols up to the first x-colored symbol (as well as this letter itself) R(n) Q(n) try P(n) p 5 r 3

Blind vs. Sighted CNA Can the automaton view all the letters in the stack above the matching x-colored letter? Or can it just see that x-colored letter? We call the first version Sighted CNA R(n) Q(n) try P(n)

Blind vs. Sighted CNA Can the automaton view all the letters in the stack above the matching x-colored letter? Or can it just see that x-colored letter? We call the first version Sighted CNA And the second version Blind CNA R(n) Q(n) try P(n) Theorem 1 Sighted CNA and Blind CNA have the same expressive power

Blind vs. Sighted CNA Theorem 2 Sighted CNA and Blind CNA accept all the regular languages of colored nested words

Other Examples broken html

CNA html example Pop trans Push trans Internal ε-trans

CNA - Examples Accepts word where #a s between () is even

CNA - Examples Accepts word where #a s between () is even and #a s between [] is odd

CNA - Examples Accepts word where #a s between () is even and #a s between [] is odd a ( may be unmatched if encapsulated by []

Properties of Regular Languages of Colored Nested Words Automata Deterministic nested word automata are as expressive as their non-deterministic counterparts The class is closed under: union intersection complementation concatenation Kleene-* prefixes reversal homomorphism The following problems are decidable: emptiness membership inclusion equivalence

Grammar Characterization There exists a grammar characterization for NWs It classifies variables to two sets One that disallows pending calls One that disallows pending returns There exists a grammar characterization for CNWs It classifies variables C +2 sets One per each color ))... ) (..) ((.)()()() ) ( ().( One that disallows pending calls One that disallows pending returns )) ) (..) ((.) ( () ( ) ( ().(