Counting multiplicity over infinite alphabets

Similar documents
Two-Variable Logic on Words with Data

Model checking pushdown systems

Bounded depth data trees

XPath with transitive closure

Regular Expressions for Data Words


Graph algorithms based on infinite automata: logical descriptions and usable constructions

Discrete Mathematics Lecture 4. Harper Langston New York University

Reasoning on Words and Trees with Data

Containment of Data Graph Queries

Rewriting Models of Boolean Programs

A Decidable Extension of Data Automata

Foundations of Computer Science Spring Mathematical Preliminaries

We ve studied the main models and concepts of the theory of computation:

View-based query processing: On the relationship between rewriting, answering and losslessness

Automata Theory for Reasoning about Actions

TAFL 1 (ECS-403) Unit- V. 5.1 Turing Machine. 5.2 TM as computer of Integer Function

Rational subsets in HNN-extensions

(a) Give inductive definitions of the relations M N and M N of single-step and many-step β-reduction between λ-terms M and N. (You may assume the

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1

CSE 20 DISCRETE MATH. Fall

Reducing Clocks in Timed Automata while Preserving Bisimulation

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

More on Polynomial Time and Space

Semantic Characterizations of XPath

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

Relational Databases

Regular Path Queries on Graphs with Data

Logical Aspects of Spatial Databases

Lecture 2 Finite Automata

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

LTCS Report. Concept Descriptions with Set Constraints and Cardinality Constraints. Franz Baader. LTCS-Report 17-02

[Ch 6] Set Theory. 1. Basic Concepts and Definitions. 400 lecture note #4. 1) Basics

Timed Automata: Semantics, Algorithms and Tools

COUNTING AND PROBABILITY

Slides for Faculty Oxford University Press All rights reserved.

COMP 763. Eugene Syriani. Ph.D. Student in the Modelling, Simulation and Design Lab School of Computer Science. McGill University

Structural characterizations of schema mapping languages

Implementation of Lexical Analysis

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Expressiveness of Minmax Automata

lec3:nondeterministic finite state automata

Negations in Refinement Type Systems

Implementation of Lexical Analysis

A Retrospective on Datalog 1.0

Computation Engineering Applied Automata Theory and Logic. Ganesh Gopalakrishnan University of Utah. ^J Springer

2 Review of Set Theory

Implementation of Lexical Analysis

Thompson groups, Cantor space, and foldings of de Bruijn graphs. Peter J. Cameron University of St Andrews

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data?

Term Algebras with Length Function and Bounded Quantifier Elimination

Craig Interpolation Theorems and Database Applications

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

( A(x) B(x) C(x)) (A(x) A(y)) (C(x) C(y))

Appendix Set Notation and Concepts

Forms of Determinism for Automata

Relational Database: The Relational Data Model; Operations on Database Relations

MATH 139 W12 Review 1 Checklist 1. Exam Checklist. 1. Introduction to Predicates and Quantified Statements (chapters ).

CSC Discrete Math I, Spring Sets

CS402 - Theory of Automata Glossary By

Material from Recitation 1

Summary of Course Coverage

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

MODELS OF CUBIC THEORIES

DISCRETE MATHEMATICS

Comparing the Expressive Power of Access Control Models

Introduction to Sets and Logic (MATH 1190)

Timed Automata. Rajeev Alur. University of Pennsylvania

Finite Model Theory and Its Applications

Lexical Analysis. Chapter 2

XML Research for Formal Language Theorists

CSE 20 DISCRETE MATH. Winter

Example: NFA to DFA Conversion

Appendix 1. Description Logic Terminology

Zhizheng Zhang. Southeast University

Appendix 1. Description Logic Terminology

Relative Expressive Power of Navigational Querying on Graphs

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

Lexical Analysis. Implementation: Finite Automata

LOGIC AND DISCRETE MATHEMATICS

Implementation of Lexical Analysis. Lecture 4

CSE 105 THEORY OF COMPUTATION

EDAA40 At home exercises 1

Formal Languages and Automata

Solutions to Homework 10

When Can We Answer Queries Using Result-Bounded Data Interfaces?

3 No-Wait Job Shops with Variable Processing Times

CHAPTER 8. Copyright Cengage Learning. All rights reserved.

On partial order semantics for SAT/SMT-based symbolic encodings of weak memory concurrency

Eliminating the Storage Tape in Reachability Constructions

Foundations of AI. 9. Predicate Logic. Syntax and Semantics, Normal Forms, Herbrand Expansion, Resolution

(Refer Slide Time: 0:19)

Action Language Verifier, Extended

Regular Languages and Regular Expressions

Midterm Exam II CIS 341: Foundations of Computer Science II Spring 2006, day section Prof. Marvin K. Nakayama

Computer Sciences Department

1. [5 points each] True or False. If the question is currently open, write O or Open.

Exercises Computational Complexity

Transcription:

Counting multiplicity over infinite alphabets Amal Dev Manuel and R. Ramanujam The Institute of Mathematical Sciences, Chennai, India {amal,jam}@imsc.res.in

Summary Motivation for infinite data. We need good data models, amenable to decidable verification. Crucial decision: operations and predicates on data. Our proposal: count data value occurrences (subject to constraints). Decidable automaton model. Interesting connections to logics.

Data in verification Semi-structured data: documents viewed as ranked / unranked trees with labels from finite domain. Software verification: Control structures: Procedure calls, dynamic process creation. Data structures: integers, lists, pointers. Communication channels: unbounded buffers. Parameters: Number of processes, communication delays.

A resourceful tale Two processes: {r i, s i, t i } for request, start and terminate. Local property: in any computation, the i-projection is of the form: (r i s i t i ). Global property: between any s i and subsequent t i, there is no s j or t j, where j i. What happens when the number of processes is either unknown, or changes during computation?

Model checking infinite state systems An active research area. A typical approach: Describe system states by finite objects (strings). Describe possible transitions by rewriting rules. Devise algorithms for checking reachability. Model checking of linear time properties possible in many cases. Missing: reasoning about data across states (as above). Missing: A generic framework for branching time properties.

Decidability issues Parametrized verification: property refers to process actions indexed by process ID: requires an infinite alphabet. Apt and Kozen 1986: Parametrized verification is undecidable. Decidability obtained using network invariants, regular model checking.

Decidability issues Parametrized verification: property refers to process actions indexed by process ID: requires an infinite alphabet. Apt and Kozen 1986: Parametrized verification is undecidable. Decidability obtained using network invariants, regular model checking. Emerson, Namjoshi 2005: indexed processes in CTL : decidability obtained by showing that the properties studied have constant cutoffs, using symmetry arguments.

A uniform framework Similar considerations in dealing with semistructured data. Enhance finitely labelled structures by data. One or more relations per node. Parameters: Underlying structure. Amount and structure of data at each node. Operations and predicates on data. Expressiveness of specification language.

Regular languages Regular languages over finite alphabets: a robust notion. There does not seem to be a canonical notion of regular data languages. But we can mimic the regular languages framework. Some automata models have been studied.

Example languages Some standard examples. No two a positions have same data value. There exist two a positions have same data value. For every a position, there exists a b position with the same data value. A process has to consume one given resource before requesting another. Every process requesting a resource is eventually granted. Only one process has the resource at any time.

Need for a theory We look for a decent theory of regular-like word and tree languages over infinite alphabets. Decent = decidable emptiness problem, with manageable complexity. Better, equivalent logical / algebraic characterizations. Only equality comparisons on the infinite alphabet.

Data languages (Σ D)-labelled words, where Σ is finite and D is infinite. Data word language L Σ, Data trees: the same notion, over unranked ordered trees. Since we have only equality tests on values, positions in data words are partitioned into classes; similarly nodes in trees are equated.

Reasoning Books that have been re-edited: y. (x.isbn = y.isbn x.year y.year) Unary keys: attribute A has distinct values: x, y. (x.a = y.a = x = y) Navigation: From node x we can access nodes y 1, y 2 via paths of type p 1, p 2 R such that y 1.B = y 2.B.

Register automata k-register automata: upon reading (a, v) (Σ D), one can check in which register value v occurs, can store v into a register. Let L Σ. Define: Proj(L) = {a 1... a n (a 1, v 1 )... (a n, v n ) L} If L is recognized by a k-ra M, then Proj(L) is regular. From M one can construct a word automaton M of size M 2 O(k2). Proof idea: Consider the matrix {=, } k k for keeping track of equal registers; guess (in)equalities on the fly.

Example All data values occurring with letter a are distinct. (Σ, 1) (Σ, 2) (Σ, 1), (Σ, 2) q 0 Σ 1 a, 1 q 1 Σ 2 a, 1 q f Σ 2

Results Non-emptiness for register automata is decidable. There are subtle differences between register automata models. In some, data values can occur in more than one register; in some they cannot. In the former, the problem is PSpace-complete; in the latter it is NP-complete. The mode is not expressive: local properties, like every projection is of the form (r i s i t i ), cannot be expressed.

Pebble automata Upon reading (a, v) (Σ D), check which pebbles are under the head; check which pebbles mark positions with v can lift highest pebble, with head reverting to previous pebble, and place new pebble.

Example There are at least two positions with a having the same data value. 1, Σ,,, 2, Σ,,, 2, Σ,,, 1, a,,, 2, a, {1}, {1}, 2, a,, {1}, q 0 q q 1 q f

Data logics FO(+1, <, 1, ): atomic predicates P a (x), for a Σ. +1 for successor position, < order on positions. same value relation, 1 for class successor. EMSO(+1, <, 1, ), similarly. Models: data words.

Examples Consider FO(+1, <, ): Every position labelled with a has a distinct value:. x, y.(p a (x) P a (y) x y) = (x y) Complement of the language above: words containing two positions labelled with a having the same data value:. x, y.(p a (x) P a (y) x y x y) Inclusion dependence: every position labelled with a has a value which appears under a position labelled with b:. x. y.(p a (x) = (P b (y) x y))

Examples Sequences over {0, 1} with the same subsequence of 0-values and 1-values: All 0 s have distinct values; similarly for 1 s. There is a bijection between 0-values and 1-values. For every pair of 0-positions x < y and every 1-position z with x z, there exists a 1-position z such that z < z and y z. Needs 3 variables, accepted by 2-PA. Earlier examples, plus:. x, y, z.(p 0 (x) P 0 (y) x < y P 1 (z) x z) = ( x.(p 1 (x) x y z < x))

Undecidability FO 3 (S, ) is undecidable (and therefore FO 3 (<, ) since S is definable from < when we can use 3 variables). PCP reduction: Given instance I over alphabet Σ, let Σ consist of two disjoint copies of Σ. Regular EqualSequences

Expressive power FO(+1, <, ) is incomparable with register automata. FO(+1, <, ) is strictly included in pebble automata. The two-variable fragment is a decidable fragment, but almost all natural extensions are undecidable. E.g. FO 2 (+1, <,, ) with a linear order on data values.

Two variable FO Why consider a two variable logic, at all? More hope for decidability. Rich structure over words. Core XPath without attributes = FO(+1, <) over trees [Gottlob et al 02, Marx 05]. Core XPath with one attribute FO(+1, <, ).

Two variable logics FO 2 over graphs has finite model property [Mortimer 75]; is NEXPTIME complete [Graedel, Otto 99]. Over words, FO 2 is equivalent to: unary LTL and Σ 2 Π 2 [Etessami, Vardi Wilke 02]. the variety DA [Therien, Wilke 98]. 2-way partially ordered DFA [Schwentick, Therien, Vollmer 01]. and is NEXPTIME complete.

Decidability FO 2 (+1, <, ) is decidable [Bojanczyk, Muscholl, Schwentick, Segoufin, David 06]. For each formula φ construct a data automaton that accepts L(φ). For each data automaton accepting L, construct a multicounter automaton that recognizes str(l). 2-EXPTIME reduction.

Data automata Data automaton (A, B): A, the base automaton, is a nondeterministic letter-to-letter transducer. B, the class automaton, is an NFA. A outputs a word x over a finite alphabet. B checks, for each -class, that the subword of x corresponding to the class is accepted.

Example Every data value occurring under a is distinct. Σ Σ The transducer q 0 b b The finite automaton q 0 a q1

FO and data automata Above, we saw that FO 2 definable data languages are recognizable by data automata. But the converse does not hold. Consider the property: each class is of even length. This is not FO-definable. A prefix of second order existential quantifiers helps. Still not good enough; describing an accepting run needs a comparison of successive positions in the same class. EMSO 2 (+1, <,, 1) = DA.

FO to DA Scott normal form: every formula equivalent to x y.χ i x y.χ i where the χ i and χ are quantifier free, but over an extended signature with unary predicates. Hence equivalent to R 1... R m followed by a Scott formula. Careful rewriting to ensure that innermost conjuncts are all of the form base type or x y, x y, x < y etc. Then construct data automata for each case, and use closure under intersection, union and renaming.

Multicounter automata Finite automata + positive counters. Equivalent to Petri nets. No test for zero (except at the end). Acceptance by final state all counters = 0. Emptiness decidable [Mayr, Kosaraju 84]. Not known to be elementary.

DA to multicounter automata Show that Proj(L) can be obtained as Shuf (L ) R. When Proj(L){a n b n n 0}, each class contains one a and one b to its right; i.e. Proj(L) = Shuf ({ab}) a b. Marked shuffle of n words: use n colours. When L is regular, Shuf (L) is recognized by a multicounter automaton [Gischer 81].

Counter automata Counter mechanisms in the context of unbounded data. Each event type (occurrence of a data vaue, occurrence of a letter - value pair, etc) needs its own counter. Hence we need unboundedly many counters. A restraint on counter operations: monotone counters. Can be incremented, reset or compared against constants.

The proposal An automaton model for counting multiplicity of data values. The automaton includes a bag of infinitely many monotone counters, one for each possible data value. When it encounters a letter - data pair, say (a, d), the multiplicity of d is checked against a given constraint, and accordingly updated, the transition causing a change of state, as well as possible updates for other data as well. A bag is like a hash table, with elements of D as keys, and counters as hash values. Transitions depend only on hash values (subject to constraints) and not keys.

The model A constraint is a pair c = (op, e), where op {<, =,, >} and e N. Define a bag to be a map h : D N. Inst = { +, }, the set of instructions. CC A = (Q,, I, F ), where: (Q Σ C Inst U Q)), where U is a finite subset of N.

Behaviour A configuration is a pair (q, h), where q Q and h B. The initial configuration of A is given by (q 0, h 0 ), where d D, h 0 [i](d) = 0 and q 0 I. Given a data word w = (a 1, d 1 ),... (a n, d n ), a run of A on w is a sequence γ = (q 0, h 0 )(q 1, h 1 )... (q n, h n ) such that q 0 I and for all i, 0 i < n, there exists a transition t i = (q, a, c, ι, n, q ) such that q = q i, q = q i+1, a = a i+1 and: hi (d i+1 ) = c. hi+1 is given by: { hi (d, n h i+1 = ) if ι = +, n = h i (d) + n h i (d, n) if ι = }

Example All data values under a are distinct. a, x = 0, [+1] b, x 0, [0] Σ, x 0, [0] a, x = 1, [0] q 0 q 1

Examples The language L fd(a) = Data values under a are all distinct is recognizable. The language There exists a data value appearing at least twice under a is recognizable. The language All data values under a occur at most n times is recognizable. The language There exists a data value appearing under a occurring more than n times is recognizable. The language L a,= n = All data values under a occur exactly n times is not recognizable.

Decidability Theorem The emptiness problem of class counting automata is decidable. By reduction to the covering problem for Petri nets. The decision procedure runs in Expspace, and thus we have elementary decidability. The problem is complete for Expspace, by an easy reduction the other way as well. CCA are closed under union and intersection, but not under complementation.

Extensions The model admits many possible extensions. Instead of working with one bag of counters, the automaton can use several bags of counters, much as multiple registers are used in the register automaton. We can check for the presence of any counter (in each bag) satisfying a given constraint and updating it. The language of constraints can be strengthened: any syntax that can specify semilinear sets will do. Extensions like two-way movement and alternation lead to undecidability.

A comparison No two a positions have same data value: PA, DA, CCA, FO 2, but not RA. There exist two a positions having same data value: all formalisms. For every a position, there exists a b position with the same data value: PA, DA, CCA, FO 2, but not RA. A process has to consume one given resource before requesting another: PA, DA, CCA, FO 2, but not RA. Every process requesting a resource is eventually granted: PA, DA, CCA, FO 2, but not RA. Between two successive accesses to the resource by the same user, some other process has to access it: PA, DA, RA, but not FO 2 or CCA.

Comparison Non-emptiness decidable for RA, CCA, DA and FO 2, but not PA. Inclusion decidable only for FO 2. Membership efficient for RA, CCA, PA and FO 2, but not DA. PA and FO 2 are closed under complementation, CCA, PA and DA are not.

Automata vs Logics Mostly incomparability results; better behaviour for PA than RA. No FO / MSO characterization for RA. 2-way APA = MSO; 2-way strong DPA = FO. Emptiness undecidable for weak 1-way PA.

Many questions Data words have many potential applications. Applications to verification of parametrized systems? This approach orthogonal to reachability based approaches. Ability to talk about data is very limited (no arithmetic). Find models with better complexities. Study the tradeoff between more expressive data access and complexity / decidability. Clear need for decidable automata models and logics over data words and data trees. A challenging topic with many potential applications in databases and system verification.