Institut for Matematik og Datalogi 22. januar 2017 Syddansk Universitet, Odense. The Prefix Table

Similar documents
Searching a Sorted Set of Strings

Write an algorithm to find the maximum value that can be obtained by an appropriate placement of parentheses in the expression

Solution to Problem 1 of HW 2. Finding the L1 and L2 edges of the graph used in the UD problem, using a suffix array instead of a suffix tree.

1. Suppose you are given a magic black box that somehow answers the following decision problem in polynomial time:

MAT 145: PROBLEM SET 4

(Refer Slide Time: 1:27)

Loop Optimizations. Outline. Loop Invariant Code Motion. Induction Variables. Loop Invariant Code Motion. Loop Invariant Code Motion

Homework 4 Solutions

Conditional Elimination through Code Duplication

Advanced Combinatorial Optimization September 17, Lecture 3. Sketch some results regarding ear-decompositions and factor-critical graphs.

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

ANALYSIS OF ALGORITHMS

Lecture Notes for Chapter 2: Getting Started

Matching Theory. Figure 1: Is this graph bipartite?

6.11: An annotated flowchart is shown in Figure 6.2.

ECE608 - Chapter 16 answers

Problem Set 2 Solutions

The Hunt-Szymanski Algorithm for LCS

Directed acyclic graphs

Lecture 4: September 11, 2003

Analyze the obvious algorithm, 5 points Here is the most obvious algorithm for this problem: (LastLargerElement[A[1..n]:

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

1. Chapter 1, # 1: Prove that for all sets A, B, C, the formula

These are not polished as solutions, but ought to give a correct idea of solutions that work. Note that most problems have multiple good solutions.

Greedy Algorithms 1. For large values of d, brute force search is not feasible because there are 2 d

Assertions & Verification & Example Loop Invariants Example Exam Questions

Tutorial 6-7. Dynamic Programming and Greedy

(Refer Slide Time: 00:50)

Handout 9: Imperative Programs and State

Broadcast: Befo re 1

Verifying a Border Array in Linear Time

Assertions & Verification Example Exam Questions

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

We will show that the height of a RB tree on n vertices is approximately 2*log n. In class I presented a simple structural proof of this claim:

Greedy algorithms is another useful way for solving optimization problems.

Lecture Notes on Arrays

Simpler, Linear-time Transitive Orientation via Lexicographic Breadth-First Search

DM515 Spring 2011 Weekly Note 7

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

We will give examples for each of the following commonly used algorithm design techniques:

Commando: Solution. Solution 3 O(n): Consider two decisions i<j, we choose i instead of j if and only if : A S j S i

CHENNAI MATHEMATICAL INSTITUTE M.Sc. / Ph.D. Programme in Computer Science

Greedy Algorithms CHAPTER 16

6.856 Randomized Algorithms

Solution for Homework set 3

Problem Set 9 Solutions

Problem Set 7 Solutions

Greedy Algorithms 1 {K(S) K(S) C} For large values of d, brute force search is not feasible because there are 2 d {1,..., d}.

Exam 3 Practice Problems

Recall our recursive multiply algorithm:

4 Basics of Trees. Petr Hliněný, FI MU Brno 1 FI: MA010: Trees and Forests

1 Matching in Non-Bipartite Graphs

Assigning Meaning to Programs, Floyd. Proving Termination with Multiset Orderings

Column Generation Method for an Agent Scheduling Problem

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

A Mathematical Proof. Zero Knowledge Protocols. Interactive Proof System. Other Kinds of Proofs. When referring to a proof in logic we usually mean:

Zero Knowledge Protocols. c Eli Biham - May 3, Zero Knowledge Protocols (16)

Solutions to the Second Midterm Exam

Space Efficient Linear Time Construction of

The Shortest Path Problem. The Shortest Path Problem. Mathematical Model. Integer Programming Formulation

String Matching Algorithms

Rigidity, connectivity and graph decompositions

Lecture 2: Getting Started

2. Lecture notes on non-bipartite matching

Solutions to Problem Set 1

Matchings. Saad Mneimneh

Important separators and parameterized algorithms

Framework for Design of Dynamic Programming Algorithms

Theorem 2.9: nearest addition algorithm

Tree Isomorphism Algorithms.

8 Matroid Intersection

Searching Algorithms/Time Analysis

CSCI 5454, CU Boulder Samriti Kanwar Lecture April 2013

Image Restoration using Markov Random Fields

1 Linear programming relaxation

Enhancing The Fault-Tolerance of Nonmasking Programs

1 Introduction and Examples

Lecture Notes: Hoare Logic

COT 3100 Spring 2010 Midterm 2

Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 3.5, Page 1

Lecture 4 Searching Arrays

NEW STABILITY RESULTS FOR ADVERSARIAL QUEUING

Lecture Notes on Linear Search

Minimal comparability completions of arbitrary graphs

1 Connected components in undirected graphs

Dynamic Arrays and Amortized Analysis

THE EULER TOUR TECHNIQUE: EVALUATION OF TREE FUNCTIONS

Matching Algorithms. Proof. If a bipartite graph has a perfect matching, then it is easy to see that the right hand side is a necessary condition.

Assume you are given a Simple Linked List (i.e. not a doubly linked list) containing an even number of elements. For example L = [A B C D E F].

Practice Sheet 2 Solutions

Lecture 4: Primal Dual Matching Algorithm and Non-Bipartite Matching. 1 Primal/Dual Algorithm for weighted matchings in Bipartite Graphs

Chapter Design Techniques for Approximation Algorithms

Scribes: Romil Verma, Juliana Cook (2015), Virginia Williams, Date: May 1, 2017 and Seth Hildick-Smith (2016), G. Valiant (2017), M.

Recitation 13. Minimum Spanning Trees Announcements. SegmentLab has been released, and is due Friday, November 17. It s worth 135 points.

Maximal Independent Set

LECTURES 3 and 4: Flows and Matchings

Faster and Dynamic Algorithms For Maximal End-Component Decomposition And Related Graph Problems In Probabilistic Verification

Chordal deletion is fixed-parameter tractable

MODEL ANSWERS COMP36512, May 2016

Long cycles through prescribed vertices have the Erdős-Pósa property

Transcription:

Institut for Matematik og Datalogi 22. januar 2017 Syddansk Universitet, Odense RF The Prefix Table As a building block in the algorithm for finding the BMShift table, the following values will prove useful. For a string X of length m (and first character indexed by zero), we define the table Pref(i) of longest prefixes as follows: Pref(i) = lcp(x, X[i..(m 1)]) for 0 i m That is, Pref(i) is the length of the longest common prefix between X[i..(m 1)] and X. This is illustrated below (shaded area means matching characters): The goal of this note is to give an algorithm for finding Pref(i) for all 0 i m in time O(m). The overall idea of the algorithm is to find Pref(i) for increasing i, exploiting already found information when looking at the next i. To capture the already found information, it will be useful to consider the ending point in X of the prefix in X[j..(m 1)] corresponding to Pref(j). This ending point is j + Pref(j), as can be seen in the following figure:

Central to the algorithm is the largest such ending point found so far. We make the following definitions to describe this value and a j which generated it: g(i) = max {j + Pref(j)} (1) 0<j<i f(i) = argmax{j + Pref(j)} (2) 0<j<i Thus, g is the largest such ending point found so far (excluding the trivial knowledge that Pref(0) = m) and f is a j generating it. For increasing i, g is clearly non-decreasing. Each iteration of the algorithm increments i, which may make g grow by some value t i 0. If the iteration can be performed in O(1 + t i ) time, the total time will be O( n i=0 (1+t i)) = O(n + n i=0 t i) = O(n), since n i=0 t i is the final value of g, which cannot exceed n. We now describe the details of the algorithm carrying out that plan. For an iteration generating Pref(i), we assume for now that g > i. This situation is illustrated below (note that f < i always holds by the definition of f): 2

Note for later use that the two characters pointed out in the figure above must differ (else Pref(f) would be larger). To find Pref(i), we must compare X[i..(m 1)] and X, starting from the left. By the matching in the figure above we may as well compare X[(i f)..(m 1)] and X. We distinguish three cases: Case I: Case II: Case III: Pref(i f) > g i Pref(i f) < g i Pref(i f) = g i Note that by the definition of f, we have 1 f i 1, hence we know 1 i f i 1. Hence, Pref(i f) has been found at the current time, and (assuming correct values for f and g are maintained by the algorithm) the algorithm can therefore distinguish between the cases. Case I is illustrated below. Since the two characters pointed out differ, we can deduce from the figure that Pref(i) = g i, and that g and f do not change. Case II is illustrated below. 3

Since the two characters pointed out differ (or Pref(i f) would have been larger), we can deduce from the figure that Pref(i) = Pref(i f), and that g and f do not change. Case III is illustrated below. In this case, we can only deduce that Pref(i) g i, and we must find how much Pref(i) is longer by comparing characters in X[g..(m 1)] and X[(i g)..(m 1)] left to right. If Pref(i) ends up being increased by t i, we have made 1 + t i comparisons. The value of g has been increased by t i, and i is the new value for f. Here is a code implementing the above algorithmic idea: Pref[0] = m g = 0 f = 1 FOR i = 1 TO m-1 IF i < g AND Pref[i-f]!= g-i Pref[i] = min{pref[i-f],g-i} ELSE g = max{g,i} f = i WHILE g < m AND X[g] == X[g-f] g++ Pref[i] = g-f We claim that this algorithm maintains the following invariant at the start of each iteration of the FOR-loop: For 0 j < i, Pref[j] contains the correct value Pref(j), and g and f obeys (1) and (2), respectively. More precisely, this invariant holds at the start of the second iteration and onwards (which is in line with (1) and (2) only being meaningful for i 2). 4

We now prove the claim by induction on the FOR-loop. The base case is a little longer then usual: The first line clearly sets Pref[0] correctly to the entire string length. At the start of the first iteration, we have i = 1 and g = 0, hence the ELSE case is taken in this first iteration (note that the only use of the initialization f = 1 is to make all elements of the IF test well-defined). After the two first lines of the ELSE case, we have g = f = 1. Hence, the WHILE-loop is a simple left-to-right scan of X[1..(m 1)] and X[0..(m 1)] which correctly sets Pref[1] at the exit of that loop. Also at that exit, g and f obeys (1) and (2) for i = 2 (there is only one j in the maximizations in (1) and (2), namely j = 1). Hence, the invariant is true at the start of the second iteration of the FOR-loop. This proves the basis of the induction. For the induction step, we assume the invariant is true at the beginning of one iteration, and must prove it true at the beginning of the next iteration. If the IF branch is taken, we are in Case I or II above. The analysis for these cases shows that Pref[0] is set correctly, and that g and f does not need to change for (1) and (2) to continue to hold. Hence, the invariant is true at the start of the next iteration. If the ELSE branch is taken, we separate into two cases: Case A: Case B: Pref[i f] = g i and i < g i g Case A is exactly Case III above. By the analysis of Case III, it is correct to set f to i. The line with max does not change the value of g. The WHILEloop then performs the left-to-right scan from the analysis of Case III, and thus at exit leaves g and Pref[i] correctly set. Hence, the invariant is true at the start of the next iteration. In Case B, i g means that we know that i will be the correct value for f for the new round, since i + Pref(i) will be at least as large as the current value of g (this is true even if i = g and it turns out that Pref(i) = 0). The code starts by raising g to i, as this is the minimal new value for g, and sets f to i (correct, at argued above). As both g and f are equal to i at the start of the WHILE-loop, g-f is zero. Thus, the WHILE-loop performs a left-to-right scan of X[i..(m 1)] and X[0..(m 1)], which is the straight-forward way to find Pref(i). If the WHILE-loop runs for t i rounds, we have t i = Pref(i). Since g f = t i, Pref[i] is correctly set. Also, g is left at is correct value. Hence, the invariant is seen to be true at the start of the next iteration. 5

This proves the induction step, and hence the invariant. From the first part of the invariant, Pref[i] is correct for all i when the algorithm terminates. Hence, the algorithm is correct. For the time analysis, all iterations of the WHILE-loop will increase g. Nowhere in the code can g decrease (and it may even increase outside the WHILE-loop in Case B). As g starts out zero and cannot by larger than m (this is true for g in (1) by construction, as all values maximized over are at most m, and our invariant shows that g in the code fulfills (1)), at most m iterations of the WHILE-loop can take place in total in the algorithm. The rest of the work of the algorithm is clearly O(m). In conclusion, the algorithm runs in O(m) time. We note that in the last iteration of the FOR-loop with i equal to m, there is a reference X[m] in the third to last code line, which is a reference to a non-existing character past the end of the string. If the programming language has short-circuiting evaluation of AND (such as many languages of C-ancestry, including Java) this is not a problem (the reference will not be executed). Else, the FOR-loop should only run until m-1, and Pref[m] should be set to its value zero separately. If the programming language has short-circuiting evaluation of AND, the third line of the code initializing f can be removed (it is only there for the first execution of the IF test). 6