Exact vs approximate search. Text Algorithms (6EAP) Similarity. Problem. Edit distance (Levenshtein distance) 10/11/10. Similarity measures

Size: px
Start display at page:

Download "Exact vs approximate search. Text Algorithms (6EAP) Similarity. Problem. Edit distance (Levenshtein distance) 10/11/10. Similarity measures"

Transcription

1 Exact vs approximate search Text Algorithms (6EAP) Similarity measures Jaak Vilo 21 fall In exact search we searched for a string or set of strings in a long text There are plenty of applicakons that require approximate search Approximate matching, i.e. find those regions in a long text that are similar to the query string E.g. to find substrings of S that have edit distacne < k to query string m. Jaak Vilo MTAT.3.19 Text Algorithms 1 Problem Given P and S find all approximate occurrences of P in S Similarity How can we measure the similarity of two strings? When are the two things almost the same? Edit distance (Levenshtein distance) Smallest nr of edit operakons to convert one string into the other I N D U S T R Y I N T E R E S T I N D U S T R Y I N T E R E S T Defini.on The edit distance D(A,B) between strings A and B is the minimal number of edit operakons to change A into B. Allowed edit operakons are delekon of a single le[er, inserkon of a le[er, or replacing one le[er with another. Let A= a 1 a 2... a m and B= b 1 b 2... b m. E1: Dele.on a i ε E2: Inser.on ε b i E3: Subs.tu.on a i b j (if a i b j ) Other possible variants: E4: Transposi.on a i a i+1 b j b j+1 and a i =b j+1 ja a i+1 =b j (e.g. lecture letcure) 1

2 How can we calculate this? How can we calculate this efficiently? α β a b D(αa, βb) = min 1. D(α, β) if a=b 2. D(α, β)+1 if a b 3. D(αa, β)+1 4. D(α, βb)+1 D(S,T) = d(i,j) = min Define: d(i,j) = D( S[1..i], T[1..j] ) min 1. D(S[1..n-1], T[1..m-1] ) + (S[n]=T[m])? : 1 2. D(S[1..n], T[1..m-1] ) D(S[1..n-1], T[1..m] ) d(i-1,j-1) + (S[n]=T[m])? : 1 2. d(i, j-1) d(i-1, j) +1 Recursion Recursion? F()=1 F(1)=1 F(n) = F(n- 1)+F(n- 2) 1,1,2,3,5,8, sub fib(int x) if (x<3) return 1; else return fib(x-1)+fib(x-2); i x d(i-1,j-1) d(i,j-1) j y d(i-1,j) d(i,j) Recursion? Algorithm Edit distance D(A,B) using Dynamic Programming (DP) i m j n n d(i-1,j-1) d(i-1,j) d(i,j-1) d(i,j) Input: A=a 1 a 2...a n, B=b 1 b 2...b m Output: Value d mn in matrix (d ij ), i m, j n. for i= to m do d i =i ; for j= to n do d j =j ; for j=1 to n do for i=1 to m do return d mn d ij = min( d i- 1, j- 1 + (if a i ==b j then else 1), d i- 1, j + 1, d i, j ) 2

3 Dynamic Programming i x m y n n d(i-1,j-1) d(i-1,j) d(i,j-1) j d(i,j) Edit distance is a metric It can be shown, that D(A,B) is a metric D(A,B), D(A,B)= iff A=B D(A,B) = D(B,A) D(A,C) D(A,B) + D(B,C) indust-r-y- in---terest Alignment Path of edit operakons OpKmal solukon can be calculated aterwards Quite typical in dynamic programming Memorize sets pred[i,j] depending from where the d ij was reached. 3

4 Three possible minimizing paths Add into pred[i,j] (i- 1,j- 1) if d ij = d i- 1,j- 1 + (if a i ==b j then else 1) (i- 1,j) if d ij = d i- 1,j + 1 (i,j- 1) if d ij = d i,j The path (in reverse order) ε c 6, b 5 b 5, c 4 c 4, a 3 a 3, a 2 b 2, b 1 a 1. MulKple paths possible All paths are correct There can be many (how many?) paths What are the other queskons in edit distance caclulakons? Space complexity Time complexity Other ways to look at the algorithm(s) ApplicaKons More complex nokons of similarity Space can be reduced CalculaKon of D(A,B) in space Θ(m) Input: A=a 1 a 2...a m, B=b 1 b 2...b n (choose m<=n) Output: d mn =D(A,B) for i= to m do C[i]=i for j=1 to n do C=C[]; C[]=j; for i=1 to m do d = min( C + (if a i ==b j then else 1), C[i- 1] + 1, C[i] + 1 ) C = C[i] // memorize new diagonal value C[i] = d write C[m] Time complexity is Θ(mn) since C[..m] is filled n Kmes 4

5 Shortest path in the graph h[p://en.wikipedia.org/wiki/shortest_path_problem Shortest path in the graph h[p://en.wikipedia.org/wiki/shortest_path_problem All nodes at distance 1 from source Edit distance = shortest path in ObservaKons? Shortest path is close to the diagonal If a short distance path exists Values along any diagonal can only increase (by at most 1) Diagonal Diagonal lemma Property of any diagonal: The values of matrix (d ij ) can on any specific diagonal either increase by 1 or stay the same Lemma: For each d ij, 1 i m, 1 j n holds: d ij =d i- 1,j- 1 or d ij = d i- 1,j (nokce that d ij and d i- 1,j- 1 are on the same diagonal) Proof: Since d ij is an integer, show: d ij d i- 1, j d ij d i- 1,j- 1 From the definikon of edit distance 1. holds since d ij d i- 1, j Diagonal nr. 2, d 2, d 13, d 24, d 35, d 46 Diagonal k, -m k n, s.t. diagonal k contains only d ij where j-i = k. InducKon on i+j : Basis is trivial when i= or j= (if we agree that d - 1, j- 1 =d j ) InducKon step: there are 3 possibilikes - On minimizakon the d ij is calculated from entry d i- 1, j- 1, or d ij d i- 1,j- 1 On minimizakon the d ij is calculated from entry d i- 1, j, or d ij =d i- 1,j +1 d i- 2,j d i- 1,j- 1 On minimizakon the d ij is calculated from entry d i,j- 1. Analogical to 2. Hence, d i- 1,j- 1 d ij 5

6 Transform the matrix into f kp For each diagonal only show the posikon (row index) where the value is increased by 1. Also, one can restrict the matrix (d ij ) to only this part where d ij d mn since only those d ij can be on the shortest path. We'll use the matrix (f kp ) that represents the diagonals of d ij f kp is a row index i from d ij, such that on diagonal k the value p reaches row i (d ij =p and j- i=k). IniKalizaKon: f,- 1 =- 1 and f kp =- when p k - 1 ; d mn = p, such that f n- m,p =m Calcula.ng matrix (f kp ) by columns Assume the column p- 1 has been calculated in (f kp ), and we want to calculate f kp. (the region of d ij =p) On diagonal k values p reach at least the row t= max ( f k,p- 1 +1, f k- 1,p- 1, f k+1,p- 1 +1) if the diagonal k reaches so far. If on row t+1 addikonally a i = b j on the same diagonal, then d ij cannot increase, and value p reaches row t+1. Repeat previous step unkl a i b j on diagonal k. f k,p same diagonal f k- 1,p- 1 - diagonal below f k+1,p diagonal above Algorithm A(): calculate f kp A(k,p) 1. t = max( f k,p- 1 +1, f k- 1,p- 1, f k+1,p- 1 +1) 2. while a t+1 == b t+1+k do t = t+1 3. f kp = if t>m or t+k >n then undefined else t f,2 t= max (3,2,3) = 2 A[3]=B[3], A[5]=B[5] => f(,2) = 5 f(1,2) t 6

7 Algoritm: Diagonal method by columns p= - 1 while f n- m,p m p=p+1 for k= - p to p do // f kp = A(k,p) t = max( f k,p- 1 +1, f k- 1,p- 1, f k+1,p- 1 +1) while a t+1 = b t+1+k do t = t+1 f kp = if t>m or t+k >n then undefined else t p can only ocur on diagonals - p k p. Method can be improved since k is oten such that f kp is undefined. We can decrease values of k: - m k n (diagonal numbers) Let m n and d ij on diagonal k. if - m k then k d ij m if 1 k n then k d ij k+m Hence, - m k m if p m and p- m k p if p m Some notes In applicakons small D(A,B) are most intereskng. Can modify the algorithm by providing maximum t Hence, O(tm) - the smaller the t, the faster the algorithm. Space can be reduced by keeping only previous column How to output the shortest path? RelaKvely simple, in Kme O(s), outputs a single path. Extensions to basic edit distance New operakons Variable costs TransposiKon (ab ba) E4: Transposi.on a i a i+1 b j b j+1, s.t. a i =b j+1 and a i+1 =b j (e.g.: lecture letcure ) d(i,j) = min 1. d(i-1,j-1) + (S[n]=T[m])? : 1 2. d(i, j-1) d(i-1, j) d(i-2,j-2) + ( if S[i-1,i] = T[j,j-1] then 1 else ) 7

8 Longest common subsequences The edit distance algorithm can be changed easily Space efficiency can also be achieved using 2 last columns, hence skll O(m) Diagonal method can be modified, since the diagonal lemma holds. Algorithms can be modified in a relakvely straigh orward manner Defini.on. String C=c 1 c 2...c r is a subsequence (alamjada) of A=a 1 a 2...a m if by removing from A null or more characters, one can get C. String C=c 1 c 2...c r is the longest common subsequence, LCS (pikim ühine alamjada) of A=a 1 a 2...a m and B=b 1 b 2...b n if C is the longest string that is both the subsequence of A and B. C=LCS(A,B) The length of LCS(A,B), C, can be used as the similarity measure for A and B. LCS(A,B) can be calculated similarly to edit distance LCS(A,B) = ( A + B - D'(A,B) )/2 Let D'(A,B) the edit distance where the only allowed opera.ons are inserkon and delekon (no replace). Theorem a) LCS(A,B) = ( A + B - D'(A,B) )/2 b) Lets have two sets D'(A,B) with opkmal nr of changes: 1. a i1 ε, a i2 ε,... a ip ε dele.ons from A and 2. ε b j1, ε b j2,... ε b jr inser.ons into B. Then LCS(A,B)=C can be constructed such that, C is A ater delekons of 1. and C is B ater delekon of all inserkons 2. (inserkons in 2. are reversely delekons from B). Proof b) According to construckon, C is uniquely defined C is a subsequence of A as well as B. If C was not the longest, then there would be C', C < C' s.t. C'=LCS(A,B). But then D'(A,B) A - C' + B - C' < A - C + B - C = D'(A,B), which is a contradickon. Hence, C=LCS(A,B). Proof a) According to b) LCS(A,B) = A - p and LCS(A,B) = B - r, or 2 LCS(A,B) = A + B - (p+r)= A + B - D'(A,B). Example. LCS(england,inglismaa)=ngla. D'(england,inglismaa)=8, ngla =4=(7+9-8)/2. Diagonal lemma holds, but the increase always occurs by two. Time complexity O(mn), with diag. method O (min(s,m) s) where s=d'(a,b), m= A, n= B. There exists other algorithms for LCS (e.g. Hunt- Szymanski) Unix command diff compares files row by row and searches the deviakons from the LCS of the two files. 8

9 Generalized edit distance Use more operakons E1...En, and to provide different costs to each. Defini.on. Let x, y Σ *. Then every x y is an edit operakon. Edit operakon replaces x by y. If A=uxv then ater the operakon, A=uyv We note by w(x y) the cost or weight of the operakon. Cost may depend on x and/or y. But we assume w(x y). Generalized edit distance If operakons can only be applied in parallel, i.e. the part already changed cannot be modified again, then we can use the dynamic programming. Otherwise it is an algorithmically unsolvable problem, since queskon - can A be transformed into B using operakons of G, is unsolvable. The diagonal method in general may not be applicable. But, since each diversion from diagonal, the cost slightly increases, one can stay within the narrow region around the diagonal. Applica.ons of generalized edit distance Examples Historic documents, names Human language and dialects TransliteraKon rules from one alphabet to another e.g. Tõugu => Tyugu (via Russian)... 9

10 näituseks näiteks Ahwrika - Aafrika weikese - väikese materjaali - materjali tuseks -> teks a -> aa, hw -> f w -> v, e -> ä aa -> a How? Apply Aho- Corasick to match for all possible edit operakons Use minimum over all possible such operakons and costs kavalam otsimine Dush, dušš, dushsh? Gorbatšov, Gorbatshov, Горбачов, Gorbachev režiim, rezhiim, riim ImplementaKon: Reina Käärik Possible problems/tasks Manually create sensible lists of operakons For English, Russian, etc Old language, Improve the speed of the algorithm (teskng) Train for automa.c extrac.on of edit opera.ons and respec.ve costs from examples of matching words 1

The Hunt-Szymanski Algorithm for LCS

The Hunt-Szymanski Algorithm for LCS Department of Mathematics and Computer Science January 12, 2017 University of Southern Denmark RF The Hunt-Szymanski Algorithm for LCS In 1977, James W. Hunt and Thomas G. Szymanski published an algorithm

More information

CS473-Algorithms I. Lecture 10. Dynamic Programming. Cevdet Aykanat - Bilkent University Computer Engineering Department

CS473-Algorithms I. Lecture 10. Dynamic Programming. Cevdet Aykanat - Bilkent University Computer Engineering Department CS473-Algorithms I Lecture 1 Dynamic Programming 1 Introduction An algorithm design paradigm like divide-and-conquer Programming : A tabular method (not writing computer code) Divide-and-Conquer (DAC):

More information

Greedy Algorithms. Algorithms

Greedy Algorithms. Algorithms Greedy Algorithms Algorithms Greedy Algorithms Many algorithms run from stage to stage At each stage, they make a decision based on the information available A Greedy algorithm makes decisions At each

More information

Longest Common Subsequences and Substrings

Longest Common Subsequences and Substrings Longest Common Subsequences and Substrings Version November 5, 2014 Version November 5, 2014 Longest Common Subsequences and Substrings 1 / 16 Longest Common Subsequence Given two sequences X = (x 1, x

More information

Lecture 13: Chain Matrix Multiplication

Lecture 13: Chain Matrix Multiplication Lecture 3: Chain Matrix Multiplication CLRS Section 5.2 Revised April 7, 2003 Outline of this Lecture Recalling matrix multiplication. The chain matrix multiplication problem. A dynamic programming algorithm

More information

Lecture 9: Core String Edits and Alignments

Lecture 9: Core String Edits and Alignments Biosequence Algorithms, Spring 2005 Lecture 9: Core String Edits and Alignments Pekka Kilpeläinen University of Kuopio Department of Computer Science BSA Lecture 9: String Edits and Alignments p.1/30 III:

More information

Dynamic Programming II

Dynamic Programming II June 9, 214 DP: Longest common subsequence biologists often need to find out how similar are 2 DNA sequences DNA sequences are strings of bases: A, C, T and G how to define similarity? DP: Longest common

More information

Mouse, Human, Chimpanzee

Mouse, Human, Chimpanzee More Alignments 1 Mouse, Human, Chimpanzee Mouse to Human Chimpanzee to Human 2 Mouse v.s. Human Chromosome X of Mouse to Human 3 Local Alignment Given: two sequences S and T Find: substrings of S and

More information

Recursive-Fib(n) if n=1 or n=2 then return 1 else return Recursive-Fib(n-1)+Recursive-Fib(n-2)

Recursive-Fib(n) if n=1 or n=2 then return 1 else return Recursive-Fib(n-1)+Recursive-Fib(n-2) Dynamic Programming Any recursive formula can be directly translated into recursive algorithms. However, sometimes the compiler will not implement the recursive algorithm very efficiently. When this is

More information

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Dynamic Programming

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Dynamic Programming Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 25 Dynamic Programming Terrible Fibonacci Computation Fibonacci sequence: f = f(n) 2

More information

Text Algorithms (6EAP)

Text Algorithms (6EAP) Text Algorithms (6EAP) Approximate Matching Jaak Vilo 2017 fall Jaak Vilo MTAT.03.190 Text Algorithms 1 Exact vs approximate search In exact search we searched for a string or set of strings in a long

More information

CMPS 2200 Fall Dynamic Programming. Carola Wenk. Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk

CMPS 2200 Fall Dynamic Programming. Carola Wenk. Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk CMPS 00 Fall 04 Dynamic Programming Carola Wenk Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk 9/30/4 CMPS 00 Intro. to Algorithms Dynamic programming Algorithm design technique

More information

We ve done. Now. Next

We ve done. Now. Next We ve done Matroid Theory Task scheduling problem (another matroid example) Dijkstra s algorithm (another greedy example) Dynamic Programming Now Matrix Chain Multiplication Longest Common Subsequence

More information

Special course in Computer Science: Advanced Text Algorithms

Special course in Computer Science: Advanced Text Algorithms Special course in Computer Science: Advanced Text Algorithms Lecture 6: Alignments Elena Czeizler and Ion Petre Department of IT, Abo Akademi Computational Biomodelling Laboratory http://www.users.abo.fi/ipetre/textalg

More information

Least Squares; Sequence Alignment

Least Squares; Sequence Alignment Least Squares; Sequence Alignment 1 Segmented Least Squares multi-way choices applying dynamic programming 2 Sequence Alignment matching similar words applying dynamic programming analysis of the algorithm

More information

Lecture Notes: Euclidean Traveling Salesman Problem

Lecture Notes: Euclidean Traveling Salesman Problem IOE 691: Approximation Algorithms Date: 2/6/2017, 2/8/2017 ecture Notes: Euclidean Traveling Salesman Problem Instructor: Viswanath Nagarajan Scribe: Miao Yu 1 Introduction In the Euclidean Traveling Salesman

More information

y j LCS-Length(X,Y) Running time: O(st) set c[i,0] s and c[0,j] s to 0 for i=1 to s for j=1 to t if x i =y j then else if

y j LCS-Length(X,Y) Running time: O(st) set c[i,0] s and c[0,j] s to 0 for i=1 to s for j=1 to t if x i =y j then else if Recursive solution for finding LCS of X and Y if x s =y t, then find an LCS of X s-1 and Y t-1, and then append x s =y t to this LCS if x s y t, then solve two subproblems: (1) find an LCS of X s-1 and

More information

Dynamic Programming 1

Dynamic Programming 1 Dynamic Programming 1 Jie Wang University of Massachusetts Lowell Department of Computer Science 1 I thank Prof. Zachary Kissel of Merrimack College for sharing his lecture notes with me; some of the examples

More information

CS 231: Algorithmic Problem Solving

CS 231: Algorithmic Problem Solving CS 231: Algorithmic Problem Solving Naomi Nishimura Module 5 Date of this version: June 14, 2018 WARNING: Drafts of slides are made available prior to lecture for your convenience. After lecture, slides

More information

12 Edit distance. Summer Term 2011

12 Edit distance. Summer Term 2011 12 Edit distance Summer Term 2011 Jan-Georg Smaus Problem: similarity of strings Edit distance For two strings A and B, compute, as efficiently as possible, the edit distance D(A,B) and a minimal sequence

More information

Lecture 8. Dynamic Programming

Lecture 8. Dynamic Programming Lecture 8. Dynamic Programming T. H. Cormen, C. E. Leiserson and R. L. Rivest Introduction to Algorithms, 3rd Edition, MIT Press, 2009 Sungkyunkwan University Hyunseung Choo choo@skku.edu Copyright 2000-2018

More information

Lectures by Volker Heun, Daniel Huson and Knut Reinert, in particular last years lectures

Lectures by Volker Heun, Daniel Huson and Knut Reinert, in particular last years lectures 4 FastA and the chaining problem We will discuss: Heuristics used by the FastA program for sequence alignment Chaining problem 4.1 Sources for this lecture Lectures by Volker Heun, Daniel Huson and Knut

More information

Chain Matrix Multiplication

Chain Matrix Multiplication Chain Matrix Multiplication Version of November 5, 2014 Version of November 5, 2014 Chain Matrix Multiplication 1 / 27 Outline Outline Review of matrix multiplication. The chain matrix multiplication problem.

More information

Dynamic Programming CS 445. Example: Floyd Warshll Algorithm: Computing all pairs shortest paths

Dynamic Programming CS 445. Example: Floyd Warshll Algorithm: Computing all pairs shortest paths CS 44 Dynamic Programming Some of the slides are courtesy of Charles Leiserson with small changes by Carola Wenk Example: Floyd Warshll lgorithm: Computing all pairs shortest paths Given G(V,E), with weight

More information

Sequence Alignment. Ulf Leser

Sequence Alignment. Ulf Leser Sequence Alignment Ulf Leser his Lecture Approximate String Matching Edit distance and alignment Computing global alignments Local alignment Ulf Leser: Bioinformatics, Summer Semester 2016 2 ene Function

More information

Glynda, the good witch of the North

Glynda, the good witch of the North Strings and Languages It is always best to start at the beginning -- Glynda, the good witch of the North What is a Language? A language is a set of strings made of of symbols from a given alphabet. An

More information

Longest Common Subsequence

Longest Common Subsequence .. CSC 448 Bioinformatics Algorithms Alexander Dekhtyar.. Dynamic Programming for Bioinformatics... Longest Common Subsequence Subsequence. Given a string S = s 1 s 2... s n, a subsequence of S is any

More information

Dynamic Programming. Lecture Overview Introduction

Dynamic Programming. Lecture Overview Introduction Lecture 12 Dynamic Programming 12.1 Overview Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n 2 ) or O(n 3 ) for which a naive approach

More information

Algorithms Dr. Haim Levkowitz

Algorithms Dr. Haim Levkowitz 91.503 Algorithms Dr. Haim Levkowitz Fall 2007 Lecture 4 Tuesday, 25 Sep 2007 Design Patterns for Optimization Problems Greedy Algorithms 1 Greedy Algorithms 2 What is Greedy Algorithm? Similar to dynamic

More information

FastA & the chaining problem

FastA & the chaining problem FastA & the chaining problem We will discuss: Heuristics used by the FastA program for sequence alignment Chaining problem 1 Sources for this lecture: Lectures by Volker Heun, Daniel Huson and Knut Reinert,

More information

FastA and the chaining problem, Gunnar Klau, December 1, 2005, 10:

FastA and the chaining problem, Gunnar Klau, December 1, 2005, 10: FastA and the chaining problem, Gunnar Klau, December 1, 2005, 10:56 4001 4 FastA and the chaining problem We will discuss: Heuristics used by the FastA program for sequence alignment Chaining problem

More information

Suffix Tree and Array

Suffix Tree and Array Suffix Tree and rray 1 Things To Study So far we learned how to find approximate matches the alignments. nd they are difficult. Finding exact matches are much easier. Suffix tree and array are two data

More information

Write an algorithm to find the maximum value that can be obtained by an appropriate placement of parentheses in the expression

Write an algorithm to find the maximum value that can be obtained by an appropriate placement of parentheses in the expression Chapter 5 Dynamic Programming Exercise 5.1 Write an algorithm to find the maximum value that can be obtained by an appropriate placement of parentheses in the expression x 1 /x /x 3 /... x n 1 /x n, where

More information

CSE 417 Dynamic Programming (pt 5) Multiple Inputs

CSE 417 Dynamic Programming (pt 5) Multiple Inputs CSE 417 Dynamic Programming (pt 5) Multiple Inputs Reminders > HW5 due Wednesday Dynamic Programming Review > Apply the steps... optimal substructure: (small) set of solutions, constructed from solutions

More information

arxiv: v3 [cs.ds] 16 Aug 2012

arxiv: v3 [cs.ds] 16 Aug 2012 Solving Cyclic Longest Common Subsequence in Quadratic Time Andy Nguyen August 17, 2012 arxiv:1208.0396v3 [cs.ds] 16 Aug 2012 Abstract We present a practical algorithm for the cyclic longest common subsequence

More information

Longest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism

Longest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism Information Processing Letters 90 (2004) 167 173 www.elsevier.com/locate/ipl Longest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism Valerio Freschi, Alessandro

More information

Single Source Shortest Path (SSSP) Problem

Single Source Shortest Path (SSSP) Problem Single Source Shortest Path (SSSP) Problem Single Source Shortest Path Problem Input: A directed graph G = (V, E); an edge weight function w : E R, and a start vertex s V. Find: for each vertex u V, δ(s,

More information

Elements of Dynamic Programming. COSC 3101A - Design and Analysis of Algorithms 8. Discovering Optimal Substructure. Optimal Substructure - Examples

Elements of Dynamic Programming. COSC 3101A - Design and Analysis of Algorithms 8. Discovering Optimal Substructure. Optimal Substructure - Examples Elements of Dynamic Programming COSC 3A - Design and Analysis of Algorithms 8 Elements of DP Memoization Longest Common Subsequence Greedy Algorithms Many of these slides are taken from Monica Nicolescu,

More information

CSE 101, Winter Design and Analysis of Algorithms. Lecture 11: Dynamic Programming, Part 2

CSE 101, Winter Design and Analysis of Algorithms. Lecture 11: Dynamic Programming, Part 2 CSE 101, Winter 2018 Design and Analysis of Algorithms Lecture 11: Dynamic Programming, Part 2 Class URL: http://vlsicad.ucsd.edu/courses/cse101-w18/ Goal: continue with DP (Knapsack, All-Pairs SPs, )

More information

Syllabus. 5. String Problems. strings recap

Syllabus. 5. String Problems. strings recap Introduction to Algorithms Syllabus Recap on Strings Pattern Matching: Knuth-Morris-Pratt Longest Common Substring Edit Distance Context-free Parsing: Cocke-Younger-Kasami Huffman Compression strings recap

More information

2.3.4 Optimal paths in directed acyclic graphs

2.3.4 Optimal paths in directed acyclic graphs .3.4 Optimal paths in directed acyclic graphs Definition: A directed graph G = (N, A) is acyclic if it contains no circuits. A directed acyclic graph is referred to as DAG. circuit Problem Given a directed

More information

DVA337 HT17 - LECTURE 4. Languages and regular expressions

DVA337 HT17 - LECTURE 4. Languages and regular expressions DVA337 HT17 - LECTURE 4 Languages and regular expressions 1 SO FAR 2 TODAY Formal definition of languages in terms of strings Operations on strings and languages Definition of regular expressions Meaning

More information

Announcements. CS243: Discrete Structures. Strong Induction and Recursively Defined Structures. Review. Example (review) Example (review), cont.

Announcements. CS243: Discrete Structures. Strong Induction and Recursively Defined Structures. Review. Example (review) Example (review), cont. Announcements CS43: Discrete Structures Strong Induction and Recursively Defined Structures Işıl Dillig Homework 4 is due today Homework 5 is out today Covers induction (last lecture, this lecture, and

More information

Regular Languages. Regular Language. Regular Expression. Finite State Machine. Accepts

Regular Languages. Regular Language. Regular Expression. Finite State Machine. Accepts Regular Languages L Regular Language Regular Expression Accepts Finite State Machine Regular Expressions The regular expressions over an alphabet are all and only the strings that can be obtained as follows:

More information

String Algorithms. CITS3001 Algorithms, Agents and Artificial Intelligence. 2017, Semester 2. CLRS Chapter 32

String Algorithms. CITS3001 Algorithms, Agents and Artificial Intelligence. 2017, Semester 2. CLRS Chapter 32 String Algorithms CITS3001 Algorithms, Agents and Artificial Intelligence Tim French School of Computer Science and Software Engineering The University of Western Australia CLRS Chapter 32 2017, Semester

More information

Tutorial 6-7. Dynamic Programming and Greedy

Tutorial 6-7. Dynamic Programming and Greedy Tutorial 6-7 Dynamic Programming and Greedy Dynamic Programming Why DP? Natural Recursion may be expensive. For example, the Fibonacci: F(n)=F(n-1)+F(n-2) Recursive implementation memoryless : time= 1

More information

Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2015 1 Sequence Alignment Dannie Durand Pairwise Sequence Alignment The goal of pairwise sequence alignment is to establish a correspondence between the

More information

Assignment 8 Solution

Assignment 8 Solution McGill University COMP360 Winter 2011 Instructor: Phuong Nguyen Assignment 8 Solution Question 1 (10pt) You are given a color picture consisting of an m n array A of pixels, where each pixel specifies

More information

Solution for Homework set 3

Solution for Homework set 3 TTIC 300 and CMSC 37000 Algorithms Winter 07 Solution for Homework set 3 Question (0 points) We are given a directed graph G = (V, E), with two special vertices s and t, and non-negative integral capacities

More information

Edit Distance with Single-Symbol Combinations and Splits

Edit Distance with Single-Symbol Combinations and Splits Edit Distance with Single-Symbol Combinations and Splits Manolis Christodoulakis 1 and Gerhard Brey 2 1 School of Computing & Technology, University of East London Docklands Campus, 4 6 University Way,

More information

Introduction to Algorithms

Introduction to Algorithms Introduction to Algorithms 6.046J/18.401J LECTURE 12 Dynamic programming Longest common subsequence Optimal substructure Overlapping subproblems Prof. Charles E. Leiserson Dynamic programming Design technique,

More information

Dynamic Programming part 2

Dynamic Programming part 2 Dynamic Programming part 2 Week 7 Objectives More dynamic programming examples - Matrix Multiplication Parenthesis - Longest Common Subsequence Subproblem Optimal structure Defining the dynamic recurrence

More information

4. Suffix Trees and Arrays

4. Suffix Trees and Arrays 4. Suffix Trees and Arrays Let T = T [0..n) be the text. For i [0..n], let T i denote the suffix T [i..n). Furthermore, for any subset C [0..n], we write T C = {T i i C}. In particular, T [0..n] is the

More information

Chapter 3 Dynamic programming

Chapter 3 Dynamic programming Chapter 3 Dynamic programming 1 Dynamic programming also solve a problem by combining the solutions to subproblems. But dynamic programming considers the situation that some subproblems will be called

More information

A Revised Algorithm to find Longest Common Subsequence

A Revised Algorithm to find Longest Common Subsequence A Revised Algorithm to find Longest Common Subsequence Deena Nath 1, Jitendra Kurmi 2, Deveki Nandan Shukla 3 1, 2, 3 Department of Computer Science, Babasaheb Bhimrao Ambedkar University Lucknow Abstract:

More information

Algebraic method for Shortest Paths problems

Algebraic method for Shortest Paths problems Lecture 1 (06.03.2013) Author: Jaros law B lasiok Algebraic method for Shortest Paths problems 1 Introduction In the following lecture we will see algebraic algorithms for various shortest-paths problems.

More information

On the Parikh-de-Bruijn grid

On the Parikh-de-Bruijn grid On the Parikh-de-Bruijn grid Péter Burcsi Zsuzsanna Lipták W. F. Smyth ELTE Budapest (Hungary), U of Verona (Italy), McMaster U (Canada) & Murdoch U (Australia) Words & Complexity 2018 Lyon, 19-23 Feb.

More information

Modeling web-crawlers on the Internet with random walksdecember on graphs11, / 15

Modeling web-crawlers on the Internet with random walksdecember on graphs11, / 15 Modeling web-crawlers on the Internet with random walks on graphs December 11, 2014 Modeling web-crawlers on the Internet with random walksdecember on graphs11, 2014 1 / 15 Motivation The state of the

More information

Dynamic Programming. An Enumeration Approach. Matrix Chain-Products. Matrix Chain-Products (not in book)

Dynamic Programming. An Enumeration Approach. Matrix Chain-Products. Matrix Chain-Products (not in book) Matrix Chain-Products (not in book) is a general algorithm design paradigm. Rather than give the general structure, let us first give a motivating example: Matrix Chain-Products Review: Matrix Multiplication.

More information

Section 1.7 Sequences, Summations Cardinality of Infinite Sets

Section 1.7 Sequences, Summations Cardinality of Infinite Sets Section 1.7 Sequences, Summations Cardinality of Infinite Sets Definition: A sequence is a function from a subset of the natural numbers (usually of the form {0, 1, 2,... } to a set S. Note: the sets and

More information

Decision Properties for Context-free Languages

Decision Properties for Context-free Languages Previously: Decision Properties for Context-free Languages CMPU 240 Language Theory and Computation Fall 2018 Context-free languages Pumping Lemma for CFLs Closure properties for CFLs Today: Assignment

More information

Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost

Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R

More information

CSE 101- Winter 18 Discussion Section Week 6

CSE 101- Winter 18 Discussion Section Week 6 CSE 101- Winter 18 Discussion Section Week 6 Administrative Introducing 1:1 Sessions: https://docs.google.com/spreadsheets/d/1kgxt_rzbzlibbdijiczs_ o1wxdwa9hhvxccprn8_bwk/edit?usp=sharing Please see the

More information

from notes written mostly by Dr. Carla Savage: All Rights Reserved

from notes written mostly by Dr. Carla Savage: All Rights Reserved CSC 505, Fall 2000: Week 9 Objectives: learn about various issues related to finding shortest paths in graphs learn algorithms for the single-source shortest-path problem observe the relationship among

More information

Memoization/Dynamic Programming. The String reconstruction problem. CS124 Lecture 11 Spring 2018

Memoization/Dynamic Programming. The String reconstruction problem. CS124 Lecture 11 Spring 2018 CS124 Lecture 11 Spring 2018 Memoization/Dynamic Programming Today s lecture discusses memoization, which is a method for speeding up algorithms based on recursion, by using additional memory to remember

More information

Exact String Matching. The Knuth-Morris-Pratt Algorithm

Exact String Matching. The Knuth-Morris-Pratt Algorithm Exact String Matching The Knuth-Morris-Pratt Algorithm Outline for Today The Exact Matching Problem A simple algorithm Motivation for better algorithms The Knuth-Morris-Pratt algorithm The Exact Matching

More information

Divide and Conquer. Bioinformatics: Issues and Algorithms. CSE Fall 2007 Lecture 12

Divide and Conquer. Bioinformatics: Issues and Algorithms. CSE Fall 2007 Lecture 12 Divide and Conquer Bioinformatics: Issues and Algorithms CSE 308-408 Fall 007 Lecture 1 Lopresti Fall 007 Lecture 1-1 - Outline MergeSort Finding mid-point in alignment matrix in linear space Linear space

More information

DESIGN AND ANALYSIS OF ALGORITHMS (DAA 2017)

DESIGN AND ANALYSIS OF ALGORITHMS (DAA 2017) DESIGN AND ANALYSIS OF ALGORITHMS (DAA 2017) Veli Mäkinen Design and Analysis of Algorithms 2017 week 4 11.8.2017 1 Dynamic Programming Week 4 2 Design and Analysis of Algorithms 2017 week 4 11.8.2017

More information

String Matching. Pedro Ribeiro 2016/2017 DCC/FCUP. Pedro Ribeiro (DCC/FCUP) String Matching 2016/ / 42

String Matching. Pedro Ribeiro 2016/2017 DCC/FCUP. Pedro Ribeiro (DCC/FCUP) String Matching 2016/ / 42 String Matching Pedro Ribeiro DCC/FCUP 2016/2017 Pedro Ribeiro (DCC/FCUP) String Matching 2016/2017 1 / 42 On this lecture The String Matching Problem Naive Algorithm Deterministic Finite Automata Knuth-Morris-Pratt

More information

Implementation of Relational Operations

Implementation of Relational Operations Implementation of Relational Operations Module 4, Lecture 1 Database Management Systems, R. Ramakrishnan 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset of rows

More information

4 Dynamic Programming

4 Dynamic Programming 4 Dynamic Programming Dynamic Programming is a form of recursion. In Computer Science, you have probably heard the tradeoff between Time and Space. There is a trade off between the space complexity and

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 16 Dynamic Programming Least Common Subsequence Saving space Adam Smith Least Common Subsequence A.k.a. sequence alignment edit distance Longest Common Subsequence

More information

Elementary Recursive Function Theory

Elementary Recursive Function Theory Chapter 6 Elementary Recursive Function Theory 6.1 Acceptable Indexings In a previous Section, we have exhibited a specific indexing of the partial recursive functions by encoding the RAM programs. Using

More information

PDA s. and Formal Languages. Automata Theory CS 573. Outline of equivalence of PDA s and CFG s. (see Theorem 5.3)

PDA s. and Formal Languages. Automata Theory CS 573. Outline of equivalence of PDA s and CFG s. (see Theorem 5.3) CS 573 Automata Theory and Formal Languages Professor Leslie Lander Lecture # 20 November 13, 2000 Greibach Normal Form (GNF) Sheila Greibach s normal form (GNF) for a CFG is one where EVERY production

More information

Section 2.4 Sequences and Summations

Section 2.4 Sequences and Summations Section 2.4 Sequences and Summations Definition: A sequence is a function from a subset of the natural numbers (usually of the form {0, 1, 2,... } to a set S. Note: the sets and {0, 1, 2, 3,..., k} {1,

More information

Dynamic Programming. Design and Analysis of Algorithms. Entwurf und Analyse von Algorithmen. Irene Parada. Design and Analysis of Algorithms

Dynamic Programming. Design and Analysis of Algorithms. Entwurf und Analyse von Algorithmen. Irene Parada. Design and Analysis of Algorithms Entwurf und Analyse von Algorithmen Dynamic Programming Overview Introduction Example 1 When and how to apply this method Example 2 Final remarks Introduction: when recursion is inefficient Example: Calculation

More information

GraphBLAS Mathematics - Provisional Release 1.0 -

GraphBLAS Mathematics - Provisional Release 1.0 - GraphBLAS Mathematics - Provisional Release 1.0 - Jeremy Kepner Generated on April 26, 2017 Contents 1 Introduction: Graphs as Matrices........................... 1 1.1 Adjacency Matrix: Undirected Graphs,

More information

Languages and Strings. Chapter 2

Languages and Strings. Chapter 2 Languages and Strings Chapter 2 Let's Look at Some Problems int alpha, beta; alpha = 3; beta = (2 + 5) / 10; (1) Lexical analysis: Scan the program and break it up into variable names, numbers, etc. (2)

More information

QED Q: Why is it called the triangle inequality? A: Analogue with euclidean distance in the plane: picture Defn: Minimum Distance of a code C:

QED Q: Why is it called the triangle inequality? A: Analogue with euclidean distance in the plane: picture Defn: Minimum Distance of a code C: Lecture 3: Lecture notes posted online each week. Recall Defn Hamming distance: for words x = x 1... x n, y = y 1... y n of the same length over the same alphabet, d(x, y) = {1 i n : x i y i } i.e., d(x,

More information

CpSc 421 Final Solutions

CpSc 421 Final Solutions CpSc 421 Final Solutions Do any eight of the ten problems below. If you attempt more than eight problems, please indicate which ones to grade (otherwise we will make a random choice). This allows you to

More information

CMSC351 - Fall 2014, Homework #4

CMSC351 - Fall 2014, Homework #4 CMSC351 - Fall 2014, Homework #4 Due: November 14th at the start of class PRINT Name: Grades depend on neatness and clarity. Write your answers with enough detail about your approach and concepts used,

More information

17 dicembre Luca Bortolussi SUFFIX TREES. From exact to approximate string matching.

17 dicembre Luca Bortolussi SUFFIX TREES. From exact to approximate string matching. 17 dicembre 2003 Luca Bortolussi SUFFIX TREES From exact to approximate string matching. An introduction to string matching String matching is an important branch of algorithmica, and it has applications

More information

Computer Science 236 Fall Nov. 11, 2010

Computer Science 236 Fall Nov. 11, 2010 Computer Science 26 Fall Nov 11, 2010 St George Campus University of Toronto Assignment Due Date: 2nd December, 2010 1 (10 marks) Assume that you are given a file of arbitrary length that contains student

More information

Two Dimensional Dictionary Matching

Two Dimensional Dictionary Matching Two Dimensional Dictionary Matching Amihood Amir Martin Farach Georgia Tech DIMACS September 10, 1992 Abstract Most traditional pattern matching algorithms solve the problem of finding all occurrences

More information

VERTEX MAPS FOR TREES: ALGEBRA AND PERIODS OF PERIODIC ORBITS. Chris Bernhardt

VERTEX MAPS FOR TREES: ALGEBRA AND PERIODS OF PERIODIC ORBITS. Chris Bernhardt VERTEX MAPS FOR TREES: ALGEBRA AND PERIODS OF PERIODIC ORBITS CHRIS BERNHARDT Abstract. Let T be a tree with n vertices. Let f : T T be continuous and suppose that the n vertices form a periodic orbit

More information

More Dynamic Programming

More Dynamic Programming CS 374: Algorithms & Models of Computation, Fall 2015 More Dynamic Programming Lecture 12 October 8, 2015 Chandra & Manoj (UIUC) CS374 1 Fall 2015 1 / 43 What is the running time of the following? Consider

More information

External Memory. Philip Bille

External Memory. Philip Bille External Memory Philip Bille Outline Computationals models Modern computers (word) RAM I/O Cache-oblivious Shortest path in implicit grid graphs RAM algorithm I/O algorithms Cache-oblivious algorithm Computational

More information

IN101: Algorithmic techniques Vladimir-Alexandru Paun ENSTA ParisTech

IN101: Algorithmic techniques Vladimir-Alexandru Paun ENSTA ParisTech IN101: Algorithmic techniques Vladimir-Alexandru Paun ENSTA ParisTech License CC BY-NC-SA 2.0 http://creativecommons.org/licenses/by-nc-sa/2.0/fr/ Outline Previously on IN101 Python s anatomy Functions,

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms Group Members: 1. Geng Xue (A0095628R) 2. Cai Jingli (A0095623B) 3. Xing Zhe (A0095644W) 4. Zhu Xiaolu (A0109657W) 5. Wang Zixiao (A0095670X) 6. Jiao Qing (A0095637R) 7. Zhang

More information

Umans Complexity Theory Lectures

Umans Complexity Theory Lectures Introduction Umans Complexity Theory Lectures Lecture 5: Boolean Circuits & NP: - Uniformity and Advice, - NC hierarchy Power from an unexpected source? we know P EXP, which implies no polytime algorithm

More information

Lectures 12 and 13 Dynamic programming: weighted interval scheduling

Lectures 12 and 13 Dynamic programming: weighted interval scheduling Lectures 12 and 13 Dynamic programming: weighted interval scheduling COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures 12-13: Dynamic Programming 1 Overview Last week: Graph

More information

Parallel Longest Increasing Subsequences in Scalable Time and Memory

Parallel Longest Increasing Subsequences in Scalable Time and Memory Parallel Longest Increasing Subsequences in Scalable Time and Memory Peter Krusche Alexander Tiskin Department of Computer Science University of Warwick, Coventry, CV4 7AL, UK PPAM 2009 What is in this

More information

Dynamic Programming: 1D Optimization. Dynamic Programming: 2D Optimization. Fibonacci Sequence. Crazy 8 s. Edit Distance

Dynamic Programming: 1D Optimization. Dynamic Programming: 2D Optimization. Fibonacci Sequence. Crazy 8 s. Edit Distance Dynamic Programming: 1D Optimization Fibonacci Sequence To efficiently calculate F [x], the xth element of the Fibonacci sequence, we can construct the array F from left to right (or bottom up ). We start

More information

Lecture 5: Data Streaming Algorithms

Lecture 5: Data Streaming Algorithms Great Ideas in Theoretical Computer Science Summer 2013 Lecture 5: Data Streaming Algorithms Lecturer: Kurt Mehlhorn & He Sun In the data stream scenario, the input arrive rapidly in an arbitrary order,

More information

Primal Dual Schema Approach to the Labeling Problem with Applications to TSP

Primal Dual Schema Approach to the Labeling Problem with Applications to TSP 1 Primal Dual Schema Approach to the Labeling Problem with Applications to TSP Colin Brown, Simon Fraser University Instructor: Ramesh Krishnamurti The Metric Labeling Problem has many applications, especially

More information

4. Suffix Trees and Arrays

4. Suffix Trees and Arrays 4. Suffix Trees and Arrays Let T = T [0..n) be the text. For i [0..n], let T i denote the suffix T [i..n). Furthermore, for any subset C [0..n], we write T C = {T i i C}. In particular, T [0..n] is the

More information

Algorithms for Data Science

Algorithms for Data Science Algorithms for Data Science CSOR W4246 Eleni Drinea Computer Science Department Columbia University Thursday, October 1, 2015 Outline 1 Recap 2 Shortest paths in graphs with non-negative edge weights (Dijkstra

More information

York University CSE 2001 Unit 4.0 Context Free Grammars and Parsers and Context Sensitive Grammars Instructor: Jeff Edmonds

York University CSE 2001 Unit 4.0 Context Free Grammars and Parsers and Context Sensitive Grammars Instructor: Jeff Edmonds York University CSE 2001 Unit 4.0 Context Free Grammars and Parsers and Context Sensitive Grammars Instructor: Jeff Edmonds Don t cheat by looking at these answers prematurely. 1. Consider the following

More information

9.1 Cook-Levin Theorem

9.1 Cook-Levin Theorem CS787: Advanced Algorithms Scribe: Shijin Kong and David Malec Lecturer: Shuchi Chawla Topic: NP-Completeness, Approximation Algorithms Date: 10/1/2007 As we ve already seen in the preceding lecture, two

More information

15-451/651: Design & Analysis of Algorithms January 26, 2015 Dynamic Programming I last changed: January 28, 2015

15-451/651: Design & Analysis of Algorithms January 26, 2015 Dynamic Programming I last changed: January 28, 2015 15-451/651: Design & Analysis of Algorithms January 26, 2015 Dynamic Programming I last changed: January 28, 2015 Dynamic Programming is a powerful technique that allows one to solve many different types

More information