Exact vs approximate search. Text Algorithms (6EAP) Similarity. Problem. Edit distance (Levenshtein distance) 10/11/10. Similarity measures
|
|
- Jemimah Sanders
- 5 years ago
- Views:
Transcription
1 Exact vs approximate search Text Algorithms (6EAP) Similarity measures Jaak Vilo 21 fall In exact search we searched for a string or set of strings in a long text There are plenty of applicakons that require approximate search Approximate matching, i.e. find those regions in a long text that are similar to the query string E.g. to find substrings of S that have edit distacne < k to query string m. Jaak Vilo MTAT.3.19 Text Algorithms 1 Problem Given P and S find all approximate occurrences of P in S Similarity How can we measure the similarity of two strings? When are the two things almost the same? Edit distance (Levenshtein distance) Smallest nr of edit operakons to convert one string into the other I N D U S T R Y I N T E R E S T I N D U S T R Y I N T E R E S T Defini.on The edit distance D(A,B) between strings A and B is the minimal number of edit operakons to change A into B. Allowed edit operakons are delekon of a single le[er, inserkon of a le[er, or replacing one le[er with another. Let A= a 1 a 2... a m and B= b 1 b 2... b m. E1: Dele.on a i ε E2: Inser.on ε b i E3: Subs.tu.on a i b j (if a i b j ) Other possible variants: E4: Transposi.on a i a i+1 b j b j+1 and a i =b j+1 ja a i+1 =b j (e.g. lecture letcure) 1
2 How can we calculate this? How can we calculate this efficiently? α β a b D(αa, βb) = min 1. D(α, β) if a=b 2. D(α, β)+1 if a b 3. D(αa, β)+1 4. D(α, βb)+1 D(S,T) = d(i,j) = min Define: d(i,j) = D( S[1..i], T[1..j] ) min 1. D(S[1..n-1], T[1..m-1] ) + (S[n]=T[m])? : 1 2. D(S[1..n], T[1..m-1] ) D(S[1..n-1], T[1..m] ) d(i-1,j-1) + (S[n]=T[m])? : 1 2. d(i, j-1) d(i-1, j) +1 Recursion Recursion? F()=1 F(1)=1 F(n) = F(n- 1)+F(n- 2) 1,1,2,3,5,8, sub fib(int x) if (x<3) return 1; else return fib(x-1)+fib(x-2); i x d(i-1,j-1) d(i,j-1) j y d(i-1,j) d(i,j) Recursion? Algorithm Edit distance D(A,B) using Dynamic Programming (DP) i m j n n d(i-1,j-1) d(i-1,j) d(i,j-1) d(i,j) Input: A=a 1 a 2...a n, B=b 1 b 2...b m Output: Value d mn in matrix (d ij ), i m, j n. for i= to m do d i =i ; for j= to n do d j =j ; for j=1 to n do for i=1 to m do return d mn d ij = min( d i- 1, j- 1 + (if a i ==b j then else 1), d i- 1, j + 1, d i, j ) 2
3 Dynamic Programming i x m y n n d(i-1,j-1) d(i-1,j) d(i,j-1) j d(i,j) Edit distance is a metric It can be shown, that D(A,B) is a metric D(A,B), D(A,B)= iff A=B D(A,B) = D(B,A) D(A,C) D(A,B) + D(B,C) indust-r-y- in---terest Alignment Path of edit operakons OpKmal solukon can be calculated aterwards Quite typical in dynamic programming Memorize sets pred[i,j] depending from where the d ij was reached. 3
4 Three possible minimizing paths Add into pred[i,j] (i- 1,j- 1) if d ij = d i- 1,j- 1 + (if a i ==b j then else 1) (i- 1,j) if d ij = d i- 1,j + 1 (i,j- 1) if d ij = d i,j The path (in reverse order) ε c 6, b 5 b 5, c 4 c 4, a 3 a 3, a 2 b 2, b 1 a 1. MulKple paths possible All paths are correct There can be many (how many?) paths What are the other queskons in edit distance caclulakons? Space complexity Time complexity Other ways to look at the algorithm(s) ApplicaKons More complex nokons of similarity Space can be reduced CalculaKon of D(A,B) in space Θ(m) Input: A=a 1 a 2...a m, B=b 1 b 2...b n (choose m<=n) Output: d mn =D(A,B) for i= to m do C[i]=i for j=1 to n do C=C[]; C[]=j; for i=1 to m do d = min( C + (if a i ==b j then else 1), C[i- 1] + 1, C[i] + 1 ) C = C[i] // memorize new diagonal value C[i] = d write C[m] Time complexity is Θ(mn) since C[..m] is filled n Kmes 4
5 Shortest path in the graph h[p://en.wikipedia.org/wiki/shortest_path_problem Shortest path in the graph h[p://en.wikipedia.org/wiki/shortest_path_problem All nodes at distance 1 from source Edit distance = shortest path in ObservaKons? Shortest path is close to the diagonal If a short distance path exists Values along any diagonal can only increase (by at most 1) Diagonal Diagonal lemma Property of any diagonal: The values of matrix (d ij ) can on any specific diagonal either increase by 1 or stay the same Lemma: For each d ij, 1 i m, 1 j n holds: d ij =d i- 1,j- 1 or d ij = d i- 1,j (nokce that d ij and d i- 1,j- 1 are on the same diagonal) Proof: Since d ij is an integer, show: d ij d i- 1, j d ij d i- 1,j- 1 From the definikon of edit distance 1. holds since d ij d i- 1, j Diagonal nr. 2, d 2, d 13, d 24, d 35, d 46 Diagonal k, -m k n, s.t. diagonal k contains only d ij where j-i = k. InducKon on i+j : Basis is trivial when i= or j= (if we agree that d - 1, j- 1 =d j ) InducKon step: there are 3 possibilikes - On minimizakon the d ij is calculated from entry d i- 1, j- 1, or d ij d i- 1,j- 1 On minimizakon the d ij is calculated from entry d i- 1, j, or d ij =d i- 1,j +1 d i- 2,j d i- 1,j- 1 On minimizakon the d ij is calculated from entry d i,j- 1. Analogical to 2. Hence, d i- 1,j- 1 d ij 5
6 Transform the matrix into f kp For each diagonal only show the posikon (row index) where the value is increased by 1. Also, one can restrict the matrix (d ij ) to only this part where d ij d mn since only those d ij can be on the shortest path. We'll use the matrix (f kp ) that represents the diagonals of d ij f kp is a row index i from d ij, such that on diagonal k the value p reaches row i (d ij =p and j- i=k). IniKalizaKon: f,- 1 =- 1 and f kp =- when p k - 1 ; d mn = p, such that f n- m,p =m Calcula.ng matrix (f kp ) by columns Assume the column p- 1 has been calculated in (f kp ), and we want to calculate f kp. (the region of d ij =p) On diagonal k values p reach at least the row t= max ( f k,p- 1 +1, f k- 1,p- 1, f k+1,p- 1 +1) if the diagonal k reaches so far. If on row t+1 addikonally a i = b j on the same diagonal, then d ij cannot increase, and value p reaches row t+1. Repeat previous step unkl a i b j on diagonal k. f k,p same diagonal f k- 1,p- 1 - diagonal below f k+1,p diagonal above Algorithm A(): calculate f kp A(k,p) 1. t = max( f k,p- 1 +1, f k- 1,p- 1, f k+1,p- 1 +1) 2. while a t+1 == b t+1+k do t = t+1 3. f kp = if t>m or t+k >n then undefined else t f,2 t= max (3,2,3) = 2 A[3]=B[3], A[5]=B[5] => f(,2) = 5 f(1,2) t 6
7 Algoritm: Diagonal method by columns p= - 1 while f n- m,p m p=p+1 for k= - p to p do // f kp = A(k,p) t = max( f k,p- 1 +1, f k- 1,p- 1, f k+1,p- 1 +1) while a t+1 = b t+1+k do t = t+1 f kp = if t>m or t+k >n then undefined else t p can only ocur on diagonals - p k p. Method can be improved since k is oten such that f kp is undefined. We can decrease values of k: - m k n (diagonal numbers) Let m n and d ij on diagonal k. if - m k then k d ij m if 1 k n then k d ij k+m Hence, - m k m if p m and p- m k p if p m Some notes In applicakons small D(A,B) are most intereskng. Can modify the algorithm by providing maximum t Hence, O(tm) - the smaller the t, the faster the algorithm. Space can be reduced by keeping only previous column How to output the shortest path? RelaKvely simple, in Kme O(s), outputs a single path. Extensions to basic edit distance New operakons Variable costs TransposiKon (ab ba) E4: Transposi.on a i a i+1 b j b j+1, s.t. a i =b j+1 and a i+1 =b j (e.g.: lecture letcure ) d(i,j) = min 1. d(i-1,j-1) + (S[n]=T[m])? : 1 2. d(i, j-1) d(i-1, j) d(i-2,j-2) + ( if S[i-1,i] = T[j,j-1] then 1 else ) 7
8 Longest common subsequences The edit distance algorithm can be changed easily Space efficiency can also be achieved using 2 last columns, hence skll O(m) Diagonal method can be modified, since the diagonal lemma holds. Algorithms can be modified in a relakvely straigh orward manner Defini.on. String C=c 1 c 2...c r is a subsequence (alamjada) of A=a 1 a 2...a m if by removing from A null or more characters, one can get C. String C=c 1 c 2...c r is the longest common subsequence, LCS (pikim ühine alamjada) of A=a 1 a 2...a m and B=b 1 b 2...b n if C is the longest string that is both the subsequence of A and B. C=LCS(A,B) The length of LCS(A,B), C, can be used as the similarity measure for A and B. LCS(A,B) can be calculated similarly to edit distance LCS(A,B) = ( A + B - D'(A,B) )/2 Let D'(A,B) the edit distance where the only allowed opera.ons are inserkon and delekon (no replace). Theorem a) LCS(A,B) = ( A + B - D'(A,B) )/2 b) Lets have two sets D'(A,B) with opkmal nr of changes: 1. a i1 ε, a i2 ε,... a ip ε dele.ons from A and 2. ε b j1, ε b j2,... ε b jr inser.ons into B. Then LCS(A,B)=C can be constructed such that, C is A ater delekons of 1. and C is B ater delekon of all inserkons 2. (inserkons in 2. are reversely delekons from B). Proof b) According to construckon, C is uniquely defined C is a subsequence of A as well as B. If C was not the longest, then there would be C', C < C' s.t. C'=LCS(A,B). But then D'(A,B) A - C' + B - C' < A - C + B - C = D'(A,B), which is a contradickon. Hence, C=LCS(A,B). Proof a) According to b) LCS(A,B) = A - p and LCS(A,B) = B - r, or 2 LCS(A,B) = A + B - (p+r)= A + B - D'(A,B). Example. LCS(england,inglismaa)=ngla. D'(england,inglismaa)=8, ngla =4=(7+9-8)/2. Diagonal lemma holds, but the increase always occurs by two. Time complexity O(mn), with diag. method O (min(s,m) s) where s=d'(a,b), m= A, n= B. There exists other algorithms for LCS (e.g. Hunt- Szymanski) Unix command diff compares files row by row and searches the deviakons from the LCS of the two files. 8
9 Generalized edit distance Use more operakons E1...En, and to provide different costs to each. Defini.on. Let x, y Σ *. Then every x y is an edit operakon. Edit operakon replaces x by y. If A=uxv then ater the operakon, A=uyv We note by w(x y) the cost or weight of the operakon. Cost may depend on x and/or y. But we assume w(x y). Generalized edit distance If operakons can only be applied in parallel, i.e. the part already changed cannot be modified again, then we can use the dynamic programming. Otherwise it is an algorithmically unsolvable problem, since queskon - can A be transformed into B using operakons of G, is unsolvable. The diagonal method in general may not be applicable. But, since each diversion from diagonal, the cost slightly increases, one can stay within the narrow region around the diagonal. Applica.ons of generalized edit distance Examples Historic documents, names Human language and dialects TransliteraKon rules from one alphabet to another e.g. Tõugu => Tyugu (via Russian)... 9
10 näituseks näiteks Ahwrika - Aafrika weikese - väikese materjaali - materjali tuseks -> teks a -> aa, hw -> f w -> v, e -> ä aa -> a How? Apply Aho- Corasick to match for all possible edit operakons Use minimum over all possible such operakons and costs kavalam otsimine Dush, dušš, dushsh? Gorbatšov, Gorbatshov, Горбачов, Gorbachev režiim, rezhiim, riim ImplementaKon: Reina Käärik Possible problems/tasks Manually create sensible lists of operakons For English, Russian, etc Old language, Improve the speed of the algorithm (teskng) Train for automa.c extrac.on of edit opera.ons and respec.ve costs from examples of matching words 1
The Hunt-Szymanski Algorithm for LCS
Department of Mathematics and Computer Science January 12, 2017 University of Southern Denmark RF The Hunt-Szymanski Algorithm for LCS In 1977, James W. Hunt and Thomas G. Szymanski published an algorithm
More informationCS473-Algorithms I. Lecture 10. Dynamic Programming. Cevdet Aykanat - Bilkent University Computer Engineering Department
CS473-Algorithms I Lecture 1 Dynamic Programming 1 Introduction An algorithm design paradigm like divide-and-conquer Programming : A tabular method (not writing computer code) Divide-and-Conquer (DAC):
More informationGreedy Algorithms. Algorithms
Greedy Algorithms Algorithms Greedy Algorithms Many algorithms run from stage to stage At each stage, they make a decision based on the information available A Greedy algorithm makes decisions At each
More informationLongest Common Subsequences and Substrings
Longest Common Subsequences and Substrings Version November 5, 2014 Version November 5, 2014 Longest Common Subsequences and Substrings 1 / 16 Longest Common Subsequence Given two sequences X = (x 1, x
More informationLecture 13: Chain Matrix Multiplication
Lecture 3: Chain Matrix Multiplication CLRS Section 5.2 Revised April 7, 2003 Outline of this Lecture Recalling matrix multiplication. The chain matrix multiplication problem. A dynamic programming algorithm
More informationLecture 9: Core String Edits and Alignments
Biosequence Algorithms, Spring 2005 Lecture 9: Core String Edits and Alignments Pekka Kilpeläinen University of Kuopio Department of Computer Science BSA Lecture 9: String Edits and Alignments p.1/30 III:
More informationDynamic Programming II
June 9, 214 DP: Longest common subsequence biologists often need to find out how similar are 2 DNA sequences DNA sequences are strings of bases: A, C, T and G how to define similarity? DP: Longest common
More informationMouse, Human, Chimpanzee
More Alignments 1 Mouse, Human, Chimpanzee Mouse to Human Chimpanzee to Human 2 Mouse v.s. Human Chromosome X of Mouse to Human 3 Local Alignment Given: two sequences S and T Find: substrings of S and
More informationRecursive-Fib(n) if n=1 or n=2 then return 1 else return Recursive-Fib(n-1)+Recursive-Fib(n-2)
Dynamic Programming Any recursive formula can be directly translated into recursive algorithms. However, sometimes the compiler will not implement the recursive algorithm very efficiently. When this is
More informationPresentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Dynamic Programming
Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 25 Dynamic Programming Terrible Fibonacci Computation Fibonacci sequence: f = f(n) 2
More informationText Algorithms (6EAP)
Text Algorithms (6EAP) Approximate Matching Jaak Vilo 2017 fall Jaak Vilo MTAT.03.190 Text Algorithms 1 Exact vs approximate search In exact search we searched for a string or set of strings in a long
More informationCMPS 2200 Fall Dynamic Programming. Carola Wenk. Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk
CMPS 00 Fall 04 Dynamic Programming Carola Wenk Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk 9/30/4 CMPS 00 Intro. to Algorithms Dynamic programming Algorithm design technique
More informationWe ve done. Now. Next
We ve done Matroid Theory Task scheduling problem (another matroid example) Dijkstra s algorithm (another greedy example) Dynamic Programming Now Matrix Chain Multiplication Longest Common Subsequence
More informationSpecial course in Computer Science: Advanced Text Algorithms
Special course in Computer Science: Advanced Text Algorithms Lecture 6: Alignments Elena Czeizler and Ion Petre Department of IT, Abo Akademi Computational Biomodelling Laboratory http://www.users.abo.fi/ipetre/textalg
More informationLeast Squares; Sequence Alignment
Least Squares; Sequence Alignment 1 Segmented Least Squares multi-way choices applying dynamic programming 2 Sequence Alignment matching similar words applying dynamic programming analysis of the algorithm
More informationLecture Notes: Euclidean Traveling Salesman Problem
IOE 691: Approximation Algorithms Date: 2/6/2017, 2/8/2017 ecture Notes: Euclidean Traveling Salesman Problem Instructor: Viswanath Nagarajan Scribe: Miao Yu 1 Introduction In the Euclidean Traveling Salesman
More informationy j LCS-Length(X,Y) Running time: O(st) set c[i,0] s and c[0,j] s to 0 for i=1 to s for j=1 to t if x i =y j then else if
Recursive solution for finding LCS of X and Y if x s =y t, then find an LCS of X s-1 and Y t-1, and then append x s =y t to this LCS if x s y t, then solve two subproblems: (1) find an LCS of X s-1 and
More informationDynamic Programming 1
Dynamic Programming 1 Jie Wang University of Massachusetts Lowell Department of Computer Science 1 I thank Prof. Zachary Kissel of Merrimack College for sharing his lecture notes with me; some of the examples
More informationCS 231: Algorithmic Problem Solving
CS 231: Algorithmic Problem Solving Naomi Nishimura Module 5 Date of this version: June 14, 2018 WARNING: Drafts of slides are made available prior to lecture for your convenience. After lecture, slides
More information12 Edit distance. Summer Term 2011
12 Edit distance Summer Term 2011 Jan-Georg Smaus Problem: similarity of strings Edit distance For two strings A and B, compute, as efficiently as possible, the edit distance D(A,B) and a minimal sequence
More informationLecture 8. Dynamic Programming
Lecture 8. Dynamic Programming T. H. Cormen, C. E. Leiserson and R. L. Rivest Introduction to Algorithms, 3rd Edition, MIT Press, 2009 Sungkyunkwan University Hyunseung Choo choo@skku.edu Copyright 2000-2018
More informationLectures by Volker Heun, Daniel Huson and Knut Reinert, in particular last years lectures
4 FastA and the chaining problem We will discuss: Heuristics used by the FastA program for sequence alignment Chaining problem 4.1 Sources for this lecture Lectures by Volker Heun, Daniel Huson and Knut
More informationChain Matrix Multiplication
Chain Matrix Multiplication Version of November 5, 2014 Version of November 5, 2014 Chain Matrix Multiplication 1 / 27 Outline Outline Review of matrix multiplication. The chain matrix multiplication problem.
More informationDynamic Programming CS 445. Example: Floyd Warshll Algorithm: Computing all pairs shortest paths
CS 44 Dynamic Programming Some of the slides are courtesy of Charles Leiserson with small changes by Carola Wenk Example: Floyd Warshll lgorithm: Computing all pairs shortest paths Given G(V,E), with weight
More informationSequence Alignment. Ulf Leser
Sequence Alignment Ulf Leser his Lecture Approximate String Matching Edit distance and alignment Computing global alignments Local alignment Ulf Leser: Bioinformatics, Summer Semester 2016 2 ene Function
More informationGlynda, the good witch of the North
Strings and Languages It is always best to start at the beginning -- Glynda, the good witch of the North What is a Language? A language is a set of strings made of of symbols from a given alphabet. An
More informationLongest Common Subsequence
.. CSC 448 Bioinformatics Algorithms Alexander Dekhtyar.. Dynamic Programming for Bioinformatics... Longest Common Subsequence Subsequence. Given a string S = s 1 s 2... s n, a subsequence of S is any
More informationDynamic Programming. Lecture Overview Introduction
Lecture 12 Dynamic Programming 12.1 Overview Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n 2 ) or O(n 3 ) for which a naive approach
More informationAlgorithms Dr. Haim Levkowitz
91.503 Algorithms Dr. Haim Levkowitz Fall 2007 Lecture 4 Tuesday, 25 Sep 2007 Design Patterns for Optimization Problems Greedy Algorithms 1 Greedy Algorithms 2 What is Greedy Algorithm? Similar to dynamic
More informationFastA & the chaining problem
FastA & the chaining problem We will discuss: Heuristics used by the FastA program for sequence alignment Chaining problem 1 Sources for this lecture: Lectures by Volker Heun, Daniel Huson and Knut Reinert,
More informationFastA and the chaining problem, Gunnar Klau, December 1, 2005, 10:
FastA and the chaining problem, Gunnar Klau, December 1, 2005, 10:56 4001 4 FastA and the chaining problem We will discuss: Heuristics used by the FastA program for sequence alignment Chaining problem
More informationSuffix Tree and Array
Suffix Tree and rray 1 Things To Study So far we learned how to find approximate matches the alignments. nd they are difficult. Finding exact matches are much easier. Suffix tree and array are two data
More informationWrite an algorithm to find the maximum value that can be obtained by an appropriate placement of parentheses in the expression
Chapter 5 Dynamic Programming Exercise 5.1 Write an algorithm to find the maximum value that can be obtained by an appropriate placement of parentheses in the expression x 1 /x /x 3 /... x n 1 /x n, where
More informationCSE 417 Dynamic Programming (pt 5) Multiple Inputs
CSE 417 Dynamic Programming (pt 5) Multiple Inputs Reminders > HW5 due Wednesday Dynamic Programming Review > Apply the steps... optimal substructure: (small) set of solutions, constructed from solutions
More informationarxiv: v3 [cs.ds] 16 Aug 2012
Solving Cyclic Longest Common Subsequence in Quadratic Time Andy Nguyen August 17, 2012 arxiv:1208.0396v3 [cs.ds] 16 Aug 2012 Abstract We present a practical algorithm for the cyclic longest common subsequence
More informationLongest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism
Information Processing Letters 90 (2004) 167 173 www.elsevier.com/locate/ipl Longest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism Valerio Freschi, Alessandro
More informationSingle Source Shortest Path (SSSP) Problem
Single Source Shortest Path (SSSP) Problem Single Source Shortest Path Problem Input: A directed graph G = (V, E); an edge weight function w : E R, and a start vertex s V. Find: for each vertex u V, δ(s,
More informationElements of Dynamic Programming. COSC 3101A - Design and Analysis of Algorithms 8. Discovering Optimal Substructure. Optimal Substructure - Examples
Elements of Dynamic Programming COSC 3A - Design and Analysis of Algorithms 8 Elements of DP Memoization Longest Common Subsequence Greedy Algorithms Many of these slides are taken from Monica Nicolescu,
More informationCSE 101, Winter Design and Analysis of Algorithms. Lecture 11: Dynamic Programming, Part 2
CSE 101, Winter 2018 Design and Analysis of Algorithms Lecture 11: Dynamic Programming, Part 2 Class URL: http://vlsicad.ucsd.edu/courses/cse101-w18/ Goal: continue with DP (Knapsack, All-Pairs SPs, )
More informationSyllabus. 5. String Problems. strings recap
Introduction to Algorithms Syllabus Recap on Strings Pattern Matching: Knuth-Morris-Pratt Longest Common Substring Edit Distance Context-free Parsing: Cocke-Younger-Kasami Huffman Compression strings recap
More information2.3.4 Optimal paths in directed acyclic graphs
.3.4 Optimal paths in directed acyclic graphs Definition: A directed graph G = (N, A) is acyclic if it contains no circuits. A directed acyclic graph is referred to as DAG. circuit Problem Given a directed
More informationDVA337 HT17 - LECTURE 4. Languages and regular expressions
DVA337 HT17 - LECTURE 4 Languages and regular expressions 1 SO FAR 2 TODAY Formal definition of languages in terms of strings Operations on strings and languages Definition of regular expressions Meaning
More informationAnnouncements. CS243: Discrete Structures. Strong Induction and Recursively Defined Structures. Review. Example (review) Example (review), cont.
Announcements CS43: Discrete Structures Strong Induction and Recursively Defined Structures Işıl Dillig Homework 4 is due today Homework 5 is out today Covers induction (last lecture, this lecture, and
More informationRegular Languages. Regular Language. Regular Expression. Finite State Machine. Accepts
Regular Languages L Regular Language Regular Expression Accepts Finite State Machine Regular Expressions The regular expressions over an alphabet are all and only the strings that can be obtained as follows:
More informationString Algorithms. CITS3001 Algorithms, Agents and Artificial Intelligence. 2017, Semester 2. CLRS Chapter 32
String Algorithms CITS3001 Algorithms, Agents and Artificial Intelligence Tim French School of Computer Science and Software Engineering The University of Western Australia CLRS Chapter 32 2017, Semester
More informationTutorial 6-7. Dynamic Programming and Greedy
Tutorial 6-7 Dynamic Programming and Greedy Dynamic Programming Why DP? Natural Recursion may be expensive. For example, the Fibonacci: F(n)=F(n-1)+F(n-2) Recursive implementation memoryless : time= 1
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2015 1 Sequence Alignment Dannie Durand Pairwise Sequence Alignment The goal of pairwise sequence alignment is to establish a correspondence between the
More informationAssignment 8 Solution
McGill University COMP360 Winter 2011 Instructor: Phuong Nguyen Assignment 8 Solution Question 1 (10pt) You are given a color picture consisting of an m n array A of pixels, where each pixel specifies
More informationSolution for Homework set 3
TTIC 300 and CMSC 37000 Algorithms Winter 07 Solution for Homework set 3 Question (0 points) We are given a directed graph G = (V, E), with two special vertices s and t, and non-negative integral capacities
More informationEdit Distance with Single-Symbol Combinations and Splits
Edit Distance with Single-Symbol Combinations and Splits Manolis Christodoulakis 1 and Gerhard Brey 2 1 School of Computing & Technology, University of East London Docklands Campus, 4 6 University Way,
More informationIntroduction to Algorithms
Introduction to Algorithms 6.046J/18.401J LECTURE 12 Dynamic programming Longest common subsequence Optimal substructure Overlapping subproblems Prof. Charles E. Leiserson Dynamic programming Design technique,
More informationDynamic Programming part 2
Dynamic Programming part 2 Week 7 Objectives More dynamic programming examples - Matrix Multiplication Parenthesis - Longest Common Subsequence Subproblem Optimal structure Defining the dynamic recurrence
More information4. Suffix Trees and Arrays
4. Suffix Trees and Arrays Let T = T [0..n) be the text. For i [0..n], let T i denote the suffix T [i..n). Furthermore, for any subset C [0..n], we write T C = {T i i C}. In particular, T [0..n] is the
More informationChapter 3 Dynamic programming
Chapter 3 Dynamic programming 1 Dynamic programming also solve a problem by combining the solutions to subproblems. But dynamic programming considers the situation that some subproblems will be called
More informationA Revised Algorithm to find Longest Common Subsequence
A Revised Algorithm to find Longest Common Subsequence Deena Nath 1, Jitendra Kurmi 2, Deveki Nandan Shukla 3 1, 2, 3 Department of Computer Science, Babasaheb Bhimrao Ambedkar University Lucknow Abstract:
More informationAlgebraic method for Shortest Paths problems
Lecture 1 (06.03.2013) Author: Jaros law B lasiok Algebraic method for Shortest Paths problems 1 Introduction In the following lecture we will see algebraic algorithms for various shortest-paths problems.
More informationOn the Parikh-de-Bruijn grid
On the Parikh-de-Bruijn grid Péter Burcsi Zsuzsanna Lipták W. F. Smyth ELTE Budapest (Hungary), U of Verona (Italy), McMaster U (Canada) & Murdoch U (Australia) Words & Complexity 2018 Lyon, 19-23 Feb.
More informationModeling web-crawlers on the Internet with random walksdecember on graphs11, / 15
Modeling web-crawlers on the Internet with random walks on graphs December 11, 2014 Modeling web-crawlers on the Internet with random walksdecember on graphs11, 2014 1 / 15 Motivation The state of the
More informationDynamic Programming. An Enumeration Approach. Matrix Chain-Products. Matrix Chain-Products (not in book)
Matrix Chain-Products (not in book) is a general algorithm design paradigm. Rather than give the general structure, let us first give a motivating example: Matrix Chain-Products Review: Matrix Multiplication.
More informationSection 1.7 Sequences, Summations Cardinality of Infinite Sets
Section 1.7 Sequences, Summations Cardinality of Infinite Sets Definition: A sequence is a function from a subset of the natural numbers (usually of the form {0, 1, 2,... } to a set S. Note: the sets and
More informationDecision Properties for Context-free Languages
Previously: Decision Properties for Context-free Languages CMPU 240 Language Theory and Computation Fall 2018 Context-free languages Pumping Lemma for CFLs Closure properties for CFLs Today: Assignment
More informationTraveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost
Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R
More informationCSE 101- Winter 18 Discussion Section Week 6
CSE 101- Winter 18 Discussion Section Week 6 Administrative Introducing 1:1 Sessions: https://docs.google.com/spreadsheets/d/1kgxt_rzbzlibbdijiczs_ o1wxdwa9hhvxccprn8_bwk/edit?usp=sharing Please see the
More informationfrom notes written mostly by Dr. Carla Savage: All Rights Reserved
CSC 505, Fall 2000: Week 9 Objectives: learn about various issues related to finding shortest paths in graphs learn algorithms for the single-source shortest-path problem observe the relationship among
More informationMemoization/Dynamic Programming. The String reconstruction problem. CS124 Lecture 11 Spring 2018
CS124 Lecture 11 Spring 2018 Memoization/Dynamic Programming Today s lecture discusses memoization, which is a method for speeding up algorithms based on recursion, by using additional memory to remember
More informationExact String Matching. The Knuth-Morris-Pratt Algorithm
Exact String Matching The Knuth-Morris-Pratt Algorithm Outline for Today The Exact Matching Problem A simple algorithm Motivation for better algorithms The Knuth-Morris-Pratt algorithm The Exact Matching
More informationDivide and Conquer. Bioinformatics: Issues and Algorithms. CSE Fall 2007 Lecture 12
Divide and Conquer Bioinformatics: Issues and Algorithms CSE 308-408 Fall 007 Lecture 1 Lopresti Fall 007 Lecture 1-1 - Outline MergeSort Finding mid-point in alignment matrix in linear space Linear space
More informationDESIGN AND ANALYSIS OF ALGORITHMS (DAA 2017)
DESIGN AND ANALYSIS OF ALGORITHMS (DAA 2017) Veli Mäkinen Design and Analysis of Algorithms 2017 week 4 11.8.2017 1 Dynamic Programming Week 4 2 Design and Analysis of Algorithms 2017 week 4 11.8.2017
More informationString Matching. Pedro Ribeiro 2016/2017 DCC/FCUP. Pedro Ribeiro (DCC/FCUP) String Matching 2016/ / 42
String Matching Pedro Ribeiro DCC/FCUP 2016/2017 Pedro Ribeiro (DCC/FCUP) String Matching 2016/2017 1 / 42 On this lecture The String Matching Problem Naive Algorithm Deterministic Finite Automata Knuth-Morris-Pratt
More informationImplementation of Relational Operations
Implementation of Relational Operations Module 4, Lecture 1 Database Management Systems, R. Ramakrishnan 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset of rows
More information4 Dynamic Programming
4 Dynamic Programming Dynamic Programming is a form of recursion. In Computer Science, you have probably heard the tradeoff between Time and Space. There is a trade off between the space complexity and
More informationAlgorithm Design and Analysis
Algorithm Design and Analysis LECTURE 16 Dynamic Programming Least Common Subsequence Saving space Adam Smith Least Common Subsequence A.k.a. sequence alignment edit distance Longest Common Subsequence
More informationElementary Recursive Function Theory
Chapter 6 Elementary Recursive Function Theory 6.1 Acceptable Indexings In a previous Section, we have exhibited a specific indexing of the partial recursive functions by encoding the RAM programs. Using
More informationPDA s. and Formal Languages. Automata Theory CS 573. Outline of equivalence of PDA s and CFG s. (see Theorem 5.3)
CS 573 Automata Theory and Formal Languages Professor Leslie Lander Lecture # 20 November 13, 2000 Greibach Normal Form (GNF) Sheila Greibach s normal form (GNF) for a CFG is one where EVERY production
More informationSection 2.4 Sequences and Summations
Section 2.4 Sequences and Summations Definition: A sequence is a function from a subset of the natural numbers (usually of the form {0, 1, 2,... } to a set S. Note: the sets and {0, 1, 2, 3,..., k} {1,
More informationDynamic Programming. Design and Analysis of Algorithms. Entwurf und Analyse von Algorithmen. Irene Parada. Design and Analysis of Algorithms
Entwurf und Analyse von Algorithmen Dynamic Programming Overview Introduction Example 1 When and how to apply this method Example 2 Final remarks Introduction: when recursion is inefficient Example: Calculation
More informationGraphBLAS Mathematics - Provisional Release 1.0 -
GraphBLAS Mathematics - Provisional Release 1.0 - Jeremy Kepner Generated on April 26, 2017 Contents 1 Introduction: Graphs as Matrices........................... 1 1.1 Adjacency Matrix: Undirected Graphs,
More informationLanguages and Strings. Chapter 2
Languages and Strings Chapter 2 Let's Look at Some Problems int alpha, beta; alpha = 3; beta = (2 + 5) / 10; (1) Lexical analysis: Scan the program and break it up into variable names, numbers, etc. (2)
More informationQED Q: Why is it called the triangle inequality? A: Analogue with euclidean distance in the plane: picture Defn: Minimum Distance of a code C:
Lecture 3: Lecture notes posted online each week. Recall Defn Hamming distance: for words x = x 1... x n, y = y 1... y n of the same length over the same alphabet, d(x, y) = {1 i n : x i y i } i.e., d(x,
More informationCpSc 421 Final Solutions
CpSc 421 Final Solutions Do any eight of the ten problems below. If you attempt more than eight problems, please indicate which ones to grade (otherwise we will make a random choice). This allows you to
More informationCMSC351 - Fall 2014, Homework #4
CMSC351 - Fall 2014, Homework #4 Due: November 14th at the start of class PRINT Name: Grades depend on neatness and clarity. Write your answers with enough detail about your approach and concepts used,
More information17 dicembre Luca Bortolussi SUFFIX TREES. From exact to approximate string matching.
17 dicembre 2003 Luca Bortolussi SUFFIX TREES From exact to approximate string matching. An introduction to string matching String matching is an important branch of algorithmica, and it has applications
More informationComputer Science 236 Fall Nov. 11, 2010
Computer Science 26 Fall Nov 11, 2010 St George Campus University of Toronto Assignment Due Date: 2nd December, 2010 1 (10 marks) Assume that you are given a file of arbitrary length that contains student
More informationTwo Dimensional Dictionary Matching
Two Dimensional Dictionary Matching Amihood Amir Martin Farach Georgia Tech DIMACS September 10, 1992 Abstract Most traditional pattern matching algorithms solve the problem of finding all occurrences
More informationVERTEX MAPS FOR TREES: ALGEBRA AND PERIODS OF PERIODIC ORBITS. Chris Bernhardt
VERTEX MAPS FOR TREES: ALGEBRA AND PERIODS OF PERIODIC ORBITS CHRIS BERNHARDT Abstract. Let T be a tree with n vertices. Let f : T T be continuous and suppose that the n vertices form a periodic orbit
More informationMore Dynamic Programming
CS 374: Algorithms & Models of Computation, Fall 2015 More Dynamic Programming Lecture 12 October 8, 2015 Chandra & Manoj (UIUC) CS374 1 Fall 2015 1 / 43 What is the running time of the following? Consider
More informationExternal Memory. Philip Bille
External Memory Philip Bille Outline Computationals models Modern computers (word) RAM I/O Cache-oblivious Shortest path in implicit grid graphs RAM algorithm I/O algorithms Cache-oblivious algorithm Computational
More informationIN101: Algorithmic techniques Vladimir-Alexandru Paun ENSTA ParisTech
IN101: Algorithmic techniques Vladimir-Alexandru Paun ENSTA ParisTech License CC BY-NC-SA 2.0 http://creativecommons.org/licenses/by-nc-sa/2.0/fr/ Outline Previously on IN101 Python s anatomy Functions,
More informationApproximation Algorithms
Approximation Algorithms Group Members: 1. Geng Xue (A0095628R) 2. Cai Jingli (A0095623B) 3. Xing Zhe (A0095644W) 4. Zhu Xiaolu (A0109657W) 5. Wang Zixiao (A0095670X) 6. Jiao Qing (A0095637R) 7. Zhang
More informationUmans Complexity Theory Lectures
Introduction Umans Complexity Theory Lectures Lecture 5: Boolean Circuits & NP: - Uniformity and Advice, - NC hierarchy Power from an unexpected source? we know P EXP, which implies no polytime algorithm
More informationLectures 12 and 13 Dynamic programming: weighted interval scheduling
Lectures 12 and 13 Dynamic programming: weighted interval scheduling COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures 12-13: Dynamic Programming 1 Overview Last week: Graph
More informationParallel Longest Increasing Subsequences in Scalable Time and Memory
Parallel Longest Increasing Subsequences in Scalable Time and Memory Peter Krusche Alexander Tiskin Department of Computer Science University of Warwick, Coventry, CV4 7AL, UK PPAM 2009 What is in this
More informationDynamic Programming: 1D Optimization. Dynamic Programming: 2D Optimization. Fibonacci Sequence. Crazy 8 s. Edit Distance
Dynamic Programming: 1D Optimization Fibonacci Sequence To efficiently calculate F [x], the xth element of the Fibonacci sequence, we can construct the array F from left to right (or bottom up ). We start
More informationLecture 5: Data Streaming Algorithms
Great Ideas in Theoretical Computer Science Summer 2013 Lecture 5: Data Streaming Algorithms Lecturer: Kurt Mehlhorn & He Sun In the data stream scenario, the input arrive rapidly in an arbitrary order,
More informationPrimal Dual Schema Approach to the Labeling Problem with Applications to TSP
1 Primal Dual Schema Approach to the Labeling Problem with Applications to TSP Colin Brown, Simon Fraser University Instructor: Ramesh Krishnamurti The Metric Labeling Problem has many applications, especially
More information4. Suffix Trees and Arrays
4. Suffix Trees and Arrays Let T = T [0..n) be the text. For i [0..n], let T i denote the suffix T [i..n). Furthermore, for any subset C [0..n], we write T C = {T i i C}. In particular, T [0..n] is the
More informationAlgorithms for Data Science
Algorithms for Data Science CSOR W4246 Eleni Drinea Computer Science Department Columbia University Thursday, October 1, 2015 Outline 1 Recap 2 Shortest paths in graphs with non-negative edge weights (Dijkstra
More informationYork University CSE 2001 Unit 4.0 Context Free Grammars and Parsers and Context Sensitive Grammars Instructor: Jeff Edmonds
York University CSE 2001 Unit 4.0 Context Free Grammars and Parsers and Context Sensitive Grammars Instructor: Jeff Edmonds Don t cheat by looking at these answers prematurely. 1. Consider the following
More information9.1 Cook-Levin Theorem
CS787: Advanced Algorithms Scribe: Shijin Kong and David Malec Lecturer: Shuchi Chawla Topic: NP-Completeness, Approximation Algorithms Date: 10/1/2007 As we ve already seen in the preceding lecture, two
More information15-451/651: Design & Analysis of Algorithms January 26, 2015 Dynamic Programming I last changed: January 28, 2015
15-451/651: Design & Analysis of Algorithms January 26, 2015 Dynamic Programming I last changed: January 28, 2015 Dynamic Programming is a powerful technique that allows one to solve many different types
More information