Introduction to Algorithms

Similar documents
Algorithms and Data Structures

String Matching. Geetha Patil ID: Reference: Introduction to Algorithms, by Cormen, Leiserson and Rivest

Problem Set 9 Solutions

Algorithms and Data Structures Lesson 3

String Matching. Pedro Ribeiro 2016/2017 DCC/FCUP. Pedro Ribeiro (DCC/FCUP) String Matching 2016/ / 42

Introduction to. Algorithms. Lecture 6. Prof. Constantinos Daskalakis. CLRS: Chapter 17 and 32.2.

END-TERM EXAMINATION

Chapter 4. Transform-and-conquer

Lecture 6: Hashing Steven Skiena

String Algorithms. CITS3001 Algorithms, Agents and Artificial Intelligence. 2017, Semester 2. CLRS Chapter 32

CS/COE 1501

String matching algorithms تقديم الطالب: سليمان ضاهر اشراف المدرس: علي جنيدي

String Matching Algorithms

Assignment 2 (programming): Problem Description

String Matching Algorithms

Data Structures and Algorithms. Course slides: String Matching, Algorithms growth evaluation

CSCI S-Q Lecture #13 String Searching 8/3/98

String quicksort solves this problem by processing the obtained information immediately after each symbol comparison.

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

Hash Functions. Kuan-Yu Chen ( 陳冠宇 ) TR-212, NTUST

Knuth-Morris-Pratt. Kranthi Kumar Mandumula Indiana State University Terre Haute IN, USA. December 16, 2011

Introduction to Algorithms

Announcements. Programming assignment 1 posted - need to submit a.sh file

kvjlixapejrbxeenpphkhthbkwyrwamnugzhppfx

Questions. 6. Suppose we were to define a hash code on strings s by:

String Processing Workshop

CS5112: Algorithms and Data Structures for Applications

SORTING. Practical applications in computing require things to be in order. To consider: Runtime. Memory Space. Stability. In-place algorithms???

COMP Online Algorithms. k-server Problem & Advice. Shahin Kamali. Lecture 13 - Oct. 24, 2017 University of Manitoba

Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest. Introduction to Algorithms

Consider the actions taken on positive integers when considering decimal values shown in Table 1 where the division discards the remainder.

Goal of the course: The goal is to learn to design and analyze an algorithm. More specifically, you will learn:

Module 2: Classical Algorithm Design Techniques

Programming Project, CS378, Spring 2013 Implementing ElGamal Encryption

CSE 214 Computer Science II Searching

EE 486 Winter The role of arithmetic. EE 486 : lecture 1, the integers. SIA Roadmap - 2. SIA Roadmap - 1

CSC Design and Analysis of Algorithms. Lecture 8. Transform and Conquer II Algorithm Design Technique. Transform and Conquer

Lecture 6: Arithmetic and Threshold Circuits

Introduction to Algorithms 6.046J/18.401J

Chapter 7. Space and Time Tradeoffs. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

String matching algorithms

Data Structure and Algorithm Midterm Reference Solution TA

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

A Multipattern Matching Algorithm Using Sampling and Bit Index

Question Score Points Out Of 25

CSC263 Week 5. Larry Zhang.

Introduction to Algorithms

University of Waterloo CS240R Fall 2017 Review Problems

Application of String Matching in Auto Grading System

CSC152 Algorithm and Complexity. Lecture 7: String Match

Exact String Matching. The Knuth-Morris-Pratt Algorithm

A string is a sequence of characters. In the field of computer science, we use strings more often as we use numbers.

Lecture 7 February 26, 2010

1 Format. 2 Topics Covered. 2.1 Minimal Spanning Trees. 2.2 Union Find. 2.3 Greedy. CS 124 Quiz 2 Review 3/25/18

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

CpSc 421 Final Solutions

University of Waterloo CS240R Winter 2018 Help Session Problems

Hashing 1. Searching Lists

University of Waterloo CS240 Spring 2018 Help Session Problems

UNIVERSITY of OSLO. Faculty of Mathematics and Natural Sciences. INF 4130/9135: Algoritmer: Design og effektivitet Date of exam: 14th December 2012

Hashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong

Algorithms for Grid Graphs in the MapReduce Model

What we still don t know about addition and multiplication

Today s Outline. CS 561, Lecture 8. Direct Addressing Problem. Hash Tables. Hash Tables Trees. Jared Saia University of New Mexico

III Data Structures. Dynamic sets

Indexing and Searching

Cryptography: More Primitives

Modular Arithmetic. Marizza Bailey. December 14, 2015

CS369G: Algorithmic Techniques for Big Data Spring

Multiple-choice (35 pt.)

Lecture 18 April 12, 2005

Introduction to. Algorithms. Lecture 5

Practice Midterm Exam Solutions

CS Prof. Clarkson Fall 2014

A note on some interesting test results for the Rabin-Karp string matching algorithm

Package VeryLargeIntegers

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

Introduction to Algorithms

What we still don t know about addition and multiplication

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

CS103 Handout 14 Winter February 8, 2013 Problem Set 5

of characters from an alphabet, then, the hash function could be:

Given a text file, or several text files, how do we search for a query string?

Outline for Today. How can we speed up operations that work on integer data? A simple data structure for ordered dictionaries.

CS 251, LE 2 Fall MIDTERM 2 Tuesday, November 1, 2016 Version 00 - KEY

Data Structures and Algorithms(4)

Lecture 5 Using Data Structures to Improve Dijkstra s Algorithm. (AM&O Sections and Appendix A)

Lecture 7: Efficient Collections via Hashing

Introduction to Algorithms

Implementation of Pattern Matching Algorithm on Antivirus for Detecting Virus Signature

Introduction to Algorithms

SORTING AND SEARCHING

LING/C SC/PSYC 438/538. Lecture 18 Sandiway Fong

DATA STRUCTURES AND ALGORITHMS

Data Structures and Algorithms Course Introduction. Google Camp Fall 2016 Yanqiao ZHU

CS1800 Discrete Structures Final Version B

Algebra 1. Standard 11 Operations of Expressions. Categories Combining Expressions Multiply Expressions Multiple Operations Function Knowledge

Modular Arithmetic. is just the set of remainders we can get when we divide integers by n

CS103 Handout 13 Fall 2012 May 4, 2012 Problem Set 5

Introduction to Algorithms Third Edition

Transcription:

Introduction to Algorithms 6.046J/18.401J Lecture 22 Prof. Piotr Indyk

Today String matching problems HKN Evaluations (last 15 minutes) Graded Quiz 2 (outside) Piotr Indyk Introduction to Algorithms December 1, 2004 L22.2

String Matching Input: Two strings T[1 n] and P[1 m], containing symbols from alphabet Σ. E.g. : Σ={a,b,,z} T[1 18]= to be or not to be P[1..2]= be Goal: find all shifts 0 s n-m such that T[s+1 s+m]=p E.g. 3, 16 Piotr Indyk Introduction to Algorithms December 1, 2004 L22.3

Simple Algorithm for s 0 to n-m Match 1 for j 1 to m if T[s+j] P[j] then Match 0 exit loop if Match=1 then output s Piotr Indyk Introduction to Algorithms December 1, 2004 L22.4

Results Running time of the simple algorithm: Worst-case: O(nm) Average-case (random text): O(n) T s = time spent on checking shift s E[T s ] 2 E [ s T s ] = s E[T s ] = O(n) Piotr Indyk Introduction to Algorithms December 1, 2004 L22.5

Worst-case Is it possible to achieve O(n) for any input? Knuth-Morris-Pratt 77: deterministic Karp-Rabin 81: randomized Piotr Indyk Introduction to Algorithms December 1, 2004 L22.6

Karp-Rabin Algorithm A very elegant use of an idea that we have encountered before, namely HASHING! Idea: Hash all substrings T[1 m], T[2 m+1],, T[m-n+1 n] Hash the pattern P[1 m] Report the substrings that hash to the same value as P Problem: how to hash n-m substrings, each of length m, in O(n) time? Piotr Indyk Introduction to Algorithms December 1, 2004 L22.7

Attempt 0 In Lecture 7, we have seen h a (x)= i a i x i mod q where a=(a 1,,a r ), x=(x 1,,x r ) To implement it, we would need to compute h a ( T[s s+m-1] )= i a i T[s+i] mod q for s=0 n-m How to compute it in O(n) time? A big open problem! Piotr Indyk Introduction to Algorithms December 1, 2004 L22.8

Attempt 1 Assume Σ={0,1} Think about each T s =T[s+1 s+m] as a number in binary representation, i.e., t s =T[s+1]2 m-1 +T[s+2]2 m-2 + +T[s+m]2 0 Find a fast way of computing t s+1 given t s Output all s such that t s is equal to the number p represented by P Piotr Indyk Introduction to Algorithms December 1, 2004 L22.9

The great formula How to transform t s =T[s+1]2 m-1 +T[s+2]2 m-2 + +T[s+m]2 0 into t s+1 =T[s+2]2 m-1 +T[s+3]2 m-2 + +T[s+m+1]2 0? Three steps: Subtract T[s+1]2 m-1 Multiply by 2 (i.e., shift the bits by one position) Add T[s+m+1]2 0 Therefore: t s+1 = (t s -T[s+1]2 m-1 )*2 + T[s+m+1]2 0 Piotr Indyk Introduction to Algorithms December 1, 2004 L22.10

Algorithm t s+1 = (t s - T[s+1]2 m-1 )*2 + T[s+m+1]2 0 Can compute t s+1 from t s using 3 arithmetic operations Therefore, we can compute all t 0,t 1,,t n-m using O(n) arithmetic operations We can compute a number corresponding to P using O(m) arithmetic operations Are we done? Piotr Indyk Introduction to Algorithms December 1, 2004 L22.11

Problem To get O(n) time, we would need to perform each arithmetic operation in O(1) time However, the arguments are m-bit long! If m large, it is unreasonable to assume that operations on such big numbers can be done in O(1) time We need to reduce the number range to something more managable Piotr Indyk Introduction to Algorithms December 1, 2004 L22.12

Attempt 2: Hashing We will instead compute t s =T[s+1]2 m-1 +T[s+2]2 m-2 + +T[s+m]2 0 mod q where q is an appropriate prime number One can still compute t s+1 from t s : t s+1 = (t s -T[s+1]2 m-1 )*2+T[s+m+1]2 0 mod q If q is not large, i.e., has O(log n) bits, we can compute all t s (and p ) in O(n) time Piotr Indyk Introduction to Algorithms December 1, 2004 L22.13

Problem Unfortunately, we can have false positives, i.e., T s P but t s mod q = p mod q Need to use a random q We will show that the probability of a false positive is small randomized algorithm Piotr Indyk Introduction to Algorithms December 1, 2004 L22.14

False positives Consider any t s p. We know that both numbers are in the range {0 2 m -1} How many primes q are there such that t s mod q = p mod q (t s -p) =0 mod q? Such prime has to divide x=(t s -p) 2 m Represent x=p e1 p e2 2 p ek 1 k, p i prime, e i 1 What is the largest possible value of k? Since 2 p i, we have x 2 k But x 2 m k m There are m primes dividing x Piotr Indyk Introduction to Algorithms December 1, 2004 L22.15

Algorithm Algorithm: Let be a set of 2nm primes, each having O(log n) bits Choose q uniformly at random from Compute t 0 mod q, t 1 mod q,., and p mod q Report s such that t s mod q = p mod q Analysis: For each s, the probability that T s P but t s mod q =p mod q is at most m/2nm = 1/2n The probability of any false positive is at most (n-m)/2n 1/2 Piotr Indyk Introduction to Algorithms December 1, 2004 L22.16

Details How do we know that such exists? (That is, a set of 2nm primes, each having O(log n) bits) How do we choose a random prime from in O(n) time? Piotr Indyk Introduction to Algorithms December 1, 2004 L22.17

Prime density Primes are dense. I.e., if PRIMES(N) is the set of primes smaller than N, then asymptotically PRIMES(N) /N ~ 1/ln N If N large enough, then PRIMES(N) N/(2ln N) Proof: Trust me. Piotr Indyk Introduction to Algorithms December 1, 2004 L22.18

Prime density continued Set N=C mn ln(mn) There exists C=O(1) such that N/(2ln N) 2mn (Note: for such N we have PRIMES(N) 2mn ) Proof: C mn ln(mn) / [2 ln(c mn ln(mn)) ] C mn ln(mn) / [2 ln(c (mn) 2 ) ] = C mn ln(mn) / 4[ ln(c) + ln(mn)] All elements of PRIMES(N) are log N = O(log n) bits long Piotr Indyk Introduction to Algorithms December 1, 2004 L22.19

Prime selection Still need to find a random element of PRIMES(N) Solution: Choose a random element from {1 N} Check if it is prime If not, repeat Piotr Indyk Introduction to Algorithms December 1, 2004 L22.20

Prime selection analysis A random element q from {1 N} is prime with probability ~1/ln N We can check if q is prime in time polynomial in log N : Randomized: Rabin, Solovay-Strassen in 1976 Deterministic: Agrawal et al in 2002 Therefore, we can generate random prime q in o(n) time Piotr Indyk Introduction to Algorithms December 1, 2004 L22.21

Final Algorithm Set N=C mn ln(mn) Repeat Choose q uniformly at random from {1 N} Until q is prime Compute t 0 mod q, t 1 mod q,., and p mod q Report s such that t s mod q = p mod q Piotr Indyk Introduction to Algorithms December 1, 2004 L22.22