Assignment 2 (programming): Problem Description

Size: px
Start display at page:

Download "Assignment 2 (programming): Problem Description"

Transcription

1 CS2210b Data Structures and Algorithms Due: Monday, February 14th Assignment 2 (programming): Problem Description 1 Overview The purpose of this assignment is for students to practice on hashing techniques and hash table implementations. The application is an adapted version of the String Matching problem in the Intel 2009 Threading Challenge 1. Briefly, the String Matching problem in this contest is to write a (parallel) program to search a large set of DNA sequences (represented as strings of characters) for a given set of other DNA sequences. Dr. Bradley C. Kuszmaul (Massachusetts Institute of Technology) won the contest with a program written in Cilk++, an extension of C++ for parallel programing. Dr. Kuszmaul used Rabin-Karp hashing 2. In this assignment, students will implement a sequential Java program to perform the DNA sequence matching functionalities following Dr. Kuszmaul s techniques. The test suites will be built from the chromosome of Canis familiaris (domestic dog), available at 2 Problem Description The following is based on a report by Dr. Kuszmaul. The problem, as stated on Intel s web site is as follows. Write a threaded program to search a database of DNA sequences, represented as strings of characters, to find matches of other DNA subsequences. Both the database and the query strings are made up of only four characters: A, C, G and T. The output must report the location of an exact match within any given input DNA sequence for each input search query string. If the query string matches within multiple sequences within the database, each result must be reported; and if the query string matches multiple locations within the same database sequence, the earliest position that matches exactly must be reported. The file names for this problem (database file, queries file, output results) will be given on the command line. File formats: The input database and query files will have the same format. Each sequence will be prefixed by a line starting with a greater than character ( > ) followed by a description of the origin of the sequence of no more than 131 characters. The sequence will then begin on the next line for some number of lines. Each line will contain exactly string search algorithm 1

2 characters from the set A, C, G and T, except the last line, which may hold fewer than 80 characters. Following this last line will be the descriptor line from the next sequence until the end of the file has been reached, which will be signified by the descriptor ( >EOF ). For each query string contained within the second input file, the output file should print the descriptor of the query sequence and the descriptors of any database sequences that contain a match as well as the position within the database sequence of that exact match. If the query sequence string is not found within any database sequences, a message to that effect should be printed after the query descriptor. Thus the program takes as input two sets of strings: the Database strings and the Query strings. We will refer to these two sets of strings as the database and the querybase respectively. We make some assumptions about the input: Each string is less than 1,000,000 bytes 3. The entire database fits in main memory, but the entire querybase may be too big to fit in main memory 4. There are less than 2 32 strings in a database or a querybase 5. The output format is described in( -matching/topic/65663/). In that example, the database is: >DB description string 1 AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACT GGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAG ACAGATAAAAATTACA >DB description string 2 AACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAGCCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAG GTAACGAGGTAACAACCATGCGAGTGTTGAAGTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGC CGATATTCTGGAAAGCAATGC >EOF and an example querybase is: >Query description string 1 CATTCTGACTGCAA >Query description string 2 AAAAAAG >Query description string 3 GTAA >Query description string 4 AGAGAGAGAGAGAGAGAGAGAGAGAGAGAG >EOF

3 The output would then be Query description string 1 [DB description string 1] at offset 7 Query description string 2 [DB description string 1] at offset 47 [DB description string 2] at offset 33 Query description string 3 [DB description string 1] at offset 94 [DB description string 2] at offset 79 Query description string 4 NOT FOUND 3 Notation For describing the algorithm and analyzing performance, we define the following variables. If s is a string then s is the length of s and s i is the ith letter (starting from 0) of s while s <i,j> is the substring of s starting at position i (inclusive) through position j (exclusive). Q is the set of all query strings. D is the set of all database strings. T(Q) is the total size of all the query strings. That is T(Q) = q Q q T(D) is the total size of all the database strings. Q is the number of query strings. D is the number of database strings. O is Big-Oh notation. 4 Naive Serial Algorithm The simplest algorithm for implementing this string matching is to loop over combinations of the query strings and database strings and compare the strings. The algorithm is as follows. 3

4 Naive() 1 for q Q 2 for d D 3 for i below d q +1 4 for j below q 5 if d i+j q j then 6 continue to the next iteration of the i loop. 7 Match found for q at offset i of d. 8 continue to the next iteration of the outer loop. 9 No match found for q. In the worst case, the total number of instructions for the naive algorithm is O(T(D) T(Q)). In the worst case, essentially every character in the querybase needs to be compared to essentially every character in the database. Thus the total number of instructions can be quadratic in the input size. This case occurs when the querybase and the database are about the same size. 5 Rabin-Karp Matching There are algorithms for string matching that are much faster than the naive algorithm. For example, to match a single query string, the Knuth-Morris-Pratt algorithm [1] can be performed in O(T(Q) + T(D)) instructions, and the Boyer-Moore algorithm [2] is often even faster. However, it is not clear how to use those algorithms to search for multiple query strings quickly. Therefore, Dr. Kuszmaul used Rabin-Karp hashing [3]. 5.1 Rabin-Karp rolling hashing Rabin-Karp hashing is a technique for computing a hash of a string. Given the Rabin-Karp hash of a k-element substring of a text, one can quickly compute the hash of successive k-element substrings. For example, if we have the string ABCDEFG, we can quickly compute the hashes of ABCD, of BCDE, CDEF, and DEFG respectively. This kind of hash is sometimes called a rolling hash. Rabin-Karp hashing works as follows. First we chose a prime number P. The hash of a string q is computed as h(q i )P q i 1 (1) 0 i< q where h(q i ) is a non-negative integer less than P. In this case, since our alphabet consists of the four letters A, C, T, G we can choose P = 5, and h(a) = 0, h(c) = 1, h(t) = 2, h(a) = 3. Thus, if we have the string we can encode it as TATGTGAGAAGA,

5 To compute the hash of the first four characters we have hash(2023) = = = 261. To compute the hash of the next 4-character subsequence we have hash(0232) = = = 57. But that can be computed incrementally by multiplying the previous hash by 5, adding in the newly added character and subtracting the newly removed character. Thus in general we have hash(0232) = hash(2023) = = 57 hash(s <i+1,j+1> ) = (hash(s <i,j> )) P h(s i ) P j i +h(s j+1 ). (2) Since we know the length of the s in advance (4 in this example), we can compute P s in advance (in this example 5 4 = 625). Furthermore, we can do this arithmetic in a fixed-size word (e.g., in a 32-bit int in Java), and it will work out. 5.2 Rabin-Karp Searching We can thus implement the search as SimpleRabinKarp() 1 for q Q 2 h q hash(q) 3 for d D 4 for i below d q +1 5 h d hash(d <i,i+ q > ) 6 if h q = h d then 7 Match may be found for q at offset i of d. 8 Verify that it is really a match. 9 if it is really a match then 10 continue to the next iteration of the outer loop 11 No match found for q. If we are very unlucky, the hash will match when the strings do not really match (a so-called false match). In that case we ll do the work to verify a string that isn t actually a 5

6 match. We don t have to worry about the work to verify a string that actually is a match, because for each query string we stop searching d after the first match. In the databases that Dr. Kuszmaul worked with, he never encountered a false match on an entire query string. For SimpleRabinKarp, in the usual case, the number of instructions executed is O(T(Q)+ Q T(D)), since the database must be scanned once for every query string. 5.3 Dr. Kuszmaul s Rabin-Karp Searching We can do even better in many cases, however. For example, if all the query strings are the same size then we can compute hash(d <i,i+ q > ) only once and compare it to all the query hashes. We can store the query hashes in a hash table, so that all of those comparisons can run in O(1) instructions. In this case the total number of instructions executed is O(T(Q) + T(D)), which is asymptotically optimal. In practice, query strings are not all the same size. Dr. Kuszmaul applied the following heuristic to reduce the number of passes through each database string. He organized the query strings into groups, where the strings within each group are all within a factor of two of each other in size. Thus there are O(log Q ) groups. For a group G, if s is the shortest string G, then he used the hashes of the prefixes of length s for every string in the group. For example, if we have a group {ATAT, GATTA, CTAAT}, then the shortest string is of length 4. Instead of looking for hash matches against every element of the group, we look for matches against the length-4 prefixes of the group. So we look for hashes matching {ATAT, GATT, CTAA}. This increases the number of false matches, but reduces the number of passes over the database strings from Q to log Q. Dr. Kuszmaul found that for DNA sequences using a prefix is a good tradeoff, making the program run faster. This grouping is implemented by sorting the query strings by length. The system looks at the shortest string, and uses its length m. The system then includes all strings of length less than m in a group. The group is called an m-group. Then the system looks at the shortest string not in any group. That string s length n is guaranteed to 2m n. The system creates an m-group. And so forth, producing at most O(logN) groups where N is the longest string in the querybase. When looking for any string in an m-group the system hashes only the first m characters, and then simply compares the rest. One problem of using a prefix is that there are some bad cases that act like the naive algorithm. This can happen when we are using only the first k characters of a string s, but that substring is repeated several times. For example if we had 9 A s and one T as AAAAAAAAAT, and that we are only using the first 5 A s to match. We would encounter false matches many times (4 in this case) on the substring at offsets 0, 1, 2, and 3. We can avoid this false prefix matching by adopting the rule that if the grouping would produce a situation where a string s is in an m-group, and the substring s <0,m> is repeated as later in s, then the system creates a s -group. This gives us the property we need. If a string in an m-group is matched on the first m characters, then the hash won t match a later substring, so the system will not do much work comparing the same characters over and over. If the query strings don t exhibit too much self-matching, but all the queries are different sizes, then the resulting instruction count is O(T(Q)+T(D)log Q ). 6

7 Integrate the above techniques into Algorithm SimpleRabinKarp, we get the following improved algorithm for DNA sequence matching. KuszmaulRabinKarp() //Encode strings 1 Read database D into main memory (and find lengths). 2 Read querybase Q into main memory (and find lengths). //Preprocess the strings 3 for q Q 4 Compute the fingerprint of q. 5 Sort the querybase into ascending length. 6 Group the querybase strings by size and self-matchingness. 7 for q Q 8 L the length of the shortest string in q s group. 9 s q hash(s <0,L> ) 10 Store s q in a hash table. // Now start the actual search 11 for d D 12 for L {the unique shortest lengths} 13 for i below d L 14 h hash(d <i,i+l> ) 15 for q in the hash table matching h 16 if q is not already matched in d and i+ q d and the strings match then 17 Record the match of q at i on d. // Print matches 18 Sort the matches. 19 Print the sorted matches. 5.4 Special Case of Rabin-Karp Searching From the above introduction we notice that a special case where all the query strings are the samesizeisafundamental one. Wenameithereas equal-sizequerystrings, inshortesqs. If allthequery strings arethesame size q, we canhashthequery strings andstorethe query hashes in a hash table. For each string d in the database, we can compute hash(d <i,i+ q > ) only once and compare it to all the query hashes, and each comparison can run in O(1) instructions. In this case the total number of instructions executed is O(T(Q)+T(D)), which is asymptotically optimal. Dr. Kuszmaul s trick is to reduce the general case where query strings are not all the same size to the one which can be handled like ESQS s. Therefore, we focus on the matching of ESQS s, since it is the core of Algorithm KuszmaulRabinKarp. In this assignment, we will implement the following special algorithm for DNA sequence matching based on Algorithm KuszmaulRabinKarp. 7

8 EsqsRabinKarp() //Encode strings 1 Read database D into main memory (encode strings and find their lengths). 2 Read querybase Q into main memory (encode strings and find their lengths). //Hash the query strings 3 L the length of the query string 4 for q Q 5 s q hash(s <0,L> ) 6 Store q with hash value of s q in a hash table. // Now start the actual search 7 for d D 8 for i below d L 9 h hash(d <i,i+l> ) 10 for q in the hash table matching h 11 if q is not already matched in d and i+l d and the strings match then 12 Record the match of q at i on d. 13 Print the matches. References [1] Donald E. Knuth, J.H. Morris, and Vaughn B. Pratt. Fast pattern matching in strings. SIAM Journal of Computing, 6(2):323350, [2] Robert S. Boyer and J.S. Moore. A fast string searching algorithm. Communications of the ACM, 20(10):762772, October [3] Richard M. Karp and Michael O. Rabin. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31, March

Algorithms and Data Structures Lesson 3

Algorithms and Data Structures Lesson 3 Algorithms and Data Structures Lesson 3 Michael Schwarzkopf https://www.uni weimar.de/de/medien/professuren/medieninformatik/grafische datenverarbeitung Bauhaus University Weimar May 30, 2018 Overview...of

More information

Data Structures and Algorithms. Course slides: String Matching, Algorithms growth evaluation

Data Structures and Algorithms. Course slides: String Matching, Algorithms growth evaluation Data Structures and Algorithms Course slides: String Matching, Algorithms growth evaluation String Matching Basic Idea: Given a pattern string P, of length M Given a text string, A, of length N Do all

More information

String Matching. Pedro Ribeiro 2016/2017 DCC/FCUP. Pedro Ribeiro (DCC/FCUP) String Matching 2016/ / 42

String Matching. Pedro Ribeiro 2016/2017 DCC/FCUP. Pedro Ribeiro (DCC/FCUP) String Matching 2016/ / 42 String Matching Pedro Ribeiro DCC/FCUP 2016/2017 Pedro Ribeiro (DCC/FCUP) String Matching 2016/2017 1 / 42 On this lecture The String Matching Problem Naive Algorithm Deterministic Finite Automata Knuth-Morris-Pratt

More information

String Processing Workshop

String Processing Workshop String Processing Workshop String Processing Overview What is string processing? String processing refers to any algorithm that works with data stored in strings. We will cover two vital areas in string

More information

CS/COE 1501

CS/COE 1501 CS/COE 1501 www.cs.pitt.edu/~nlf4/cs1501/ String Pattern Matching General idea Have a pattern string p of length m Have a text string t of length n Can we find an index i of string t such that each of

More information

String matching algorithms تقديم الطالب: سليمان ضاهر اشراف المدرس: علي جنيدي

String matching algorithms تقديم الطالب: سليمان ضاهر اشراف المدرس: علي جنيدي String matching algorithms تقديم الطالب: سليمان ضاهر اشراف المدرس: علي جنيدي للعام الدراسي: 2017/2016 The Introduction The introduction to information theory is quite simple. The invention of writing occurred

More information

Given a text file, or several text files, how do we search for a query string?

Given a text file, or several text files, how do we search for a query string? CS 840 Fall 2016 Text Search and Succinct Data Structures: Unit 4 Given a text file, or several text files, how do we search for a query string? Note the query/pattern is not of fixed length, unlike key

More information

Algorithms and Data Structures

Algorithms and Data Structures Algorithms and Data Structures Charles A. Wuethrich Bauhaus-University Weimar - CogVis/MMC May 11, 2017 Algorithms and Data Structures String searching algorithm 1/29 String searching algorithm Introduction

More information

Lecture 7 February 26, 2010

Lecture 7 February 26, 2010 6.85: Advanced Data Structures Spring Prof. Andre Schulz Lecture 7 February 6, Scribe: Mark Chen Overview In this lecture, we consider the string matching problem - finding all places in a text where some

More information

Introduction to. Algorithms. Lecture 6. Prof. Constantinos Daskalakis. CLRS: Chapter 17 and 32.2.

Introduction to. Algorithms. Lecture 6. Prof. Constantinos Daskalakis. CLRS: Chapter 17 and 32.2. 6.006- Introduction to Algorithms Lecture 6 Prof. Constantinos Daskalakis CLRS: Chapter 17 and 32.2. LAST TIME Dictionaries, Hash Tables Dictionary: Insert, Delete, Find a key can associate a whole item

More information

Introduction to Algorithms

Introduction to Algorithms Introduction to Algorithms 6.046J/18.401J Lecture 22 Prof. Piotr Indyk Today String matching problems HKN Evaluations (last 15 minutes) Graded Quiz 2 (outside) Piotr Indyk Introduction to Algorithms December

More information

Indexing and Searching

Indexing and Searching Indexing and Searching Introduction How to retrieval information? A simple alternative is to search the whole text sequentially Another option is to build data structures over the text (called indices)

More information

CSCI S-Q Lecture #13 String Searching 8/3/98

CSCI S-Q Lecture #13 String Searching 8/3/98 CSCI S-Q Lecture #13 String Searching 8/3/98 Administrivia Final Exam - Wednesday 8/12, 6:15pm, SC102B Room for class next Monday Graduate Paper due Friday Tonight Precomputation Brute force string searching

More information

String Algorithms. CITS3001 Algorithms, Agents and Artificial Intelligence. 2017, Semester 2. CLRS Chapter 32

String Algorithms. CITS3001 Algorithms, Agents and Artificial Intelligence. 2017, Semester 2. CLRS Chapter 32 String Algorithms CITS3001 Algorithms, Agents and Artificial Intelligence Tim French School of Computer Science and Software Engineering The University of Western Australia CLRS Chapter 32 2017, Semester

More information

1 Introduciton. 2 Tries /651: Algorithms CMU, Spring Lecturer: Danny Sleator

1 Introduciton. 2 Tries /651: Algorithms CMU, Spring Lecturer: Danny Sleator 15-451/651: Algorithms CMU, Spring 2015 Lecture #25: Suffix Trees April 22, 2015 (Earth Day) Lecturer: Danny Sleator Outline: Suffix Trees definition properties (i.e. O(n) space) applications Suffix Arrays

More information

String Matching Algorithms

String Matching Algorithms String Matching Algorithms 1. Naïve String Matching The naïve approach simply test all the possible placement of Pattern P[1.. m] relative to text T[1.. n]. Specifically, we try shift s = 0, 1,..., n -

More information

Improved Parallel Rabin-Karp Algorithm Using Compute Unified Device Architecture

Improved Parallel Rabin-Karp Algorithm Using Compute Unified Device Architecture Improved Parallel Rabin-Karp Algorithm Using Compute Unified Device Architecture Parth Shah 1 and Rachana Oza 2 1 Chhotubhai Gopalbhai Patel Institute of Technology, Bardoli, India parthpunita@yahoo.in

More information

Knuth-Morris-Pratt. Kranthi Kumar Mandumula Indiana State University Terre Haute IN, USA. December 16, 2011

Knuth-Morris-Pratt. Kranthi Kumar Mandumula Indiana State University Terre Haute IN, USA. December 16, 2011 Kranthi Kumar Mandumula Indiana State University Terre Haute IN, USA December 16, 2011 Abstract KMP is a string searching algorithm. The problem is to find the occurrence of P in S, where S is the given

More information

Document Compression and Ciphering Using Pattern Matching Technique

Document Compression and Ciphering Using Pattern Matching Technique Document Compression and Ciphering Using Pattern Matching Technique Sawood Alam Department of Computer Engineering, Jamia Millia Islamia, New Delhi, India, ibnesayeed@gmail.com Abstract This paper describes

More information

SORTING. Practical applications in computing require things to be in order. To consider: Runtime. Memory Space. Stability. In-place algorithms???

SORTING. Practical applications in computing require things to be in order. To consider: Runtime. Memory Space. Stability. In-place algorithms??? SORTING + STRING COMP 321 McGill University These slides are mainly compiled from the following resources. - Professor Jaehyun Park slides CS 97SI - Top-coder tutorials. - Programming Challenges book.

More information

Chapter 7. Space and Time Tradeoffs. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

Chapter 7. Space and Time Tradeoffs. Copyright 2007 Pearson Addison-Wesley. All rights reserved. Chapter 7 Space and Time Tradeoffs Copyright 2007 Pearson Addison-Wesley. All rights reserved. Space-for-time tradeoffs Two varieties of space-for-time algorithms: input enhancement preprocess the input

More information

String matching algorithms

String matching algorithms String matching algorithms Deliverables String Basics Naïve String matching Algorithm Boyer Moore Algorithm Rabin-Karp Algorithm Knuth-Morris- Pratt Algorithm Copyright @ gdeepak.com 2 String Basics A

More information

String Matching Algorithms

String Matching Algorithms String Matching Algorithms Georgy Gimel farb (with basic contributions from M. J. Dinneen, Wikipedia, and web materials by Ch. Charras and Thierry Lecroq, Russ Cox, David Eppstein, etc.) COMPSCI 369 Computational

More information

COMPARATIVE ANALYSIS ON EFFICIENCY OF SINGLE STRING PATTERN MATCHING ALGORITHMS

COMPARATIVE ANALYSIS ON EFFICIENCY OF SINGLE STRING PATTERN MATCHING ALGORITHMS International Journal of Latest Trends in Engineering and Technology Special Issue SACAIM 2016, pp. 221-225 e-issn:2278-621x COMPARATIVE ANALYSIS ON EFFICIENCY OF SINGLE STRING PATTERN MATCHING ALGORITHMS

More information

Lecture 6: Hashing Steven Skiena

Lecture 6: Hashing Steven Skiena Lecture 6: Hashing Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794 4400 http://www.cs.stonybrook.edu/ skiena Dictionary / Dynamic Set Operations Perhaps

More information

String Matching in Scribblenauts Unlimited

String Matching in Scribblenauts Unlimited String Matching in Scribblenauts Unlimited Jordan Fernando / 13510069 Program Studi Teknik Informatika Sekolah Teknik Elektro dan Informatika Institut Teknologi Bandung, Jl. Ganesha 10 Bandung 40132, Indonesia

More information

Applied Databases. Sebastian Maneth. Lecture 14 Indexed String Search, Suffix Trees. University of Edinburgh - March 9th, 2017

Applied Databases. Sebastian Maneth. Lecture 14 Indexed String Search, Suffix Trees. University of Edinburgh - March 9th, 2017 Applied Databases Lecture 14 Indexed String Search, Suffix Trees Sebastian Maneth University of Edinburgh - March 9th, 2017 2 Recap: Morris-Pratt (1970) Given Pattern P, Text T, find all occurrences of

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Introduction to Algorithms December 8, 2004 Massachusetts Institute of Technology 6.046J/18.410J Professors Piotr Indyk and Charles E. Leiserson Handout 34 Problem Set 9 Solutions Reading: Chapters 32.1

More information

CSC152 Algorithm and Complexity. Lecture 7: String Match

CSC152 Algorithm and Complexity. Lecture 7: String Match CSC152 Algorithm and Complexity Lecture 7: String Match Outline Brute Force Algorithm Knuth-Morris-Pratt Algorithm Rabin-Karp Algorithm Boyer-Moore algorithm String Matching Aims to Detecting the occurrence

More information

Exact String Matching. The Knuth-Morris-Pratt Algorithm

Exact String Matching. The Knuth-Morris-Pratt Algorithm Exact String Matching The Knuth-Morris-Pratt Algorithm Outline for Today The Exact Matching Problem A simple algorithm Motivation for better algorithms The Knuth-Morris-Pratt algorithm The Exact Matching

More information

Technical University of Denmark

Technical University of Denmark page 1 of 12 pages Technical University of Denmark Written exam, December 11, 2015. Course name: Algorithms and data structures. Course number: 02110. Aids allowed: All written materials are permitted.

More information

CS/COE 1501

CS/COE 1501 CS/COE 1501 www.cs.pitt.edu/~nlf4/cs1501/ Introduction Meta-notes These notes are intended for use by students in CS1501 at the University of Pittsburgh. They are provided free of charge and may not be

More information

Fast Substring Matching

Fast Substring Matching Fast Substring Matching Andreas Klein 1 2 3 4 5 6 7 8 9 10 Abstract The substring matching problem occurs in several applications. Two of the well-known solutions are the Knuth-Morris-Pratt algorithm (which

More information

Department of Computer Applications. MCA 312: Design and Analysis of Algorithms. [Part I : Medium Answer Type Questions] UNIT I

Department of Computer Applications. MCA 312: Design and Analysis of Algorithms. [Part I : Medium Answer Type Questions] UNIT I MCA 312: Design and Analysis of Algorithms [Part I : Medium Answer Type Questions] UNIT I 1) What is an Algorithm? What is the need to study Algorithms? 2) Define: a) Time Efficiency b) Space Efficiency

More information

University of Waterloo CS240 Spring 2018 Help Session Problems

University of Waterloo CS240 Spring 2018 Help Session Problems University of Waterloo CS240 Spring 2018 Help Session Problems Reminder: Final on Wednesday, August 1 2018 Note: This is a sample of problems designed to help prepare for the final exam. These problems

More information

A New String Matching Algorithm Based on Logical Indexing

A New String Matching Algorithm Based on Logical Indexing The 5th International Conference on Electrical Engineering and Informatics 2015 August 10-11, 2015, Bali, Indonesia A New String Matching Algorithm Based on Logical Indexing Daniar Heri Kurniawan Department

More information

HASH TABLES. Hash Tables Page 1

HASH TABLES. Hash Tables Page 1 HASH TABLES TABLE OF CONTENTS 1. Introduction to Hashing 2. Java Implementation of Linear Probing 3. Maurer s Quadratic Probing 4. Double Hashing 5. Separate Chaining 6. Hash Functions 7. Alphanumeric

More information

University of Waterloo CS240R Winter 2018 Help Session Problems

University of Waterloo CS240R Winter 2018 Help Session Problems University of Waterloo CS240R Winter 2018 Help Session Problems Reminder: Final on Monday, April 23 2018 Note: This is a sample of problems designed to help prepare for the final exam. These problems do

More information

Data Structure and Algorithm Midterm Reference Solution TA

Data Structure and Algorithm Midterm Reference Solution TA Data Structure and Algorithm Midterm Reference Solution TA email: dsa1@csie.ntu.edu.tw Problem 1. To prove log 2 n! = Θ(n log n), it suffices to show N N, c 1, c 2 > 0 such that c 1 n ln n ln n! c 2 n

More information

Parallel and Sequential Data Structures and Algorithms Lecture (Spring 2012) Lecture 25 Suffix Arrays

Parallel and Sequential Data Structures and Algorithms Lecture (Spring 2012) Lecture 25 Suffix Arrays Lecture 25 Suffix Arrays Parallel and Sequential Data Structures and Algorithms, 15-210 (Spring 2012) Lectured by Kanat Tangwongsan April 17, 2012 Material in this lecture: The main theme of this lecture

More information

ECE 122. Engineering Problem Solving Using Java

ECE 122. Engineering Problem Solving Using Java ECE 122 Engineering Problem Solving Using Java Lecture 27 Linear and Binary Search Overview Problem: How can I efficiently locate data within a data structure Searching for data is a fundamental function

More information

CS/COE 1501 cs.pitt.edu/~bill/1501/ Introduction

CS/COE 1501 cs.pitt.edu/~bill/1501/ Introduction CS/COE 1501 cs.pitt.edu/~bill/1501/ Introduction Meta-notes These notes are intended for use by students in CS1501 at the University of Pittsburgh. They are provided free of charge and may not be sold

More information

Clever Linear Time Algorithms. Maximum Subset String Searching. Maximum Subrange

Clever Linear Time Algorithms. Maximum Subset String Searching. Maximum Subrange Clever Linear Time Algorithms Maximum Subset String Searching Maximum Subrange Given an array of numbers values[1..n] where some are negative and some are positive, find the subarray values[start..end]

More information

Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest. Introduction to Algorithms

Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest. Introduction to Algorithms Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Introduction to Algorithms Preface xiii 1 Introduction 1 1.1 Algorithms 1 1.2 Analyzing algorithms 6 1.3 Designing algorithms 1 1 1.4 Summary 1 6

More information

Analysis of Algorithms. 5-Dec-16

Analysis of Algorithms. 5-Dec-16 Analysis of Algorithms 5-Dec-16 Time and space To analyze an algorithm means: developing a formula for predicting how fast an algorithm is, based on the size of the input (time complexity), and/or developing

More information

How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. MySQL UC 2010 How Fractal Trees Work 1

How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. MySQL UC 2010 How Fractal Trees Work 1 MySQL UC 2010 How Fractal Trees Work 1 How TokuDB Fractal TreeTM Indexes Work Bradley C. Kuszmaul MySQL UC 2010 How Fractal Trees Work 2 More Information You can download this talk and others at http://tokutek.com/technology

More information

University of Waterloo CS240R Fall 2017 Review Problems

University of Waterloo CS240R Fall 2017 Review Problems University of Waterloo CS240R Fall 2017 Review Problems Reminder: Final on Tuesday, December 12 2017 Note: This is a sample of problems designed to help prepare for the final exam. These problems do not

More information

Analysis of algorithms

Analysis of algorithms Analysis of algorithms Time and space To analyze an algorithm means: developing a formula for predicting how fast an algorithm is, based on the size of the input (time complexity), and/or developing a

More information

Introduction to Analysis of Algorithms

Introduction to Analysis of Algorithms Introduction to Analysis of Algorithms Analysis of Algorithms To determine how efficient an algorithm is we compute the amount of time that the algorithm needs to solve a problem. Given two algorithms

More information

Quiz 1 Practice Problems

Quiz 1 Practice Problems Introduction to Algorithms: 6.006 Massachusetts Institute of Technology March 7, 2008 Professors Srini Devadas and Erik Demaine Handout 6 1 Asymptotic Notation Quiz 1 Practice Problems Decide whether these

More information

University of Waterloo CS240R Fall 2017 Solutions to Review Problems

University of Waterloo CS240R Fall 2017 Solutions to Review Problems University of Waterloo CS240R Fall 2017 Solutions to Review Problems Reminder: Final on Tuesday, December 12 2017 Note: This is a sample of problems designed to help prepare for the final exam. These problems

More information

Week 12: Running Time and Performance

Week 12: Running Time and Performance Week 12: Running Time and Performance 1 Most of the problems you have written in this class run in a few seconds or less Some kinds of programs can take much longer: Chess algorithms (Deep Blue) Routing

More information

Clever Linear Time Algorithms. Maximum Subset String Searching

Clever Linear Time Algorithms. Maximum Subset String Searching Clever Linear Time Algorithms Maximum Subset String Searching Maximum Subrange Given an array of numbers values[1..n] where some are negative and some are positive, find the subarray values[start..end]

More information

Algorithm Analysis. (Algorithm Analysis ) Data Structures and Programming Spring / 48

Algorithm Analysis. (Algorithm Analysis ) Data Structures and Programming Spring / 48 Algorithm Analysis (Algorithm Analysis ) Data Structures and Programming Spring 2018 1 / 48 What is an Algorithm? An algorithm is a clearly specified set of instructions to be followed to solve a problem

More information

CPSC 211, Sections : Data Structures and Implementations, Honors Final Exam May 4, 2001

CPSC 211, Sections : Data Structures and Implementations, Honors Final Exam May 4, 2001 CPSC 211, Sections 201 203: Data Structures and Implementations, Honors Final Exam May 4, 2001 Name: Section: Instructions: 1. This is a closed book exam. Do not use any notes or books. Do not confer with

More information

DO NOT REPRODUCE. CS61B, Fall 2008 Test #3 (revised) P. N. Hilfinger

DO NOT REPRODUCE. CS61B, Fall 2008 Test #3 (revised) P. N. Hilfinger CS6B, Fall 2008 Test #3 (revised) P. N. Hilfinger. [7 points] Please give short answers to the following, giving reasons where called for. Unless a question says otherwise, time estimates refer to asymptotic

More information

Multi-Pattern String Matching with Very Large Pattern Sets

Multi-Pattern String Matching with Very Large Pattern Sets Multi-Pattern String Matching with Very Large Pattern Sets Leena Salmela L. Salmela, J. Tarhio and J. Kytöjoki: Multi-pattern string matching with q-grams. ACM Journal of Experimental Algorithmics, Volume

More information

Algorithms: Design & Practice

Algorithms: Design & Practice Algorithms: Design & Practice Deepak Kumar Bryn Mawr College Spring 2018 Course Essentials Algorithms Design & Practice How to design Learn some good ones How to implement practical considerations How

More information

CSED233: Data Structures (2017F) Lecture12: Strings and Dynamic Programming

CSED233: Data Structures (2017F) Lecture12: Strings and Dynamic Programming (2017F) Lecture12: Strings and Dynamic Programming Daijin Kim CSE, POSTECH dkim@postech.ac.kr Strings A string is a sequence of characters Examples of strings: Python program HTML document DNA sequence

More information

17 dicembre Luca Bortolussi SUFFIX TREES. From exact to approximate string matching.

17 dicembre Luca Bortolussi SUFFIX TREES. From exact to approximate string matching. 17 dicembre 2003 Luca Bortolussi SUFFIX TREES From exact to approximate string matching. An introduction to string matching String matching is an important branch of algorithmica, and it has applications

More information

Lecture Notes for Chapter 2: Getting Started

Lecture Notes for Chapter 2: Getting Started Instant download and all chapters Instructor's Manual Introduction To Algorithms 2nd Edition Thomas H. Cormen, Clara Lee, Erica Lin https://testbankdata.com/download/instructors-manual-introduction-algorithms-2ndedition-thomas-h-cormen-clara-lee-erica-lin/

More information

CSC Design and Analysis of Algorithms. Lecture 9. Space-For-Time Tradeoffs. Space-for-time tradeoffs

CSC Design and Analysis of Algorithms. Lecture 9. Space-For-Time Tradeoffs. Space-for-time tradeoffs CSC 8301- Design and Analysis of Algorithms Lecture 9 Space-For-Time Tradeoffs Space-for-time tradeoffs Two varieties of space-for-time algorithms: input enhancement -- preprocess input (or its part) to

More information

What is an Algorithm?

What is an Algorithm? What is an Algorithm? Step-by-step procedure used to solve a problem These steps should be capable of being performed by a machine Must eventually stop and so produce an answer Types of Algorithms Iterative

More information

Chapter 11. Analysis of Algorithms

Chapter 11. Analysis of Algorithms Chapter 11 Analysis of Algorithms Chapter Scope Efficiency goals The concept of algorithm analysis Big-Oh notation The concept of asymptotic complexity Comparing various growth functions Java Foundations,

More information

COMP 161 Lecture Notes 16 Analyzing Search and Sort

COMP 161 Lecture Notes 16 Analyzing Search and Sort COMP 161 Lecture Notes 16 Analyzing Search and Sort In these notes we analyze search and sort. Counting Operations When we analyze the complexity of procedures we re determine the order of the number of

More information

CSE332 Summer 2010: Final Exam

CSE332 Summer 2010: Final Exam CSE332 Summer 2010: Final Exam Closed notes, closed book; calculator ok. Read the instructions for each problem carefully before answering. Problems vary in point-values, difficulty and length, so you

More information

Computer Science 210 Data Structures Siena College Fall Topic Notes: Complexity and Asymptotic Analysis

Computer Science 210 Data Structures Siena College Fall Topic Notes: Complexity and Asymptotic Analysis Computer Science 210 Data Structures Siena College Fall 2017 Topic Notes: Complexity and Asymptotic Analysis Consider the abstract data type, the Vector or ArrayList. This structure affords us the opportunity

More information

Chapter 2: Complexity Analysis

Chapter 2: Complexity Analysis Chapter 2: Complexity Analysis Objectives Looking ahead in this chapter, we ll consider: Computational and Asymptotic Complexity Big-O Notation Properties of the Big-O Notation Ω and Θ Notations Possible

More information

A Practical Distributed String Matching Algorithm Architecture and Implementation

A Practical Distributed String Matching Algorithm Architecture and Implementation A Practical Distributed String Matching Algorithm Architecture and Implementation Bi Kun, Gu Nai-jie, Tu Kun, Liu Xiao-hu, and Liu Gang International Science Index, Computer and Information Engineering

More information

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio Project Proposal ECE 526 Spring 2006 Modified Data Structure of Aho-Corasick Benfano Soewito, Ed Flanigan and John Pangrazio 1. Introduction The internet becomes the most important tool in this decade

More information

One of Mike s early tests cases involved the following code, which produced the error message about something being really wrong:

One of Mike s early tests cases involved the following code, which produced the error message about something being really wrong: Problem 1 (3 points) While working on his solution to project 2, Mike Clancy encountered an interesting bug. His program includes a LineNumber class that supplies, among other methods, a constructor that

More information

Sorting Pearson Education, Inc. All rights reserved.

Sorting Pearson Education, Inc. All rights reserved. 1 19 Sorting 2 19.1 Introduction (Cont.) Sorting data Place data in order Typically ascending or descending Based on one or more sort keys Algorithms Insertion sort Selection sort Merge sort More efficient,

More information

Today s Outline. CSE 326: Data Structures Asymptotic Analysis. Analyzing Algorithms. Analyzing Algorithms: Why Bother? Hannah Takes a Break

Today s Outline. CSE 326: Data Structures Asymptotic Analysis. Analyzing Algorithms. Analyzing Algorithms: Why Bother? Hannah Takes a Break Today s Outline CSE 326: Data Structures How s the project going? Finish up stacks, queues, lists, and bears, oh my! Math review and runtime analysis Pretty pictures Asymptotic analysis Hannah Tang and

More information

CSCI-UA /002, Fall 2012 RK Lab: Approximiate document matching Due: Sep 26 11:59PM

CSCI-UA /002, Fall 2012 RK Lab: Approximiate document matching Due: Sep 26 11:59PM CSCI-UA.0201-001/002, Fall 2012 RK Lab: Approximiate document matching Due: Sep 26 11:59PM 1 Introduction In many scenarios, we would like to know how similar two documents are are to each other. For example,

More information

String Matching. Geetha Patil ID: Reference: Introduction to Algorithms, by Cormen, Leiserson and Rivest

String Matching. Geetha Patil ID: Reference: Introduction to Algorithms, by Cormen, Leiserson and Rivest String Matching Geetha Patil ID: 312410 Reference: Introduction to Algorithms, by Cormen, Leiserson and Rivest Introduction: This paper covers string matching problem and algorithms to solve this problem.

More information

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1 Hash Open Indexing Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1 Warm Up Consider a StringDictionary using separate chaining with an internal capacity of 10. Assume our buckets are implemented

More information

Groups of two-state devices are used to represent data in a computer. In general, we say the states are either: high/low, on/off, 1/0,...

Groups of two-state devices are used to represent data in a computer. In general, we say the states are either: high/low, on/off, 1/0,... Chapter 9 Computer Arithmetic Reading: Section 9.1 on pp. 290-296 Computer Representation of Data Groups of two-state devices are used to represent data in a computer. In general, we say the states are

More information

Data Structure and Algorithm Homework #1 Due: 1:20pm, Tuesday, March 21, 2017 TA === Homework submission instructions ===

Data Structure and Algorithm Homework #1 Due: 1:20pm, Tuesday, March 21, 2017 TA   === Homework submission instructions === Data Structure and Algorithm Homework #1 Due: 1:20pm, Tuesday, March 21, 2017 TA email: dsa1@csie.ntu.edu.tw === Homework submission instructions === For Problem 1-3, please put all your solutions in a

More information

CS 103 Unit 8b Slides

CS 103 Unit 8b Slides 1 CS 103 Unit 8b Slides Algorithms Mark Redekopp ALGORITHMS 2 3 How Do You Find a Word in a Dictionary Describe an efficient method Assumptions / Guidelines Let target_word = word to lookup N pages in

More information

String quicksort solves this problem by processing the obtained information immediately after each symbol comparison.

String quicksort solves this problem by processing the obtained information immediately after each symbol comparison. Lcp-Comparisons General (non-string) comparison-based sorting algorithms are not optimal for sorting strings because of an imbalance between effort and result in a string comparison: it can take a lot

More information

BUNDLED SUFFIX TREES

BUNDLED SUFFIX TREES Motivation BUNDLED SUFFIX TREES Luca Bortolussi 1 Francesco Fabris 2 Alberto Policriti 1 1 Department of Mathematics and Computer Science University of Udine 2 Department of Mathematics and Computer Science

More information

CS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims. Lecture 10: Asymptotic Complexity and

CS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims. Lecture 10: Asymptotic Complexity and CS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims Lecture 10: Asymptotic Complexity and What Makes a Good Algorithm? Suppose you have two possible algorithms or

More information

Fundamental Algorithms

Fundamental Algorithms Fundamental Algorithms Chapter 7: Hash Tables Michael Bader Winter 2011/12 Chapter 7: Hash Tables, Winter 2011/12 1 Generalised Search Problem Definition (Search Problem) Input: a sequence or set A of

More information

Boyer-Moore. Ben Langmead. Department of Computer Science

Boyer-Moore. Ben Langmead. Department of Computer Science Boyer-Moore Ben Langmead Department of Computer Science Please sign guestbook (www.langmead-lab.org/teaching-materials) to tell me briefly how you are using the slides. For original Keynote files, email

More information

COS 226 Algorithms and Data Structures Spring Final

COS 226 Algorithms and Data Structures Spring Final COS 226 Algorithms and Data Structures Spring 2015 Final This exam has 14 questions worth a total of 100 points. You have 180 minutes. The exam is closed book, except that you are allowed to use a one

More information

Questions. 6. Suppose we were to define a hash code on strings s by:

Questions. 6. Suppose we were to define a hash code on strings s by: Questions 1. Suppose you are given a list of n elements. A brute force method to find duplicates could use two (nested) loops. The outer loop iterates over position i the list, and the inner loop iterates

More information

END-TERM EXAMINATION

END-TERM EXAMINATION (Please Write your Exam Roll No. immediately) Exam. Roll No... END-TERM EXAMINATION Paper Code : MCA-205 DECEMBER 2006 Subject: Design and analysis of algorithm Time: 3 Hours Maximum Marks: 60 Note: Attempt

More information

DO NOT. In the following, long chains of states with a single child are condensed into an edge showing all the letters along the way.

DO NOT. In the following, long chains of states with a single child are condensed into an edge showing all the letters along the way. CS61B, Fall 2009 Test #3 Solutions P. N. Hilfinger Unless a question says otherwise, time estimates refer to asymptotic bounds (O( ), Ω( ), Θ( )). Always give the simplest bounds possible (O(f(x)) is simpler

More information

Final Exam Solutions

Final Exam Solutions COS 226 Algorithms and Data Structures Spring 2014 Final Exam Solutions 1. Analysis of algorithms. (8 points) (a) 20,000 seconds The running time is cubic. (b) 48N bytes. Each Node object uses 48 bytes:

More information

58093 String Processing Algorithms. Lectures, Autumn 2013, period II

58093 String Processing Algorithms. Lectures, Autumn 2013, period II 58093 String Processing Algorithms Lectures, Autumn 2013, period II Juha Kärkkäinen 1 Contents 0. Introduction 1. Sets of strings Search trees, string sorting, binary search 2. Exact string matching Finding

More information

Basic operators, Arithmetic, Relational, Bitwise, Logical, Assignment, Conditional operators. JAVA Standard Edition

Basic operators, Arithmetic, Relational, Bitwise, Logical, Assignment, Conditional operators. JAVA Standard Edition Basic operators, Arithmetic, Relational, Bitwise, Logical, Assignment, Conditional operators JAVA Standard Edition Java - Basic Operators Java provides a rich set of operators to manipulate variables.

More information

Lecture L16 April 19, 2012

Lecture L16 April 19, 2012 6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture L16 April 19, 2012 1 Overview In this lecture, we consider the string matching problem - finding some or all places in a text where

More information

What Every Programmer Should Know About Floating-Point Arithmetic

What Every Programmer Should Know About Floating-Point Arithmetic What Every Programmer Should Know About Floating-Point Arithmetic Last updated: October 15, 2015 Contents 1 Why don t my numbers add up? 3 2 Basic Answers 3 2.1 Why don t my numbers, like 0.1 + 0.2 add

More information

B+ Tree Review. CSE332: Data Abstractions Lecture 10: More B Trees; Hashing. Can do a little better with insert. Adoption for insert

B+ Tree Review. CSE332: Data Abstractions Lecture 10: More B Trees; Hashing. Can do a little better with insert. Adoption for insert B+ Tree Review CSE2: Data Abstractions Lecture 10: More B Trees; Hashing Dan Grossman Spring 2010 M-ary tree with room for L data items at each leaf Order property: Subtree between keys x and y contains

More information

Data structure and algorithm in Python

Data structure and algorithm in Python Data structure and algorithm in Python Algorithm Analysis Xiaoping Zhang School of Mathematics and Statistics, Wuhan University Table of contents 1. Experimental studies 2. The Seven Functions used in

More information

COS 226 Algorithms and Data Structures Fall Final

COS 226 Algorithms and Data Structures Fall Final COS 226 Algorithms and Data Structures Fall 2018 Final This exam has 16 questions (including question 0) worth a total of 100 points. You have 180 minutes. This exam is preprocessed by a computer when

More information

Lecture 3: February Local Alignment: The Smith-Waterman Algorithm

Lecture 3: February Local Alignment: The Smith-Waterman Algorithm CSCI1820: Sequence Alignment Spring 2017 Lecture 3: February 7 Lecturer: Sorin Istrail Scribe: Pranavan Chanthrakumar Note: LaTeX template courtesy of UC Berkeley EECS dept. Notes are also adapted from

More information

PARALLEL PROCESSING UNIT 3. Dr. Ahmed Sallam

PARALLEL PROCESSING UNIT 3. Dr. Ahmed Sallam PARALLEL PROCESSING 1 UNIT 3 Dr. Ahmed Sallam FUNDAMENTAL GPU ALGORITHMS More Patterns Reduce Scan 2 OUTLINES Efficiency Measure Reduce primitive Reduce model Reduce Implementation and complexity analysis

More information

CS 61B Summer 2005 (Porter) Midterm 2 July 21, SOLUTIONS. Do not open until told to begin

CS 61B Summer 2005 (Porter) Midterm 2 July 21, SOLUTIONS. Do not open until told to begin CS 61B Summer 2005 (Porter) Midterm 2 July 21, 2005 - SOLUTIONS Do not open until told to begin This exam is CLOSED BOOK, but you may use 1 letter-sized page of notes that you have created. Problem 0:

More information

Study of Selected Shifting based String Matching Algorithms

Study of Selected Shifting based String Matching Algorithms Study of Selected Shifting based String Matching Algorithms G.L. Prajapati, PhD Dept. of Comp. Engg. IET-Devi Ahilya University, Indore Mohd. Sharique Dept. of Comp. Engg. IET-Devi Ahilya University, Indore

More information