CSC 421: Algorithm Design & Analysis. Spring Space vs. time

Similar documents
CSC 421: Algorithm Design Analysis. Spring 2017

CSC Design and Analysis of Algorithms. Lecture 9. Space-For-Time Tradeoffs. Space-for-time tradeoffs

Chapter 7. Space and Time Tradeoffs. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

CSC 321: Data Structures. Fall 2016

CSC 321: Data Structures. Fall 2017

THINGS WE DID LAST TIME IN SECTION

CSC 421: Algorithm Design Analysis. Spring 2013

String Matching. Pedro Ribeiro 2016/2017 DCC/FCUP. Pedro Ribeiro (DCC/FCUP) String Matching 2016/ / 42

Review. CSE 143 Java. A Magical Strategy. Hash Function Example. Want to implement Sets of objects Want fast contains( ), add( )

CSC 321: Data Structures. Fall 2016

CSC 222: Object-Oriented Programming. Fall 2015

CSC 427: Data Structures and Algorithm Analysis. Fall 2004

Algorithms: Design & Practice

Hash table basics mod 83 ate. ate. hashcode()

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada

Hash[ string key ] ==> integer value

University of Waterloo CS240 Spring 2018 Help Session Problems

CSC 321: Data Structures. Fall 2017

Hash table basics. ate à. à à mod à 83

Cpt S 223. School of EECS, WSU

Hash table basics mod 83 ate. ate

Module 2: Classical Algorithm Design Techniques

Announcements. Hash Functions. Hash Functions 4/17/18 HASHING

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Lecture 13: AVL Trees and Binary Heaps

Selection, Bubble, Insertion, Merge, Heap, Quick Bucket, Radix

Figure 1. The Suffix Trie Representing "BANANAS".

Topic 22 Hash Tables

NAME: c. (true or false) The median is always stored at the root of a binary search tree.

COMP-202 Unit 7: More Advanced OOP. CONTENTS: ArrayList HashSet (Optional) HashMap (Optional)

CMSC 341 Lecture 14: Priority Queues, Heaps

CSC 222: Object-Oriented Programming Spring 2012

String Processing Workshop

University of Waterloo CS240R Fall 2017 Review Problems

COMP4128 Programming Challenges

Lecture 7 February 26, 2010

Topics Applications Most Common Methods Serial Search Binary Search Search by Hashing (next lecture) Run-Time Analysis Average-time analysis Time anal

Data Structure. A way to store and organize data in order to support efficient insertions, queries, searches, updates, and deletions.

University of Waterloo CS240R Winter 2018 Help Session Problems

Garbage Collection (1)

HEAPS & PRIORITY QUEUES

CS350: Data Structures Heaps and Priority Queues

Algorithms and Data Structures Lesson 3

EXAMINATIONS 2015 COMP103 INTRODUCTION TO DATA STRUCTURES AND ALGORITHMS

Indexing and Searching

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

Parsing and Pattern Recognition

Hash table basics mod 83 ate. ate

CSC152 Algorithm and Complexity. Lecture 7: String Match

SORTING. Practical applications in computing require things to be in order. To consider: Runtime. Memory Space. Stability. In-place algorithms???

CSCI S-Q Lecture #13 String Searching 8/3/98

Announcements. Container structures so far. IntSet ADT interface. Sets. Today s topic: Hashing (Ch. 10) Next topic: Graphs. Break around 11:45am

CS 261 Data Structures

Sorting. Quicksort analysis Bubble sort. November 20, 2017 Hassan Khosravi / Geoffrey Tien 1

Optimization of Boyer-Moore-Horspool-Sunday Algorithm

Announcements. Submit Prelim 2 conflicts by Thursday night A6 is due Nov 7 (tomorrow!)

DO NOT. In the following, long chains of states with a single child are condensed into an edge showing all the letters along the way.

Friday Four Square! 4:15PM, Outside Gates

DATA STRUCTURES + SORTING + STRING

Exam Data structures DIT960/DAT036

ECE 122. Engineering Problem Solving Using Java

EXAMINATIONS 2017 TRIMESTER 2

Applied Databases. Sebastian Maneth. Lecture 14 Indexed String Search, Suffix Trees. University of Edinburgh - March 9th, 2017

Announcements. Today s topic: Hashing (Ch. 10) Next topic: Graphs. Break around 11:45am

HASH TABLES cs2420 Introduction to Algorithms and Data Structures Spring 2015

String Matching Algorithms

A New String Matching Algorithm Based on Logical Indexing

Taking Stock. IE170: Algorithms in Systems Engineering: Lecture 7. (A subset of) the Collections Interface. The Java Collections Interfaces

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Announcements. Programming assignment 1 posted - need to submit a.sh file

Points off Total off Net Score. CS 314 Final Exam Spring Your Name Your UTEID

Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum

CPS222 Lecture: Sets. 1. Projectable of random maze creation example 2. Handout of union/find code from program that does this

Boyer-Moore. Ben Langmead. Department of Computer Science

University of Waterloo CS240R Fall 2017 Solutions to Review Problems

CSCI-1200 Data Structures Fall 2018 Lecture 23 Priority Queues II

Lecture L16 April 19, 2012

COS 226 Midterm Exam, Spring 2011

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio

University of Illinois at Urbana-Champaign Department of Computer Science. Final Examination

Second Examination Solution

Practice Midterm Exam Solutions

Exam Data structures DAT036/DAT037/DIT960

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

Answer any FIVE questions 5 x 10 = 50. Graph traversal algorithms process all the vertices of a graph in a systematic fashion.

Lecture 18 April 12, 2005

CSE373: Data Structures & Algorithms Lecture 5: Dictionary ADTs; Binary Trees. Linda Shapiro Spring 2016

BINARY HEAP cs2420 Introduction to Algorithms and Data Structures Spring 2015

Trees & Tree-Based Data Structures. Part 4: Heaps. Definition. Example. Properties. Example Min-Heap. Definition

An introduction to suffix trees and indexing

CSE373: Data Structures & Algorithms Lecture 9: Priority Queues and Binary Heaps. Linda Shapiro Spring 2016

Lecture 18. Collision Resolution

Avi Sharma # Remo Reuben #

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

Summer Final Exam Review Session August 5, 2009

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

Chapter 10. Sorting and Searching Algorithms. Fall 2017 CISC2200 Yanjun Li 1. Sorting. Given a set (container) of n elements

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

Lecture 7. Transform-and-Conquer

Transcription:

CSC 421: Algorithm Design & Analysis Spring 2015 Space vs. time space/time tradeoffs examples: heap sort, data structure redundancy, hashing string matching brute force, Horspool algorithm, Boyer-Moore algorithm 1

Space vs. time for many applications, it is possible to trade memory for speed i.e., additional storage can allow for a more efficient algorithm ArrayList wastes space when it expands (doubles in size) each expansion yields a list with (slightly less than) half empty improves overall performance since only log N expansions when adding N items linked structures sacrifice space (reference fields) for improved performance e.g., a linked list takes twice the space, but provides O(1) add/remove from either end hash tables can obtain O(1) add/remove/search if willing to waste space to minimize the chance of collisions, must keep the load factor low HashSet & HashMap resize (& rehash) if load factor every reaches 75% 2

Heap sort can utilize a heap to efficiently sort a list of values start with the ArrayList to be sorted construct a heap out of the elements repeatedly, remove min element and put back into the ArrayList public static <E extends Comparable<? super E>> void heapsort(arraylist<e> items) { MinHeap<E> itemheap = new MinHeap<E>(); } for (int i = 0; i < items.size(); i++) { itemheap.add(items.get(i)); } for (int i = 0; i < items.size(); i++) { items.set(i, itemheap.minvalue()); itemheap.remove(); } N items in list, each insertion can require O(log N) swaps to reheapify à construct heap in O(N log N) N items in heap, each removal can require O(log N) swap to reheapify à copy back in O(N log N) can get the same effect by copying the values into a TreeSet, then iterating 3

Other space vs. time examples we have made similar space/time tradeoffs often in code AnagramFinder (find all anagrams of a given word) can preprocess the dictionary & build a map of words, keyed by sorted letters then, each anagram lookup is O(1) BinaryTree (implement a binary tree & eventually extend to BST) kept track of the number of nodes in a field (updated on each add/remove) turned the O(N) size operation into O(1) KWIC Index (store and display titles based on keywords) to allow for efficient searches/displays, store a copy for each keyword other examples? 4

String matching many useful applications involve searching text for patterns word processing (e.g., global find and replace) data mining (e.g., looking for common parameters) genomics (e.g., searching for gene sequences) TTAAGGACCGCATGCCCTCGAATAGGCTTGAGCTTGCAATTAACGCCCAC in general given a (potentially long) string S and need to find (relatively small) pattern P an obvious, brute force solution exists: O( S P ) however, utilizing preprocessing and additional memory can reduce this to O( S ) in practice, it is often < S HOW CAN THIS BE?!? 5

Brute force String: Pattern: BIZ the brute force approach would shift the pattern along, looking for a match: at worst: S - P +1 passes through the pattern each pass requires at most P comparisons è O( S P ) 6

Smart shifting BIZ suppose we look at the rightmost letter in the attempted match here, compare Z in the pattern with O in the string since there is no O in the pattern, can skip the next two comparisons FOOBARBIZBBAZ BIZ again, no R in BIZ, so can skip another two comparisons FOOBARBIZBBAZ BIZ 7

Horspool algorithm given a string S and pattern P, 1. Construct a shift table containing each letter of the alphabet. if the letter appears in P (other than in the last index) à store distance from end otherwise, store P e.g., P = BIZ A B C I X Y Z 3 2 3 1 3 3 3 2. Align the pattern with the start of the string S, i.e., with S 0 S 1 S P -1 3. Repeatedly, compare from right-to-left with the aligned sequence S i S i+1 S i+ P -1 if all P letters match, then FOUND. if not, then shift P to the right by shifttable[s i+ P -1 ] if the shift falls off the end, then NOT FOUND 8

Horspool example 1 S = P=BIZ shifttable for "BIZ" A B C I X Y Z 3 2 3 1 3 3 3 BIZ Z and O do not match, so shift shifttable[o] = 3 spots BIZ Z and R do not match, so shift shifttable[r] = 3 spots BIZ pattern is FOUND total number of comparisons = 5 9

Horspool example 2 S = "FOZIZBARBIZBAZ" shifttable for "BIZ" FOZIZBARBIZBAZ BIZ P="BIZ" A B C I X Y Z 3 2 3 1 3 3 3 Z and Z match, but not I and O, so shift shifttable[z] = 3 spots FOZIZBARBIZBAZ BIZ Z and B do not match, so shift shifttable[b] = 2 spots FOZIZBARBIZBAZ BIZ Z and R do not match, so shift shifttable[r] = 3 spots FOZIZBARBIZBAZ BIZ pattern is FOUND total number of comparisons = 7 suppose we are looking for BIZ in a string with no Z's how many comparisons? 10

Horspool exercise String: BANDANABABANANAN Pattern: BANANA shift table for BANANA? steps? A B C D N Y Z BANDANABABANANAN BANANA 11

Horspool analysis space & time requires storing shift table whose size is the alphabet since the alphabet is usually fixed, table requires O(1) space worst case is O( S P ) this occurs when skips are infrequent & close matches to the pattern appear often for random data, however, only O( S ) Horspool algorithm is a simplification of a more complex (and well-known) algorithm: Boyer-Moore algorithm in practice, Horspool is often faster however, Boyer-Moore has O( S ) worst case, instead of O( S P ) 12

Boyer-Moore algorithm based on two kinds of shifts (both compare right-to-left, find first mismatch) the first is bad-symbol shift (based on the symbol in S that caused the mismatch) FIZBIZ F and B don't match, shift to align F FIZBIZ I and Z don't match, shift to align I FIZBIZ I and Z don't match, shift to align I FIZBIZ F and Z don't match, shift to align F FIZBIZ FOUND 13

Bad symbol shift bad symbol table is alphabet P kth row contains shift amount if mismatch occurred at index k (from right) bad symbol table for FIZBIZ: A B C F I Y Z 0 6 2 6 5 1 6-1 5 1 5 4-5 2 2 4-4 3 2 4 1 3 3 3 3 2 1 3-4 2 2 2 1-2 2 FIZBIZ F and B don't match à badsymboltable(f,2) = 3 FIZBIZ I and Z don't match à badsymboltable(i, 0) = 1 FIZBIZ I and Z don't match à badsymboltable(i, 3) = 1 FIZBIZ F and Z don't match à badsymboltable(f, 0) = 5 FIZBIZ FOUND 14

Good suffix shift find the longest suffix that matches if that suffix appears to the left in P preceded by a different char, shift to align if not, then shift the entire length of the word FIZBIZ IZ suffix matches, IZ appears to left so shift to align FIZBIZ no suffix match, so shift 1 spot FIZBIZ BIZ suffix matches, doesn't appear again so full shift FIZBIZ FOUND 15

Good suffix shift good suffix shift table is P assume that suffix matches but char to the left does not if that suffix appears to the left preceded by a different char, shift to align if not, then can shift the entire length of the word good suffix table for FIZBIZ IZBIZ ZBIZ BIZ IZ Z 6 6 6 3 6 1 FIZBIZ IZ suffix matches à goodsuffixtable(iz) = 3 FIZBIZ no suffix match à goodsuffixtable() = 1 FIZBIZ BIZ suffix matches à goodsuffixtable(biz) = 6 FIZBIZ FOUND 16

Boyer-Moore string search algorithm 1. calculate the bad symbol and good suffix shift tables 2. while match not found and not off the edge a) compare pattern with string section b) shift1 = bad symbol shift of rightmost non-matching char c) shift2 = good suffix shift of longest matching suffix d) shift string section for comparison by max(shift1, shift2) the algorithm has been proven to require at most 3* S comparisons so, O( S ) in practice, can require fewer than S comparisons requires storing O( P ) bad symbol shift table and O( P ) good suffix shift table 17

Boyer-Moore exercise String: BANDANABABANANAN Pattern: BANANA bad symbol table for BANANA? good suffix table for BANANA? A B C D N Y Z 0 1 2 3 4 steps? ANANA NANA ANA NA A BANDANABABANANAN BANANA 18

Interesting YouTube videos 15 Sorting Algorithms in 6 Minutes Algorithms Are Taking Over The World: Christopher Steiner at TEDxOrangeCoast What is Google Pagerank Algorithm & How It Works? 19