CSC Design and Analysis of Algorithms. Lecture 9. Space-For-Time Tradeoffs. Space-for-time tradeoffs

Similar documents
Chapter 7. Space and Time Tradeoffs. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

CS 350 Algorithms and Complexity

THINGS WE DID LAST TIME IN SECTION

Module 2: Classical Algorithm Design Techniques

of characters from an alphabet, then, the hash function could be:

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

CSC 421: Algorithm Design & Analysis. Spring Space vs. time

Algorithms and Data Structures Lesson 3

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

Fast Lookup: Hash tables

Data Structures and Algorithm Analysis (CSC317) Hash tables (part2)

AAL 217: DATA STRUCTURES

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Lecture 17. Improving open-addressing hashing. Brent s method. Ordered hashing CSE 100, UCSD: LEC 17. Page 1 of 19

CSCI S-Q Lecture #13 String Searching 8/3/98

DATA STRUCTURES AND ALGORITHMS

CMSC 341 Hashing. Based on slides from previous iterations of this course

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

kvjlixapejrbxeenpphkhthbkwyrwamnugzhppfx

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

CSC 421: Algorithm Design Analysis. Spring 2017

Hashing. Introduction to Data Structures Kyuseok Shim SoEECS, SNU.

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Applied Databases. Sebastian Maneth. Lecture 14 Indexed String Search, Suffix Trees. University of Edinburgh - March 9th, 2017

Indexing and Searching

HASH TABLES.

Understand how to deal with collisions

Hash Table and Hashing

Introduction to. Algorithms. Lecture 6. Prof. Constantinos Daskalakis. CLRS: Chapter 17 and 32.2.

String Matching Algorithms

Hashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong

Hash Tables. Hashing Probing Separate Chaining Hash Function

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

Fast Exact String Matching Algorithms

TABLES AND HASHING. Chapter 13

String Matching. Pedro Ribeiro 2016/2017 DCC/FCUP. Pedro Ribeiro (DCC/FCUP) String Matching 2016/ / 42

DS ,21. L11-12: Hashmap

This lecture. Iterators ( 5.4) Maps. Maps. The Map ADT ( 8.1) Comparison to java.util.map

Hashing. October 19, CMPE 250 Hashing October 19, / 25

Hash Tables Hash Tables Goodrich, Tamassia

Hash Tables. Hash Tables

BBM371& Data*Management. Lecture 6: Hash Tables

Chapter 27 Hashing. Liang, Introduction to Java Programming, Eleventh Edition, (c) 2017 Pearson Education, Inc. All rights reserved.

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

Introduction To Hashing

Module 3: Hashing Lecture 9: Static and Dynamic Hashing. The Lecture Contains: Static hashing. Hashing. Dynamic hashing. Extendible hashing.

Application of String Matching in Auto Grading System

String Matching Algorithms

The dictionary problem

University of Waterloo CS240 Spring 2018 Help Session Problems

Knuth-Morris-Pratt. Kranthi Kumar Mandumula Indiana State University Terre Haute IN, USA. December 16, 2011

HASH TABLES. Goal is to store elements k,v at index i = h k

CS/COE 1501

CSC152 Algorithm and Complexity. Lecture 7: String Match

CS 350 : Data Structures Hash Tables

Open Addressing: Linear Probing (cont.)

MA/CSSE 473 Day 23. Binary (max) Heap Quick Review

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Part I Anton Gerdelan

COMP171. Hashing.

Hashing Techniques. Material based on slides by George Bebis

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio

Dictionaries (Maps) Hash tables. ADT Dictionary or Map. n INSERT: inserts a new element, associated to unique value of a field (key)

Algorithms and Data Structures

Lecture 6: Hashing Steven Skiena

Data Structures and Algorithms. Chapter 7. Hashing

CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS

CS 241 Analysis of Algorithms

Hashing. Hashing Procedures

Hashing. 7- Hashing. Hashing. Transform Keys into Integers in [[0, M 1]] The steps in hashing:

Hashing file organization

Data Structures and Algorithms(10)

Data Structures and Algorithms. Roberto Sebastiani

Successful vs. Unsuccessful

Lecture 4. Hashing Methods

Hash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1]

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

Topic HashTable and Table ADT

A New String Matching Algorithm Based on Logical Indexing

CMSC423: Bioinformatic Algorithms, Databases and Tools. Exact string matching: introduction

CS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table

Hashing (Κατακερματισμός)

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

MIDTERM EXAM THURSDAY MARCH

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Data Structures and Algorithms. Course slides: String Matching, Algorithms growth evaluation

Hashing 1. Searching Lists

CSC263 Week 5. Larry Zhang.

Hashed-Based Indexing

Outline. hash tables hash functions open addressing chained hashing

UNIT III BALANCED SEARCH TREES AND INDEXING

Text Algorithms (6EAP) Lecture 3: Exact pa;ern matching II

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

DATA STRUCTURES AND ALGORITHMS

Transcription:

CSC 8301- Design and Analysis of Algorithms Lecture 9 Space-For-Time Tradeoffs Space-for-time tradeoffs Two varieties of space-for-time algorithms: input enhancement -- preprocess input (or its part) to store some info to be used later in solving the problem String processing algorithms prestructuring -- preprocess the input to make accessing its elements easier Hashing 1

String matching pattern: a string of m characters to search for text: a (long) string of n characters to search in Brute force algorithm: Step 1 Align pattern at beginning of text Step 2 Moving from left to right, compare each character of pattern to the corresponding character in text until either all characters are found to match (successful search) or a mismatch is detected Step 3 While a mismatch is detected and the text is not yet exhausted, realign pattern one position to the right and repeat Step 2. String searching algorithms Several string searching algorithms are based on the input enhancement idea of preprocessing the pattern Boyer-Moore algorithm stores information into two tables Horspool s algorithm simplifies the Boyer-Moore algorithm by using just one table 2

Horspool s Algorithm A simplified version of Boyer-Moore algorithm: preprocesses pattern to generate a shift table that determines how much to shift the pattern when a mismatch occurs compares characters right-to-left always makes a shift based on the text s character c aligned with the last character in the pattern according to the shift table s entry for c How far to shift? Look at first (rightmost) character in text that was compared. Three cases: 1. The character is not in the pattern...c... (c not in pattern) BAOBAB 2. The character is in the pattern (but not at rightmost position)...o... (O occurs once in pattern) BAOBAB...A... (A occurs twice in pattern) BAOBAB 3. The rightmost characters do match...b... BAOBAB 3

Shift table Shift sizes t(c) can be precomputed in a shift table as - pattern s length m, if c not among the first m-1 characters of pattern - distance from the rightmost c among the first m-1 characters of the pattern to its right end, otherwise by scanning pattern before search begins Shift table is indexed by text and pattern alphabet. Eg, for BAOBAB: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1 2 6 6 6 6 6 6 6 6 6 6 6 6 3 6 6 6 6 6 6 6 6 6 6 6 Shift sizes can be computed by initializing all entries to pattern s length and then scanning pattern left to right Tracing Horspool s Algorithm A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Text: Pattern: BAOBAB BARD LOVED BANANAS 4

Boyer-Moore algorithm Based on same two ideas: comparing pattern characters to text from right to left precomputing shift sizes in two tables bad-symbol table that determines how much to shift the pattern when a mismatch occurs good-suffix table with same idea applied to the matched part (suffix) of the pattern Bad-symbol shift in Boyer-Moore algorithm If the rightmost character of the pattern doesn t match, BM algorithm acts as Horspool s If the rightmost character of the pattern does match, BM compares preceding characters right to left until either all pattern s characters match or a mismatch on text s character c is encountered after k>0 matches text c pattern bad-symbol shift d 1 = max{t 1 (c ) - k, 1} 5

Good-suffix shift in Boyer-Moore algorithm Good-suffix shift d 2 is applied after 0<k<m last characters were matched. d 2 (k) = the distance between matched suffix of size k and its rightmost occurrence in the pattern that is not preceded by the same character as the suffix Example: CABABA d 2 (1) = d 2 (2) = d 2 (3) = Example: ABCBAB d 2 (1) = d 2 (2) = d 2 (3) = Good-suffix shift in Boyer-Moore algorithm d 2 (k) = the distance between matched suffix of size k and its rightmost occurrence in the pattern that is not preceded by the same character as the suffix If there is no such occurrence, find the longest prefix of size p < k that matches the p-character suffix; d 2 is the distance between this prefix and the p-character suffix k pa'ern d 2 If there are no such 1 WOWWOW suffix-prefix matches, 2 WOWWOW d 2 (k) = m 3 WOWWOW 4 WOWWOW 5 WOWWOW 6

Boyer-Moore Algorithm Step 1 Fill in the bad-symbol shift table Step 2 Fill in the good-suffix shift table Step 3 Align the pattern against the beginning of the text Step 4 Repeat until a matching substring is found or text ends: Compare the corresponding characters right to left. If no characters match, retrieve entry t 1 (c) from the badsymbol table for the text s character c causing the mismatch and shift the pattern to the right by t 1 (c). If 0 < k < m characters are matched, retrieve entry t 1 (c) from the bad-symbol table for the text s character c causing the mismatch and entry d 2 (k) from the good-suffix table and shift the pattern to the right by d = max {d 1, d 2 } where d 1 = max{t 1 (c) - k, 1}. Example of Boyer-Moore application A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1 2 6 6 6 6 6 6 6 6 6 6 6 6 3 6 6 6 6 6 6 6 6 6 6 6-6 B E S S _ K N E W _ A B O U T _ B A O B A B S B A O B A B d 1 = t 1 (K) = 6 B A O B A B d 1 = t 1 (_)-2 = 4 k pattern d 2 d 2 (2) = 5 1 BAOBAB 2 B A O B A B d 1 = t 1 (_)-1 = 5 2 BAOBAB 5 d 2 (1) = 2 3 BAOBAB 5 B A O B A B (success) 4 BAOBAB 5 5 BAOBAB 5 7

Boyer-Moore Algorithm After matching successfully 0<k<m characters, the algorithm shifts the pattern right by d = max {d 1, d 2 } where d 1 = max{t 1 (c) - k, 1} is bad-symbol shift d 2 (k) is good-suffix shift Example: Find pattern AT_THAT in WHICH_FINALLY_HALTS. AT_THAT Boyer-Moore Algorithm WHICH_FINALLY_HALTS. AT_THAT AT_THAT pa'ern t 1 A AT_THAT H AT_THAT T AT_THAT - AT_THAT else AT_THAT k pa'ern d 2 1 AT_THAT 2 AT_THAT 3 AT_THAT 4 AT_THAT 5 AT_THAT 6 AT_THAT 8

Hashing A very efficient method for implementing a dictionary, i.e., a set with the operations: find insert delete Important Applications: symbol tables (compilers) cryptography (message digests, passwords) databases Hash tables and hash functions The idea of hashing is to map keys of a given file of size n into a table of size m, called the hash table, by using a predefined function, called the hash function, h: K location (bucket) in the hash table Example: student records, key = SSN. Hash function: h(k) = K mod m where m is some integer (typically, prime) If m = 1000, where is record with SSN= 314159265 stored? Generally, a hash function should: be easy to compute distribute keys about evenly throughout the hash table 9

Collisions If h(k 1 ) = h(k 2 ) then there is a collision. Good hash functions result in fewer collisions. Collisions can never be completely eliminated. Two types handle collisions differently: Open hashing - bucket points to linked list of all keys hashing to it. Closed hashing one key per bucket in case of collision, find another bucket for one of the keys (need collision resolution strategy) linear probing: use next free bucket double hashing: use second hash function to compute increment Open hashing (Separate Chaining) Keys are stored in linked lists attached to cells of a hash table. Each list contains all the keys hashed to its cell. Ex.: A, FOOL, AND, HIS, MONEY, ARE, SOON, PARTED h(k) = sum of K s letters positions in the alphabet MOD 13 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 key: A FOOL AND HIS MONEY ARE SOON PARTED bucket: 1 9 6 10 7 11 11 12 10

Open hashing (Separate chaining) Keys are stored in linked lists outside a hash table whose elements serve as the lists headers. Example: A, FOOL, AND, HIS, MONEY, ARE, SOON, PARTED h(k) = sum of K s letters positions in the alphabet MOD 13 Key A FOOL AND HIS MONEY ARE SOON PARTED h(k) 1 9 6 10 7 11 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 A Search for KID AND MONEY FOOL HIS ARE PARTED SOON Open hashing (cont.) If hash function distributes keys uniformly, average length of linked list will be α = n/m. This ratio is called load factor. Average number of probes in successful, S, and unsuccessful searches, U: S 1+α/2, U = α Load α is typically kept small (ideally, about 1) Open hashing still works if n > m 11

Closed hashing (Open addressing) Keys are stored inside a hash table. Key A FOOL AND HIS MONEY ARE SOON PARTED h(k) 1 9 6 10 7 11 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 A A FOOL A AND FOOL A AND FOOL HIS A AND MONEY FOOL HIS A AND MONEY FOOL HIS ARE A AND MONEY FOOL HIS ARE SOON PARTED A AND MONEY FOOL HIS ARE SOON Closed hashing (cont.) Does not work if n > m Avoids pointers Deletions are not straightforward Number of probes to find/insert/delete a key depends on load factor α = n/m (hash table density) and collision resolution strategy. For linear probing: S = (½) (1+ 1/(1- α)) and U = (½) (1+ 1/(1- α)²) As the table gets filled (α approaches 1), the performance deteriorates: 12

Reading: Sections 7.2, 7.3 Exercises 7.2: 1-9 odd Exercises 7.3: 1, 2, 7, 8 Homework Next time: Dynamic Programming (Chapter 8) 13