Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

Similar documents
Topic HashTable and Table ADT

HASH TABLES. Goal is to store elements k,v at index i = h k

Hash Tables. Hashing Probing Separate Chaining Hash Function

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum

Understand how to deal with collisions

CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

TABLES AND HASHING. Chapter 13

CSE 214 Computer Science II Searching

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Open Addressing: Linear Probing (cont.)

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

DATA STRUCTURES AND ALGORITHMS

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

Today s Outline. CS 561, Lecture 8. Direct Addressing Problem. Hash Tables. Hash Tables Trees. Jared Saia University of New Mexico

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Hashing Algorithms. Hash functions Separate Chaining Linear Probing Double Hashing

Dictionaries-Hashing. Textbook: Dictionaries ( 8.1) Hash Tables ( 8.2)

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Comp 335 File Structures. Hashing

lecture23: Hash Tables

Hashing Techniques. Material based on slides by George Bebis

AAL 217: DATA STRUCTURES

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

Hash table basics. ate à. à à mod à 83

Hash Table and Hashing

Algorithms and Data Structures

Hashing. 5/1/2006 Algorithm analysis and Design CS 007 BE CS 5th Semester 2

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

CS 10: Problem solving via Object Oriented Programming Winter 2017

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

Data Structures And Algorithms

Cpt S 223. School of EECS, WSU

Hash Tables. Johns Hopkins Department of Computer Science Course : Data Structures, Professor: Greg Hager

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

Hash Tables. Gunnar Gotshalks. Maps 1

Hash table basics mod 83 ate. ate

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

Hashing. Hashing Procedures

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

COMP171. Hashing.

Hash table basics mod 83 ate. ate

HASH TABLES.

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Standard ADTs. Lecture 19 CS2110 Summer 2009

Algorithms and Data Structures

The physical database. Contents - physical database design DATABASE DESIGN I - 1DL300. Introduction to Physical Database Design

Data and File Structures Chapter 11. Hashing

Data Structures and Algorithms. Chapter 7. Hashing

Hash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1]

Hash Tables. Hash functions Open addressing. November 24, 2017 Hassan Khosravi / Geoffrey Tien 1

Outline. hash tables hash functions open addressing chained hashing

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Worst-case running time for RANDOMIZED-SELECT

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

HASH TABLES. Hash Tables Page 1

CSC263 Week 5. Larry Zhang.

Lecture 4. Hashing Methods

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

CS2210 Data Structures and Algorithms

Data Structures and Algorithms. Roberto Sebastiani

9/24/ Hash functions

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

Module 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo

CMSC 341 Hashing. Based on slides from previous iterations of this course

Lecture 7: Efficient Collections via Hashing

Fundamental Algorithms

Fast Lookup: Hash tables

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

Question Bank Subject: Advanced Data Structures Class: SE Computer

Hashing as a Dictionary Implementation

UNIT III BALANCED SEARCH TREES AND INDEXING

Hash Tables Hash Tables Goodrich, Tamassia

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2

Priority Queue Sorting

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

Hash tables. hashing -- idea collision resolution. hash function Java hashcode() for HashMap and HashSet big-o time bounds applications

Hash[ string key ] ==> integer value

Module 2: Classical Algorithm Design Techniques

Questions. 6. Suppose we were to define a hash code on strings s by:

CSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators)

Lecture Notes on Hash Tables

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada

CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions. Kevin Quinn Fall 2015

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

CS 2412 Data Structures. Chapter 10 Sorting and Searching

Hashing. October 19, CMPE 250 Hashing October 19, / 25

Transcription:

Announcements HW1 PAST DUE HW2 online: 7 questions, 60 points Nat l Inst visit Thu, ok? Last time: Continued PA1 Walk Through Dictionary ADT: Unsorted Hashing Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

Hash Table and Hashing Reference items in a table (the hash table) by keys Use arithmetic operations to transform keys into table addresses (buckets) Two issues in hashing: 1. construction of the hash function 2. policy for collision resolution

Components of Hash Functions Hash function (h()): transforms the search key into a bucket index hash code: t(key) intmap maps the key (which could be a string, etc.) into an integer (intmap) compression map: c(intmap) bucket maps the intmap into an address within the table, i.e., into the range [0, M 1], for table of size M Given a key: h(key) c(t(key)) bucket/index

Hash Functions Common parts of hashing function: truncation (hash code): remove parts common across keys, or extract bits from keys that are more likely to be unique acros keys. Examples: o 734 763 1583 o 141.213.8.193 o cello.eecs.umich.edu folding (hash code): partition the key into several parts and combine them in a convenient way modulo arithmetic (compression map)

Example of Folding: Polynomial Hash Code Polynomial hash code (t) takes positional info into account: t(x[]) = x[0]a k 1 + x[1]a k 2 +... + x[k 2]a + x[k 1] If a = 33, the intmap s are: t( tops ) = t( pots ) = Good choices of a for English words: {33,37,39,41} What does it mean for a to be a good choice? Why are these particular values good?

Compression Mapping division method: intmap mod M, M table size (prime) MAD (multiply and divide) method: a intmap + b mod M, a and b non-negative integers

Hashing: Collision Resolution With hashing, collision is inevitable What do you do when two items hash into the same bucket/bin? Methods of collision resolution: separate chaining scatter table o coalesced chaining o open addressing: - linear probing - quadratic probing - double hashing extensible hashing: dynamic hashing

Collision Resolution Separate chaining: let each bin points to a linked list of elements (see Fig. 8.3) Coalesced chaining: store linked list in unused portion of the table if item hashes to an already occupied bin, follow the linked list and add item to the end (coalesced chaining, see Fig. 8.4) hash table can only hold as many items as table size Open addressing: if there s a collision, apply another hash function repeatedly until there s no collision. Set of hash functions: {h 0, h 1, h 2,...}

Open Addressing To probe ::= to look at an entry and compare its key with target Linear probing: h i (key) = (h 0 (key) + i) mod M do a linear search from h 0 (key) until you find an empty slot (see Fig. 8.6, compare against Fig. 8.4) Clustering: when contiguous bins are all occupied Why is clustering undesirable? Assuming input size N, table size 2N: What is the best-case cluster distribution? What is the worst-case cluster distribution? What s the average time to find an empty slot in both cases?

Open Addressing (cont) Quadratic probing: h i (key) = (h 0 (key) + i 2 ) mod M less likely to form clusters, but only works when table is less than half full because it cannot hit every possible table address Double hashing: h i (key) = (h(key) + ih (key)) mod M use 2 distinct hash functions Removal on scatter tables: complicated, don t want to move an element up the table beyond its actual hash location; must rehash the rest of chain/cluster, or otherwise mark deleted entry as deleted (see Fig. 8.5)

Timing Analysis Differentiate between successful(s()) and unsuccessful(u()) search Example: passwd lookup wants success to be quick, failure can be (is desirable to be) slow What s the time complexity of search for separate chaining? Worst case: S() = U() =

Average Case: Load Factor Load factor: λ = N/M, where N = number of items, M = hash table size empty table, λ = 0 half full, λ = 0.5 full, λ = 1 can λ > 1? If hash function is good, length of average chain in separate chaining is λ = N/M (number of comparison reduced by a factor of M if using separate chaining) Given λ, average unsuccessful search time U(λ) = 1 + λ average successful search time S(λ) = 1 + λ/2

Average Case Search Times (FYI) Coalesced chaining: U(λ) = 1 1 4 (e2λ 1 2λ); For λ = 1, U(1) = 2.1 S(λ) = 1 8λ 1 (e2λ 1 2λ) + 4 λ ; For λ = 1, S(1) = 1.8 Linear probing: U(λ) = 1 2 (1 + ( 1 1 λ )2 ) S(λ) = 1 2 (1 + 1 1 λ ) Quadratic probing and double hashing: U(λ) = 1 1 λ S(λ) = 1 λ log 1 1 λ

Hashing Search Time Separate chaining: increases gradually Scatter table: increases dramatically as table fills, when table is full, no more keys may be inserted Extensible hashing (dynamic hashing): Double table size when it is half full Old entries must be re-hashed into larger table This is an expensive, but hopefully infrequent, operation Doubling cost is amortized over individual search/insert

When NOT to use Hashing Rank search: return the k th largest item Range search: return values between h and k Sort: return the values in order Readings: done with Preiss Ch. 8

Dictionary ADT Table lookup: look up a key in a table The key is associated with some other information (value) Key space is usually more structured/regular than value space, so easier to search An entry is a <key, value> pair For example: <title, mp3 file> Usually the ADT stores pointers to entries

Runtime Unsorted list: insert: O() search: O() Sorted list: insert: O() search: O()

Binary Search: Iterative Version Given a sorted list, with unique keys, perform a Divide and Conquer algorithm to find a key Find 31 in a[20 27 29 31 35 38 42 53 59 63 67 78] Write an iterative version of binary search: ibinsearch(int *a, int n, int key), a[] sorted, n array size, return index of key What is the time complexity of the algorithm?

Accounting Rule 5 Rule 5: Divide and Conquer: An algorithm is O(log n) if each constant time O(1) operation (e.g., CMP) can cut the problem size by a fraction (e.g., half) Corollary: If constant time is required to reduce the problem size by a constant amount, the algorithm is O(N)

Recursive Function What is a recursive function? Must have termination condition Example recursive function: n! Iterative version (assume n >= 0): ifact(int n) { for (int i = n-1; i > 0; i--) n *= i; return (n? n : 1); } What is its big-oh complexity? Recursive version (assume n >= 0): rfact(int n) { return (n? n*(rfact(n-1)) : 1); } How to compute its big-oh?

Time Complexity of rfact(n) Let T(n) be the operation-count complexity of rfact(n) Which operation shall we count? What is the value of T(n)? What is the big-oh complexity of rfact(n)? Any constant operation count can be replaced by 1 And the space complexity?

Recurrence Relation (Accounting Rule 6) Definition: a recurrence relation is a function of x that is recursively defined in terms of the same function on x, where x < x Examples: T(n) = 1 + T(n 1), T(0) = 1 T(n) = 2 T(n/2) + n, T(1) = 1 etc. Accounting Rule 6: Recurrence relations are natural descriptions of the timing complexity of recursive algorithms

Binary Search: Recursive Version Write a recursive version of binary search: rbinsearch(int *a, int left, int right, int key), a[] sorted, left index to first element in array, right index to last element in array, return index of key What is the time complexity of the algorithm? Recall: any constant per-level cost can be represented as 1

Divide and Conquer Divide and conquer doesn t work with linked list, unfortunately So why use linked-list at all? How can we speed up search and use dynamically allocated space?