STANDARD ADTS Lecture 17 CS2110 Spring 2013

Similar documents
Abstract Data Types (ADTs) Queues & Priority Queues. Sets. Dictionaries. Stacks 6/15/2011

Standard ADTs. Lecture 19 CS2110 Summer 2009

Stack ADT. ! push(x) puts the element x on top of the stack! pop removes the topmost element from the stack.

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

Hash table basics mod 83 ate. ate. hashcode()

Review. CSE 143 Java. A Magical Strategy. Hash Function Example. Want to implement Sets of objects Want fast contains( ), add( )

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Fast Lookup: Hash tables

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

EXAMINATIONS 2015 COMP103 INTRODUCTION TO DATA STRUCTURES AND ALGORITHMS

Hash table basics mod 83 ate. ate

DATA STRUCTURES AND ALGORITHMS

Habanero Extreme Scale Software Research Project

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

COMP 103 Introduction to Data Structures and Algorithms

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

CS 307 Final Spring 2009

Hash table basics mod 83 ate. ate

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum

EXAMINATIONS 2005 END-YEAR. COMP 103 Introduction to Data Structures and Algorithms

Linked lists (6.5, 16)

Hash Tables. Johns Hopkins Department of Computer Science Course : Data Structures, Professor: Greg Hager

Hash tables. hashing -- idea collision resolution. hash function Java hashcode() for HashMap and HashSet big-o time bounds applications

Lecture 7: Efficient Collections via Hashing

CSE 100: HASHING, BOGGLE

Hash table basics. ate à. à à mod à 83

PRIORITY QUEUES AND HEAPS

Hash Tables. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Data Dictionary Revisited

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

Hashing as a Dictionary Implementation

CS 310: Hash Table Collision Resolution

NAME: c. (true or false) The median is always stored at the root of a binary search tree.

(f) Given what we know about linked lists and arrays, when would we choose to use one data structure over the other?

HASH TABLES cs2420 Introduction to Algorithms and Data Structures Spring 2015

1.00 Lecture 32. Hashing. Reading for next time: Big Java Motivation

CS 3410 Ch 20 Hash Tables

stacks operation array/vector linked list push amortized O(1) Θ(1) pop Θ(1) Θ(1) top Θ(1) Θ(1) isempty Θ(1) Θ(1)

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Hash[ string key ] ==> integer value

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

EXAMINATIONS 2016 TRIMESTER 2

COMP 250. Lecture 27. hashing. Nov. 10, 2017

Symbol Tables. ASU Textbook Chapter 7.6, 6.5 and 6.3. Tsan-sheng Hsu.

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Algorithms and Data Structures

Lecture 10 March 4. Goals: hashing. hash functions. closed hashing. application of hashing

DUKE UNIVERSITY Department of Computer Science. Test 2: CompSci 100

Abstract Data Types Spring 2018 Exam Prep 5: February 12, 2018

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

This lecture. Iterators ( 5.4) Maps. Maps. The Map ADT ( 8.1) Comparison to java.util.map

CS/ENGRD 2110 SPRING Lecture 7: Interfaces and Abstract Classes

References and Homework ABSTRACT DATA TYPES; LISTS & TREES. Abstract Data Type (ADT) 9/24/14. ADT example: Set (bunch of different values)

PRIORITY QUEUES AND HEAPS

Unit #5: Hash Functions and the Pigeonhole Principle

Topic HashTable and Table ADT

Collections, Maps and Generics

Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Collections. Powered by Pentalog. by Vlad Costel Ungureanu for Learn Stuff

Questions. 6. Suppose we were to define a hash code on strings s by:

CS 310: Hash Table Collision Resolution Strategies

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

DATA STRUCTURES AND ALGORITHMS

EXAMINATIONS 2017 TRIMESTER 2

Lecture 16: HashTables 10:00 AM, Mar 2, 2018

Hash Tables. Hashing Probing Separate Chaining Hash Function

Hashing. October 19, CMPE 250 Hashing October 19, / 25

UNIT III BALANCED SEARCH TREES AND INDEXING

EXAMINATIONS 2011 Trimester 2, MID-TERM TEST. COMP103 Introduction to Data Structures and Algorithms SOLUTIONS

Section 05: Midterm Review

Cpt S 223. School of EECS, WSU

DUKE UNIVERSITY Department of Computer Science. Test 2: CompSci 100e

CSC 321: Data Structures. Fall 2016

Logic, Algorithms and Data Structures ADT:s & Hash tables. By: Jonas Öberg, Lars Pareto

CS 310: Hash Table Collision Resolution

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

1B1b Implementing Data Structures Lists, Hash Tables and Trees

CSE 332: Data Structures & Parallelism Lecture 7: Dictionaries; Binary Search Trees. Ruth Anderson Winter 2019

11/27/12. CS202 Fall 2012 Lecture 11/15. Hashing. What: WiCS CS Courses: Inside Scoop When: Monday, Nov 19th from 5-7pm Where: SEO 1000

CSC 321: Data Structures. Fall 2017

Data Structures. COMS W1007 Introduction to Computer Science. Christopher Conway 1 July 2003

Announcements. Hash Functions. Hash Functions 4/17/18 HASHING

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

6.7 b. Show that a heap of eight elements can be constructed in eight comparisons between heap elements. Tournament of pairwise comparisons

CS 302 ALGORITHMS AND DATA STRUCTURES

Open Addressing: Linear Probing (cont.)

Algorithms and Data Structures

Announcements HEAPS & PRIORITY QUEUES. Abstract vs concrete data structures. Concrete data structures. Abstract data structures

Priority Queue. 03/09/04 Lecture 17 1

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Object Oriented Programming and Design in Java. Session 19 Instructor: Bert Huang

EXAMINATIONS 2012 MID YEAR. COMP103 Introduction to Data Structures and Algorithms SOLUTIONS

HEAPS & PRIORITY QUEUES

Transcription:

STANDARD ADTS Lecture 17 CS2110 Spring 2013

Abstract Data Types (ADTs) 2 A method for achieving abstraction for data structures and algorithms ADT = model + operations In Java, an interface corresponds well to an ADT The interface describes the operations, but says nothing at all about how they are implemented Describes what each operation does, but not how it does it An ADT is independent of its implementation Example: Stack interface/adt public interface Stack { public void push(object x); public Object pop(); public Object peek(); public boolean isempty(); public void clear(); }

Queues & Priority Queues 3 ADT Queue Operations: void add(object x); Object poll(); Object peek(); boolean isempty(); void clear(); Where used: Simple job scheduler (e.g., print queue) Wide use within other algorithms ADT PriorityQueue Operations: void insert(object x); Object getmax(); Object peekatmax(); boolean isempty(); void clear(); Where used: Job scheduler for OS Event-driven simulation Can be used for sorting Wide use within other algorithms A (basic) queue is first in, first out. A priority queue ranks objects: getmax() returns the largest according to the comparator interface.

Sets 4 ADT Set Operations: void insert(object element); boolean contains(object element); void remove(object element); boolean isempty(); void clear(); for(object o: myset) {... } Where used: Wide use within other algorithms Note: no duplicates allowed A set with duplicates is sometimes called a multiset or bag A set makes no promises about ordering, but you can still iterate over it.

Dictionaries 5 ADT Dictionary (aka Map) Operations: void insert(object key, Object value); void update(object key, Object value); Object find(object key); void remove(object key); boolean isempty(); void clear(); Think of: key = word; value = definition Where used: Symbol tables Wide use within other algorithms A HashMap is a particular implementation of the Map interface

Data Structure Building Blocks 6 These are implementation building blocks that are often used to build more-complicated data structures Arrays Linked Lists Singly linked Doubly linked Binary Trees Graphs Adjacency matrix Adjacency list

From interface to implementation 7 Given that we want to support some interface, the designer still faces a choice What will be the best way to implement this interface for my expected type of use? Choice of implementation can reflect many considerations Major factors we think about Speed for typical use case Storage space required

Array Implementation of Stack 8 class ArrayStack implements Stack { private Object[] array; //Array that holds the Stack private int index = 0; //First empty slot in Stack public ArrayStack(int maxsize) { array = new Object[maxSize]; } public void push(object x) { array[index++] = x; } public Object pop() { return array[--index]; } public Object peek() { return array[index-1]; } public boolean isempty() { return index == 0; } public void clear() { index = 0; } } index max-1 4 3 2 1 0 O(1) worstcase time for each operation Question: What can go wrong?. What if maxsize is too small?

Linked List Implementation of Stack 9 class ListStack implements Stack { private Node head = null; //Head of list that //holds the Stack public void push(object x) { head = new Node(x, head); } public Object pop() { Node temp = head; head = head.next; return temp.data; } public Object peek() { return head.data; } public boolean isempty() { return head == null; } public void clear() { head = null; } } O(1) worst-case time for each operation (but constant is larger) Note that array implementation can overflow, but the linked list version cannot head

Queue Implementations 10 Possible implementations head Linked List head last last Array with head always at A[0] (poll( ) becomes expensive) (can overflow) Array with wraparound (can overflow) last Recall: operations are add, poll, peek, For linked-list All operations are O(1) For array with head at A[0] poll takes time O(n) Other ops are O(1) Can overflow For array with wraparound All operations are O(1) Can overflow

A Queue From 2 Stacks 11 Add pushes onto stack A Poll pops from stack B If B is empty, move all elements from stack A to stack B Some individual operations are costly, but still O(1) time per operations over the long run

Dealing with Overflow 12 For array implementations of stacks and queues, use table doubling Check for overflow with each insert op If table will overflow, Allocate a new table twice the size Copy everything over The operations that cause overflow are expensive, but still constant time per operation over the long run (proof later)

Goal: Design a Dictionary (aka Map) 13 Operations Array implementation: Using an array of (key,value) pairs void insert(key, value) void update(key, value) Object find(key) void remove(key) boolean isempty() void clear() Unsorted Sorted insert O(1) O(n) update O(n) O(log n) find O(n) O(log n) remove O(n) O(n) n is the number of items currently held in the dictionary

Hashing 14 Idea: compute an array index via a hash function h U is the universe of keys h: U [0,,m-1] where m = hash table size Usually U is much bigger than m, so collisions are possible (two elements with the same hash code) h should be easy to compute avoid collisions have roughly equal probability for each table position Typical situation: U = all legal identifiers Typical hash function: h converts each letter to a number, then compute a function of these numbers Best hash functions are highly random This is connected to cryptography We ll return to this in a few minutes

A Hashing Example 15 Suppose each word below has the following hashcode jan 7 feb 0 mar 5 apr 2 may 4 jun 7 jul 3 aug 7 sep 2 oct 5 How do we resolve collisions? use chaining: each table position is the head of a list for any particular problem, this might work terribly In practice, using a good hash function, we can assume each position is equally likely

Analysis for Hashing with Chaining 16 Analyzed in terms of load factor λ = n/m = (items in table)/(table size) We count the expected number of probes (key comparisons) Goal: Determine expected number of probes for an unsuccessful search Expected number of probes for an unsuccessful search = average number of items per table position = n/m = λ Expected number of probes for a successful search = 1 + λ = O(λ) Worst case is O(n)

Table Doubling 17 We know each operation takes time O(λ) where λ λ =n/m So it gets worse as n gets large relative to m Table Doubling: Set a bound for λ (call it λ 0 ) Whenever λ reaches this bound: Create a new table twice as big Then rehash all the data As before, operations usually take time O(1) But sometimes we copy the whole table

Analysis of Table Doubling 18 Suppose we reach a state with n items in a table of size m and that we have just completed a table doubling

Analysis of Table Doubling, Cont d 19 Total number of insert operations needed to reach current table = copying work + initial insertions of items = 2n + n = 3n inserts Each insert takes expected time O(λ 0 ) or O(1), so total expected time to build entire table is O(n) Thus, expected time per operation is O(1) Disadvantages of table doubling: Worst-case insertion time of O(n) is definitely achieved (but rarely) Thus, not appropriate for time critical operations

Concept: hash codes 20 Definition: a hash code is the output of a function that takes some input and maps it to a pseudorandom number (a hash) Input could be a big object like a string or an Animal or some other complex thing Same input always gives same out Idea is that hashcode for distinct objects will have a very low likelihood of collisions Used to create index data structures for finding an object given its hash code

Java Hash Functions 21 Most Java classes implement the hashcode() method hashcode() returns an int Java s HashMap class uses h(x) = X.hashCode() mod m h(x) in detail: int hash = X.hashCode(); int index = (hash & 0x7FFFFFFF) % m; What hashcode() returns: Integer: uses the int value Float: converts to a bit representation and treats it as an int Short Strings: 37*previous + value of next character Long Strings: sample of 8 characters; 39*previous + next value

hashcode() Requirements 22 Contract for hashcode() method: Whenever it is invoked in the same object, it must return the same result Two objects that are equal (in the sense of.equals(...)) must have the same hash code Two objects that are not equal should return different hash codes, but are not required to do so (i.e., collisions are allowed)

Hashtables in Java 23 java.util.hashmap java.util.hashset java.util.hashtable A node in each chain looks like this: Use chaining hashcode key value next Initial (default) size = 101 Load factor = 0 = 0.75 original hashcode (before mod m) Allows faster rehashing and (possibly) faster key comparison Uses table doubling (2*previous+1)

Linear & Quadratic Probing 24 These are techniques in which all data is stored directly within the hash table array Linear Probing Probe at h(x), then at h(x) + 1 h(x) + 2 h(x) + i Leads to primary clustering Long sequences of filled cells Quadratic Probing Similar to Linear Probing in that data is stored within the table Probe at h(x), then at h(x)+1 h(x)+4 h(x)+9 h(x)+ i 2 Works well when < 0.5 Table size is prime

Universal Hashing 25 In in doubt, choose a hash function at random from a large parameterized family of hash functions (e.g., h(x) = ax + b, where a and b are chosen at random) With high probability, it will be just as good as any custom-designed hash function you dream up

Dictionary Implementations 26 Ordered Array Better than unordered array because Binary Search can be used Unordered Linked List Ordering doesn t help Hashtables O(1) expected time for Dictionary operations

Aside: Comparators 27 When implementing a comparator interface you normally must Override compareto() method Override hashcode() Override equals() Easy to forget and if you make that mistake your code will be very buggy

hashcode() and equals() 28 We mentioned that the hash codes of two equal objects must be equal this is necessary for hashtable-based data structures such as HashMap and HashSet to work correctly In Java, this means if you override Object.equals(), you had better also override Object.hashCode() But how???

hashcode() and equals() 29 class Identifier { String name; String type; public boolean equals(object obj) { if (obj == null) return false; Identifier id; try { id = (Identifier)obj; } catch (ClassCastException cce) { return false; } return name.equals(id.name) && type.equals(id.type); } }

hashcode() and equals() 30 class Identifier { String name; String type; public boolean equals(object obj) { if (obj == null) return false; Identifier id; try { id = (Identifier)obj; } catch (ClassCastException cce) { return false; } return name.equals(id.name) && type.equals(id.type); } } public int hashcode() { return 37 * name.hashcode() + 113 * type.hashcode() + 42; }

hashcode() and equals() 31 class TreeNode { TreeNode left, right; String datum; public boolean equals(object obj) { if (obj == null!(obj instanceof TreeNode)) return false; TreeNode t = (TreeNode)obj; boolean leq = (left!= null)? left.equals(t.left) : t.left == null; boolean req = (right!= null)? right.equals(t.right) : t.right == null; return datum.equals(t.datum) && leq && req; } }

hashcode() and equals() 32 class TreeNode { TreeNode left, right; String datum; public boolean equals(object obj) { if (obj == null!(obj instanceof TreeNode)) return false; TreeNode t = (TreeNode)obj; boolean leq = (left!= null)? left.equals(t.left) : t.left == null; boolean req = (right!= null)? right.equals(t.right) : t.right == null; return datum.equals(t.datum) && leq && req; } } public int hashcode() { int lhc = (left!= null)? left.hashcode() : 298; int rhc = (right!= null)? right.hashcode() : 377; return 37 * datum.hashcode() + 611 * lhc - 43 * rhc; }

Professional quality hash codes? 33 For large objects we often compute an MD5 hash MD5 is the fifth of a series of standard message digest functions They are fast to compute (like an XOR over the bytes of the object) But they also use a cryptographic key: without the key you can t guess what the MD5 hashcode will be For example key could be a random number you pick when your program is launched Or it could be a password With a password key, an MD5 hash is a proof of authenticity If object is tampered with, the hashcode will reveal it!