Randomized Algorithms, Hash Functions

Similar documents
Randomized Algorithms: Element Distinctness

Randomized Algorithms: Selection

RAM with Randomization and Quick Sort

Fast Lookup: Hash tables

Data Structures and Algorithms 2018

COMP Analysis of Algorithms & Data Structures

CIS 121 Data Structures and Algorithms Midterm 3 Review Solution Sketches Fall 2018

Scribe: Sam Keller (2015), Seth Hildick-Smith (2016), G. Valiant (2017) Date: January 25, 2017

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017

Divide and Conquer Algorithms

Lecture 9: Sorting Algorithms

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

Searching Algorithms/Time Analysis

1 Probabilistic analysis and randomized algorithms

CSE 332 Winter 2015: Midterm Exam (closed book, closed notes, no calculators)

1 (15 points) LexicoSort

Selection (deterministic & randomized): finding the median in linear time

Worst-case running time for RANDOMIZED-SELECT

Randomized Algorithms, Quicksort and Randomized Selection

1 CSE 100: HASH TABLES

Hashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong

Lecturers: Sanjam Garg and Prasad Raghavendra March 20, Midterm 2 Solutions

Hash Tables. Hash Tables

Practical Session #11 - Sort properties, QuickSort algorithm, Selection

Question And Answer.

CSE 332 Spring 2014: Midterm Exam (closed book, closed notes, no calculators)

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

Sorting. Sorting. Stable Sorting. In-place Sort. Bubble Sort. Bubble Sort. Selection (Tournament) Heapsort (Smoothsort) Mergesort Quicksort Bogosort

SFU CMPT Lecture: Week 8

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Quiz 1 Practice Problems

Hash Tables. Hashing Probing Separate Chaining Hash Function

Solutions to Problem Set 1

Algorithms and Data Structures for Mathematicians

CS 561, Lecture 1. Jared Saia University of New Mexico

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

Section 05: Solutions

Problem Set 4 Solutions

Analyze the obvious algorithm, 5 points Here is the most obvious algorithm for this problem: (LastLargerElement[A[1..n]:

COSC242 Lecture 7 Mergesort and Quicksort

CSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators)

Deterministic and Randomized Quicksort. Andreas Klappenecker

CHAPTER 11 SETS, AND SELECTION

Outline. CS 561, Lecture 6. Priority Queues. Applications of Priority Queue. For NASA, space is still a high priority, Dan Quayle

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs

Cuckoo Hashing for Undergraduates

(Refer Slide Time: 01.26)

k-selection Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong

Midterm 2 Solutions. CS70 Discrete Mathematics and Probability Theory, Spring 2009

CSE548, AMS542: Analysis of Algorithms, Fall 2012 Date: October 16. In-Class Midterm. ( 11:35 AM 12:50 PM : 75 Minutes )

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

CMPSCI 105: Lecture #12 Searching, Sorting, Joins, and Indexing PART #1: SEARCHING AND SORTING. Linear Search. Binary Search.

Hashing for searching

CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators)

Practical Session 10 - Huffman code, Sort properties, QuickSort algorithm, Selection

Notes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 4 Notes. Class URL:

CSE 101. Algorithm Design and Analysis Miles Jones Office 4208 CSE Building Lecture 17: Divide and Conquer Design examples.

CS 161 Problem Set 4

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Quicksort. Repeat the process recursively for the left- and rightsub-blocks.

Sorting is a problem for which we can prove a non-trivial lower bound.

Parallel and Sequential Data Structures and Algorithms Lecture (Spring 2012) Lecture 16 Treaps; Augmented BSTs

AAL 217: DATA STRUCTURES

n 1 x i = xn 1 x 1. i= (2n 1) = n 2.

EECS 2011M: Fundamentals of Data Structures

CSE 101 Homework 5. Winter 2015

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

Lecture 6: Hashing Steven Skiena

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Module 02 Lecture - 45 Memoization

CS 5321: Advanced Algorithms Sorting. Acknowledgement. Ali Ebnenasir Department of Computer Science Michigan Technological University

Recall from Last Time: Big-Oh Notation

COMP Analysis of Algorithms & Data Structures

Quiz 1 Practice Problems

We can use a max-heap to sort data.

INDIAN STATISTICAL INSTITUTE

Binary Search to find item in sorted array

Midterm 2. Read all of the following information before starting the exam:

CSCI E 119 Section Notes Section 11 Solutions

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

Greedy algorithms is another useful way for solving optimization problems.

Section 05: Solutions

Data Structures And Algorithms

Scan and Quicksort. 1 Scan. 1.1 Contraction CSE341T 09/20/2017. Lecture 7

- 1 - Handout #22S May 24, 2013 Practice Second Midterm Exam Solutions. CS106B Spring 2013

Quick Sort. CSE Data Structures May 15, 2002

Dr. Amotz Bar-Noy s Compendium of Algorithms Problems. Problems, Hints, and Solutions

CSE wi: Practice Midterm

Introduction to Randomized Algorithms

7.3 A randomized version of quicksort

Mergesort again. 1. Split the list into two equal parts

Encoding/Decoding, Counting graphs

Recursion: The Beginning

THE UNIVERSITY OF WESTERN AUSTRALIA

Problem Set 6 Due: 11:59 Sunday, April 29

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

CS126 Final Exam Review

BBM371& Data*Management. Lecture 6: Hash Tables

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

Hash Tables: Handling Collisions

Transcription:

Randomized Algorithms, Hash Functions Lecture A Tiefenbruck MWF 9-9:50am Center 212 Lecture B Jones MWF 2-2:50pm Center 214 Lecture C Tiefenbruck MWF 11-11:50am Center 212 http://cseweb.ucsd.edu/classes/wi16/cse21-abc/ March 7, 2016

Selection Problem: WHAT Given list of distinct integers a 1, a 2,, a n and integer i, 1 <= i <= n, find the i th smallest element in the array.

Selection Problem: HOW Given list of distinct integers a 1, a 2,, a n and integer i, 1 <= i <= n, find the i th smallest element in the array. What algorithm would you choose if i=1?

Selection Problem: HOW Given list of distinct integers a 1, a 2,, a n and integer i, 1 <= i <= n, find the i th smallest element in the array. What algorithm would you choose in general?

Selection Problem: HOW Given list of distinct integers a 1, a 2,, a n and integer i, 1 <= i <= n, find the i th smallest element in the array. What algorithm would you choose in general? Can sorting help? Algorithm: first sort list and then step through to find i th smallest. What's its runtime? A. B. C. D. E. None of the above

Selection Problem: HOW Given list of distinct integers a 1, a 2,, a n and integer i, 1 <= i <= n, find the i th smallest element in the array. What algorithm would you choose in general? Different strategy Pick random list element called pivot. Partition list into those smaller than pivot, those bigger than pivot. Using i and size of partition sets, determine in which set to continue looking.

Selection Problem: HOW Given list of distinct integers a 1, a 2,, a n and integer i, 1 <= i <= n, find the i th smallest element in the array. Pick random list element called pivot. Partition list into those smaller than pivot, those bigger than pivot. Using i and size of partition sets, determine in which set to continue looking. ex. 17, 42, 3, 8, 19, 21, 2 i = 3

Selection Problem: HOW Given list of distinct integers a 1, a 2,, a n and integer i, 1 <= i <= n, find the i th smallest element in the array. Pick random list element called pivot. Partition list into those smaller than pivot, those bigger than pivot. Using i and size of partition sets, determine in which set to continue looking. ex. 17, 42, 3, 8, 19, 21, 2 i = 3 Random pivot: 17

Selection Problem: HOW Given list of distinct integers a 1, a 2,, a n and integer i, 1 <= i <= n, find the i th smallest element in the array. Pick random list element called pivot. Partition list into those smaller than pivot, those bigger than pivot. Using i and size of partition sets, determine in which set to continue looking. ex. 17, 42, 3, 8, 19, 21, 2 i = 3 Random pivot: 17 Smaller than 17: 3, 8, 2 Bigger than 17: 42, 19, 21

Selection Problem: HOW Given list of distinct integers a 1, a 2,, a n and integer i, 1 <= i <= n, find the i th smallest element in the array. Pick random list element called pivot. Partition list into those smaller than pivot, those bigger than pivot. Using i and size of partition sets, determine in which set to continue looking. ex. 17, 42, 3, 8, 19, 21, 2 i = 3 Random pivot: 17 Smaller than 17: 3, 8, 2 Bigger than 17: 42, 19, 21 Has 3 elements so third smallest must be in this set

Selection Problem: HOW Given list of distinct integers a 1, a 2,, a n and integer i, 1 <= i <= n, find the i th smallest element in the array. Pick random list element called pivot. Partition list into those smaller than pivot, those bigger than pivot. Using i and size of partition sets, determine in which set to continue looking. ex. 17, 42, 3, 8, 19, 21, 2 i = 3 Random pivot: 17 New list: 3, 8, 2 i = 3

Selection Problem: HOW Given list of distinct integers a 1, a 2,, a n and integer i, 1 <= i <= n, find the i th smallest element in the array. Pick random list element called pivot. Partition list into those smaller than pivot, those bigger than pivot. Using i and size of partition sets, determine in which set to continue looking. ex. 17, 42, 3, 8, 19, 21, 2 i = 3 Random pivot: 17 New list: 3, 8, 2 i = 3 Random pivot: 8

Selection Problem: HOW Given list of distinct integers a 1, a 2,, a n and integer i, 1 <= i <= n, find the i th smallest element in the array. Pick random list element called pivot. Partition list into those smaller than pivot, those bigger than pivot. Using i and size of partition sets, determine in which set to continue looking. ex. 17, 42, 3, 8, 19, 21, 2 i = 3 Random pivot: 17 New list: 3, 8, 2 i = 3 Random pivot: 8 Smaller than 8: 3, 2 Bigger than 8:

Selection Problem: HOW Given list of distinct integers a 1, a 2,, a n and integer i, 1 <= i <= n, find the i th smallest element in the array. Pick random list element called pivot. Partition list into those smaller than pivot, those bigger than pivot. Using i and size of partition sets, determine in which set to continue looking. ex. 17, 42, 3, 8, 19, 21, 2 i = 3 Random pivot: 17 New list: 3, 8, 2 i = 3 Random pivot: 8 Smaller than 8: 3, 2 Bigger than 8: Has 2 elements so third smallest must be "next" element, i.e. 8

Selection Problem: HOW Given list of distinct integers a 1, a 2,, a n and integer i, 1 <= i <= n, find the i th smallest element in the array. Pick random list element called pivot. Partition list into those smaller than pivot, those bigger than pivot. Using i and size of partition sets, determine in which set to continue looking. ex. 17, 42, 3, 8, 19, 21, 2 i = 3 Random pivot: 17 New list: 3, 8, 2 i = 3 Random pivot: 8 Smaller than 8: 3, 2 Bigger than 8: Return 8 compare to original list: 17, 42, 3, 8, 19, 21, 2

Selection Problem: HOW Given list of distinct integers A = a 1, a 2,, a n and integer i, 1 <= i <= n, Algorithm will incorporate both randomness and recursion!

Selection Problem: HOW Given list of distinct integers A = a 1, a 2,, a n and integer i, 1 <= i <= n, RandSelect(A,i) 1. If n=1 return a 1 What are we doing in this first line? A. Establishing the base case of the recursion. B. Establishing the induction step. C. Randomly picking a pivot. D. Randomly returning a list element. E. None of the above.

Selection Problem: HOW Given list of distinct integers A = a 1, a 2,, a n and integer i, 1 <= i <= n, RandSelect(A,i) 1. If n=1 return a 1 2. Initialize lists S and B. 3. Pick integer j uniformly at random from 1 to n. 4. For each index k from 1 to n (except j): 5. if a k < a j, add a k to the list S. 6. if a k > a j, add a k to the list B.

Selection Problem: HOW Given list of distinct integers A = a 1, a 2,, a n and integer i, 1 <= i <= n, RandSelect(A,i) 1. If n=1 return a 1 2. Initialize lists S and B. 3. Pick integer j uniformly at random from 1 to n. 4. For each index k from 1 to n (except j): 5. if a k < a j, add a k to the list S. 6. if a k > a j, add a k to the list B. 7. Let s be the size of S. 8. If s = i-1, return a j.

Selection Problem: HOW Given list of distinct integers A = a 1, a 2,, a n and integer i, 1 <= i <= n, RandSelect(A,i) 1. If n=1 return a 1 2. Initialize lists S and B. 3. Pick integer j uniformly at random from 1 to n. 4. For each index k from 1 to n (except j): 5. if a k < a j, add a k to the list S. 6. if a k > a j, add a k to the list B. 7. Let s be the size of S. 8. If s = i-1, return a j. 9. If s >= i, return RandSelect(S, i). 10. If s < i, return RandSelect(B,??? ). What's the right way to fill in this blank? A. i B. s C. i+s D. i-(s+1) E. None of the above.

Selection Problem: WHEN Given list of distinct integers A = a 1, a 2,, a n and integer i, 1 <= i <= n, RandSelect(A,i) 1. If n=1 return a 1 2. Initialize lists S and B. 3. Pick integer j uniformly at random from 1 to n. 4. For each index k from 1 to n (except j): 5. if a k < a j, add a k to the list S. 6. if a k > a j, add a k to the list B. 7. Let s be the size of S. 8. If s = i-1, return a j. 9. If s >= i, return RandSelect(S, i). 10. If s < i, return RandSelect(B, i-(s+1)). What input gives the best-case performance of this algorithm? A. When element we're looking for is the first in list. B. When element we're looking for is i th in list. C. When element we're looking for is in the middle of the list. D. When element we're looking for is last in list. E. None of the above.

Selection Problem: WHEN Given list of distinct integers A = a 1, a 2,, a n and integer i, 1 <= i <= n, RandSelect(A,i) 1. If n=1 return a 1 2. Initialize lists S and B. 3. Pick integer j uniformly at random from 1 to n. 4. For each index k from 1 to n (except j): 5. if a k < a j, add a k to the list S. 6. if a k > a j, add a k to the list B. 7. Let s be the size of S. 8. If s = i-1, return a j. 9. If s >= i, return RandSelect(S, i). 10. If s < i, return RandSelect(B, i-(s+1)). Performance depends on more than the input!

Selection Problem: WHEN Given list of distinct integers A = a 1, a 2,, a n and integer i, 1 <= i <= n, RandSelect(A,i) 1. If n=1 return a 1 2. Initialize lists S and B. 3. Pick integer j uniformly at random from 1 to n. 4. For each index k from 1 to n (except j): 5. if a k < a j, add a k to the list S. 6. if a k > a j, add a k to the list B. 7. Let s be the size of S. 8. If s = i-1, return a j. 9. If s >= i, return RandSelect(S, i). 10. If s < i, return RandSelect(B, i-(s+1)). Minimum time if we happen to pick pivot which is the i th smallest list element. In this case, what's the runtime? A. B. C. D. E. None of the above

Selection Problem: WHEN How can we give a time analysis for an algorithm that is allowed to pick and then use random numbers? T(x): a random variable that represents the runtime of the algorithm on input x Compute the worst-case expected time worst case over all inputs of size n average runtime incorporating random choices in the algorithm

Selection Problem: WHEN How can we give a time analysis for an algorithm that is allowed to pick and then use random numbers? T(x): a random variable that represents the runtime of the algorithm on input x Compute the worst-case expected time Recurrence equation unravelling

Selection Problem: WHEN Situation so far: Sort then search takes worst-case Randomized selection takes worst-case expected time

Selection Problem: WHEN Situation so far: Sort then search takes worst-case Randomized selection takes worst-case expected time How do we implement randomized algorithms? Are there deterministic algorithms that perform as well? For selection problem: Blum et al, yes! In general: open J

Element Distinctness: WHAT Given list of positive integers a 1, a 2,, a n decide whether all the numbers are distinct or whether there is a repetition, i.e. two positions i, j with 1 <= i < j <= n such that a i = a j. What algorithm would you choose in general?

Element Distinctness: HOW Given list of positive integers a 1, a 2,, a n decide whether all the numbers are distinct or whether there is a repetition, i.e. two positions i, j with 1 <= i < j <= n such that a i = a j. What algorithm would you choose in general? Can sorting help? Algorithm: first sort list and then step through to find duplicates. What's its runtime? A. B. C. D. E. None of the above

Element Distinctness: HOW Given list of positive integers a 1, a 2,, a n decide whether all the numbers are distinct or whether there is a repetition, i.e. two positions i, j with 1 <= i < j <= n such that a i = a j. What algorithm would you choose in general? Can sorting help? Algorithm: first sort list and then step through to find duplicates. How much memory does it require? A. B. C. D. E. None of the above

Element Distinctness: HOW Given list of positive integers a 1, a 2,, a n decide whether all the numbers are distinct or whether there is a repetition, i.e. two positions i, j with 1 <= i < j <= n such that a i = a j. What algorithm would you choose in general? What if we had unlimited memory?

Element Distinctness: HOW Given list of positive integers A = a 1, a 2,, a n, UnlimitedMemoryDistinctness(A) 1. For i = 1 to n, 2. If M[a i ] = 1 then return "Found repeat" 3. Else M[a i ] := 1 4. Return "Distinct elements" What's the runtime of this algorithm? A. B. C. D. E. None of the above M is an array of memory locations This is memory location indexed by a i

Element Distinctness: HOW Given list of positive integers A = a 1, a 2,, a n, UnlimitedMemoryDistinctness(A) 1. For i = 1 to n, 2. If M[a i ] = 1 then return "Found repeat" 3. Else M[a i ] := 1 4. Return "Distinct elements" M is an array of memory locations This is memory location indexed by a i What's the runtime of this algorithm? A. B. C. D. E. None of the above What's the memory use of this algorithm? A. B. C. D. E. None of the above

Element Distinctness: HOW To simulate having more memory locations: use Virtual Memory. Define hash function h: { desired memory locations } à { actual memory locations } Typically we want more memory than we have, so h is not one-to-one. How to implement h? CSE 12, CSE 100. Here, let's use hash functions in an algorithm for Element Distinctness.

For example, suppose you have a company of 5,000 employees and each is identified by their SSN. You want Virtual to be able to Memory access employee Applications records by their SSN. You don t want to keep a table of all possible SSN s so we ll use a virtual memory data structure to emulate having that huge table. Can you think of any other examples?

Ideal Hash Function Ideally, we could use a very unpredictable function called a hash function to assign random physical locations to each virtual location. Later we will discuss how to actually implement such hash functions. But for now assume that we have a function h so that for every virtual location v, h(v) is uniformly and randomly chosen among the physical locations. We call such an h an ideal hash function if its computable in constant time.

Element Distinctness: HOW Given list of positive integers A = a 1, a 2,, a n, and m memory locations available HashDistinctness(A, m) 1. Initialize array M[1,..,m] to all 0s. 2. Pick a hash function h from all positive integers to 1,..,m. 3. For i = 1 to n, 4. If M[ h(a i ) ] = 1 then return "Found repeat" 5. Else M[ h(a i ) ] := 1 6. Return "Distinct elements"

Element Distinctness: HOW Given list of positive integers A = a 1, a 2,, a n, and m memory locations available HashDistinctness(A, m) 1. Initialize array M[1,..,m] to all 0s. 2. Pick a hash function h from all positive integers to 1,..,m. 3. For i = 1 to n, 4. If M[ h(a i ) ] = 1 then return "Found repeat" 5. Else M[ h(a i ) ] := 1 6. Return "Distinct elements" What's the runtime of this algorithm? A. B. C. D. E. None of the above

Element Distinctness: HOW Given list of positive integers A = a 1, a 2,, a n, and m memory locations available HashDistinctness(A, m) 1. Initialize array M[1,..,m] to all 0s. 2. Pick a hash function h from all positive integers to 1,..,m. 3. For i = 1 to n, 4. If M[ h(a i ) ] = 1 then return "Found repeat" 5. Else M[ h(a i ) ] := 1 6. Return "Distinct elements" What's the memory use of this algorithm? A. B. C. D. E. None of the above

Element Distinctness: WHY Given list of positive integers A = a 1, a 2,, a n, and m memory locations available HashDistinctness(A, m) 1. Initialize array M[1,..,m] to all 0s. 2. Pick a hash function h from all positive integers to 1,..,m. 3. For i = 1 to n, 4. If M[ h(a i ) ] = 1 then return "Found repeat" 5. Else M[ h(a i ) ] := 1 6. Return "Distinct elements" But this algorithm might make a mistake!!! When?

Element Distinctness: WHY Given list of positive integers A = a 1, a 2,, a n, and m memory locations available HashDistinctness(A, m) 1. Initialize array M[1,..,m] to all 0s. 2. Pick a hash function h from all positive integers to 1,..,m. 3. For i = 1 to n, 4. If M[ h(a i ) ] = 1 then return "Found repeat" 5. Else M[ h(a i ) ] := 1 6. Return "Distinct elements" Correctness: Goal is If there is a repetition, algorithm finds it If there is no repetition, algorithm reports "Distinct elements"

Element Distinctness: WHY Given list of positive integers A = a 1, a 2,, a n, and m memory locations available HashDistinctness(A, m) 1. Initialize array M[1,..,m] to all 0s. 2. Pick a hash function h from all positive integers to 1,..,m. 3. For i = 1 to n, 4. If M[ h(a i ) ] = 1 then return "Found repeat" 5. Else M[ h(a i ) ] := 1 6. Return "Distinct elements" Correctness: Goal is If there is a repetition, algorithm finds it If there is no repetition, algorithm reports "Distinct elements" Hash Collisions

Element Distinctness: WHY Given list of positive integers A = a 1, a 2,, a n, and m memory locations available HashDistinctness(A, m) 1. Initialize array M[1,..,m] to all 0s. 2. Pick a hash function h from all positive integers to 1,..,m. 3. For i = 1 to n, 4. If M[ h(a i ) ] = 1 then return "Found repeat" 5. Else M[ h(a i ) ] := 1 6. Return "Distinct elements" When is our algorithm correct with high probability in the ideal hash model?

Days of the year = memory locations Where is the connection? h(person)=birthday collisions mean that two people share the same birthday.

General Birthday Paradox-type Phenomena We have n objects and m places. We are putting each object at random into one of the places. What is the probability that 2 objects occupy the same place?

Calculating the general rule

Calculating the general rule Probability the first object causes no collisions is 1

Calculating the general rule Probability the second object causes no collisions is 1-1/m

Calculating the general rule Probability the third object causes no collisions is (m-2)/m=1-2/m

Calculating the general rule Probability the ith object causes no collisions is 1-(i-1)/m

Conditional Probabilities Using conditional probabilities, the probability there is no collisions is [1(1-1/m)(1-2/m) (1-(n-1)/m)] ( p = # 1 i 1 m )*+ Then using the fact that 1 x e /0, ( p # e /)/+ 1 = e / 3 )/+ 456 1 )*+ = e / 3 7 1

Conditional Probabilities ( p # e /)/+ 1 = e / 3 )/+ 456 1 )*+ = e / 3 7 1 We want p to be close to 1 so 3 7 1 should be small, i.e. m ( 9 (7 9. For the birthday problem, this is when the number of people is about 2(365) 27 In the element distinctness algorithm, we need the number of memory locations to be at least Ω n 9.

Conditional Probabilities ( p # e /)/+ 1 = e / 3 )/+ 456 1 )*+ = e / 3 7 1 On the other hand, it is possible to show that if m>>n 9 then there are no collisions with high probability. i.e. p > 1 So if m is large then p is close to 1. ( 9 m

Element Distinctness: WHY Given list of positive integers A = a 1, a 2,, a n, and m memory locations available HashDistinctness(A, m) 1. Initialize array M[1,..,m] to all 0s. 2. Pick a hash function h from all positive integers to 1,..,m. 3. For i = 1 to n, 4. If M[ h(a i ) ] = 1 then return "Found repeat" 5. Else M[ h(a i ) ] := 1 6. Return "Distinct elements" What this means about this algorithm is that we can get time to be O(n) at the expense of using O(n 9 ) memory. Since we need to initialize the memory, this doesn t seem worthwhile because sorting uses less memory and slightly more time. So what can we do?

Resolving collisions with chaining Hash Table Each memory location holds a pointer to a linked list, initially empty. Collision means there is more than one item in this linked list Each linked list records the items that map to that memory location.

Element Distinctness: HOW Given list of positive integers A = a 1, a 2,, a n, and m memory locations available ChainHashDistinctness(A, m) 1. Initialize array M[1,..,m] to null lists. 2. Pick a hash function h from all positive integers to 1,..,m. 3. For i = 1 to n, 4. For each element j in M[ h(a i ) ], 5. If a j = a i then return "Found repeat" 6. Append i to the tail of the list M [ h(a i ) ] 7. Return "Distinct elements"

Element Distinctness: WHY Given list of positive integers A = a 1, a 2,, a n, and m memory locations available ChainHashDistinctness(A, m) 1. Initialize array M[1,..,m] to null lists. 2. Pick a hash function h from all positive integers to 1,..,m. 3. For i = 1 to n, 4. For each element j in M[ h(a i ) ], 5. If a j = a i then return "Found repeat" 6. Append i to the tail of the list M [ h(a i ) ] 7. Return "Distinct elements" Correctness: Goal is If there is a repetition, algorithm finds it If there is no repetition, algorithm reports "Distinct elements"

Element Distinctness: MEMORY Given list of positive integers A = a 1, a 2,, a n, and m memory locations available ChainHashDistinctness(A, m) 1. Initialize array M[1,..,m] to null lists. 2. Pick a hash function h from all positive integers to 1,..,m. 3. For i = 1 to n, 4. For each element j in M[ h(a i ) ], 5. If a j = a i then return "Found repeat" 6. Append i to the tail of the list M [ h(a i ) ] 7. Return "Distinct elements" What's the memory use of this algorithm?

Element Distinctness: MEMORY Given list of distinct integers A = a 1, a 2,, a n, and m memory locations available ChainHashDistinctness(A, m) 1. Initialize array M[1,..,m] to null lists. 2. Pick a hash function h from all positive integers to 1,..,m. 3. For i = 1 to n, 4. For each element j in M[ h(a i ) ], 5. If a j = a i then return "Found repeat" 6. Append i to the tail of the list M [ h(a i ) ] 7. Return "Distinct elements" What's the memory use of this algorithm? Size of M: O(m). Total size of all the linked lists: O(n). Total memory: O(m+n).

Element Distinctness: WHEN ChainHashDistinctness(A, m) 1. Initialize array M[1,..,m] to null lists. 2. Pick a hash function h from all positive integers to 1,..,m. 3. For i = 1 to n, 4. For each element j in M[ h(a i ) ], 5. If a j = a i then return "Found repeat" 6. Append i to the tail of the list M [ h(a i ) ] 7. Return "Distinct elements"

Element Distinctness: WHEN ChainHashDistinctness(A, m) 1. Initialize array M[1,..,m] to null lists. 2. Pick a hash function h from all positive integers to 1,..,m. 3. For i = 1 to n, 4. For each element j in M[ h(a i ) ], 5. If a j = a i then return "Found repeat" 6. Append i to the tail of the list M [ h(a i ) ] 7. Return "Distinct elements" Worst case is when we don't find a i : O( 1 + size of list M[ h(a i ) ] )

Element Distinctness: WHEN ChainHashDistinctness(A, m) 1. Initialize array M[1,..,m] to null lists. 2. Pick a hash function h from all positive integers to 1,..,m. 3. For i = 1 to n, 4. For each element j in M[ h(a i ) ], 5. If a j = a i then return "Found repeat" 6. Append i to the tail of the list M [ h(a i ) ] 7. Return "Distinct elements" Worst case is when we don't find a i : O( 1 + size of list M[ h(a i ) ] ) = O( 1 + # j<i with h(a j )=h(a i ) )

Element Distinctness: WHEN ChainHashDistinctness(A, m) 1. Initialize array M[1,..,m] to null lists. 2. Pick a hash function h from all positive integers to 1,..,m. 3. For i = 1 to n, 4. For each element j in M[ h(a i ) ], 5. If a j = a i then return "Found repeat" 6. Append i to the tail of the list M [ h(a i ) ] 7. Return "Distinct elements" Worst case is when we don't find a i : O( 1 + size of list M[ h(a i ) ] ) = O( 1 + # j<i with h(a j )=h(a i ) ) Total time: O(n + # collisions between pairs a i and a j, where j<i ) = O(n + total # collisions)

Element Distinctness: WHEN Total time: O(n + # collisions between pairs a i and a j, where j<i ) = O(n + total # collisions) What's the expected total number of collisions?

Element Distinctness: WHEN Total time: O(n + # collisions between pairs a i and a j, where j<i ) = O(n + total # collisions) What's the expected total number of collisions? For each pair (i,j) with j<i, define: X i,j = 1 if h(a i )=h(a j ) and X i,j =0 otherwise. Total # of collisions =

Element Distinctness: WHEN Total time: O(n + # collisions between pairs a i and a j, where j<i ) = O(n + total # collisions) What's the expected total number of collisions? For each pair (i,j) with j<i, define: X i,j = 1 if h(a i )=h(a j ) and X i,j =0 otherwise. Total # of collisions = So by linearity of expectation: E( total # of collisions ) =

Element Distinctness: WHEN Total time: O(n + # collisions between pairs a i and a j, where j<i ) = O(n + total # collisions) What's the expected total number of collisions? For each pair (i,j) with j<i, define: X i,j = 1 if h(a i )=h(a j ) and X i,j =0 otherwise. Total # of collisions = What's E(X i,j )? A. 1/n B. 1/m C. 1/n 2 D. 1/m 2 E. None of the above.

Element Distinctness: WHEN Total time: O(n + # collisions between pairs a i and a j, where j<i ) = O(n + total # collisions) What's the expected total number of collisions? For each pair (i,j) with j<i, define: Total # of collisions = X i,j = 1 if h(a i )=h(a j ) and X i,j =0 otherwise. How many terms are in the sum? That is, how many pairs (i,j) with j<i are there? A. n B. n 2 C. C(n,2) D. n(n-1)

Element Distinctness: WHEN Total time: O(n + # collisions between pairs a i and a j, where j<i ) = O(n + total # collisions) What's the expected total number of collisions? For each pair (i,j) with j<i, define: X i,j = 1 if h(a i )=h(a j ) and X i,j =0 otherwise. So by linearity of expectation: E( total # of collisions ) = =

Element Distinctness: WHEN Total time: O(n + # collisions between pairs a i and a j, where j<i ) = O(n + total # collisions) Total expected time: O(n + n 2 /m) In ideal hash model, as long as m>n the total expected time is O(n).