CSE 373: Data Structures and Algorithms

Similar documents
CSE373: Data Structures & Algorithms Lecture 12: Amortized Analysis and Memory Locality. Lauren Milne Spring 2015

CSE373: Data Structures & Algorithms Lecture 12: Amortized Analysis and Memory Locality. Linda Shapiro Winter 2015

CSE373: Data Structures & Algorithms Lecture 9: Priority Queues. Aaron Bauer Winter 2014

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

CSE 373: Data Structures and Algorithms

CSE 373: Data Structures and Algorithms

B+ Tree Review. CSE332: Data Abstractions Lecture 10: More B Trees; Hashing. Can do a little better with insert. Adoption for insert

Module 1: Asymptotic Time Complexity and Intro to Abstract Data Types

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

CSE373: Data Structures & Algorithms Lecture 5: Dictionary ADTs; Binary Trees. Linda Shapiro Spring 2016

University of Waterloo Department of Electrical and Computer Engineering ECE250 Algorithms and Data Structures Fall 2014

UNIVERSITY REGULATIONS

CPSC 221: Algorithms and Data Structures Lecture #1: Stacks and Queues

CSE 373: Data Structures and Algorithms

CSE373: Data Structures & Algorithms Lecture 6: Hash Tables

CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions. Kevin Quinn Fall 2015

ECE250: Algorithms and Data Structures Midterm Review

CSE 332: Data Structures & Parallelism Lecture 7: Dictionaries; Binary Search Trees. Ruth Anderson Winter 2019

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

Today s Outline. CSE 326: Data Structures Asymptotic Analysis. Analyzing Algorithms. Analyzing Algorithms: Why Bother? Hannah Takes a Break

Faculty of Science FINAL EXAMINATION COMP-250 A Introduction to Computer Science School of Computer Science, McGill University

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

Amortized Analysis. Ric Glassey

Recall from Last Time: Big-Oh Notation

CSE 373: Data Structures and Algorithms

CSE 373: Data Structures and Algorithms

CSE 373: Data Structures and Algorithms

CSE 146. Asymptotic Analysis Interview Question of the Day Homework 1 & Project 1 Work Session

CS Prof. Clarkson Fall 2014

CSE 332 Spring 2014: Midterm Exam (closed book, closed notes, no calculators)

CSE 373: Data Structures and Algorithms

CSE 326 Team. Today s Outline. Course Information. Instructor: Steve Seitz. Winter Lecture 1. Introductions. Web page:

csci 210: Data Structures Stacks and Queues

CSE373 Fall 2013, Second Midterm Examination November 15, 2013

CPSC 221: Algorithms and Data Structures ADTs, Stacks, and Queues

Stacks and queues (chapters 6.6, 15.1, 15.5)

Data Structure. Recitation VII

Linked lists (6.5, 16)

CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find. Lauren Milne Spring 2015

Overview of Data. 1. Array 2. Linked List 3. Stack 4. Queue

You must include this cover sheet. Either type up the assignment using theory3.tex, or print out this PDF.

CSE 332: Data Abstractions. Ruth Anderson Spring 2014 Lecture 1

CSE373: Data Structures and Algorithms Lecture 1: Introduction; ADTs; Stacks/Queues. Lauren Milne Summer 2015

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

Practice Midterm Exam Solutions

Queues. Queue ADT. Queue Specification. An ordered sequence A queue supports operations for inserting (enqueuing) and removing (dequeuing) elements.

Algorithm Design and Analysis Homework #4

Standard ADTs. Lecture 19 CS2110 Summer 2009

Assume you are given a Simple Linked List (i.e. not a doubly linked list) containing an even number of elements. For example L = [A B C D E F].

CPSC 221: Algorithms and Data Structures Lecture #0: Introduction. Come up and say hello! Fibonacci. Fibonacci. Fibonacci. Fibonacci. (Welcome!

HASH TABLES. CSE 332 Data Abstractions: B Trees and Hash Tables Make a Complete Breakfast. An Ideal Hash Functions.

Outline for Today. Why could we construct them in time O(n)? Analyzing data structures over the long term.

CS24 Week 4 Lecture 2

Discussion 2C Notes (Week 3, January 21) TA: Brian Choi Section Webpage:

Computer Science 210 Data Structures Siena College Fall Topic Notes: Linear Structures

CSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators)

III Data Structures. Dynamic sets

CSE 332: Data Structures. Spring 2016 Richard Anderson Lecture 1

Lecture 17: Hash Tables, Maps, Finish Linked Lists

UNIT 6A Organizing Data: Lists. Last Two Weeks

Algorithms in Systems Engineering IE172. Midterm Review. Dr. Ted Ralphs

Adam Blank Lecture 1 Winter 2017 CSE 332. Data Abstractions

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

University of Waterloo Department of Electrical and Computer Engineering ECE250 Algorithms and Data Structures Fall 2017

CSci 231 Final Review

Data Structures and Algorithms CSE 465

Implementing Hash and AVL

CSE373: Data Structures & Algorithms Lecture 28: Final review and class wrap-up. Nicki Dell Spring 2014

University of Illinois at Urbana-Champaign Department of Computer Science. Second Examination

UNIVERSITY REGULATIONS

CSE 373: Data Structures and Algorithms

CSE wi: Practice Midterm

LIFO : Last In First Out

Unit #3: Recursion, Induction, and Loop Invariants

University of Illinois at Urbana-Champaign Department of Computer Science. Second Examination

Reasoning About Programs Panagiotis Manolios

Course goals. exposure to another language. knowledge of specific data structures. impact of DS design & implementation on program performance

Object Oriented Programming and Design in Java. Session 19 Instructor: Bert Huang

CS S-04 Stacks and Queues 1. An Abstract Data Type is a definition of a type based on the operations that can be performed on it.

Data Structures and Algorithms

CSE 332 Winter 2015: Midterm Exam (closed book, closed notes, no calculators)

Section 05: Solutions

CSE 373: Data Structures and Algorithms

Outline for Today. How can we speed up operations that work on integer data? A simple data structure for ordered dictionaries.

stacks operation array/vector linked list push amortized O(1) Θ(1) pop Θ(1) Θ(1) top Θ(1) Θ(1) isempty Θ(1) Θ(1)

CSE 373 Data Structures and Algorithms. Lecture 2: Queues

Cpt S 223 Fall Cpt S 223. School of EECS, WSU

0.1 Welcome. 0.2 Insertion sort. Jessica Su (some portions copied from CLRS)

CSE 214 Computer Science II Searching

University of Illinois at Urbana-Champaign Department of Computer Science. Final Examination

CSE 332: Data Structures & Parallelism Lecture 3: Priority Queues. Ruth Anderson Winter 2019

COURSE MECHANICS. Welcome! CSE 332 Data Abstractions: Introduction and ADTs. Concise to-do list. Today in Class. Instructor: Kate Deibel

Apply to be a Meiklejohn! tinyurl.com/meikapply

CPSC 211, Sections : Data Structures and Implementations, Honors Final Exam May 4, 2001

Module 2: Classical Algorithm Design Techniques

You must include this cover sheet. Either type up the assignment using theory4.tex, or print out this PDF.

CS 241 Analysis of Algorithms

INSTITUTE OF AERONAUTICAL ENGINEERING

Summer Final Exam Review Session August 5, 2009

CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators)

Transcription:

CSE 373: Data Structures and Algorithms Lecture 6: Finishing Amortized Analysis; Dictionaries ADT; Introduction to Hash Tables Instructor: Lilian de Greef Quarter: Summer 2017

Today: Finish up Amortized Analysis Dictionary ADT Introduce Hash Tables

Reminder: No class on Monday! Unofficial holiday have a good 4-day weekend! J Will ask you to do a ~30 minute activity to make up for last class time Remember that homework 2 is also due the day after we re back

Homework 2 update: There was a typo (woops!) for problem 7 The website now has a corrected version.

Amortized Analysis How we calculate the average time!

Amortized Cost The amortized cost of n operations is the worst-case total cost of the operations divided by n. Shorthand: If T(n) = worst-case (upper bound) of total cost for n = number of operations Amortized Cost = T(n) / n

Example: Array Stack What s the amortized cost of calling push() n times if we double the array size when it s full? n operations: n pushes at O(1) each -> total cost = n cost of resizing = n + n/2 + n/4 + n/8 + 2n -> total cost 2n -> T(n) = n + 2n = 3n -> Amortized cost = T(n)/n = 3n/n = 3 -> Amortized Running time = O(1) The amortized cost of n operations is the worstcase total cost of the operations divided by n.

Another Perspective: Paying and Saving Currency A B C D 1 operation costs us 1$ to the computer A B C D E 123456789

Another Perspective: Paying and Saving Currency A B C D A B C D 12340 Potential Function Use $2 for each push: $1 to computer, $1 to bank E Spend our savings in the bank to resize. That way it only costs $1 to push(e)! 123456789 0

Example #2: Queue made of Stacks A sneaky way to implement Queue using two Stacks Example walk-through: enqueue A enqueue B enqueue C dequeue enqueue D enqueue E dequeue dequeue dequeue in out

Example #2: Queue made of Stacks A sneaky way to implement Queue using two Stacks class Queue<E> { Stack<E> in = new Stack<E>(); Stack<E> out = new Stack<E>(); void enqueue(e x){ in.push(x); } E dequeue(){ if(out.isempty()) { while(!in.isempty()) { out.push(in.pop()); } } return out.pop(); } } in out Wouldn t it be nice to have a queue of t-shirts to wear instead of a stack (like in your dresser)? So have two stacks in: stack of t-shirts go after you wash them out: stack of t-shirts to wear if out is empty, reverse in into out

Example #2: Queue made of Stacks (Analysis) class Queue<E> { Stack<E> in = new Stack<E>(); Stack<E> out = new Stack<E>(); void enqueue(e x){ in.push(x); } E dequeue(){ if(out.isempty()) { while(!in.isempty()) { out.push(in.pop()); } } return out.pop(); } } in out Assume stack operations are (amortized) O(1). What s the worst-case for dequeue()? What operations did we need to do to reach that condition (starting with an empty Queue)? Hence, what is the amortized cost? So the average time for dequeue() is:

Example #2: Using Currency Analogy Spend $2 for every enqueue $1 to the computer, $1 to the bank. Example walk-through: enqueue A enqueue B enqueue C enqueue D enqueue E dequeue in out Potential Function

Example #3: (Parody / Joke Example) Lectures are 1 hour long, 3 times per week, so I m supposed to lecture for 27 hours this quarter. If I end the first 26 lectures 5 minutes early, then I d have saved up 130 minutes worth of extra lecture time. Then I could spend it all on the last lecture and can keep you here for 3 hours (bwahahahaha)! (After all, each lecture would still be 1 hour amortized time)

Wrapping up Amortized Analysis In what cases do we care more about the average / amortized run time? In what cases do we care more about the worst-case run time?

Taking a step back (Take a deep breath)

What have we covered so far? Abstract Data Types (ADTs) Two data structures Stacks (both using arrays and linked-lists) Queues (including circular queues) Asymptotic Analysis Intuition for Big-O Formally proving Big-O using Inductive Proofs Calculating Big-O for recursive methods using Recurrence Relations Big-O s cousins: Big-Ω, Big-θ, little-o, little-ω Average running time using Asymptotic Analysis

Whew! That was a *lot* of algorithm analysis. Now shifting gears completely on to some new data structures!

Dictionary ADT

Dictionary ADT

Uses of Dictionary ADT Used to store information with some key and retrieve it efficiently lots of programs do that! Examples: Contacts in a phone (name: number, email) Orca/Husky cards (account number: balance) Genome maps (DNA sequence: location on genome) Lilian s database of your grades (student ID: assignments, grades) Networks (router tables), Operating Systems (page tables), Compilers (symbol tables), Databases and so much more! Possibly the most widely used ADT!

Motivating Hash Tables Creative thinking time: how could you implement a dictionary using what you know so far (namely, linked-lists and arrays)? e.g. map names (key) to phone numbers (value)

Motivating Hash Tables Creative thinking time: how could you implement a dictionary using what you know so far (namely, linked-lists and arrays)? e.g. map names (key) to phone numbers (value)

Motivating Hash Tables Running times for Dictionary operations with n (key, value) pairs: insert find delete Magic Array

Magic Array Use key to compute array index for an item in O(1) time Example: phone contacts (name, number) name index = computeindex(name) array[index] = (name, number)

Magic Array Use key to compute array index for an item in O(1) time Example: phone contacts (name, number) name index = computeindex(name) array[index] = (name, number) What would be important about the indices from computeindex?

Introducing Hash Tables! Closest thing to our magic array

Hash Tables: closest thing to our Magic Array Average case O(1) find, insert, and delete (when under some often-reasonable assumptions) Our computeindex function is called a The index from the hash function is called a all possible keys hash function: index = h(key) Hash Table 0 tablesize 1

Hash Tables: Example Illustration

Hash Functions Hash functions need to For a person s name, would it be a good hash function to Use the ASCII values of first and second letter? Use the number of letters in the name?

Example Hash Function Hash function djb2 :

Hash Functions Many datatypes and Objects are hashable When writing a class, can make it hashable! Do so by implementing hashcode method We ll focus on ints and Strings in this class.

Collisions Happens when two elements get the same index (unavoidable in practice) Homework: come up with a strategy, write it down on paper, and bring it to class on Weds

Hash Table roles When hash tables are a reusable library, the division of responsibility generally breaks down into two roles: client hash table library E int table-index collision? collision resolution We will learn both roles, but most programmers in the real world spend more time as clients while understanding the library

Hash Tables There are m possible keys (m typically large, even infinite) We expect our table to have only n items n is much less than m (often written n << m) Many dictionaries have this property Compiler: All possible identifiers allowed by the language vs. those used in some file of one program Database: All possible student names vs. students enrolled AI: All possible chess-board configurations vs. those considered by the current player

Hash Table Size How can we keep hash values (i.e. the indices) within the table size? _ Table size usually prime Real-life data tends to have a pattern "Multiples of 61" probably less likely than "multiples of 60" Helpful for a collision-handling strategy we'll see next week

Review: Hash Tables thus far The hash table is one of the most important data structures Supports only find, insert, and delete efficiently Have to search entire table for other operations Important to use a good hash function Important to keep hash table at a good size Side-comment: hash functions have uses beyond hash tables Examples: Cryptography, check-sums Big remaining topic: Handling collision

Homework Come up with a collision-resolution strategy, write it down on paper, and bring it to class on Wednesday Goal: prime our brains for learning the most common collision-resolution strategies.