IS 2610: Data Structures

Similar documents
Hashing Algorithms. Hash functions Separate Chaining Linear Probing Double Hashing

Binary Search Trees. Manolis Koubarakis. Data Structures and Programming Techniques

Search Trees. The term refers to a family of implementations, that may have different properties. We will discuss:

Lecture P9: Trees. Overview. Searching a Database. Searching a Database

CSE 214 Computer Science II Searching

Hash Tables. Hashing Probing Separate Chaining Hash Function

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

Hash Table and Hashing

COMP171. Hashing.

9/24/ Hash functions

Hashing. Hashing Procedures

Search trees, binary trie, patricia trie Marko Berezovský Radek Mařík PAL 2012

Today s Outline. CS 561, Lecture 8. Direct Addressing Problem. Hash Tables. Hash Tables Trees. Jared Saia University of New Mexico

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Dictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)

Symbol Tables. Computing 2 COMP x1

CMSC 132: Object-Oriented Programming II. Hash Tables

BBM371& Data*Management. Lecture 6: Hash Tables

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Hashing Techniques. Material based on slides by George Bebis

CSE 2320 Notes 13: Hashing

Hashing. October 19, CMPE 250 Hashing October 19, / 25

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

Data Structures And Algorithms

Understand how to deal with collisions

Binary Search Trees. Analysis of Algorithms

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

Question Bank Subject: Advanced Data Structures Class: SE Computer

Analysis of Algorithms

Algorithms and Data Structures

HASH TABLES.

UNIT III BALANCED SEARCH TREES AND INDEXING

Cpt S 223. School of EECS, WSU

Trees. (Trees) Data Structures and Programming Spring / 28

CS 350 Algorithms and Complexity

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

HO #13 Fall 2015 Gary Chan. Hashing (N:12)

Hashing 1. Searching Lists

CS/COE 1501

AAL 217: DATA STRUCTURES

Topic HashTable and Table ADT

TABLES AND HASHING. Chapter 13

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

OPPA European Social Fund Prague & EU: We invest in your future.

Balanced Binary Search Trees. Victor Gao

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

Hashing. Manolis Koubarakis. Data Structures and Programming Techniques

csci 210: Data Structures Maps and Hash Tables

Direct Addressing Hash table: Collision resolution how handle collisions Hash Functions:

Lecture 6: Hashing Steven Skiena

Data Structures and Algorithms. Chapter 7. Hashing

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Section 1: True / False (1 point each, 15 pts total)

CS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table

AP Programming - Chapter 20 Lecture page 1 of 17

Test 1 - Closed Book Last 4 Digits of Student ID # Multiple Choice. Write your answer to the LEFT of each problem. 3 points each

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

Solutions to Exam Data structures (X and NV)

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

Fast Lookup: Hash tables

We assume uniform hashing (UH):

Hash[ string key ] ==> integer value

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

Algorithms with numbers (2) CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Support for Dictionary

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

Data Structures and Algorithms. Roberto Sebastiani

Tirgul 7. Hash Tables. In a hash table, we allocate an array of size m, which is much smaller than U (the set of keys).

lecture23: Hash Tables

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

Hash-Based Indexes. Chapter 11

Lecture 10 March 4. Goals: hashing. hash functions. closed hashing. application of hashing

CS 251, LE 2 Fall MIDTERM 2 Tuesday, November 1, 2016 Version 00 - KEY

CS 112 Final May 8, 2008 (Lightly edited for 2012 Practice) Name: BU ID: Instructions

HASH TABLES. Hash Tables Page 1

Successful vs. Unsuccessful

ECE242 Data Structures and Algorithms Fall 2008

Second Semester - Question Bank Department of Computer Science Advanced Data Structures and Algorithms...

HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018

Module 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo

Data Structure. A way to store and organize data in order to support efficient insertions, queries, searches, updates, and deletions.

TREES. Trees - Introduction

Fundamental Algorithms

Lecture 17. Improving open-addressing hashing. Brent s method. Ordered hashing CSE 100, UCSD: LEC 17. Page 1 of 19

Open Addressing: Linear Probing (cont.)

CSE Data Structures and Introduction to Algorithms... In Java! Instructor: Fei Wang. Binary Search Trees. CSE2100 DS & Algorithms 1

Search Trees - 2. Venkatanatha Sarma Y. Lecture delivered by: Assistant Professor MSRSAS-Bangalore. M.S Ramaiah School of Advanced Studies - Bangalore

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2

Questions. 6. Suppose we were to define a hash code on strings s by:

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

CS Transform-and-Conquer

CSC Design and Analysis of Algorithms

Splay Trees. (Splay Trees) Data Structures and Programming Spring / 27

INSTITUTE OF AERONAUTICAL ENGINEERING

Transcription:

IS 2610: Data Structures Searching March 29, 2004

Symbol Table A symbol table is a data structure of items with keys that supports two basic operations: insert a new item, and return an item with a given key Examples: Account information in banks Airline reservations

Symbol Table ADT Key operations Insert a new item Search for an item with a given key Delete a specified item Select the k th smallest item Sort the symbol table Join two symbol tables void STinit(int); int STcount(); void STinsert(Item); Item STsearch(Key); void STdelete(Item); Item STselect(int); void STsort(void (*visit)(item));

Key-indexed ST Simplest search algorithm is based on storing items in an array, indexed by the keys static Item *st; static int M = maxkey; void STinit(int maxn) { int i; st = malloc((m+1)*sizeof(item)); for (i = 0; i <= M; i++) st[i] = NULLitem; } int STcount() { int i, N = 0; for (i = 0; i < M; i++) if (st[i]!= NULLitem) N++; return N; } void STinsert(Item item) { st[key(item)] = item; } Item STsearch(Key v) { return st[v]; } void STdelete(Item item) { st[key(item)] = NULLitem; } Item STselect(int k) { int i; for (i = 0; i < M; i++) if (st[i]!= NULLitem) if (k-- == 0) return st[i]; } void STsort(void (*visit)(item)) { int i; for (i = 0; i < M; i++) if (st[i]!= NULLitem) visit(st[i]); }

Sequential Search based ST When a new item is inserted, we put it into the array by moving the larger elements over one position (as in insertion sort) To search for an element Look through the array sequentially If we encounter a key larger than the search key we report an error

Binary Search Divide and conquer methodology Divide the items into two parts Determine which part the search key belongs to and concentrate on that part Keep the items sorted Use the indices to delimit the part searched. Item search(int l, int r, Key v) { int m = (l+r)/2; if (l > r) return NULLitem; if eq(v, key(st[m])) return st[m]; if (l == r) return NULLitem; if less(v, key(st[m])) return search(l, m-1, v); else return search(m+1, r, v); } Item STsearch(Key v) { return search(0, N-1, v); }

Binary Search Tree NST is a binary tree A key is associated with each of its internal nodes Key in any node is larger than (or equal to) the keys in all nodes in that node s left subtree is smaller than (or equal to) the keys in all nodes in that node s right subtree What is the output of inorder traversal on BST?

BST insertion Insert L!! T O G S P N A E R A I M X void STinsert(Item item) { Key v = key(item); link p = head, x = p; if (head == NULL) { head = NEW(item, NULL, NULL, 1); return; } while (x!= NULL) { p = x; x->n++; x = less(v, key(x->item))? x->l : x->r; } x = NEW(item, NULL, NULL, 1); if (less(v, key(p->item))) p->l = x; else p->r = x; } link insertr(link h, Item item) { Key v = key(item), t = key(h->item); if (h == z) return NEW(item, z, z, 1); if less(v, t) h->l = insertr(h->l, item); else h->r = insertr(h->r, item); (h->n)++; return h; } void STinsert(Item item) { head = insertr(head, item); }

BST Complexities Best and worst case heights ln N and N Search costs Internal path length is related to search hit External path length is related to search miss N random keys Average: Insertion, Search hit and Search miss require about 2 ln N comparisons Worst case search: N comparisons

Basic Rotations Transformations to rearrange nodes in a tree Maintain BST Changes three pointers link rotl(link h) { link x = h->r; h->r = x->l; x->l = h; return x; } link rotr(link h) { link x = h->l; h->l = x->r; x->r = h; return x; }

Balanced Trees BST worst case is bad!! Keep trees balanced so that searches can be done in less than ln N + 1 comparisons Maintenance cost incurred! Splay trees (Self-adjusting) Tree automatically reorganizes itself after each op When insert or search for x, rotate x up to root using double rotations Tree remains balanced without explicitly storing any balance information

Splay trees Check two links above current node ZIG-ZAG: if orientations differ, same as root insertion ZIG-ZIG: if orientations match, do top rotation first (unlike bottom rotation in root insertion using basic rotations)

2-3-4 Trees Nodes can hold more than one key 2-nodes : 1 key; two links 3-nodes : 2 keys; three links 4-nodes : 3 keys; four links A balanced 2-3-4 tree Links to empty trees are at the same hieght R R R C, R A S A, C S A, C, H S A H S

2-3-4 Trees How doe you Search? Insert Search to bottom for key 2-node at bottom: convert to 3-node 3-node at bottom: convert to 4-node 4-node at bottom split Whenever root becomes 4 node split it into a triangle of three 2-nodes Add E

Red black trees Represent 2-3-4 trees as binary trees

Hashing Save items in a key-indexed table Index is a function of the key Hash function function to compute table index from search key Collision resolution strategy Algorithms and data structures to handle two keys that hash to the same index One approach use linked list

Hashing Time-space complexity No space limitation Any search can be done in one memory access No time limitation Use limited memory and do sequential search Limitation on both Hashing to balance

Hash function: h Given a hash table of size M h(key) is a value in [0,.., M] Ideally, for each input, every output should be equally likely Simple methods Modular hash function h(k) = K mod M; choose M as prime Multiplicative and modular methods h(k) = (Kα) mod M; choose M as prime A popular choice is α= 0.618033 (golden ration)

Hash Function: h Strings of characters 26 4.5 Million 4-char keys Table size M = 101 Binary 01100001 01100010 01100011 01100100 Hex 61 62 63 64 Dec 97 98 99 100 ascii a b c d abcd hashes to 11 0x61626364 % 101 = 16338831724 % 101 = 11 dcba hashes to 57 Collision is inevitable

Hash function: h Horner s method Binary 0x61626364 = 256*(256*(256*97+98) + 99)+100 0x61626364 mod 101 = 256*(256*(256*97+98) + 99)+100 mod 101 Can take mod after each op (256*97+98) mod 101 = 84 (256*84+99) mod 101 = 90 (256*90+100) mod 101 = 11 N add, multiply and mod ops 01100001 01100010 01100011 01100100 int hash(char *v, int M) { int h = 0, a = 127; for (; *v!= '\0'; v++) h = (a*h + *v) % M; return h; } Why 127 instead of 128? Hex 61 62 63 64 Dec 97 98 99 100 ascii a b c d

Universal Hashing and collision Universal function Chance of collision for two distinct keys for table size M is precisely 1/M How to handle the case when two keys hash to the same value Separate chaining Open addressing linear probe Double hashing Dynamic hash increase table size dynamically int hashu(char *v, int M) { int h, a = 31415, b = 27183; for (h = 0; *v!= '\0'; v++, a = a*b % (M-1)) h = (a*h + *v) % M; return h; } Performs well in practice!

Separate Chaining A linked list for each hash address M linked lists M much smaller than N Property 14.1: Number of comparisons Reduced by factor of M Average length of the lists is N/M Search the list Unordered: insert takes constant time Search is proportional to N/M

Open Addressing Open addressing M is much larger than N Plenty of empty table slots When a new key collides find an empty slot Complex collision patterns Linear Probing When collision occurs, check (probe) the next position in the table Wrap around the table to find an empty slot

Linear Probing A S E R C H I N G X M P Load factor 7 3 9 9 8 4 11 7 10 12 0 8 α - fraction of the table positions that are occupied (less than 1) Search increases with the value of α Search loops infinitely when α = 1 Insert: ½(1+ (/(1- α)2)

Double Hashing Avoid clustering using second hash Take hash function relatively prime to avoid from probe sequence to be very short Make M prime Choose second has value that returns values less than M A useful second hash: (k mod 97) + 1