STRUKTUR DATA. By : Sri Rezeki Candra Nursari 2 SKS

Similar documents
General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Topic HashTable and Table ADT

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

TABLES AND HASHING. Chapter 13

UNIT III BALANCED SEARCH TREES AND INDEXING

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

AAL 217: DATA STRUCTURES

Hashing Techniques. Material based on slides by George Bebis

Dynamic Dictionaries. Operations: create insert find remove max/ min write out in sorted order. Only defined for object classes that are Comparable

Chapter 20 Hash Tables

COMP171. Hashing.

Hash Tables. Hashing Probing Separate Chaining Hash Function

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

Understand how to deal with collisions

Open Addressing: Linear Probing (cont.)

Algorithms and Data Structures

CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions. Kevin Quinn Fall 2015

CS 310 Advanced Data Structures and Algorithms

HASH TABLES. Goal is to store elements k,v at index i = h k

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Comp 335 File Structures. Hashing

Hash table basics mod 83 ate. ate. hashcode()

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

Data Structures And Algorithms

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

THINGS WE DID LAST TIME IN SECTION

Hashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

Data Structures. Topic #6

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Hash Table and Hashing

HASH TABLES. Hash Tables Page 1

HASH TABLES.

Question Bank Subject: Advanced Data Structures Class: SE Computer

DATA STRUCTURES/UNIT 3

CSI33 Data Structures

Hash Tables. Gunnar Gotshalks. Maps 1

1. Attempt any three of the following: 15

Hash table basics mod 83 ate. ate

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

Hash[ string key ] ==> integer value

Cpt S 223. School of EECS, WSU

DATA STRUCTURES AND ALGORITHMS

Lecture 18. Collision Resolution

Hash Tables and Hash Functions

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

Lecture 16 More on Hashing Collision Resolution

ECE 242 Data Structures and Algorithms. Hash Tables II. Lecture 25. Prof.

BINARY HEAP cs2420 Introduction to Algorithms and Data Structures Spring 2015

CSE 214 Computer Science II Searching

UNIVERSITY OF WATERLOO DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING ECE 250 ALGORITHMS AND DATA STRUCTURES

ECE 242 Data Structures and Algorithms. Hash Tables I. Lecture 24. Prof.

Hash table basics. ate à. à à mod à 83

CPSC 259 admin notes

CSE 373 Autumn 2012: Midterm #2 (closed book, closed notes, NO calculators allowed)

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

Fundamental Algorithms

CS 2412 Data Structures. Chapter 10 Sorting and Searching

CSE 332 Winter 2015: Midterm Exam (closed book, closed notes, no calculators)

Adapted By Manik Hosen

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

Chapter 27 Hashing. Liang, Introduction to Java Programming, Eleventh Edition, (c) 2017 Pearson Education, Inc. All rights reserved.

Hash Tables. Hash functions Open addressing. March 07, 2018 Cinda Heeren / Geoffrey Tien 1

MA/CSSE 473 Day 23. Binary (max) Heap Quick Review

Hashing. Hashing Procedures

CS 3410 Ch 20 Hash Tables

Outline. hash tables hash functions open addressing chained hashing

CSCD 326 Data Structures I Hashing

CS302 - Data Structures using C++

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2

Direct File Organization Hakan Uraz - File Organization 1

CITS2200 Data Structures and Algorithms. Topic 15. Hash Tables

CS301 - Data Structures Glossary By

CSE 100: UNION-FIND HASH

Hash table basics mod 83 ate. ate

CSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators)

Fast Lookup: Hash tables

CPSC 331 Term Test #2 March 26, 2007

stacks operation array/vector linked list push amortized O(1) Θ(1) pop Θ(1) Θ(1) top Θ(1) Θ(1) isempty Θ(1) Θ(1)

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Practice Midterm Exam Solutions

Review of Elementary Data. Manoj Kumar DTU, Delhi

CS 206 Introduction to Computer Science II

Search Engine Report May

CS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table

CS 261 Data Structures

Data Structure. Measuring Input size. Composite Data Structures. Linear data structures. Data Structure is: Abstract Data Type 1/9/2014

Symbol Tables. ASU Textbook Chapter 7.6, 6.5 and 6.3. Tsan-sheng Hsu.

CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators)

CS 350 : Data Structures Hash Tables

Module 2: Classical Algorithm Design Techniques

Introduction to Hashing

CSCE 2014 Final Exam Spring Version A

Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum

Transcription:

STRUKTUR DATA By : Sri Rezeki Candra Nursari 2 SKS

Literatur Sjukani Moh., (2007), Struktur Data (Algoritma & Struktur Data 2) dengan C, C++, Mitra Wacana Media Utami Ema. dkk, (2007), Struktur Data (Konsep & Implementasinya Dalam Bahasa C & Free Pascal di GNU/Linux), Graha Ilmu Hubbard Jhon, R., Ph.D, (2000), Schaum s Outline Of Theory and Problems of Data Structures With C++ McGraw-Hill Bambangworawan Paulus., (2004), Struktur Data Dengan C, Andi Yogyakarta

1. Data dan Struktur Data 2. Array 3. Struktur dan Record 4. Pointer 5. Linked List 6. Stack (Tumpukan) 7. Queue (Antrian) 8. Tree (Pohon) 9. AVL Tree 10. Heap dan B-Tree 11. Sorting 12. Search 13. Hashing 14. Graph Materi

HASH Pertemuan 15 2 SKS

Outline Hashing Definition Hash function Collision resolution Open hashing Separate chaining Closed hashing (Open addressing) Linear probing Quadratic probing Double hashing Primary Clustering, Secondary Clustering Access: insert, find, delete

Hash Tables Hashing is used for storing relatively large amounts of data in a table called a hash table ADT. Hash table is usually fixed as H-size, which is larger than the amount of data that we want to store. We define the load factor ( ) to be the ratio of data to the size of the hash table. hash table item 0 Hash function key maps an item into an 1index in hash 2 range. function 3 H-1

Hash Tables (2) Hashing is a technique used to perform insertions, deletions, and finds in constant average time. To insert or find a certain data, we assign a key to the elements and use a function to determine the location of the element within the table called hash function. Hash tables are arrays of cells with fixed size containing data or keys corresponding to data. For each key, we use the hashing function to map key into some number in the range 0 to H-size-1 using hashing function.

Hash Function Hashing function should have the following features: Easy to compute. Two distinct key map to two different cells in array (Not true in general) - why?. This can be achieved by using direct-address table where universal set of keys is reasonably small. Distributes the keys evenly among cells. One simple hashing function is to use mod function with a prime number. Any manipulation of digits, with least complexity and good distribution can be used.

Hash Function: Truncation Part of the key is simply ignored, with the remainder truncated or concatenated to form the index. Phone no: index 731-3018 338 539-2309 329 428-1397 217

Hash Function: Folding The data can be split up into smaller chunks which are then folded together in some form. Phone no: 3-group index 7313018 73+13+018 104 5392309 53+92+309 454 4281397 42+81+397 520

Hash Function: Modular arithmetic Convert the data into an integer, divide by the size of the hash table, and take the remainder as the index. 3-group index 731+3018 3749 % 100 = 49 539+2309 2848 % 100 = 48 428+1397 1825 % 100 = 25

Choosing a hash function A good has function should satisfy two criteria: 1. It should be quick to compute 2. It should minimize the number of collisions

Example of hash function Hash function for string X = 128 A 3 X 3 + A 2 X 2 + A 1 X 1 + A 0 X 0 (((A 3 X) + A 2 ) X + A 1 ) X + A 0 The result of hash function is much larger than the size of table, so we should modulo the result with the size of hash table.

Example of hash function int hash(string key, int tablesize) { int hashval = 0; for (int i=0; i < key.length(); i++) hashval = (hashval * 128 + key.charat(i)) % tablesize; return hashval % tablesize; } Modulo (A + B) % C = (A % C + B % C) % C (A * B) % C = (A % C * B % C) % C

Example of hash function int hash(string key, int tablesize) { int hashval = 0; for (int i=0; i < key.length(); i++) hashval = (hashval*37+ key.charat(i)); hashval %= tablesize; if (hashval < 0) hashval += tablesize; } return hashval;

Example of hash function int hash(string key, int tablesize) { int hashval = 0; for (int i=0; i < key.length(); i++) hashval += key.charat(i) } return hashval % tablesize;

Collision resolution When two keys map into the same cell, we get a collision. We may have collision in insertion, and need to set a procedure (collision resolution) to resolve it.

Closed Hashing If collision, try to find alternative cells within table. Closed hashing also known as open addressing. For insertion, we try cells in sequence by using incremented function like: h i (x) = (hash(x) + f(i)) mod H-size f(0) = 0 Function f is used as collision resolution strategy. The table is bigger than the number of data. Different method to choose function f : Linear probing Quadratic probing Double hashing

Linear probing Use a linear function f(i) = i Find the first position in the table for the key, which is close to the actual position. Least complex function. May result in primary clustering. Elements that hash to the different location probe the same alternative cells The complexity of this probing is dependent on the value of (load factor). We do not use this probing if > 0.5.

Hashing - insert 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 alpha dawn emerald flamingo hallmark moon. crystal marigold

Hashing - lookup 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 alpha crystal dawn emerald flamingo hallmark moon marigold private. cobalt? marigold? private?

Hashing - delete lazy deletion 0 alpha - why? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 crystal dawn flamingo hallmark marigold private. delete emerald delete moon

Hashing - operation after delete 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 alpha crystal dawn flamingo hallmark marigold private. custom (insert) marigold?

Primary Clustering Elements that hash to the different location cobalt probe the alpha same alternative cells alpha crystal canary crystal dawn dark dawn custom custom flamingo flamingo canary hallmark hallmark marigold private. marigold private.

Quadratic probing Eliminate the primary clustering by selecting f(i) = i 2 There is more problem with a hash table that is more than half full. You have to select appropriate table size that is not square of a number. We can prove that quadratic probing with table size prime number and at least half empty will always find a location for an element. Can use increment to collision by noting that quadratic function f(i) = i 2 = f(i-1) + 2 i - 1. Elements that hash to the same location will probe the same alternative cells (secondary clustering).

Double hashing Collision resolution function is another hash function like f(i) = i * hash2 (x) Each time a factor of hash2 (x) is added to probe. Have to be careful for the choice of second hash function to ensure that it does not come to zero and it probes all the cells. It is essential to have a prime size hash table.

Double Hashing cobalt alpha crystal dawn custom flamingo canary dark alpha crystal dawn custom flamingo done hallmark hallmark marigold marigold private. private.

Open Hashing Collision problems is solved by inserting all elements that hash to the same bucket into a single collection of values. Open Hashing: To keep a linked list of all the elements that are hashed to the same cell (separate chaining). Each cell in the hash table contains a pointer to a linked list containing the data. Functions and Analysis of Open Hashing: Inserting a new element in to the table: We add the element at the beginning or the end of the appropriate linked list. Depending if you want to check for duplicates or not. Also depends on how frequent you expect to access the most recently added elements.

Open Hashing 0 1 2 3 4 5

Open Hashing For search, we use the hash function to determine which linked list holds the element, and then traverse the linked list to find the element. Deletion is done to the element in the appropriate linked list after we find the element to be deleted. We could use other kinds of lists like a tree or another hash table for each cell in the hash table to resolve collision. The main advantage of this method is the fact that it can handle any amount of data (dynamic expansion). The main disadvantage of this method is the memory usage for each cell.

Analysis of Open Hash In general the average length of a list is the load factor. Complexity of insertion depends on hashing function and where insertion is done but in general has the same complexity of insertion to the linked list + time to evaluate the hashing function used. For search, time complexity is the constant time to evaluate the hashing function + traversing the list. Worst case O(n) for search. Average case depends. General rule for open hashing is to make 1. Used for dynamic size data.

Issues Other issues common to all closed hashing resolutions: Confusing after deletion. Simpler than open hashing function Good if we do not expect too many collisions. If search is unsuccessful, we may have to search the whole table. Use of large table compare to number of data expected.

Summary Hash tables: array Hash function: function that maps key into number [0 size of hash table) Collision resolution Open hashing Separate chaining Closed hashing (Open addressing) Linear probing Quadratic probing Double hashing Primary Clustering, Secondary Clustering

Summary Advantage Running time O(1) + O(Collision resolution) Disadvantage Difficult (not efficient) to print all elements in hash table Inefficient to find minimum element or maximum element Not growable (for closed hash/open addressing) Waste some space (load factor)