Search Engine Report May

Similar documents
Balanced Binary Search Trees

INF2220: algorithms and data structures Series 3

Direct Addressing Hash table: Collision resolution how handle collisions Hash Functions:

Closed Book Examination. One and a half hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Wednesday 20 th January 2010

The questions will be short answer, similar to the problems you have done on the homework

BRONX COMMUNITY COLLEGE of the City University of New York DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE. Sample Final Exam

CS301 - Data Structures Glossary By

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

CSC Design and Analysis of Algorithms

Hash table basics mod 83 ate. ate

EECS 311 Data Structures Midterm Exam Don t Panic!

CSC Design and Analysis of Algorithms. Lecture 7. Transform and Conquer I Algorithm Design Technique. Transform and Conquer

CISC 235: Topic 4. Balanced Binary Search Trees

Binary Search Trees. Analysis of Algorithms

Section 05: Solutions

PROBLEM 1 : (And the winner is...(12 points)) Assume you are considering the implementation of a priority queue that will always give you the smallest

CS 2150 (fall 2010) Midterm 2

Sample Solutions CSC 263H. June 9, 2016

Module 3: Hashing Lecture 9: Static and Dynamic Hashing. The Lecture Contains: Static hashing. Hashing. Dynamic hashing. Extendible hashing.

Hash Tables. Gunnar Gotshalks. Maps 1

Total No. of Questions :09] [Total No. of Pages : 02. II/IV B.Tech. DEGREE EXAMINATIONS, NOV/DEC First Semester CSE/IT DATA STRUCTURES USING C

Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University

CS 206 Introduction to Computer Science II

Search Trees. Undirected graph Directed graph Tree Binary search tree

lecture23: Hash Tables

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

CSC Design and Analysis of Algorithms. Lecture 7. Transform and Conquer I Algorithm Design Technique. Transform and Conquer

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

Trees. (Trees) Data Structures and Programming Spring / 28

UNIT III BALANCED SEARCH TREES AND INDEXING

Section 1: True / False (1 point each, 15 pts total)

Practice Midterm Exam Solutions

In-Memory Searching. Linear Search. Binary Search. Binary Search Tree. k-d Tree. Hashing. Hash Collisions. Collision Strategies.

Lecture No. 10. Reference Variables. 22-Nov-18. One should be careful about transient objects that are stored by. reference in data structures.

Hash table basics mod 83 ate. ate. hashcode()

CSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators)

CS350: Data Structures B-Trees

Course goals. exposure to another language. knowledge of specific data structures. impact of DS design & implementation on program performance

Self-Balancing Search Trees. Chapter 11

CSE 332 Winter 2015: Midterm Exam (closed book, closed notes, no calculators)

Section 05: Solutions

THINGS WE DID LAST TIME IN SECTION

Unit III - Tree TREES

Chapter 27 Hashing. Liang, Introduction to Java Programming, Eleventh Edition, (c) 2017 Pearson Education, Inc. All rights reserved.

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

AVL trees and rotations

Chapter 17. Disk Storage, Basic File Structures, and Hashing. Records. Blocking

Lecture 7. Transform-and-Conquer

Open Addressing: Linear Probing (cont.)

Part 1: Written Questions (60 marks):

CSE 373 Midterm 2 2/27/06 Sample Solution. Question 1. (6 points) (a) What is the load factor of a hash table? (Give a definition.

CPS 616 TRANSFORM-AND-CONQUER 7-1

CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators)

Module 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo

COSC 2007 Data Structures II Final Exam. Part 1: multiple choice (1 mark each, total 30 marks, circle the correct answer)

CSI33 Data Structures

SELF-BALANCING SEARCH TREES. Chapter 11

AVL Trees (10.2) AVL Trees

CS : Data Structures

Selection, Bubble, Insertion, Merge, Heap, Quick Bucket, Radix

Data Structures in Java

CS 350 : Data Structures B-Trees

Uses for Trees About Trees Binary Trees. Trees. Seth Long. January 31, 2010

Hashing Algorithms. Hash functions Separate Chaining Linear Probing Double Hashing

Computer Science Foundation Exam

Augmenting Data Structures

Balanced Binary Search Trees

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY THIRD SEMESTER B.TECH DEGREE EXAMINATION, JULY 2017 CS205: DATA STRUCTURES (CS, IT)

CS301 All Current Final Term Paper Subjective 2013 Solved with refernces

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

A6-R3: DATA STRUCTURE THROUGH C LANGUAGE

STRUKTUR DATA. By : Sri Rezeki Candra Nursari 2 SKS

MA/CSSE 473 Day 23. Binary (max) Heap Quick Review

Dictionaries and Hash Tables

Implementing Hash and AVL

Course Review. Cpt S 223 Fall 2009

CSE373: Data Structures & Algorithms Lecture 17: Hash Collisions. Kevin Quinn Fall 2015

CS102 Binary Search Trees

Cpt S 223 Fall Cpt S 223. School of EECS, WSU

Course Review for Finals. Cpt S 223 Fall 2008

Final Examination CSE 100 UCSD (Practice)

Question Bank Subject: Advanced Data Structures Class: SE Computer

DATA STRUCTURES AND ALGORITHMS. Hierarchical data structures: AVL tree, Bayer tree, Heap

Data and File Structures Chapter 11. Hashing

Data Structures Lesson 7

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

CS 350 : Data Structures Hash Tables

Assume you are given a Simple Linked List (i.e. not a doubly linked list) containing an even number of elements. For example L = [A B C D E F].

Symbol Tables. For compile-time efficiency, compilers often use a symbol table: associates lexical names (symbols) with their attributes

CSCI2100B Data Structures Trees

BINARY SEARCH TREES cs2420 Introduction to Algorithms and Data Structures Spring 2015

Disk Accesses. CS 361, Lecture 25. B-Tree Properties. Outline

Section 4 SOLUTION: AVL Trees & B-Trees

Introduction p. 1 Pseudocode p. 2 Algorithm Header p. 2 Purpose, Conditions, and Return p. 3 Statement Numbers p. 4 Variables p. 4 Algorithm Analysis

Module Contact: Dr Geoff McKeown, CMP Copyright of the University of East Anglia Version 1

Balanced Binary Search Trees. Victor Gao

Transcription:

Search Engine Report May 02 2016 Momin Irfan Luke Wood May 02 2016

Description of Data Structures: AVL Tree: An AVL Tree is a specialization of the binary search tree. Like a binary search tree, it stores data into nodes that have a relationship with each other based on the data that is contained in the nodes. By convention, if the data of a node that is being added to the structure is less than the data of an existing node, it is stored to the left of that node. Conversely, if the data of a node that is being added to the structure is greater than the data of an existing node, it is stored to the right of that node. What this allows is the computer to be able to cut close to half of the data while searching or inserting based on the understanding that this property is met (known as the binary search property). This basic idea leads to the conclusion that searching and inserting in a binary search tree is O (lg(n)). However, given a certain data set, it is possible to degenerate this relationship to O(n). Say for example a user enters values into a tree in an increasing order. The values will always be added to the right of the list, and therefore, the tree will be no different from a linked list. To combat this, there exists the AVL tree. The AVL tree is a tree that can keep itself from becoming lopsided, or node heavy on one end. It accomplishes this by maintain the height of each node. If the difference of heights of a node is >= 2, the AVL rebalance while still adhering to the binary search property. The four ways it will accomplish this is either rotating the alpha node (the unbalanced node) with its left child, its right child, double with its left child, or double with its right child. All of these operation cause for a balanced tree. This guarantees O (lg(n)) for both insertion and deletion, as there will be no degeneration occurring.

Hash Table: A hash table is a structure that works very similarly to the way arrays work. Each element had an index based on order of addition, or many other various causes. Where arrays and Hash Table s separate is how that index is chosen. Although with an array the index is chosen by index * size, (given homogeneous typing), hash tables have complicated hash functions that take answers and churn out large numbers. A Hash Table takes advantage of this repeated relationship. It mods (%) that value with its table size and finds an index in the array. Thus then to locate the value again you run the process backwards. This pulls up the question of how to deal with collisions, or having two elements map to the same bucket. One way is linear probing, which is to just move the new object to the next available spot. Although this is a simple solution, it creates clusters in the Hash Table. Having to linear probe thought the entire array not only takes up space, but is inefficient. To combat this we use separate chaining, which is to create an array of a data structure as the type of our hash table. In this project we use the AVL Trees we created. If we can guarantee that these AVL Trees get no larger than a certain size, we have a data structure that searches, and inserts at a O (1) rate. Although re-hashing will cause the overall time to fell potentially wrong, amortizing that function will still yield us an O (1) search time and an O (1) insert time.

This is an example of a Hash Table that uses separate chaining, as well as our implemented AVL Tree. AVL Tree vs. Hash Table Given this data, it is easy to picture that the Hash Table is the more efficient data structure. We expect constant time access for the hash table vs log (n) based access for the AVL Tree. Based on the data we calculated, this theory seems to be true. The following is a table timing the parsing into each data structure. Data Set AVL Tree Time (s) Hash Table Time (s) 1000 2 3 10000 10 12 100000 140 110 ~300000 192 120

What this data shows is that as the data set gets larger, the Hash Table becomes the more efficient structure. However for small data sets, the AVL Tree is the winner. This is most likely because although the Hash Table is O (1), the hidden constant is very large. This is because the hash function is difficult to compute and creating space in an array is more difficult than just dereferencing and traversing a few pointers. An AVL Tree has a smaller hidden constant, making it better on smaller data sets. This shows that although Hash Tables are the more efficient data structure, they are only good once after a certain size. The hidden constant in the big O can cause these subtle changes.