Data Structure. A way to store and organize data in order to support efficient insertions, queries, searches, updates, and deletions.

Similar documents
DATA STRUCTURES + SORTING + STRING

COMP 251 Winter 2017 Online quizzes with answers

Course Review. Cpt S 223 Fall 2009

UNIT III BALANCED SEARCH TREES AND INDEXING

Balanced Binary Search Trees. Victor Gao

CS301 - Data Structures Glossary By

Summer Final Exam Review Session August 5, 2009

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

Course Review for Finals. Cpt S 223 Fall 2008

Programming II (CS300)

Cpt S 223 Fall Cpt S 223. School of EECS, WSU

Advanced Set Representation Methods

CS102 Binary Search Trees

Programming II (CS300)

Course Review. Cpt S 223 Fall 2010

Final Examination CSE 100 UCSD (Practice)

Data Structures. Outline. Introduction Linked Lists Stacks Queues Trees Deitel & Associates, Inc. All rights reserved.

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

BINARY HEAP cs2420 Introduction to Algorithms and Data Structures Spring 2015

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

Module 2: Classical Algorithm Design Techniques

CS-301 Data Structure. Tariq Hanif

CS251-SE1. Midterm 2. Tuesday 11/1 8:00pm 9:00pm. There are 16 multiple-choice questions and 6 essay questions.

Dictionaries. Priority Queues

CSCI-1200 Data Structures Fall 2018 Lecture 23 Priority Queues II

CS350: Data Structures Heaps and Priority Queues

CS 240 Fall Mike Lam, Professor. Priority Queues and Heaps

Lecture 7. Transform-and-Conquer

CSE373: Data Structures & Algorithms Lecture 28: Final review and class wrap-up. Nicki Dell Spring 2014

Trees. (Trees) Data Structures and Programming Spring / 28

Priority Queues Heaps Heapsort

End-Term Examination Second Semester [MCA] MAY-JUNE 2006

Sorting and Selection

CSE 332: Data Structures & Parallelism Lecture 7: Dictionaries; Binary Search Trees. Ruth Anderson Winter 2019

Lecture Summary CSC 263H. August 5, 2016

Queues. ADT description Implementations. October 03, 2017 Cinda Heeren / Geoffrey Tien 1

Data structures. Organize your data to support various queries using little time and/or space

Outline. Graphs. Divide and Conquer.

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

Data Structures and Algorithms

CSE 373 MAY 10 TH SPANNING TREES AND UNION FIND

CSC 421: Algorithm Design Analysis. Spring 2013

COMP Data Structures

(2,4) Trees. 2/22/2006 (2,4) Trees 1

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

COSC 2007 Data Structures II Final Exam. Part 1: multiple choice (1 mark each, total 30 marks, circle the correct answer)

INSTITUTE OF AERONAUTICAL ENGINEERING

Assume you are given a Simple Linked List (i.e. not a doubly linked list) containing an even number of elements. For example L = [A B C D E F].

! Tree: set of nodes and directed edges. ! Parent: source node of directed edge. ! Child: terminal node of directed edge

Augmenting Data Structures

Solutions to Exam Data structures (X and NV)

CMSC 341 Lecture 15 Leftist Heaps

University of Illinois at Urbana-Champaign Department of Computer Science. Second Examination

Section 4 SOLUTION: AVL Trees & B-Trees

Selection, Bubble, Insertion, Merge, Heap, Quick Bucket, Radix

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Abstract vs concrete data structures HEAPS AND PRIORITY QUEUES. Abstract vs concrete data structures. Concrete Data Types. Concrete data structures

COSC160: Data Structures Heaps. Jeremy Bolton, PhD Assistant Teaching Professor

Bioinformatics Programming. EE, NCKU Tien-Hao Chang (Darby Chang)

CSci 231 Final Review

Question Bank Subject: Advanced Data Structures Class: SE Computer

Self-Balancing Search Trees. Chapter 11

E.G.S. PILLAY ENGINEERING COLLEGE (An Autonomous Institution, Affiliated to Anna University, Chennai) Nagore Post, Nagapattinam , Tamilnadu.

- Alan Perlis. Topic 24 Heaps

9. Heap : Priority Queue

2-3 Tree. Outline B-TREE. catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } ADD SLIDES ON DISJOINT SETS

CSE100 Practice Final Exam Section C Fall 2015: Dec 10 th, Problem Topic Points Possible Points Earned Grader

Introduction to Competitive Programming. Instructor: William W.Y. Hsu

Trees 2: Linked Representation, Tree Traversal, and Binary Search Trees

CMSC 341 Lecture 15 Leftist Heaps

Data Structures and Algorithms

Lecture 9: Balanced Binary Search Trees, Priority Queues, Heaps, Binary Trees for Compression, General Trees

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

Section 05: Solutions

(2,4) Trees Goodrich, Tamassia. (2,4) Trees 1

Priority Queues and Binary Heaps

Operations on Heap Tree The major operations required to be performed on a heap tree are Insertion, Deletion, and Merging.

Sorted Arrays. Operation Access Search Selection Predecessor Successor Output (print) Insert Delete Extract-Min

CSCI 136 Data Structures & Advanced Programming. Lecture 22 Fall 2018 Instructor: Bills

Data Structures Question Bank Multiple Choice

Course Review for. Cpt S 223 Fall Cpt S 223. School of EECS, WSU

Balanced Search Trees

COMP 103 Introduction to Data Structures and Algorithms

CSE373: Data Structures & Algorithms Lecture 4: Dictionaries; Binary Search Trees. Kevin Quinn Fall 2015

8. Binary Search Tree

Disclaimer. CS130A Winter 2008 Final Review. Information on the final exam. Suggestions

COMP171. AVL-Trees (Part 1)

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

CS210 (161) with Dr. Basit Qureshi Final Exam Weight 40%

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs

W4231: Analysis of Algorithms

COMP 103 RECAP-TODAY. Priority Queues and Heaps. Queues and Priority Queues 3 Queues: Oldest out first

Trees. A tree is a directed graph with the property

CS350: Data Structures B-Trees

(b) If a heap has n elements, what s the height of the tree?

Greedy Approach: Intro

CSC 421: Algorithm Design Analysis. Spring 2017

SORTING. Practical applications in computing require things to be in order. To consider: Runtime. Memory Space. Stability. In-place algorithms???

CS 350 : Data Structures B-Trees

Transcription:

DATA STRUCTURES COMP 321 McGill University These slides are mainly compiled from the following resources. - Professor Jaehyun Park slides CS 97SI - Top-coder tutorials. - Programming Challenges book.

Data Structure A way to store and organize data in order to support efficient insertions, queries, searches, updates, and deletions.

Data Structure Basic data structures (built-in libraries). Linear DS. Non-Linear DS. Data structures (Own libraries). Graphs. Union-Find Structures. Segment Tree.

Data Structures Basic data structures (built-in libraries). Linear DS (ordering the elements sequentially). Static Array (Array in C/C++ and in Java). Resizeable array (C++ STL<vector> and Java ArrayList). Linked List: (C++ STL<list> and Java LinkedList). Stack (C++ STL<stack> and Java Stack). Queue (C++ STL <queue> and Java Queue).

Data Structures Basic data structures (built-in libraries). Non-Linear DS. Balanced Binary Search Tree (C++ STL <map>/<set> and in Java TreeMap/TreeSet). AVL and Red-Black Trees = Balanced BST <map> stores (key -> data) VS <set> only stores the key Heap(C++ STL<queue>:priority_queue and Java PriorityQueue). BST complete. Heap property VS BST property. Hash Table (Java HashMap/HashSet/HashTable). Non synchronized vs synchronized. Null vs non-nulls Predictable iteration (using LinkedHashMap) vs non predictable.

Question for you. Basic data structures (built-in libraries). Non Linear DS (non-sequential ordering). Balanced Binary Search Tree (C++ STL <map>/<set> and Java TreeMap/TreeSet) AVL Tree and Red-Black = Balanced BST. <map> stores (key -> data) VS <set> only stores the key Heap (C++ STL <queue> and Java PriorityQueue) Heap property VS BST property. Complete BST. Hash table

Question for you. Basic data structures (built-in libraries). Non Linear DS (non-sequential ordering). Balanced Binary Search Tree (C++ STL <map>/<set> and Java TreeMap/TreeSet) AVL Tree and Red-Black = Balanced BST. <map> stores (key -> data) VS <set> only stores the key Heap (C++ STL <queue> and Java PriorityQueue) Heap property VS BST property. Complete BST. Hash table BST HEAP Do you recognize the problem?

Deciding the Order of the Tasks Returns the newest task (stack) Returns the oldest task (queue) Returns the most urgent task (priority queue) Returns the easiest task (priority queue)

STACK Last in, first out (Last In First Out) Stacks model piles of objects (such as dinner plates) Supports three constant-time operations Push(x): inserts x into the stack Pop(): removes the newest item Top(): returns the newest item Very easy to implement using an array

STACK Have a large enough array s[] and a counter k, which starts at zero Push(x) : set s[k] = x and increment k by 1 Pop() : decrement k by 1 Top() : returns s[k - 1] (error if k is zero) C++ and Java have implementations of stack stack (C++), Stack (Java)

STACK Useful for: Processing nested formulas Depth-first graph traversal Data storage in recursive algorithms

QUEUE First in, first out (FIFO) Supports three constant-time operations Enqueue(x) : inserts x into the queue Dequeue() : removes the oldest item Front() : returns the oldest item Implementation is similar to that of stack

QUEUE Assume that you know the total number of elements that enter the queue... which allows you to use an array for implementation If not, you can use linked lists or double linked lists Maintain two indices head and tail Dequeue() increments head Enqueue() increments tail Use the value of tail - head to check emptiness You can use queue (C++) and Queue (Java)

QUEUE Useful for implementing buffers simulating waiting lists shuffling cards

PRIORITY QUEUE Each element in a PQ has a priority value Three operations: Insert(x, p) : inserts x into the PQ, whose priority is p RemoveTop() : removes the element with the highest priority Top() : returns the element with the highest priority All operations can be done quickly if implemented using a heap (if not use a sorted array) priority_queue (C++), PriorityQueue (Java) Useful for Maintaining schedules / calendars Simulating events Sweepline geometric algorithms

HEAP Complete binary tree with the heap property: The value of a node values of its children What is the difference between full vs complete? The root node has the maximum value Constant-time top() operation Inserting/removing a node can be done in O(log n) time without breaking the heap property May need rearrangement of some nodes

HEAP Start from the root, number the nodes 1, 2,... from left to right Given a node k easy to compute the indices of its parent and children Parent node: floor(k/2) Children: 2k, 2k + 1

Heap Inserting a Node 1. Make a new node in the last level, as far left as possible If the last level is full, make a new one 2. If the new node breaks the heap property, swap with its parent node The new node moves up the tree, which may introduce another conflict Repeat 2 until all conflicts are resolved Running time = tree height = O(log n)

Heap Deleting a Node 1. Remove the root, and bring the last node (rightmost node in the last level) to the root 2. If the root breaks the heap property, look at its children and swap it with the larger one Swapping can introduce another conflict 3 Repeat 2 until all conflicts are resolved Running time = O(log n)

BINARY SEARCH TREE (BST) The idea behind is that each node has, at most, two children A binary tree with the following property: for each node v, value of v values in v s left subtree value of v < values in v s right subtree

BST Supports three operations Insert(x) : inserts a node with value x Delete(x) : deletes a node with value x, if there is any Find(x) : returns the node with value x, if there is any Many extensions are possible Count(x) : counts the number of nodes with value less than or equal to x GetNext(x) : returns the smallest node with value x

BST Simple implementation cannot guarantee efficiency In worst case, tree height becomes n (which makes BST useless) Guaranteeing O(log n) running time per operation requires balancing of the tree (hard to implement). For example AVL and Red-Black trees (We will skip the details of these balanced trees, but you should review it.). What does balanced mean?? Use the standard library implementations set, map (C++) TreeSet, TreeMap (Java)

BST Simple implementation cannot guarantee efficiency In worst case, tree height becomes n (which makes BST useless) Guaranteeing O(log n) running time per operation requires balancing of the tree (hard to implement). For example AVL and Red-Black trees (We will skip the details of these balanced trees, but you should be review it.). What does balanced mean??=> The heights of the two child subtrees of any node differ by at most one. Use the standard library implementations set, map (C++) TreeSet, TreeMap (Java)

Question for you Why a binary tree is preferable to an array of values that has been sorted? O(?) Finding a given key?

Question for you Why a binary tree is preferable to an array of values that has been sorted? O(log n) to find a given key => traversing BST and binary search. Problem is the adding of a new item.

Hash Tables A key is used as an index to locate the associated value. Content-based retrieval, unlike position-based retrieval. Hashing is the process of generating a key value. An ideal algorithm must distribute evenly the hash values => the buckets will tend to fill up evenly = fast search. A hash bucket containing more than one value is known as a collision. Open addressing => A simple rule to decide where to put a new item when the desired space is already occupied. Chaining => We associate a linked list with each table location. Hash tables are excellent dictionary data structures.

Hash Function A function that takes a string and outputs a number A good hash function has few collisions i.e., If x!= y, H(x)!= H(y) with high probability An easy and powerful hash function is a polynomial mod some prime p. Consider each letter as a number (ASCII value is fine) H(x 1... x k ) = x 1 a k 1 + x 2 a k 2 +... + x k 1 a + x k (mod p)

Data Structures Data structures (Own Libraries). Graph. Lets talk about graphs later. Union-Find Disjoint Sets Segment tree.

Union-Find Structure Used to store disjoint sets What is a disjoint set? Can support two types of operations efficiently Find(x) : returns the representative of the set that x belongs Union(x, y) : merges two sets that contain x and y Both operations can be done in (essentially) constant time Super-short implementation! Useful for problems involving partitioning. Ex: keeping track of connected components. Kruskal s algorithm (minimum spaning tree).

Union-Find Structure Used to store disjoint sets What is a disjoint set? => sets whose intersection is the empty set. Can support two types of operations efficiently Find(x) : returns the representative of the set that x belongs Union(x, y) : merges two sets that contain x and y Both operations can be done in (essentially) constant time Super-short implementation! Useful for problems involving partitioning. Ex: keeping track of connected components. Kruskal s algorithm (minimum spaning tree).

Union-Find Structure Main idea: represent each set by a rooted tree Every node maintains a link to its parent A root node is the representative of the corresponding set Example: two sets {x, y, z} and {a, b, c, d}

Union-Find Structure Find(x): follow the links from x until a node points itself This can take O(n) time but we will make it faster Union(x, y): run Find(x) and Find(y) to find corresponding root nodes and direct one to the other. If we assume that the links are stored in L[], then

Union-Find Structure In a bad case, the trees can become too deep... which slows down future operations Path compression makes the trees shallower every time Find() is called. We don t care how a tree looks like as long as the root stays the same After Find(x) returns the root, backtrack to x and reroute all the links to the root

Question for you How can you implement the operation issameset(i,j)?

Question for you How can you implement the operation issameset(i,j)? simply calls findset(i) and findset(j) to check if both refer to the same representative.

Segment Tree DS to efficiently answer dynamic range queries. Range Minimum Query (RMQ): finding the index of the minimum element in an array given a range: [i..j]. Ex. RMQ(1, 3) = 2, RMQ(3, 4) = 4, RMQ(0, 0) = 0, RMQ(0, 1) = 1, and RMQ(0, 6) = 5. Iterate takes O(n), let make it faster using a binary tree similar to heap, but usually not a complete binary tree (aka segment tree).

Segment Tree Binary tree. Each node is associated with some interval of the array. Each non-leaf node has two children whose associated intervals are disjoint. Each child s interval has approximately half the size of the parent s interval.

Segment Tree Root => [0, N 1] and for each segment [l,r] we split them into [l, (l + r) / 2] and [(l + r) / 2 + 1, r] until l = r.

Question for you What is the complexity of built_segment_tree O(?)? With segment tree ready, what is the complexity of answering an RMQ? Can you give the worst case? RMQ(?,?)

Question for you What is the complexity of built_segment_tree O(n) There are total 2n-1 nodes. With segment tree ready, what is the complexity of answering an RMQ => O(log n) (2 root-to-leaf paths) Ex RMQ(4,6) = blue line. Ex RMQ(1,3) = red line. Ex RMQ(3,4) = worst case => one path from [0,6] to [3,3] and another from [0,6] to [4,4].

Segment Tree If the array A is static, then use a Dynamic Programming solution that requires O(nlogn) pre-processing and O(1) per RMQ. Segment tree becomes useful if array A is frequently updated. Ex. Updating A[5] takes O(logn) vs O(nlogn) required by DP.

Fenwick Tree Full binary tree with at least n leaf nodes We will use n = 8 for our example kth leaf node stores the value of item k Each internal node stores the sum of values of its children e.g., Red node stores item[5] + item[6]

Summing Consecutive Values Main idea: choose the minimal set of nodes whose sum gives the desired value at most 1 node is chosen at each level so that the total number of nodes we look at is log 2 n and this can be done in O(log n) time

Summing Consecutive Values Sum(7) = sum of the values of gold-colored nodes.

Summing Consecutive Values Sum(8) = sum of the values of gold-colored nodes.

Summing Consecutive Values Sum(6) = sum of the values of gold-colored nodes.

Summing Consecutive Values Sum(3) = sum of the values of gold-colored nodes.

Summing Consecutive Values Say we want to compute Sum(k) Maintain a pointer P which initially points at leaf k Climb the tree using the following procedure: If P is pointing to a left child of some node: Add the value of P Set P to the parent node of P s left neighbor If P has no left neighbor, terminate Otherwise: Set P to the parent node of P Use an array to implement

Updating a Value Say we want to do Set(k, x) (set the value of leaf k as x) 1. Start at leaf k, change its value to x 2. Go to its parent, and recompute its value 3. Repeat 2 until the root