CSC 421: Algorithm Design Analysis. Spring 2017

Similar documents
CSC 421: Algorithm Design & Analysis. Spring Space vs. time

CSC 421: Algorithm Design Analysis. Spring 2013

CSC 321: Data Structures. Fall 2016

CSC 321: Data Structures. Fall 2017

Lecture 7. Transform-and-Conquer

CSC Design and Analysis of Algorithms. Lecture 8. Transform and Conquer II Algorithm Design Technique. Transform and Conquer

2. Instance simplification Solve a problem s instance by transforming it into another simpler/easier instance of the same problem

CSC Design and Analysis of Algorithms. Lecture 8. Transform and Conquer II Algorithm Design Technique. Transform and Conquer

THINGS WE DID LAST TIME IN SECTION

Module 2: Classical Algorithm Design Techniques

CPS 616 TRANSFORM-AND-CONQUER 7-1

CSC Design and Analysis of Algorithms

Chapter 7. Space and Time Tradeoffs. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

CSC Design and Analysis of Algorithms. Lecture 7. Transform and Conquer I Algorithm Design Technique. Transform and Conquer

Data Structure. A way to store and organize data in order to support efficient insertions, queries, searches, updates, and deletions.

Lecture 13: AVL Trees and Binary Heaps

MA/CSSE 473 Day 23. Binary (max) Heap Quick Review

Priority Queues. e.g. jobs sent to a printer, Operating system job scheduler in a multi-user environment. Simulation environments

9. Heap : Priority Queue

COSC 2007 Data Structures II Final Exam. Part 1: multiple choice (1 mark each, total 30 marks, circle the correct answer)

CS 240 Fall Mike Lam, Professor. Priority Queues and Heaps

Design and Analysis of Algorithms - Chapter 6 6

String Matching. Pedro Ribeiro 2016/2017 DCC/FCUP. Pedro Ribeiro (DCC/FCUP) String Matching 2016/ / 42

Transform & Conquer. Presorting

CS350: Data Structures Heaps and Priority Queues

Selection, Bubble, Insertion, Merge, Heap, Quick Bucket, Radix

MA/CSSE 473 Day 20. Recap: Josephus Problem

CSC Design and Analysis of Algorithms. Lecture 7. Transform and Conquer I Algorithm Design Technique. Transform and Conquer

CMSC 341 Lecture 14: Priority Queues, Heaps

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

HEAPS & PRIORITY QUEUES

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs

Announcements. Midterm exam 2, Thursday, May 18. Today s topic: Binary trees (Ch. 8) Next topic: Priority queues and heaps. Break around 11:45am

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada

Hash table basics mod 83 ate. ate. hashcode()

EXAMINATIONS 2015 COMP103 INTRODUCTION TO DATA STRUCTURES AND ALGORITHMS

Priority Queues. 1 Introduction. 2 Naïve Implementations. CSci 335 Software Design and Analysis III Chapter 6 Priority Queues. Prof.

BINARY HEAP cs2420 Introduction to Algorithms and Data Structures Spring 2015

Priority Queues Heaps Heapsort

CSC Design and Analysis of Algorithms. Lecture 9. Space-For-Time Tradeoffs. Space-for-time tradeoffs

Recall: Properties of B-Trees

Hierarchical data structures. Announcements. Motivation for trees. Tree overview

CSC 1700 Analysis of Algorithms: Heaps

CSCI2100B Data Structures Heaps

Computer Science 210 Data Structures Siena College Fall Topic Notes: Priority Queues and Heaps

CSE100 Practice Final Exam Section C Fall 2015: Dec 10 th, Problem Topic Points Possible Points Earned Grader

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

Taking Stock. IE170: Algorithms in Systems Engineering: Lecture 7. (A subset of) the Collections Interface. The Java Collections Interfaces

Course Review for Finals. Cpt S 223 Fall 2008

Friday Four Square! 4:15PM, Outside Gates

Trees & Tree-Based Data Structures. Part 4: Heaps. Definition. Example. Properties. Example Min-Heap. Definition

Data Structures and Algorithms

Course Review. Cpt S 223 Fall 2009

CSC 321: Data Structures. Fall 2016

CSC 321: Data Structures. Fall 2017

CMSC 341 Priority Queues & Heaps. Based on slides from previous iterations of this course

COMP : Trees. COMP20012 Trees 219

Self-Balancing Search Trees. Chapter 11

Priority Queues and Binary Heaps. See Chapter 21 of the text, pages

CSE 214 Computer Science II Heaps and Priority Queues

Hash table basics mod 83 ate. ate

DATA STRUCTURES + SORTING + STRING

182 review 1. Course Goals

CSCI-1200 Data Structures Fall 2018 Lecture 23 Priority Queues II

Hash table basics. ate à. à à mod à 83

Balanced Search Trees. CS 3110 Fall 2010

AP Computer Science 4325

Sorted Arrays. Operation Access Search Selection Predecessor Successor Output (print) Insert Delete Extract-Min

Advanced Tree Data Structures

LECTURE 3 ALGORITHM DESIGN PARADIGMS

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

Priority Queues and Binary Heaps. See Chapter 21 of the text, pages

UNIT III BALANCED SEARCH TREES AND INDEXING

Final Examination CSE 100 UCSD (Practice)

Binary Trees. BSTs. For example: Jargon: Data Structures & Algorithms. root node. level: internal node. edge.

CSC 8301 Design and Analysis of Algorithms: Heaps

Binary Search Trees Treesort

Data Structures Question Bank Multiple Choice

CSE373: Data Structures & Algorithms Lecture 9: Priority Queues and Binary Heaps. Linda Shapiro Spring 2016

Lecture 9: Balanced Binary Search Trees, Priority Queues, Heaps, Binary Trees for Compression, General Trees

Balanced Binary Search Trees

CSE373: Data Structures & Algorithms Lecture 28: Final review and class wrap-up. Nicki Dell Spring 2014

Algorithms: Design & Practice

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

CSE 373: Data Structures and Algorithms

University of Waterloo CS240 Spring 2018 Help Session Problems

CSE100. Advanced Data Structures. Lecture 8. (Based on Paul Kube course materials)

Abstract Data Types Data Structure Grand Tour Java Collections.

Binary Heaps. COL 106 Shweta Agrawal and Amit Kumar

Course Review. Cpt S 223 Fall 2010

CSE373: Data Structures & Algorithms Lecture 5: Dictionary ADTs; Binary Trees. Linda Shapiro Spring 2016

Question Bank Subject: Advanced Data Structures Class: SE Computer

Announcements HEAPS & PRIORITY QUEUES. Abstract vs concrete data structures. Concrete data structures. Abstract data structures

Test 1 - Closed Book Last 4 Digits of Student ID # Multiple Choice. Write your answer to the LEFT of each problem. 3 points each

Some Search Structures. Balanced Search Trees. Binary Search Trees. A Binary Search Tree. Review Binary Search Trees

Priority Queues and Heaps. Heaps of fun, for everyone!

Balanced Binary Search Trees. Victor Gao

Summer Final Exam Review Session August 5, 2009

University of Waterloo CS240R Fall 2017 Solutions to Review Problems

Red-black trees (19.5), B-trees (19.8), trees

Transcription:

CSC 421: Algorithm Design Analysis Spring 2017 Transform & conquer transform-and-conquer approach presorting, balanced search trees, heaps, Horner's Rule problem reduction space/time tradeoffs heap sort, data structure redundancy, hashing string matching: Horspool algorithm, Boyer-Moore algorithm 1 Transform & conquer the idea behind transform-and-conquer is to transform the given problem into a slightly different problem that suffices e.g., presorting data in a list can simplify many algorithms suppose we want to determine if a list contains any duplicates BRUTE FORCE: compare each item with every other item (N-1) + (N-2) + + 1 = (N-1)N/2 à O(N 2 ) TRANSFORM & CONQUER: first sort the list, then make a single pass through the list and check for adjacent duplicates O(N log N) + O(N) à O(N log N) finding the mode of a list, finding closest points, 2 1

Balanced search trees recall binary search trees we need to keep the tree balanced to ensure O(N log N) search/add/remove OR DO WE? it suffices to ensure O(log N) height, not necessarily minimal height transform the problem of "tree balance" to "relative tree balance" several specialized structures/algorithms exist: AVL trees 2-3 trees red-black trees 3 Red-black trees a red-black tree is a binary search tree in which each node is assigned a color (either red or black) such that 1. the root is black 2. a red node never has a red child 3. every path from root to leaf has the same number of black nodes add & remove preserve these properties (complex, but still O(log N)) red-black properties ensure that tree height < 2 log(n+1) à O(log N) search 4 2

TreeSets & TreeMaps java.util.treeset uses red-black trees to store values è O(log N) efficiency on add, remove, contains java.util.treemap uses red-black trees to store the key-value pairs è O(log N) efficiency on put, get, containskey thus, the original goal of an efficient tree structure is met even though the subgoal of balancing a tree was transformed into "relatively balancing" a tree 5 Scheduling & priority queues many real-world applications involve optimal scheduling balancing transmission of multiple signals over limited bandwidth selecting a job from a printer queue selecting the next disk sector to access from among a backlog multiprogramming/multitasking a priority queue encapsulates these three optimal scheduling operations: ü add item (with a given priority) ü find highest priority item ü remove highest priority item can be implemented as an unordered list à add is O(1), findhighest is O(N), removehighest is O(N) can be implemented as an ordered list à add is O(N), findhighest is O(1), removehighest is O(1) 6 3

Heaps Java provides a java.util.priorityqueue class the underlying data structure is not a list or queue at all it is a tree structure called a heap a complete tree is a tree in which all leaves are on the same level or else on 2 adjacent levels all leaves at the lowest level are as far left as possible note: a complete tree with N nodes will have minimal height = log2 N +1 a heap is complete binary tree in which for every node, the value stored is the values stored in both subtrees (technically, this is a min-heap -- can also define a max-heap where the value is ) 7 Implementing a heap a heap provides for O(1) find min, O(log N) insertion and min removal also has a simple, List-based implementation since there are no holes in a heap, can store nodes in an ArrayList, level-by-level root is at index 0 last leaf is at index size()-1 for a node at index i, children are at 2*i+1 and 2*i+2 30 34 60 36 71 66 71 83 40 94 to add at next available leaf, simply add at end 8 4

Horner's rule polynomials are used extensively in mathematics and algorithm analysis p(x) = a n x n + a n-1 x n-1 + a n-2 x n-2 +... + a 1 x + a 0 how many multiplications would it take to evaluate this function for some value of x? W.G. Horner devised a new formula that transforms the problem p(x) = ( (((a n x + a n-1 )x + a n-2 )x +... + a 1 )x + a 0 can evaluate in only n multiplications and n additions 9 Problem reduction in CSC321, we looked at a number of examples of reducing a problem from one form to another e.g., generate the powerset (set of all subsets) of an N element set S = { x 1, x 2, x 3, x 4 } powerset S = { {}, {x 1 }, {x 2 }, {x 3 }, {x 4 }, {x 1, x 2 }, {x 1, x 3 }, {x 1, x 4 }, {x 2, x 3 }, {x 2, x 4 }, {x 3, x 4 }, {x 1, x 2, x 3 }, {x 1, x 2, x 4 }, {x 1, x 3, x 4 }, {x 2, x 3, x 4 }, {x 1, x 2, x 3, x 4 } } PROBLEM REDUCTION: simplify by reducing it to a problem about bit sequences can map each subset into a sequence of N bits: b i = 1 à x i in subset { x 1, x 4, x 5 } ßà 10011000 0 much simpler to generate all possible N-bit sequences { 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111 } 10 5

lcm & gcd consider calculating the least common multiple of two numbers m & n BRUTE FORCE: reduce each number to its prime factors then multiply (factors in both m & n) (factors only in m) (factors only in n) 24 = 2 2 2 3 60 = 2 2 3 5 lcm(24, 60) = (2 2 3) (2) (5) = 12 2 5 = 120 PROBLEM REDUCTION: can recognize a relationship between lcm & gcd lcm(m, n) = m x n / gcd(m, n) lcm(24, 60) = 24 60 / 12 = 2 60 = 120 gcd can be calculated efficiently using Euclid's algorithm: gcd(a, 0) = a gcd(a, b) = gcd(b, a % b) 11 Reduction to graph searches many problems can be transformed into graph problems 25 0 N Q 5 D D 15 N D 35 D Q N N N N 10 Q N 30 D 20 D 12 6

Water jug problem recall from Die Hard with a Vengeance you have two empty water jugs (4 gallon & 3 gallon capacity) & water supply want to end up with exactly 2 gallons in a jug 0-0 4-0 0-3 3-0 1-3 4-3 4-2 3-3 0-2 13 Other interesting examples multiplication can be transformed into squaring a b = ( (a + b) 2 (a b) 2 ) / 4 find the shortest word ladder construct a graph: vertices are the words in the dictionary there is an edge between v i and v j if they differ by one letter perform a breadth first search, from start word to end word Andrew's algorithm for complex hull transforms the graph into a sorted list of coordinates, then traverses & collects 14 7

Space vs. time for many applications, it is possible to trade memory for speed i.e., additional storage can allow for a more efficient algorithm ArrayList wastes space when it expands (doubles in size) each expansion yields a list with (slightly less than) half empty improves overall performance since only log N expansions when adding N items linked structures sacrifice space (reference fields) for improved performance e.g., a linked list takes twice the space, but provides O(1) add/remove from either end hash tables can obtain O(1) add/remove/search if willing to waste space to minimize the chance of collisions, must keep the load factor low HashSet & HashMap resize (& rehash) if load factor every reaches 75% 15 Heap sort can utilize a heap to efficiently sort a list of values start with the ArrayList to be sorted construct a heap out of the elements repeatedly, remove min element and put back into the ArrayList public static <E extends Comparable<? super E>> void heapsort(arraylist<e> items) { MinHeap<E> itemheap = new MinHeap<E>(); } for (int i = 0; i < items.size(); i++) { itemheap.add(items.get(i)); } for (int i = 0; i < items.size(); i++) { items.set(i, itemheap.minvalue()); itemheap.remove(); } N items in list, each insertion can require O(log N) swaps to reheapify àconstruct heap in O(N log N) N items in heap, each removal can require O(log N) swap to reheapify à copy back in O(N log N) can get the same effect by copying the values into a TreeSet, then iterating 16 8

Other space vs. time examples we have made similar space/time tradeoffs often in code AnagramFinder (find all anagrams of a given word) can preprocess the dictionary & build a map of words, keyed by sorted letters then, each anagram lookup is O(1) BinaryTree (implement a binary tree & eventually extend to BST) kept track of the number of nodes in a field (updated on each add/remove) turned the O(N) size operation into O(1) HoopsStats (HW1) could redundantly store team stats and individual stats other examples? 17 String matching many useful applications involve searching text for patterns word processing (e.g., global find and replace) data mining (e.g., looking for common parameters) genomics (e.g., searching for gene sequences) TTAAGGACCGCATGCCCTCGAATAGGCTTGAGCTTGCAATTAACGCCCAC in general given a (potentially long) string S and need to find (relatively small) pattern P an obvious, brute force solution exists: O( S P ) however, utilizing preprocessing and additional memory can reduce this to O( S ) in practice, it is often < S HOW CAN THIS BE?!? 18 9

Brute force String: Pattern: BIZ the brute force approach would shift the pattern along, looking for a match: at worst: S - P +1 passes through the pattern each pass requires at most P comparisons è O( S P ) 19 Smart shifting BIZ suppose we look at the rightmost letter in the attempted match here, compare Z in the pattern with O in the string since there is no O in the pattern, can skip the next two comparisons FOOBARBIZBBAZ BIZ again, no R in BIZ, so can skip another two comparisons FOOBARBIZBBAZ BIZ 20 10

Horspool algorithm given a string S and pattern P, 1. Construct a shift table containing each letter of the alphabet. if the letter appears in P (other than in the last index) à store distance from end otherwise, store P e.g., P = BIZ A B C I X Y Z 3 2 3 1 3 3 3 2. Align the pattern with the start of the string S, i.e., with S 0 S 1 S P -1 3. Repeatedly, compare from right-to-left with the aligned sequence S i S i+1 S i+ P -1 if all P letters match, then FOUND. if not, then shift P to the right by shifttable[s i+ P -1 ] if the shift falls off the end, then NOT FOUND 21 Horspool example 1 S = P=BIZ shifttable for "BIZ" A B C I X Y Z 3 2 3 1 3 3 3 BIZ Z and O do not match, so shift shifttable[o] = 3 spots BIZ Z and R do not match, so shift shifttable[r] = 3 spots BIZ pattern is FOUND total number of comparisons = 5 22 11

Horspool example 2 S = "FOZIZBARBIZBAZ" shifttable for "BIZ" FOZIZBARBIZBAZ BIZ P="BIZ" A B C I X Y Z 3 2 3 1 3 3 3 Z and Z match, but not I and O, so shift shifttable[z] = 3 spots FOZIZBARBIZBAZ BIZ Z and B do not match, so shift shifttable[b] = 2 spots FOZIZBARBIZBAZ BIZ Z and R do not match, so shift shifttable[r] = 3 spots FOZIZBARBIZBAZ BIZ pattern is FOUND total number of comparisons = 7 suppose we are looking for BIZ in a string with no Z's how many comparisons? 23 Horspool exercise String: BANDANABABANANAN Pattern: BANANA shift table for BANANA? steps? A B C D N Y Z BANDANABABANANAN BANANA 24 12

Horspool analysis space & time requires storing shift table whose size is the alphabet since the alphabet is usually fixed, table requires O(1) space worst case is O( S P ) this occurs when skips are infrequent & close matches to the pattern appear often for random data, however, only O( S ) Horspool algorithm is a simplification of a more complex (and well-known) algorithm: Boyer-Moore algorithm in practice, Horspool is often faster however, Boyer-Moore has O( S ) worst case, instead of O( S P ) 25 Boyer-Moore algorithm based on two kinds of shifts (both compare right-to-left, find first mismatch) the first is bad-symbol shift (based on the symbol in S that caused the mismatch) FIZBIZ F and B don't match, shift to align F FIZBIZ I and Z don't match, shift to align I FIZBIZ I and Z don't match, shift to align I FIZBIZ F and Z don't match, shift to align F FIZBIZ FOUND 26 13

Bad symbol shift bad symbol table is alphabet P kth row contains shift amount if mismatch occurred at index k (from right) bad symbol table for FIZBIZ: A B C F I Y Z 0 6 2 6 5 1 6-1 5 1 5 4-5 2 2 4-4 3 2 4 1 3 3 3 3 2 1 3-4 2 2 2 1-2 2 FIZBIZ F and B don't match à badsymboltable(f,2) = 3 FIZBIZ I and Z don't match à badsymboltable(i, 0) = 1 FIZBIZ I and Z don't match à badsymboltable(i, 3) = 1 FIZBIZ F and Z don't match à badsymboltable(f, 0) = 5 FIZBIZ FOUND 27 Good suffix shift find the longest suffix that matches if that suffix appears to the left in P preceded by a different char, shift to align if not, then shift the entire length of the word FIZBIZ IZ suffix matches, IZ appears to left so shift to align FIZBIZ no suffix match, so shift 1 spot FIZBIZ BIZ suffix matches, doesn't appear again so full shift FIZBIZ FOUND 28 14

Good suffix shift good suffix shift table is P assume that suffix matches but char to the left does not if that suffix appears to the left preceded by a different char, shift to align if not, then can shift the entire length of the word good suffix table for FIZBIZ IZBIZ ZBIZ BIZ IZ Z 6 6 6 3 6 1 FIZBIZ IZ suffix matches à goodsuffixtable(iz) = 3 FIZBIZ no suffix match à goodsuffixtable() = 1 FIZBIZ BIZ suffix matches à goodsuffixtable(biz) = 6 FIZBIZ FOUND 29 Boyer-Moore string search algorithm 1. calculate the bad symbol and good suffix shift tables 2. while match not found and not off the edge a) compare pattern with string section b) shift1 = bad symbol shift of rightmost non-matching char c) shift2 = good suffix shift of longest matching suffix d) shift string section for comparison by max(shift1, shift2) the algorithm has been proven to require at most 3* S comparisons so, O( S ) in practice, can require fewer than S comparisons requires storing O( P ) bad symbol shift table and O( P ) good suffix shift table 30 15

Boyer-Moore exercise String: BANDANABABANANAN Pattern: BANANA bad symbol table for BANANA? good suffix table for BANANA? A B C D N Y Z 0 1 2 3 4 steps? ANANA NANA ANA NA A BANDANABABANANAN BANANA 31 16