Scan Primitives for GPU Computing
|
|
- Alexis Watkins
- 6 years ago
- Views:
Transcription
1 Scan Primitives for GPU Computing
2 Agenda What is scan A naïve parallel scan algorithm A work-efficient parallel scan algorithm Parallel segmented scan Applications of scan Implementation on CUDA 2
3 Prefix sum Prefix sum Input: An ordered set: [x 0, x 1, x 2, x 3, ] Array or linked-list A binary associative i operator: + Associative: (a+b)+c = a+(b+c) Don t need to be commutative: it s okay that a+b b+a (String concatenation for example.) Identity element exists (additive identity : 0 ) 0+a=a+0=a Examples: add, multiply, max, min, AND, OR, Minkowski sum, etc. Output: [x 0, x 0 +x 1, x 0 +x 1 +x 2, x 0 +x 1 +x 2 +x 3, ] 3
4 Scan Scan is array-based prefix sum Elements are located continuously in the memory Easy for GPU implementation Inclusive scan [x 0, x 0 +x 1, x 0 +x 1 +x 2, x 0 +x 1 +x 2 +x 3, ] Exclusive scan (prescan) [0, x 0, x 0 +x 1, x 0 +x 1 +x 2, ] 4
5 Sequential Implementation of Inclusive Scan Step Complexity: O(n) Total number of steps Work Complexity: O(n) Total number of operations 5
6 Work Complexity Work complexity matters in parallel algorithm Look at a simple parallel problem: copy n 2 data source: destination: 1 2 n n 2 Step complexity = 1; work complexity = n 2 If we have n 2 processors, we only need 1 parallel l step. If we have n processors, each process needs to do n copies. So we need n parallel steps. (Brent 1974) step complexity S, work complexity W, p processors 6 # parallel steps W + S p
7 Agenda What is scan A naïve parallel scan algorithm A work-efficient parallel scan algorithm Parallel segmented scan Applications of scan Implementation on CUDA 7
8 Horn s GPU Implementation
9 Horn s GPU Implementation x i = x i-1 + x i
10 Horn s GPU Implementation x i = x i-1 + x i x i = x i-2 + x i
11 Horn s GPU Implementation x i = x i-1 + x i x i = x i-2 + x i x i = x i-4 + x i
12 If we have 16 numbers x i = x i-1 + x i x i = x i-2 + x i x i = x i-4 + x i x i = x i-8 + x i 12
13 Horn s GPU Implementation Horn, Daniel. Stream reduction operations for GPGPU applications. In GPU Germs 2. Step Complexity: O(logn) Step complexity of sequential algorithm is O(n) Efficient Work Complexity: = (n-1)+(n-2)+(n-4)+ = log n d= d ( n 2 ) 0 = nlogn-n+1 n+1 = O(nlogn) Work complexity of sequential algorithm is O(n) Not work-efficient 13
14 Agenda What is scan A naïve parallel scan algorithm A work-efficient parallel scan algorithm Parallel segmented scan Applications of scan Implementation on CUDA 14
15 A Work-Efficient Scan Algorithm Exclusive Scan Step Complexity: O(logn) Work Complexity: O(n) Use (Implicit) Balanced Binary Tree Just a concept. Don t need to build a real tree. Perform calculations on the array directly. Easy for implementation on GPU. Two passes over the array (tree) Reduce (up-sweep) Down-sweep 15
16 Balanced Binary Tree A binary tree where the depth of all the leaves differs by at most 1. The height of a balanced binary tree is minimized. Take reduction for an example
17 Up-sweep (Balanced Tree) Parent = Left + Right
18 Down-sweep (Balanced Tree) Root =
19 Down-sweep (Balanced Tree) Right = Parent + Left Left = Parent
20 Down-sweep (Balanced Tree) Right = Parent + Left Left = Parent
21 Down-sweep (Balanced Tree) Right = Parent + Left Left = Parent
22 Why It Works? (up-sweep) Parent = Left + Right v subtree(v) After reduction (up-sweep), each node contains the partial sum of its subtree 22
23 Why It Works? (down-sweep) Right = Parent + Left Left = Parent L P R pre(p), pre(l) pre(r) After down-sweep, each node contains the sum of all the leaves that t precede it in the preorder traversal. 23
24 Pre-order traversal Traverse the tree recursively in the following order: Visit the root Visit the left subtree Visit the right subtree post-order and in-order also give a feasible algorithm? Items preceding the green node (9) are marked in yellow. What are items preceding the left (10) and right (13) child of the green node (9)? Right = Parent + Left Left = Parent Why we reset the root to zero at the beginning of down-sweep? 24
25 Why pre-order? Actually in-order and post-order also work! Try to derive their formulas But pre-order gives the simplest formula. Post-order gives a beautiful and concise formula for inclusive scan; however it requires to define a corresponding minus operator, which is not always trivial (like Minkowski sum) In-order also requires minus operator. In addition, its formula involves three continuous levels (parent, child, grandchild) Pre-order doesn t need minus operator, because the children always come after the parent. 25
26 In-place computation without a tree Balanced Binary Tree is just a concept. We don t really build such a tree. All the computations are performed on the array directly. Given an item in the array, we know the exact index of its parent or left/right child. So there is no need to record the parent-child relationship like a tree data structure does. 26
27 Up-sweep (in-place)
28 Up-sweep (in-place) Parent = Left + Right 28
29 Up-sweep (in-place) Parent = Left + Right 29
30 Up-sweep (in-place) Parent = Left + Right 30
31 Down-sweep (in-place)
32 Down-sweep (in-place) Root = 0 32
33 Down-sweep (in-place) Right = Parent + Left Left = Parent 33
34 Down-sweep (in-place) Right = Parent + Left Left = Parent 34
35 Down-sweep (in-place) Right = Parent + Left Left = Parent 35
36 Pseudocode Up-sweep (reduce): Down-sweep: 36
37 Complexity Analysis Step complexity Up-sweep: log(n) Down-sweep: log(n) Work complexity Up-sweep n log n 1 d= 0 d+ 1 2 Down-sweep = n 1 log 1 1 n n + 3 = 1 3 n 2 + d= 0 2 d 37
38 Agenda What is scan A naïve parallel scan algorithm A work-efficient parallel scan algorithm Parallel segmented scan Applications of scan Implementation on CUDA 38
39 Segmented Scan Input array is broken into arbitrary segments; Scan is performed in each segment; Additional input: flag array (1: start of a new segment) Flag: [ ] Data: [ ] => Result: [ ] Still O(n) work complexity and O(logn) step complexity; Only three times slower than scan; Algorithm is slightly different, but more tricky. 39
40 Segmented Scan Scan 40
41 Why segmented scan? Why not scan in each segment independently? Suppose we have m arrays, each of which has n numbers. Scanning them independently takes O(mlogn) step complexity. If we combine them into a single array with mn numbers, and do a segmented scan, it takes only O(logmn) step complexity. Segmented scan has lot of applications, such as quicksort, sparse matrix-vector multiplication, processor allocation, etc. 41
42 Convert segmented scan into scan Given flag array: f i (0 or 1) Given data array: x i Define the array of pairs: c i = [f i, x i ] Define a new binary operator on c i c1 c2 = [ f1, x1] [ f2, x2] { [ f1 f2, x1 + x2 ] if f2 = 0 = [ f f, x ] if f = 1 Prove that is associative The identity element for is (0, 0 ) Now scan over array c i with operator 42
43 Reduce [1,4] [0,2] [0,1] [1,3] [0,0] [0,2] [1,1] [0,5] 43
44 Reduce [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [1,6] [1,4] [0,2] [0,1] [1,3] [0,0] [0,2] [1,1] [0,5] 44
45 Reduce [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [1,6] [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [1,6] [1,4] [0,2] [0,1] [1,3] [0,0] [0,2] [1,1] [0,5] 45
46 Reduce [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [1,6] [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [1,6] [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [1,6] [1,4] [0,2] [0,1] [1,3] [0,0] [0,2] [1,1] [0,5] 46
47 Down-sweep [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [1,6] 47
48 Down-sweep [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [0,0] 48
49 Down-sweep [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [0,0] [1,4] [1,6] [0,1] [0,0] [0,0] [0,2] [1,1] [1,3] 49
50 Down-sweep [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [0,0] [1,4] [1,6] [0,1] [0,0] [0,0] [0,2] [1,1] [1,3] [1,4] [0,0] [0,1] [1,6] [0,0] [1,3] [1,1] [1,5] 50
51 Down-sweep [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [0,0] [1,4] [1,6] [0,1] [0,0] [0,0] [0,2] [1,1] [1,3] [1,4] [0,0] [0,1] [1,6] [0,0] [1,3] [1,1] [1,5] [0,0] [1,4] [1,6] [1,7] [1,3] [1,3] [1,5] [1,1] 51
52 Now What We Get Input: [1,4] [0,2] [0,1] [1,3] [0,0] [0,2] [1,1] [0,5] Output: [0,0] [1,4] [1,6] [1,7] [1,3] [1,3] [1,5] [1,1] Each output value is the exclusive prefix sum ( ). Seems correct. Should be [0,0] But something is WRONG 52
53 How to correct this? Need to cut off the sum across different segments. Note that the root is propagated along the left side the tree. Reset the top node of a subtree to zero when its leftmost leave is the start of a new segment, just like we reset the top root to zero at the beginning of the downsweep. left subtree right subtree Adjustment to down-sweep: P if the ORIGINAL flag at the position to the right of the left child is 1, the right child is L R set to 0. start of a new segment 53
54 Adjustment to Down-sweep Input: [1,4] [0,2] [0,1] [1,3] [0,0] [0,2] [1,1] [0,5] [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [0,0] [1,4] [1,6] [0,1] [0,0] [0,0] [0,2] [1,1] [1,3] [1,4] [0,0] [0,1] [1,6] [0,0] [1,3] [1,1] [1,5] [0,0] [0,0] [1,4] [1,6] [1,7] [1,3] [1,3] [1,5] [1,1] [0,0] [0,0] 54
55 Segmented Scan Algorithm: Revisit Reduce: Down-sweep: Adjustment Scan over (flag, data) pairs 55
56 Agenda What is scan A naïve parallel scan algorithm A work-efficient parallel scan algorithm Parallel segmented scan Applications of scan Implementation on CUDA 56
57 Application Other Primitives Enumerate Distribute Split Rdi Radix Sort Quick Sort Recurrence Equations Sparse Matrix-Vector Multiply Tridiagonal Matrix Solver Multi-precision addition 57
58 Split Data: [ ] Flag: [ ] Result: [ ] Steps: [ ] # e [ ] # f = scan over e 4 # NF = add last elements of e and f [ ] # id [ ] # t = id f + NF [ ] # addr = e? f : t [ ] # out[addr] = in 58
59 Split (hands-on) Data: [ ] Flag: [ ] Result: [ ] Steps: [ ] ] # e [ ] # f = scan over e [ ] # id [ ] # t = id f + NF [ ] # addr = e? f : t [ ] # out[addr] = in # NF = add last elements of e and f 59
60 Agenda What is scan A naïve parallel scan algorithm A work-efficient parallel scan algorithm Parallel segmented scan Applications of scan Implementation on CUDA 60
61 Implementation on CUDA All the threads should be in the same thread block so that they can share memory and synchronize with each other. On G80 GPUs, this limits us to a maximum of 1024 elements. Extend to large arrays by dividing the array into blocks. Each block is scanned by a single thread block. 61
62 Multi-block Segmented Scan First Level Reduce Store last element of each block Set last element of each block Second Level Segmented Scan First Level Down-Sweep 62
63 CUDA Source codes for scan 63
PARALLEL PROCESSING UNIT 3. Dr. Ahmed Sallam
PARALLEL PROCESSING 1 UNIT 3 Dr. Ahmed Sallam FUNDAMENTAL GPU ALGORITHMS More Patterns Reduce Scan 2 OUTLINES Efficiency Measure Reduce primitive Reduce model Reduce Implementation and complexity analysis
More informationScan Algorithm Effects on Parallelism and Memory Conflicts
Scan Algorithm Effects on Parallelism and Memory Conflicts 11 Parallel Prefix Sum (Scan) Definition: The all-prefix-sums operation takes a binary associative operator with identity I, and an array of n
More informationScan Primitives for GPU Computing
Scan Primitives for GPU Computing Shubho Sengupta, Mark Harris *, Yao Zhang, John Owens University of California Davis, *NVIDIA Corporation Motivation Raw compute power and bandwidth of GPUs increasing
More informationData parallel algorithms, algorithmic building blocks, precision vs. accuracy
Data parallel algorithms, algorithmic building blocks, precision vs. accuracy Robert Strzodka Architecture of Computing Systems GPGPU and CUDA Tutorials Dresden, Germany, February 25 2008 2 Overview Parallel
More informationSection 5.5. Left subtree The left subtree of a vertex V on a binary tree is the graph formed by the left child L of V, the descendents
Section 5.5 Binary Tree A binary tree is a rooted tree in which each vertex has at most two children and each child is designated as being a left child or a right child. Thus, in a binary tree, each vertex
More informationGPGPU: Parallel Reduction and Scan
Administrivia GPGPU: Parallel Reduction and Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011 Assignment 3 due Wednesday 11:59pm on Blackboard Assignment 4 handed out Monday, 02/14 Final Wednesday
More informationTrees. (Trees) Data Structures and Programming Spring / 28
Trees (Trees) Data Structures and Programming Spring 2018 1 / 28 Trees A tree is a collection of nodes, which can be empty (recursive definition) If not empty, a tree consists of a distinguished node r
More informationSection 4 SOLUTION: AVL Trees & B-Trees
Section 4 SOLUTION: AVL Trees & B-Trees 1. What 3 properties must an AVL tree have? a. Be a binary tree b. Have Binary Search Tree ordering property (left children < parent, right children > parent) c.
More informationData-Parallel Algorithms on GPUs. Mark Harris NVIDIA Developer Technology
Data-Parallel Algorithms on GPUs Mark Harris NVIDIA Developer Technology Outline Introduction Algorithmic complexity on GPUs Algorithmic Building Blocks Gather & Scatter Reductions Scan (parallel prefix)
More informationEE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 13 Parallelism in Software I
EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 13 Parallelism in Software I Mattan Erez The University of Texas at Austin N EE382N: Parallelilsm and Locality, Spring
More informationSorting and Selection
Sorting and Selection Introduction Divide and Conquer Merge-Sort Quick-Sort Radix-Sort Bucket-Sort 10-1 Introduction Assuming we have a sequence S storing a list of keyelement entries. The key of the element
More informationLecture 7. Transform-and-Conquer
Lecture 7 Transform-and-Conquer 6-1 Transform and Conquer This group of techniques solves a problem by a transformation to a simpler/more convenient instance of the same problem (instance simplification)
More informationBinary Trees. BSTs. For example: Jargon: Data Structures & Algorithms. root node. level: internal node. edge.
Binary Trees 1 A binary tree is either empty, or it consists of a node called the root together with two binary trees called the left subtree and the right subtree of the root, which are disjoint from
More informationCSCI2100B Data Structures Trees
CSCI2100B Data Structures Trees Irwin King king@cse.cuhk.edu.hk http://www.cse.cuhk.edu.hk/~king Department of Computer Science & Engineering The Chinese University of Hong Kong Introduction General Tree
More informationBinary Trees
Binary Trees 4-7-2005 Opening Discussion What did we talk about last class? Do you have any code to show? Do you have any questions about the assignment? What is a Tree? You are all familiar with what
More informationEE 368. Weeks 5 (Notes)
EE 368 Weeks 5 (Notes) 1 Chapter 5: Trees Skip pages 273-281, Section 5.6 - If A is the root of a tree and B is the root of a subtree of that tree, then A is B s parent (or father or mother) and B is A
More information& ( D. " mnp ' ( ) n 3. n 2. ( ) C. " n
CSE Name Test Summer Last Digits of Mav ID # Multiple Choice. Write your answer to the LEFT of each problem. points each. The time to multiply two n " n matrices is: A. " n C. "% n B. " max( m,n, p). The
More informationA6-R3: DATA STRUCTURE THROUGH C LANGUAGE
A6-R3: DATA STRUCTURE THROUGH C LANGUAGE NOTE: 1. There are TWO PARTS in this Module/Paper. PART ONE contains FOUR questions and PART TWO contains FIVE questions. 2. PART ONE is to be answered in the TEAR-OFF
More informationCS584/684 Algorithm Analysis and Design Spring 2017 Week 3: Multithreaded Algorithms
CS584/684 Algorithm Analysis and Design Spring 2017 Week 3: Multithreaded Algorithms Additional sources for this lecture include: 1. Umut Acar and Guy Blelloch, Algorithm Design: Parallel and Sequential
More informationMulti-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25
Multi-way Search Trees (Multi-way Search Trees) Data Structures and Programming Spring 2017 1 / 25 Multi-way Search Trees Each internal node of a multi-way search tree T: has at least two children contains
More informationCS102 Binary Search Trees
CS102 Binary Search Trees Prof Tejada 1 To speed up insertion, removal and search, modify the idea of a Binary Tree to create a Binary Search Tree (BST) Binary Search Trees Binary Search Trees have one
More informationMarch 20/2003 Jayakanth Srinivasan,
Definition : A simple graph G = (V, E) consists of V, a nonempty set of vertices, and E, a set of unordered pairs of distinct elements of V called edges. Definition : In a multigraph G = (V, E) two or
More informationECE 242 Data Structures and Algorithms. Trees IV. Lecture 21. Prof.
ECE 22 Data Structures and Algorithms http://www.ecs.umass.edu/~polizzi/teaching/ece22/ Trees IV Lecture 2 Prof. Eric Polizzi Summary previous lectures Implementations BST 5 5 7 null 8 null null 7 null
More informationBinary Trees, Binary Search Trees
Binary Trees, Binary Search Trees Trees Linear access time of linked lists is prohibitive Does there exist any simple data structure for which the running time of most operations (search, insert, delete)
More informationINF2220: algorithms and data structures Series 1
Universitetet i Oslo Institutt for Informatikk A. Maus, R.K. Runde, I. Yu INF2220: algorithms and data structures Series 1 Topic Trees & estimation of running time (Exercises with hints for solution) Issued:
More informationHeaps Outline and Required Reading: Heaps ( 7.3) COSC 2011, Fall 2003, Section A Instructor: N. Vlajic
1 Heaps Outline and Required Reading: Heaps (.3) COSC 2011, Fall 2003, Section A Instructor: N. Vlajic Heap ADT 2 Heap binary tree (T) that stores a collection of keys at its internal nodes and satisfies
More informationTHE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan
THE B+ TREE INDEX CS 564- Spring 2018 ACKs: Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? The B+ tree index Basics Search/Insertion/Deletion Design & Cost 2 INDEX RECAP We have the following query:
More informationCS 171: Introduction to Computer Science II. Binary Search Trees
CS 171: Introduction to Computer Science II Binary Search Trees Binary Search Trees Symbol table applications BST definitions and terminologies Search and insert Traversal Ordered operations Delete Symbol
More informationData Structure and Algorithm Midterm Reference Solution TA
Data Structure and Algorithm Midterm Reference Solution TA email: dsa1@csie.ntu.edu.tw Problem 1. To prove log 2 n! = Θ(n log n), it suffices to show N N, c 1, c 2 > 0 such that c 1 n ln n ln n! c 2 n
More informationCS 179: GPU Programming. Lecture 7
CS 179: GPU Programming Lecture 7 Week 3 Goals: More involved GPU-accelerable algorithms Relevant hardware quirks CUDA libraries Outline GPU-accelerated: Reduction Prefix sum Stream compaction Sorting(quicksort)
More informationBinary heaps (chapters ) Leftist heaps
Binary heaps (chapters 20.3 20.5) Leftist heaps Binary heaps are arrays! A binary heap is really implemented using an array! 8 18 29 20 28 39 66 Possible because of completeness property 37 26 76 32 74
More informationCSC Design and Analysis of Algorithms. Lecture 8. Transform and Conquer II Algorithm Design Technique. Transform and Conquer
CSC 301- Design and Analysis of Algorithms Lecture Transform and Conuer II Algorithm Design Techniue Transform and Conuer This group of techniues solves a problem by a transformation to a simpler/more
More information(Refer Slide Time 04:53)
Programming and Data Structure Dr.P.P.Chakraborty Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 26 Algorithm Design -1 Having made a preliminary study
More informationECE 498AL. Lecture 12: Application Lessons. When the tires hit the road
ECE 498AL Lecture 12: Application Lessons When the tires hit the road 1 Objective Putting the CUDA performance knowledge to work Plausible strategies may or may not lead to performance enhancement Different
More informationRange Queries. Kuba Karpierz, Bruno Vacherot. March 4, 2016
Range Queries Kuba Karpierz, Bruno Vacherot March 4, 2016 Range query problems are of the following form: Given an array of length n, I will ask q queries. Queries may ask some form of question about a
More informationCSCI-401 Examlet #5. Name: Class: Date: True/False Indicate whether the sentence or statement is true or false.
Name: Class: Date: CSCI-401 Examlet #5 True/False Indicate whether the sentence or statement is true or false. 1. The root node of the standard binary tree can be drawn anywhere in the tree diagram. 2.
More information( ) n 5. Test 1 - Closed Book
Name Test 1 - Closed Book Student ID # Multiple Choice. Write your answer to the LEFT of each problem. points each 1. During which operation on a leftist heap may subtree swaps be needed? A. DECREASE-KEY
More informationComputational Geometry
Range searching and kd-trees 1 Database queries 1D range trees Databases Databases store records or objects Personnel database: Each employee has a name, id code, date of birth, function, salary, start
More informationCE 221 Data Structures and Algorithms
CE 221 Data Structures and Algorithms Chapter 4: Trees (Binary) Text: Read Weiss, 4.1 4.2 Izmir University of Economics 1 Preliminaries - I (Recursive) Definition: A tree is a collection of nodes. The
More informationChapter 4 Trees. Theorem A graph G has a spanning tree if and only if G is connected.
Chapter 4 Trees 4-1 Trees and Spanning Trees Trees, T: A simple, cycle-free, loop-free graph satisfies: If v and w are vertices in T, there is a unique simple path from v to w. Eg. Trees. Spanning trees:
More informationBalanced Search Trees
Balanced Search Trees Computer Science E-22 Harvard Extension School David G. Sullivan, Ph.D. Review: Balanced Trees A tree is balanced if, for each node, the node s subtrees have the same height or have
More informationCSC Design and Analysis of Algorithms. Lecture 8. Transform and Conquer II Algorithm Design Technique. Transform and Conquer
CSC 301- Design and Analysis of Algorithms Lecture Transform and Conquer II Algorithm Design Technique Transform and Conquer This group of techniques solves a problem by a transformation to a simpler/more
More informationBalanced Binary Search Trees. Victor Gao
Balanced Binary Search Trees Victor Gao OUTLINE Binary Heap Revisited BST Revisited Balanced Binary Search Trees Rotation Treap Splay Tree BINARY HEAP: REVIEW A binary heap is a complete binary tree such
More informationPrefix Scan and Minimum Spanning Tree with OpenCL
Prefix Scan and Minimum Spanning Tree with OpenCL U. VIRGINIA DEPT. OF COMP. SCI TECH. REPORT CS-2013-02 Yixin Sun and Kevin Skadron Dept. of Computer Science, University of Virginia ys3kz@virginia.edu,
More information) $ f ( n) " %( g( n)
CSE 0 Name Test Spring 008 Last Digits of Mav ID # Multiple Choice. Write your answer to the LEFT of each problem. points each. The time to compute the sum of the n elements of an integer array is: # A.
More informationLecture Notes for Advanced Algorithms
Lecture Notes for Advanced Algorithms Prof. Bernard Moret September 29, 2011 Notes prepared by Blanc, Eberle, and Jonnalagedda. 1 Average Case Analysis 1.1 Reminders on quicksort and tree sort We start
More informationProgramming II (CS300)
1 Programming II (CS300) Chapter 12: Heaps and Priority Queues MOUNA KACEM mouna@cs.wisc.edu Fall 2018 Heaps and Priority Queues 2 Priority Queues Heaps Priority Queue 3 QueueADT Objects are added and
More informationARC 088/ABC 083. A: Libra. DEGwer 2017/12/23. #include <s t d i o. h> i n t main ( )
ARC 088/ABC 083 DEGwer 2017/12/23 A: Libra A, B, C, D A + B C + D #include i n t main ( ) i n t a, b, c, d ; s c a n f ( %d%d%d%d, &a, &b, &c, &d ) ; i f ( a + b > c + d ) p r i n t f (
More informationGPU Programming. Parallel Patterns. Miaoqing Huang University of Arkansas 1 / 102
1 / 102 GPU Programming Parallel Patterns Miaoqing Huang University of Arkansas 2 / 102 Outline Introduction Reduction All-Prefix-Sums Applications Avoiding Bank Conflicts Segmented Scan Sorting 3 / 102
More informationWarped parallel nearest neighbor searches using kd-trees
Warped parallel nearest neighbor searches using kd-trees Roman Sokolov, Andrei Tchouprakov D4D Technologies Kd-trees Binary space partitioning tree Used for nearest-neighbor search, range search Application:
More informationFriday Four Square! 4:15PM, Outside Gates
Binary Search Trees Friday Four Square! 4:15PM, Outside Gates Implementing Set On Monday and Wednesday, we saw how to implement the Map and Lexicon, respectively. Let's now turn our attention to the Set.
More informationAlgorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs
Algorithms in Systems Engineering ISE 172 Lecture 16 Dr. Ted Ralphs ISE 172 Lecture 16 1 References for Today s Lecture Required reading Sections 6.5-6.7 References CLRS Chapter 22 R. Sedgewick, Algorithms
More informationFINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard
FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 The data of the problem is of 2GB and the hard disk is of 1GB capacity, to solve this problem we should Use better data structures
More informationCSE 530A. B+ Trees. Washington University Fall 2013
CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key
More informationCS301 - Data Structures Glossary By
CS301 - Data Structures Glossary By Abstract Data Type : A set of data values and associated operations that are precisely specified independent of any particular implementation. Also known as ADT Algorithm
More informationLower Bound on Comparison-based Sorting
Lower Bound on Comparison-based Sorting Different sorting algorithms may have different time complexity, how to know whether the running time of an algorithm is best possible? We know of several sorting
More informationCSE100. Advanced Data Structures. Lecture 13. (Based on Paul Kube course materials)
CSE100 Advanced Data Structures Lecture 13 (Based on Paul Kube course materials) CSE 100 Priority Queues in Huffman s algorithm Heaps and Priority Queues Time and space costs of coding with Huffman codes
More informationMID TERM MEGA FILE SOLVED BY VU HELPER Which one of the following statement is NOT correct.
MID TERM MEGA FILE SOLVED BY VU HELPER Which one of the following statement is NOT correct. In linked list the elements are necessarily to be contiguous In linked list the elements may locate at far positions
More informationCSE332: Data Abstractions Lecture 21: Parallel Prefix and Parallel Sorting. Tyler Robison Summer 2010
CSE332: Data Abstractions Lecture 21: Parallel Preix and Parallel Sorting Tyler Robison Summer 2010 1 What next? 2 Done: Now: Simple ways to use parallelism or counting, summing, inding elements Analysis
More informationLecture 6. Binary Search Trees and Red-Black Trees
Lecture Binary Search Trees and Red-Black Trees Sorting out the past couple of weeks: We ve seen a bunch of sorting algorithms InsertionSort - Θ n 2 MergeSort - Θ n log n QuickSort - Θ n log n expected;
More information16/10/2008. Today s menu. PRAM Algorithms. What do you program? What do you expect from a model? Basic ideas for the machine model
Today s menu 1. What do you program? Parallel complexity and algorithms PRAM Algorithms 2. The PRAM Model Definition Metrics and notations Brent s principle A few simple algorithms & concepts» Parallel
More informationComputational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs
Computational Optimization ISE 407 Lecture 16 Dr. Ted Ralphs ISE 407 Lecture 16 1 References for Today s Lecture Required reading Sections 6.5-6.7 References CLRS Chapter 22 R. Sedgewick, Algorithms in
More informationAdvanced Tree Data Structures
Advanced Tree Data Structures Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park Binary trees Traversal order Balance Rotation Multi-way trees Search Insert Overview
More informationAn AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time
B + -TREES MOTIVATION An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations finish within O(log N) time The theoretical conclusion
More informationDay 10. COMP1006/1406 Summer M. Jason Hinek Carleton University
Day 10 COMP1006/1406 Summer 2016 M. Jason Hinek Carleton University today s agenda assignments Only the Project is left! Recursion Again Efficiency 2 last time... recursion... binary trees... 3 binary
More informationCSE506: Operating Systems CSE 506: Operating Systems
CSE 506: Operating Systems Block Cache Address Space Abstraction Given a file, which physical pages store its data? Each file inode has an address space (0 file size) So do block devices that cache data
More informationCSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators)
Name: Email address: Quiz Section: CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators) Instructions: Read the directions for each question carefully before answering. We will
More informationParallel Prefix Sum (Scan) with CUDA. Mark Harris
Parallel Prefix Sum (Scan) with CUDA Mark Harris mharris@nvidia.com March 2009 Document Change History Version Date Responsible Reason for Change February 14, 2007 Mark Harris Initial release March 25,
More informationTrees. Q: Why study trees? A: Many advance ADTs are implemented using tree-based data structures.
Trees Q: Why study trees? : Many advance DTs are implemented using tree-based data structures. Recursive Definition of (Rooted) Tree: Let T be a set with n 0 elements. (i) If n = 0, T is an empty tree,
More informationParallel Patterns Ezio Bartocci
TECHNISCHE UNIVERSITÄT WIEN Fakultät für Informatik Cyber-Physical Systems Group Parallel Patterns Ezio Bartocci Parallel Patterns Think at a higher level than individual CUDA kernels Specify what to compute,
More informationSorting. Bubble Sort. Pseudo Code for Bubble Sorting: Sorting is ordering a list of elements.
Sorting Sorting is ordering a list of elements. Types of sorting: There are many types of algorithms exist based on the following criteria: Based on Complexity Based on Memory usage (Internal & External
More informationBinary Trees. Height 1
Binary Trees Definitions A tree is a finite set of one or more nodes that shows parent-child relationship such that There is a special node called root Remaining nodes are portioned into subsets T1,T2,T3.
More information9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology
Introduction Chapter 4 Trees for large input, even linear access time may be prohibitive we need data structures that exhibit average running times closer to O(log N) binary search tree 2 Terminology recursive
More informationA set of nodes (or vertices) with a single starting point
Binary Search Trees Understand tree terminology Understand and implement tree traversals Define the binary search tree property Implement binary search trees Implement the TreeSort algorithm 2 A set of
More information1 Leaffix Scan, Rootfix Scan, Tree Size, and Depth
Lecture 17 Graph Contraction I: Tree Contraction Parallel and Sequential Data Structures and Algorithms, 15-210 (Spring 2012) Lectured by Kanat Tangwongsan March 20, 2012 In this lecture, we will explore
More informationExercise 1 : B-Trees [ =17pts]
CS - Fall 003 Assignment Due : Thu November 7 (written part), Tue Dec 0 (programming part) Exercise : B-Trees [+++3+=7pts] 3 0 3 3 3 0 Figure : B-Tree. Consider the B-Tree of figure.. What are the values
More informationAnalysis of Algorithms
Algorithm An algorithm is a procedure or formula for solving a problem, based on conducting a sequence of specified actions. A computer program can be viewed as an elaborate algorithm. In mathematics and
More information6. Algorithm Design Techniques
6. Algorithm Design Techniques 6. Algorithm Design Techniques 6.1 Greedy algorithms 6.2 Divide and conquer 6.3 Dynamic Programming 6.4 Randomized Algorithms 6.5 Backtracking Algorithms Malek Mouhoub, CS340
More informationMultiple Choice. Write your answer to the LEFT of each problem. 3 points each
CSE 0-00 Test Spring 0 Name Last 4 Digits of Student ID # Multiple Choice. Write your answer to the LEFT of each problem. points each. Suppose f ( x) is a monotonically increasing function. Which of the
More informationCPS 616 TRANSFORM-AND-CONQUER 7-1
CPS 616 TRANSFORM-AND-CONQUER 7-1 TRANSFORM AND CONQUER Group of techniques to solve a problem by first transforming the problem into one of: 1. a simpler/more convenient instance of the same problem (instance
More informationCSE 4500 (228) Fall 2010 Selected Notes Set 2
CSE 4500 (228) Fall 2010 Selected Notes Set 2 Alexander A. Shvartsman Computer Science and Engineering University of Connecticut Copyright c 2002-2010 by Alexander A. Shvartsman. All rights reserved. 2
More information1 Interlude: Is keeping the data sorted worth it? 2 Tree Heap and Priority queue
TIE-0106 1 1 Interlude: Is keeping the data sorted worth it? When a sorted range is needed, one idea that comes to mind is to keep the data stored in the sorted order as more data comes into the structure
More informationD. Θ nlogn ( ) D. Ο. ). Which of the following is not necessarily true? . Which of the following cannot be shown as an improvement? D.
CSE 0 Name Test Fall 00 Last Digits of Mav ID # Multiple Choice. Write your answer to the LEFT of each problem. points each. The time to convert an array, with priorities stored at subscripts through n,
More informationTrees! Ellen Walker! CPSC 201 Data Structures! Hiram College!
Trees! Ellen Walker! CPSC 201 Data Structures! Hiram College! ADTʼs Weʼve Studied! Position-oriented ADT! List! Stack! Queue! Value-oriented ADT! Sorted list! All of these are linear! One previous item;
More informationComputational Geometry
Windowing queries Windowing Windowing queries Zoom in; re-center and zoom in; select by outlining Windowing Windowing queries Windowing Windowing queries Given a set of n axis-parallel line segments, preprocess
More informationAssignment 1. Stefano Guerra. October 11, The following observation follows directly from the definition of in order and pre order traversal:
Assignment 1 Stefano Guerra October 11, 2016 1 Problem 1 Describe a recursive algorithm to reconstruct an arbitrary binary tree, given its preorder and inorder node sequences as input. First, recall that
More informationAnalysis of Algorithms. Unit 4 - Analysis of well known Algorithms
Analysis of Algorithms Unit 4 - Analysis of well known Algorithms 1 Analysis of well known Algorithms Brute Force Algorithms Greedy Algorithms Divide and Conquer Algorithms Decrease and Conquer Algorithms
More informationCOMP 250 Fall recurrences 2 Oct. 13, 2017
COMP 250 Fall 2017 15 - recurrences 2 Oct. 13, 2017 Here we examine the recurrences for mergesort and quicksort. Mergesort Recall the mergesort algorithm: we divide the list of things to be sorted into
More informationVisualizing Data Structures. Dan Petrisko
Visualizing Data Structures Dan Petrisko What is an Algorithm? A method of completing a function by proceeding from some initial state and input, proceeding through a finite number of well defined steps,
More informationBinary Search Trees. Carlos Moreno uwaterloo.ca EIT https://ece.uwaterloo.ca/~cmoreno/ece250
Carlos Moreno cmoreno @ uwaterloo.ca EIT-4103 https://ece.uwaterloo.ca/~cmoreno/ece250 Previously, on ECE-250... We discussed trees (the general type) and their implementations. We looked at traversals
More informationCS 677: Parallel Programming for Many-core Processors Lecture 6
1 CS 677: Parallel Programming for Many-core Processors Lecture 6 Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Logistics Midterm: March 11
More informationCS 331 DATA STRUCTURES & ALGORITHMS BINARY TREES, THE SEARCH TREE ADT BINARY SEARCH TREES, RED BLACK TREES, THE TREE TRAVERSALS, B TREES WEEK - 7
Ashish Jamuda Week 7 CS 331 DATA STRUCTURES & ALGORITHMS BINARY TREES, THE SEARCH TREE ADT BINARY SEARCH TREES, RED BLACK TREES, THE TREE TRAVERSALS, B TREES OBJECTIVES: Red Black Trees WEEK - 7 RED BLACK
More informationIntroduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree
Chapter 4 Trees 2 Introduction for large input, even access time may be prohibitive we need data structures that exhibit running times closer to O(log N) binary search tree 3 Terminology recursive definition
More informationParallel Programming
Parallel Programming Midterm Exam Wednesday, April 27, 2016 Your points are precious, don t let them go to waste! Your Time All points are not equal. Note that we do not think that all exercises have the
More informationIn the previous presentation, Erik Sintorn presented methods for practically constructing a DAG structure from a voxel data set.
1 In the previous presentation, Erik Sintorn presented methods for practically constructing a DAG structure from a voxel data set. This presentation presents how such a DAG structure can be accessed immediately
More informationIntroduction to Parallel Computing Errata
Introduction to Parallel Computing Errata John C. Kirk 27 November, 2004 Overview Book: Introduction to Parallel Computing, Second Edition, first printing (hardback) ISBN: 0-201-64865-2 Official book website:
More information/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17
601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 5.1 Introduction You should all know a few ways of sorting in O(n log n)
More informationInformation Coding / Computer Graphics, ISY, LiTH
Sorting on GPUs Revisiting some algorithms from lecture 6: Some not-so-good sorting approaches Bitonic sort QuickSort Concurrent kernels and recursion Adapt to parallel algorithms Many sorting algorithms
More informationSearch Trees. Undirected graph Directed graph Tree Binary search tree
Search Trees Undirected graph Directed graph Tree Binary search tree 1 Binary Search Tree Binary search key property: Let x be a node in a binary search tree. If y is a node in the left subtree of x, then
More information( ). Which of ( ) ( ) " #& ( ) " # g( n) ( ) " # f ( n) Test 1
CSE 0 Name Test Summer 006 Last Digits of Student ID # Multiple Choice. Write your answer to the LEFT of each problem. points each. The time to multiply two n x n matrices is: A. "( n) B. "( nlogn) # C.
More information