Scan Primitives for GPU Computing

Size: px
Start display at page:

Download "Scan Primitives for GPU Computing"

Transcription

1 Scan Primitives for GPU Computing

2 Agenda What is scan A naïve parallel scan algorithm A work-efficient parallel scan algorithm Parallel segmented scan Applications of scan Implementation on CUDA 2

3 Prefix sum Prefix sum Input: An ordered set: [x 0, x 1, x 2, x 3, ] Array or linked-list A binary associative i operator: + Associative: (a+b)+c = a+(b+c) Don t need to be commutative: it s okay that a+b b+a (String concatenation for example.) Identity element exists (additive identity : 0 ) 0+a=a+0=a Examples: add, multiply, max, min, AND, OR, Minkowski sum, etc. Output: [x 0, x 0 +x 1, x 0 +x 1 +x 2, x 0 +x 1 +x 2 +x 3, ] 3

4 Scan Scan is array-based prefix sum Elements are located continuously in the memory Easy for GPU implementation Inclusive scan [x 0, x 0 +x 1, x 0 +x 1 +x 2, x 0 +x 1 +x 2 +x 3, ] Exclusive scan (prescan) [0, x 0, x 0 +x 1, x 0 +x 1 +x 2, ] 4

5 Sequential Implementation of Inclusive Scan Step Complexity: O(n) Total number of steps Work Complexity: O(n) Total number of operations 5

6 Work Complexity Work complexity matters in parallel algorithm Look at a simple parallel problem: copy n 2 data source: destination: 1 2 n n 2 Step complexity = 1; work complexity = n 2 If we have n 2 processors, we only need 1 parallel l step. If we have n processors, each process needs to do n copies. So we need n parallel steps. (Brent 1974) step complexity S, work complexity W, p processors 6 # parallel steps W + S p

7 Agenda What is scan A naïve parallel scan algorithm A work-efficient parallel scan algorithm Parallel segmented scan Applications of scan Implementation on CUDA 7

8 Horn s GPU Implementation

9 Horn s GPU Implementation x i = x i-1 + x i

10 Horn s GPU Implementation x i = x i-1 + x i x i = x i-2 + x i

11 Horn s GPU Implementation x i = x i-1 + x i x i = x i-2 + x i x i = x i-4 + x i

12 If we have 16 numbers x i = x i-1 + x i x i = x i-2 + x i x i = x i-4 + x i x i = x i-8 + x i 12

13 Horn s GPU Implementation Horn, Daniel. Stream reduction operations for GPGPU applications. In GPU Germs 2. Step Complexity: O(logn) Step complexity of sequential algorithm is O(n) Efficient Work Complexity: = (n-1)+(n-2)+(n-4)+ = log n d= d ( n 2 ) 0 = nlogn-n+1 n+1 = O(nlogn) Work complexity of sequential algorithm is O(n) Not work-efficient 13

14 Agenda What is scan A naïve parallel scan algorithm A work-efficient parallel scan algorithm Parallel segmented scan Applications of scan Implementation on CUDA 14

15 A Work-Efficient Scan Algorithm Exclusive Scan Step Complexity: O(logn) Work Complexity: O(n) Use (Implicit) Balanced Binary Tree Just a concept. Don t need to build a real tree. Perform calculations on the array directly. Easy for implementation on GPU. Two passes over the array (tree) Reduce (up-sweep) Down-sweep 15

16 Balanced Binary Tree A binary tree where the depth of all the leaves differs by at most 1. The height of a balanced binary tree is minimized. Take reduction for an example

17 Up-sweep (Balanced Tree) Parent = Left + Right

18 Down-sweep (Balanced Tree) Root =

19 Down-sweep (Balanced Tree) Right = Parent + Left Left = Parent

20 Down-sweep (Balanced Tree) Right = Parent + Left Left = Parent

21 Down-sweep (Balanced Tree) Right = Parent + Left Left = Parent

22 Why It Works? (up-sweep) Parent = Left + Right v subtree(v) After reduction (up-sweep), each node contains the partial sum of its subtree 22

23 Why It Works? (down-sweep) Right = Parent + Left Left = Parent L P R pre(p), pre(l) pre(r) After down-sweep, each node contains the sum of all the leaves that t precede it in the preorder traversal. 23

24 Pre-order traversal Traverse the tree recursively in the following order: Visit the root Visit the left subtree Visit the right subtree post-order and in-order also give a feasible algorithm? Items preceding the green node (9) are marked in yellow. What are items preceding the left (10) and right (13) child of the green node (9)? Right = Parent + Left Left = Parent Why we reset the root to zero at the beginning of down-sweep? 24

25 Why pre-order? Actually in-order and post-order also work! Try to derive their formulas But pre-order gives the simplest formula. Post-order gives a beautiful and concise formula for inclusive scan; however it requires to define a corresponding minus operator, which is not always trivial (like Minkowski sum) In-order also requires minus operator. In addition, its formula involves three continuous levels (parent, child, grandchild) Pre-order doesn t need minus operator, because the children always come after the parent. 25

26 In-place computation without a tree Balanced Binary Tree is just a concept. We don t really build such a tree. All the computations are performed on the array directly. Given an item in the array, we know the exact index of its parent or left/right child. So there is no need to record the parent-child relationship like a tree data structure does. 26

27 Up-sweep (in-place)

28 Up-sweep (in-place) Parent = Left + Right 28

29 Up-sweep (in-place) Parent = Left + Right 29

30 Up-sweep (in-place) Parent = Left + Right 30

31 Down-sweep (in-place)

32 Down-sweep (in-place) Root = 0 32

33 Down-sweep (in-place) Right = Parent + Left Left = Parent 33

34 Down-sweep (in-place) Right = Parent + Left Left = Parent 34

35 Down-sweep (in-place) Right = Parent + Left Left = Parent 35

36 Pseudocode Up-sweep (reduce): Down-sweep: 36

37 Complexity Analysis Step complexity Up-sweep: log(n) Down-sweep: log(n) Work complexity Up-sweep n log n 1 d= 0 d+ 1 2 Down-sweep = n 1 log 1 1 n n + 3 = 1 3 n 2 + d= 0 2 d 37

38 Agenda What is scan A naïve parallel scan algorithm A work-efficient parallel scan algorithm Parallel segmented scan Applications of scan Implementation on CUDA 38

39 Segmented Scan Input array is broken into arbitrary segments; Scan is performed in each segment; Additional input: flag array (1: start of a new segment) Flag: [ ] Data: [ ] => Result: [ ] Still O(n) work complexity and O(logn) step complexity; Only three times slower than scan; Algorithm is slightly different, but more tricky. 39

40 Segmented Scan Scan 40

41 Why segmented scan? Why not scan in each segment independently? Suppose we have m arrays, each of which has n numbers. Scanning them independently takes O(mlogn) step complexity. If we combine them into a single array with mn numbers, and do a segmented scan, it takes only O(logmn) step complexity. Segmented scan has lot of applications, such as quicksort, sparse matrix-vector multiplication, processor allocation, etc. 41

42 Convert segmented scan into scan Given flag array: f i (0 or 1) Given data array: x i Define the array of pairs: c i = [f i, x i ] Define a new binary operator on c i c1 c2 = [ f1, x1] [ f2, x2] { [ f1 f2, x1 + x2 ] if f2 = 0 = [ f f, x ] if f = 1 Prove that is associative The identity element for is (0, 0 ) Now scan over array c i with operator 42

43 Reduce [1,4] [0,2] [0,1] [1,3] [0,0] [0,2] [1,1] [0,5] 43

44 Reduce [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [1,6] [1,4] [0,2] [0,1] [1,3] [0,0] [0,2] [1,1] [0,5] 44

45 Reduce [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [1,6] [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [1,6] [1,4] [0,2] [0,1] [1,3] [0,0] [0,2] [1,1] [0,5] 45

46 Reduce [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [1,6] [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [1,6] [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [1,6] [1,4] [0,2] [0,1] [1,3] [0,0] [0,2] [1,1] [0,5] 46

47 Down-sweep [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [1,6] 47

48 Down-sweep [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [0,0] 48

49 Down-sweep [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [0,0] [1,4] [1,6] [0,1] [0,0] [0,0] [0,2] [1,1] [1,3] 49

50 Down-sweep [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [0,0] [1,4] [1,6] [0,1] [0,0] [0,0] [0,2] [1,1] [1,3] [1,4] [0,0] [0,1] [1,6] [0,0] [1,3] [1,1] [1,5] 50

51 Down-sweep [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [0,0] [1,4] [1,6] [0,1] [0,0] [0,0] [0,2] [1,1] [1,3] [1,4] [0,0] [0,1] [1,6] [0,0] [1,3] [1,1] [1,5] [0,0] [1,4] [1,6] [1,7] [1,3] [1,3] [1,5] [1,1] 51

52 Now What We Get Input: [1,4] [0,2] [0,1] [1,3] [0,0] [0,2] [1,1] [0,5] Output: [0,0] [1,4] [1,6] [1,7] [1,3] [1,3] [1,5] [1,1] Each output value is the exclusive prefix sum ( ). Seems correct. Should be [0,0] But something is WRONG 52

53 How to correct this? Need to cut off the sum across different segments. Note that the root is propagated along the left side the tree. Reset the top node of a subtree to zero when its leftmost leave is the start of a new segment, just like we reset the top root to zero at the beginning of the downsweep. left subtree right subtree Adjustment to down-sweep: P if the ORIGINAL flag at the position to the right of the left child is 1, the right child is L R set to 0. start of a new segment 53

54 Adjustment to Down-sweep Input: [1,4] [0,2] [0,1] [1,3] [0,0] [0,2] [1,1] [0,5] [1,4] [1,6] [0,1] [1,3] [0,0] [0,2] [1,1] [0,0] [1,4] [1,6] [0,1] [0,0] [0,0] [0,2] [1,1] [1,3] [1,4] [0,0] [0,1] [1,6] [0,0] [1,3] [1,1] [1,5] [0,0] [0,0] [1,4] [1,6] [1,7] [1,3] [1,3] [1,5] [1,1] [0,0] [0,0] 54

55 Segmented Scan Algorithm: Revisit Reduce: Down-sweep: Adjustment Scan over (flag, data) pairs 55

56 Agenda What is scan A naïve parallel scan algorithm A work-efficient parallel scan algorithm Parallel segmented scan Applications of scan Implementation on CUDA 56

57 Application Other Primitives Enumerate Distribute Split Rdi Radix Sort Quick Sort Recurrence Equations Sparse Matrix-Vector Multiply Tridiagonal Matrix Solver Multi-precision addition 57

58 Split Data: [ ] Flag: [ ] Result: [ ] Steps: [ ] # e [ ] # f = scan over e 4 # NF = add last elements of e and f [ ] # id [ ] # t = id f + NF [ ] # addr = e? f : t [ ] # out[addr] = in 58

59 Split (hands-on) Data: [ ] Flag: [ ] Result: [ ] Steps: [ ] ] # e [ ] # f = scan over e [ ] # id [ ] # t = id f + NF [ ] # addr = e? f : t [ ] # out[addr] = in # NF = add last elements of e and f 59

60 Agenda What is scan A naïve parallel scan algorithm A work-efficient parallel scan algorithm Parallel segmented scan Applications of scan Implementation on CUDA 60

61 Implementation on CUDA All the threads should be in the same thread block so that they can share memory and synchronize with each other. On G80 GPUs, this limits us to a maximum of 1024 elements. Extend to large arrays by dividing the array into blocks. Each block is scanned by a single thread block. 61

62 Multi-block Segmented Scan First Level Reduce Store last element of each block Set last element of each block Second Level Segmented Scan First Level Down-Sweep 62

63 CUDA Source codes for scan 63

PARALLEL PROCESSING UNIT 3. Dr. Ahmed Sallam

PARALLEL PROCESSING UNIT 3. Dr. Ahmed Sallam PARALLEL PROCESSING 1 UNIT 3 Dr. Ahmed Sallam FUNDAMENTAL GPU ALGORITHMS More Patterns Reduce Scan 2 OUTLINES Efficiency Measure Reduce primitive Reduce model Reduce Implementation and complexity analysis

More information

Scan Algorithm Effects on Parallelism and Memory Conflicts

Scan Algorithm Effects on Parallelism and Memory Conflicts Scan Algorithm Effects on Parallelism and Memory Conflicts 11 Parallel Prefix Sum (Scan) Definition: The all-prefix-sums operation takes a binary associative operator with identity I, and an array of n

More information

Scan Primitives for GPU Computing

Scan Primitives for GPU Computing Scan Primitives for GPU Computing Shubho Sengupta, Mark Harris *, Yao Zhang, John Owens University of California Davis, *NVIDIA Corporation Motivation Raw compute power and bandwidth of GPUs increasing

More information

Data parallel algorithms, algorithmic building blocks, precision vs. accuracy

Data parallel algorithms, algorithmic building blocks, precision vs. accuracy Data parallel algorithms, algorithmic building blocks, precision vs. accuracy Robert Strzodka Architecture of Computing Systems GPGPU and CUDA Tutorials Dresden, Germany, February 25 2008 2 Overview Parallel

More information

Section 5.5. Left subtree The left subtree of a vertex V on a binary tree is the graph formed by the left child L of V, the descendents

Section 5.5. Left subtree The left subtree of a vertex V on a binary tree is the graph formed by the left child L of V, the descendents Section 5.5 Binary Tree A binary tree is a rooted tree in which each vertex has at most two children and each child is designated as being a left child or a right child. Thus, in a binary tree, each vertex

More information

GPGPU: Parallel Reduction and Scan

GPGPU: Parallel Reduction and Scan Administrivia GPGPU: Parallel Reduction and Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011 Assignment 3 due Wednesday 11:59pm on Blackboard Assignment 4 handed out Monday, 02/14 Final Wednesday

More information

Trees. (Trees) Data Structures and Programming Spring / 28

Trees. (Trees) Data Structures and Programming Spring / 28 Trees (Trees) Data Structures and Programming Spring 2018 1 / 28 Trees A tree is a collection of nodes, which can be empty (recursive definition) If not empty, a tree consists of a distinguished node r

More information

Section 4 SOLUTION: AVL Trees & B-Trees

Section 4 SOLUTION: AVL Trees & B-Trees Section 4 SOLUTION: AVL Trees & B-Trees 1. What 3 properties must an AVL tree have? a. Be a binary tree b. Have Binary Search Tree ordering property (left children < parent, right children > parent) c.

More information

Data-Parallel Algorithms on GPUs. Mark Harris NVIDIA Developer Technology

Data-Parallel Algorithms on GPUs. Mark Harris NVIDIA Developer Technology Data-Parallel Algorithms on GPUs Mark Harris NVIDIA Developer Technology Outline Introduction Algorithmic complexity on GPUs Algorithmic Building Blocks Gather & Scatter Reductions Scan (parallel prefix)

More information

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 13 Parallelism in Software I

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 13 Parallelism in Software I EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 13 Parallelism in Software I Mattan Erez The University of Texas at Austin N EE382N: Parallelilsm and Locality, Spring

More information

Sorting and Selection

Sorting and Selection Sorting and Selection Introduction Divide and Conquer Merge-Sort Quick-Sort Radix-Sort Bucket-Sort 10-1 Introduction Assuming we have a sequence S storing a list of keyelement entries. The key of the element

More information

Lecture 7. Transform-and-Conquer

Lecture 7. Transform-and-Conquer Lecture 7 Transform-and-Conquer 6-1 Transform and Conquer This group of techniques solves a problem by a transformation to a simpler/more convenient instance of the same problem (instance simplification)

More information

Binary Trees. BSTs. For example: Jargon: Data Structures & Algorithms. root node. level: internal node. edge.

Binary Trees. BSTs. For example: Jargon: Data Structures & Algorithms. root node. level: internal node. edge. Binary Trees 1 A binary tree is either empty, or it consists of a node called the root together with two binary trees called the left subtree and the right subtree of the root, which are disjoint from

More information

CSCI2100B Data Structures Trees

CSCI2100B Data Structures Trees CSCI2100B Data Structures Trees Irwin King king@cse.cuhk.edu.hk http://www.cse.cuhk.edu.hk/~king Department of Computer Science & Engineering The Chinese University of Hong Kong Introduction General Tree

More information

Binary Trees

Binary Trees Binary Trees 4-7-2005 Opening Discussion What did we talk about last class? Do you have any code to show? Do you have any questions about the assignment? What is a Tree? You are all familiar with what

More information

EE 368. Weeks 5 (Notes)

EE 368. Weeks 5 (Notes) EE 368 Weeks 5 (Notes) 1 Chapter 5: Trees Skip pages 273-281, Section 5.6 - If A is the root of a tree and B is the root of a subtree of that tree, then A is B s parent (or father or mother) and B is A

More information

& ( D. " mnp ' ( ) n 3. n 2. ( ) C. " n

& ( D.  mnp ' ( ) n 3. n 2. ( ) C.  n CSE Name Test Summer Last Digits of Mav ID # Multiple Choice. Write your answer to the LEFT of each problem. points each. The time to multiply two n " n matrices is: A. " n C. "% n B. " max( m,n, p). The

More information

A6-R3: DATA STRUCTURE THROUGH C LANGUAGE

A6-R3: DATA STRUCTURE THROUGH C LANGUAGE A6-R3: DATA STRUCTURE THROUGH C LANGUAGE NOTE: 1. There are TWO PARTS in this Module/Paper. PART ONE contains FOUR questions and PART TWO contains FIVE questions. 2. PART ONE is to be answered in the TEAR-OFF

More information

CS584/684 Algorithm Analysis and Design Spring 2017 Week 3: Multithreaded Algorithms

CS584/684 Algorithm Analysis and Design Spring 2017 Week 3: Multithreaded Algorithms CS584/684 Algorithm Analysis and Design Spring 2017 Week 3: Multithreaded Algorithms Additional sources for this lecture include: 1. Umut Acar and Guy Blelloch, Algorithm Design: Parallel and Sequential

More information

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25 Multi-way Search Trees (Multi-way Search Trees) Data Structures and Programming Spring 2017 1 / 25 Multi-way Search Trees Each internal node of a multi-way search tree T: has at least two children contains

More information

CS102 Binary Search Trees

CS102 Binary Search Trees CS102 Binary Search Trees Prof Tejada 1 To speed up insertion, removal and search, modify the idea of a Binary Tree to create a Binary Search Tree (BST) Binary Search Trees Binary Search Trees have one

More information

March 20/2003 Jayakanth Srinivasan,

March 20/2003 Jayakanth Srinivasan, Definition : A simple graph G = (V, E) consists of V, a nonempty set of vertices, and E, a set of unordered pairs of distinct elements of V called edges. Definition : In a multigraph G = (V, E) two or

More information

ECE 242 Data Structures and Algorithms. Trees IV. Lecture 21. Prof.

ECE 242 Data Structures and Algorithms.  Trees IV. Lecture 21. Prof. ECE 22 Data Structures and Algorithms http://www.ecs.umass.edu/~polizzi/teaching/ece22/ Trees IV Lecture 2 Prof. Eric Polizzi Summary previous lectures Implementations BST 5 5 7 null 8 null null 7 null

More information

Binary Trees, Binary Search Trees

Binary Trees, Binary Search Trees Binary Trees, Binary Search Trees Trees Linear access time of linked lists is prohibitive Does there exist any simple data structure for which the running time of most operations (search, insert, delete)

More information

INF2220: algorithms and data structures Series 1

INF2220: algorithms and data structures Series 1 Universitetet i Oslo Institutt for Informatikk A. Maus, R.K. Runde, I. Yu INF2220: algorithms and data structures Series 1 Topic Trees & estimation of running time (Exercises with hints for solution) Issued:

More information

Heaps Outline and Required Reading: Heaps ( 7.3) COSC 2011, Fall 2003, Section A Instructor: N. Vlajic

Heaps Outline and Required Reading: Heaps ( 7.3) COSC 2011, Fall 2003, Section A Instructor: N. Vlajic 1 Heaps Outline and Required Reading: Heaps (.3) COSC 2011, Fall 2003, Section A Instructor: N. Vlajic Heap ADT 2 Heap binary tree (T) that stores a collection of keys at its internal nodes and satisfies

More information

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan THE B+ TREE INDEX CS 564- Spring 2018 ACKs: Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? The B+ tree index Basics Search/Insertion/Deletion Design & Cost 2 INDEX RECAP We have the following query:

More information

CS 171: Introduction to Computer Science II. Binary Search Trees

CS 171: Introduction to Computer Science II. Binary Search Trees CS 171: Introduction to Computer Science II Binary Search Trees Binary Search Trees Symbol table applications BST definitions and terminologies Search and insert Traversal Ordered operations Delete Symbol

More information

Data Structure and Algorithm Midterm Reference Solution TA

Data Structure and Algorithm Midterm Reference Solution TA Data Structure and Algorithm Midterm Reference Solution TA email: dsa1@csie.ntu.edu.tw Problem 1. To prove log 2 n! = Θ(n log n), it suffices to show N N, c 1, c 2 > 0 such that c 1 n ln n ln n! c 2 n

More information

CS 179: GPU Programming. Lecture 7

CS 179: GPU Programming. Lecture 7 CS 179: GPU Programming Lecture 7 Week 3 Goals: More involved GPU-accelerable algorithms Relevant hardware quirks CUDA libraries Outline GPU-accelerated: Reduction Prefix sum Stream compaction Sorting(quicksort)

More information

Binary heaps (chapters ) Leftist heaps

Binary heaps (chapters ) Leftist heaps Binary heaps (chapters 20.3 20.5) Leftist heaps Binary heaps are arrays! A binary heap is really implemented using an array! 8 18 29 20 28 39 66 Possible because of completeness property 37 26 76 32 74

More information

CSC Design and Analysis of Algorithms. Lecture 8. Transform and Conquer II Algorithm Design Technique. Transform and Conquer

CSC Design and Analysis of Algorithms. Lecture 8. Transform and Conquer II Algorithm Design Technique. Transform and Conquer CSC 301- Design and Analysis of Algorithms Lecture Transform and Conuer II Algorithm Design Techniue Transform and Conuer This group of techniues solves a problem by a transformation to a simpler/more

More information

(Refer Slide Time 04:53)

(Refer Slide Time 04:53) Programming and Data Structure Dr.P.P.Chakraborty Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 26 Algorithm Design -1 Having made a preliminary study

More information

ECE 498AL. Lecture 12: Application Lessons. When the tires hit the road

ECE 498AL. Lecture 12: Application Lessons. When the tires hit the road ECE 498AL Lecture 12: Application Lessons When the tires hit the road 1 Objective Putting the CUDA performance knowledge to work Plausible strategies may or may not lead to performance enhancement Different

More information

Range Queries. Kuba Karpierz, Bruno Vacherot. March 4, 2016

Range Queries. Kuba Karpierz, Bruno Vacherot. March 4, 2016 Range Queries Kuba Karpierz, Bruno Vacherot March 4, 2016 Range query problems are of the following form: Given an array of length n, I will ask q queries. Queries may ask some form of question about a

More information

CSCI-401 Examlet #5. Name: Class: Date: True/False Indicate whether the sentence or statement is true or false.

CSCI-401 Examlet #5. Name: Class: Date: True/False Indicate whether the sentence or statement is true or false. Name: Class: Date: CSCI-401 Examlet #5 True/False Indicate whether the sentence or statement is true or false. 1. The root node of the standard binary tree can be drawn anywhere in the tree diagram. 2.

More information

( ) n 5. Test 1 - Closed Book

( ) n 5. Test 1 - Closed Book Name Test 1 - Closed Book Student ID # Multiple Choice. Write your answer to the LEFT of each problem. points each 1. During which operation on a leftist heap may subtree swaps be needed? A. DECREASE-KEY

More information

Computational Geometry

Computational Geometry Range searching and kd-trees 1 Database queries 1D range trees Databases Databases store records or objects Personnel database: Each employee has a name, id code, date of birth, function, salary, start

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 221 Data Structures and Algorithms Chapter 4: Trees (Binary) Text: Read Weiss, 4.1 4.2 Izmir University of Economics 1 Preliminaries - I (Recursive) Definition: A tree is a collection of nodes. The

More information

Chapter 4 Trees. Theorem A graph G has a spanning tree if and only if G is connected.

Chapter 4 Trees. Theorem A graph G has a spanning tree if and only if G is connected. Chapter 4 Trees 4-1 Trees and Spanning Trees Trees, T: A simple, cycle-free, loop-free graph satisfies: If v and w are vertices in T, there is a unique simple path from v to w. Eg. Trees. Spanning trees:

More information

Balanced Search Trees

Balanced Search Trees Balanced Search Trees Computer Science E-22 Harvard Extension School David G. Sullivan, Ph.D. Review: Balanced Trees A tree is balanced if, for each node, the node s subtrees have the same height or have

More information

CSC Design and Analysis of Algorithms. Lecture 8. Transform and Conquer II Algorithm Design Technique. Transform and Conquer

CSC Design and Analysis of Algorithms. Lecture 8. Transform and Conquer II Algorithm Design Technique. Transform and Conquer CSC 301- Design and Analysis of Algorithms Lecture Transform and Conquer II Algorithm Design Technique Transform and Conquer This group of techniques solves a problem by a transformation to a simpler/more

More information

Balanced Binary Search Trees. Victor Gao

Balanced Binary Search Trees. Victor Gao Balanced Binary Search Trees Victor Gao OUTLINE Binary Heap Revisited BST Revisited Balanced Binary Search Trees Rotation Treap Splay Tree BINARY HEAP: REVIEW A binary heap is a complete binary tree such

More information

Prefix Scan and Minimum Spanning Tree with OpenCL

Prefix Scan and Minimum Spanning Tree with OpenCL Prefix Scan and Minimum Spanning Tree with OpenCL U. VIRGINIA DEPT. OF COMP. SCI TECH. REPORT CS-2013-02 Yixin Sun and Kevin Skadron Dept. of Computer Science, University of Virginia ys3kz@virginia.edu,

More information

) $ f ( n) " %( g( n)

) $ f ( n)  %( g( n) CSE 0 Name Test Spring 008 Last Digits of Mav ID # Multiple Choice. Write your answer to the LEFT of each problem. points each. The time to compute the sum of the n elements of an integer array is: # A.

More information

Lecture Notes for Advanced Algorithms

Lecture Notes for Advanced Algorithms Lecture Notes for Advanced Algorithms Prof. Bernard Moret September 29, 2011 Notes prepared by Blanc, Eberle, and Jonnalagedda. 1 Average Case Analysis 1.1 Reminders on quicksort and tree sort We start

More information

Programming II (CS300)

Programming II (CS300) 1 Programming II (CS300) Chapter 12: Heaps and Priority Queues MOUNA KACEM mouna@cs.wisc.edu Fall 2018 Heaps and Priority Queues 2 Priority Queues Heaps Priority Queue 3 QueueADT Objects are added and

More information

ARC 088/ABC 083. A: Libra. DEGwer 2017/12/23. #include <s t d i o. h> i n t main ( )

ARC 088/ABC 083. A: Libra. DEGwer 2017/12/23. #include <s t d i o. h> i n t main ( ) ARC 088/ABC 083 DEGwer 2017/12/23 A: Libra A, B, C, D A + B C + D #include i n t main ( ) i n t a, b, c, d ; s c a n f ( %d%d%d%d, &a, &b, &c, &d ) ; i f ( a + b > c + d ) p r i n t f (

More information

GPU Programming. Parallel Patterns. Miaoqing Huang University of Arkansas 1 / 102

GPU Programming. Parallel Patterns. Miaoqing Huang University of Arkansas 1 / 102 1 / 102 GPU Programming Parallel Patterns Miaoqing Huang University of Arkansas 2 / 102 Outline Introduction Reduction All-Prefix-Sums Applications Avoiding Bank Conflicts Segmented Scan Sorting 3 / 102

More information

Warped parallel nearest neighbor searches using kd-trees

Warped parallel nearest neighbor searches using kd-trees Warped parallel nearest neighbor searches using kd-trees Roman Sokolov, Andrei Tchouprakov D4D Technologies Kd-trees Binary space partitioning tree Used for nearest-neighbor search, range search Application:

More information

Friday Four Square! 4:15PM, Outside Gates

Friday Four Square! 4:15PM, Outside Gates Binary Search Trees Friday Four Square! 4:15PM, Outside Gates Implementing Set On Monday and Wednesday, we saw how to implement the Map and Lexicon, respectively. Let's now turn our attention to the Set.

More information

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs Algorithms in Systems Engineering ISE 172 Lecture 16 Dr. Ted Ralphs ISE 172 Lecture 16 1 References for Today s Lecture Required reading Sections 6.5-6.7 References CLRS Chapter 22 R. Sedgewick, Algorithms

More information

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 The data of the problem is of 2GB and the hard disk is of 1GB capacity, to solve this problem we should Use better data structures

More information

CSE 530A. B+ Trees. Washington University Fall 2013

CSE 530A. B+ Trees. Washington University Fall 2013 CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key

More information

CS301 - Data Structures Glossary By

CS301 - Data Structures Glossary By CS301 - Data Structures Glossary By Abstract Data Type : A set of data values and associated operations that are precisely specified independent of any particular implementation. Also known as ADT Algorithm

More information

Lower Bound on Comparison-based Sorting

Lower Bound on Comparison-based Sorting Lower Bound on Comparison-based Sorting Different sorting algorithms may have different time complexity, how to know whether the running time of an algorithm is best possible? We know of several sorting

More information

CSE100. Advanced Data Structures. Lecture 13. (Based on Paul Kube course materials)

CSE100. Advanced Data Structures. Lecture 13. (Based on Paul Kube course materials) CSE100 Advanced Data Structures Lecture 13 (Based on Paul Kube course materials) CSE 100 Priority Queues in Huffman s algorithm Heaps and Priority Queues Time and space costs of coding with Huffman codes

More information

MID TERM MEGA FILE SOLVED BY VU HELPER Which one of the following statement is NOT correct.

MID TERM MEGA FILE SOLVED BY VU HELPER Which one of the following statement is NOT correct. MID TERM MEGA FILE SOLVED BY VU HELPER Which one of the following statement is NOT correct. In linked list the elements are necessarily to be contiguous In linked list the elements may locate at far positions

More information

CSE332: Data Abstractions Lecture 21: Parallel Prefix and Parallel Sorting. Tyler Robison Summer 2010

CSE332: Data Abstractions Lecture 21: Parallel Prefix and Parallel Sorting. Tyler Robison Summer 2010 CSE332: Data Abstractions Lecture 21: Parallel Preix and Parallel Sorting Tyler Robison Summer 2010 1 What next? 2 Done: Now: Simple ways to use parallelism or counting, summing, inding elements Analysis

More information

Lecture 6. Binary Search Trees and Red-Black Trees

Lecture 6. Binary Search Trees and Red-Black Trees Lecture Binary Search Trees and Red-Black Trees Sorting out the past couple of weeks: We ve seen a bunch of sorting algorithms InsertionSort - Θ n 2 MergeSort - Θ n log n QuickSort - Θ n log n expected;

More information

16/10/2008. Today s menu. PRAM Algorithms. What do you program? What do you expect from a model? Basic ideas for the machine model

16/10/2008. Today s menu. PRAM Algorithms. What do you program? What do you expect from a model? Basic ideas for the machine model Today s menu 1. What do you program? Parallel complexity and algorithms PRAM Algorithms 2. The PRAM Model Definition Metrics and notations Brent s principle A few simple algorithms & concepts» Parallel

More information

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs Computational Optimization ISE 407 Lecture 16 Dr. Ted Ralphs ISE 407 Lecture 16 1 References for Today s Lecture Required reading Sections 6.5-6.7 References CLRS Chapter 22 R. Sedgewick, Algorithms in

More information

Advanced Tree Data Structures

Advanced Tree Data Structures Advanced Tree Data Structures Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park Binary trees Traversal order Balance Rotation Multi-way trees Search Insert Overview

More information

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time B + -TREES MOTIVATION An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations finish within O(log N) time The theoretical conclusion

More information

Day 10. COMP1006/1406 Summer M. Jason Hinek Carleton University

Day 10. COMP1006/1406 Summer M. Jason Hinek Carleton University Day 10 COMP1006/1406 Summer 2016 M. Jason Hinek Carleton University today s agenda assignments Only the Project is left! Recursion Again Efficiency 2 last time... recursion... binary trees... 3 binary

More information

CSE506: Operating Systems CSE 506: Operating Systems

CSE506: Operating Systems CSE 506: Operating Systems CSE 506: Operating Systems Block Cache Address Space Abstraction Given a file, which physical pages store its data? Each file inode has an address space (0 file size) So do block devices that cache data

More information

CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators)

CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators) Name: Email address: Quiz Section: CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators) Instructions: Read the directions for each question carefully before answering. We will

More information

Parallel Prefix Sum (Scan) with CUDA. Mark Harris

Parallel Prefix Sum (Scan) with CUDA. Mark Harris Parallel Prefix Sum (Scan) with CUDA Mark Harris mharris@nvidia.com March 2009 Document Change History Version Date Responsible Reason for Change February 14, 2007 Mark Harris Initial release March 25,

More information

Trees. Q: Why study trees? A: Many advance ADTs are implemented using tree-based data structures.

Trees. Q: Why study trees? A: Many advance ADTs are implemented using tree-based data structures. Trees Q: Why study trees? : Many advance DTs are implemented using tree-based data structures. Recursive Definition of (Rooted) Tree: Let T be a set with n 0 elements. (i) If n = 0, T is an empty tree,

More information

Parallel Patterns Ezio Bartocci

Parallel Patterns Ezio Bartocci TECHNISCHE UNIVERSITÄT WIEN Fakultät für Informatik Cyber-Physical Systems Group Parallel Patterns Ezio Bartocci Parallel Patterns Think at a higher level than individual CUDA kernels Specify what to compute,

More information

Sorting. Bubble Sort. Pseudo Code for Bubble Sorting: Sorting is ordering a list of elements.

Sorting. Bubble Sort. Pseudo Code for Bubble Sorting: Sorting is ordering a list of elements. Sorting Sorting is ordering a list of elements. Types of sorting: There are many types of algorithms exist based on the following criteria: Based on Complexity Based on Memory usage (Internal & External

More information

Binary Trees. Height 1

Binary Trees. Height 1 Binary Trees Definitions A tree is a finite set of one or more nodes that shows parent-child relationship such that There is a special node called root Remaining nodes are portioned into subsets T1,T2,T3.

More information

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology Introduction Chapter 4 Trees for large input, even linear access time may be prohibitive we need data structures that exhibit average running times closer to O(log N) binary search tree 2 Terminology recursive

More information

A set of nodes (or vertices) with a single starting point

A set of nodes (or vertices) with a single starting point Binary Search Trees Understand tree terminology Understand and implement tree traversals Define the binary search tree property Implement binary search trees Implement the TreeSort algorithm 2 A set of

More information

1 Leaffix Scan, Rootfix Scan, Tree Size, and Depth

1 Leaffix Scan, Rootfix Scan, Tree Size, and Depth Lecture 17 Graph Contraction I: Tree Contraction Parallel and Sequential Data Structures and Algorithms, 15-210 (Spring 2012) Lectured by Kanat Tangwongsan March 20, 2012 In this lecture, we will explore

More information

Exercise 1 : B-Trees [ =17pts]

Exercise 1 : B-Trees [ =17pts] CS - Fall 003 Assignment Due : Thu November 7 (written part), Tue Dec 0 (programming part) Exercise : B-Trees [+++3+=7pts] 3 0 3 3 3 0 Figure : B-Tree. Consider the B-Tree of figure.. What are the values

More information

Analysis of Algorithms

Analysis of Algorithms Algorithm An algorithm is a procedure or formula for solving a problem, based on conducting a sequence of specified actions. A computer program can be viewed as an elaborate algorithm. In mathematics and

More information

6. Algorithm Design Techniques

6. Algorithm Design Techniques 6. Algorithm Design Techniques 6. Algorithm Design Techniques 6.1 Greedy algorithms 6.2 Divide and conquer 6.3 Dynamic Programming 6.4 Randomized Algorithms 6.5 Backtracking Algorithms Malek Mouhoub, CS340

More information

Multiple Choice. Write your answer to the LEFT of each problem. 3 points each

Multiple Choice. Write your answer to the LEFT of each problem. 3 points each CSE 0-00 Test Spring 0 Name Last 4 Digits of Student ID # Multiple Choice. Write your answer to the LEFT of each problem. points each. Suppose f ( x) is a monotonically increasing function. Which of the

More information

CPS 616 TRANSFORM-AND-CONQUER 7-1

CPS 616 TRANSFORM-AND-CONQUER 7-1 CPS 616 TRANSFORM-AND-CONQUER 7-1 TRANSFORM AND CONQUER Group of techniques to solve a problem by first transforming the problem into one of: 1. a simpler/more convenient instance of the same problem (instance

More information

CSE 4500 (228) Fall 2010 Selected Notes Set 2

CSE 4500 (228) Fall 2010 Selected Notes Set 2 CSE 4500 (228) Fall 2010 Selected Notes Set 2 Alexander A. Shvartsman Computer Science and Engineering University of Connecticut Copyright c 2002-2010 by Alexander A. Shvartsman. All rights reserved. 2

More information

1 Interlude: Is keeping the data sorted worth it? 2 Tree Heap and Priority queue

1 Interlude: Is keeping the data sorted worth it? 2 Tree Heap and Priority queue TIE-0106 1 1 Interlude: Is keeping the data sorted worth it? When a sorted range is needed, one idea that comes to mind is to keep the data stored in the sorted order as more data comes into the structure

More information

D. Θ nlogn ( ) D. Ο. ). Which of the following is not necessarily true? . Which of the following cannot be shown as an improvement? D.

D. Θ nlogn ( ) D. Ο. ). Which of the following is not necessarily true? . Which of the following cannot be shown as an improvement? D. CSE 0 Name Test Fall 00 Last Digits of Mav ID # Multiple Choice. Write your answer to the LEFT of each problem. points each. The time to convert an array, with priorities stored at subscripts through n,

More information

Trees! Ellen Walker! CPSC 201 Data Structures! Hiram College!

Trees! Ellen Walker! CPSC 201 Data Structures! Hiram College! Trees! Ellen Walker! CPSC 201 Data Structures! Hiram College! ADTʼs Weʼve Studied! Position-oriented ADT! List! Stack! Queue! Value-oriented ADT! Sorted list! All of these are linear! One previous item;

More information

Computational Geometry

Computational Geometry Windowing queries Windowing Windowing queries Zoom in; re-center and zoom in; select by outlining Windowing Windowing queries Windowing Windowing queries Given a set of n axis-parallel line segments, preprocess

More information

Assignment 1. Stefano Guerra. October 11, The following observation follows directly from the definition of in order and pre order traversal:

Assignment 1. Stefano Guerra. October 11, The following observation follows directly from the definition of in order and pre order traversal: Assignment 1 Stefano Guerra October 11, 2016 1 Problem 1 Describe a recursive algorithm to reconstruct an arbitrary binary tree, given its preorder and inorder node sequences as input. First, recall that

More information

Analysis of Algorithms. Unit 4 - Analysis of well known Algorithms

Analysis of Algorithms. Unit 4 - Analysis of well known Algorithms Analysis of Algorithms Unit 4 - Analysis of well known Algorithms 1 Analysis of well known Algorithms Brute Force Algorithms Greedy Algorithms Divide and Conquer Algorithms Decrease and Conquer Algorithms

More information

COMP 250 Fall recurrences 2 Oct. 13, 2017

COMP 250 Fall recurrences 2 Oct. 13, 2017 COMP 250 Fall 2017 15 - recurrences 2 Oct. 13, 2017 Here we examine the recurrences for mergesort and quicksort. Mergesort Recall the mergesort algorithm: we divide the list of things to be sorted into

More information

Visualizing Data Structures. Dan Petrisko

Visualizing Data Structures. Dan Petrisko Visualizing Data Structures Dan Petrisko What is an Algorithm? A method of completing a function by proceeding from some initial state and input, proceeding through a finite number of well defined steps,

More information

Binary Search Trees. Carlos Moreno uwaterloo.ca EIT https://ece.uwaterloo.ca/~cmoreno/ece250

Binary Search Trees. Carlos Moreno uwaterloo.ca EIT https://ece.uwaterloo.ca/~cmoreno/ece250 Carlos Moreno cmoreno @ uwaterloo.ca EIT-4103 https://ece.uwaterloo.ca/~cmoreno/ece250 Previously, on ECE-250... We discussed trees (the general type) and their implementations. We looked at traversals

More information

CS 677: Parallel Programming for Many-core Processors Lecture 6

CS 677: Parallel Programming for Many-core Processors Lecture 6 1 CS 677: Parallel Programming for Many-core Processors Lecture 6 Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Logistics Midterm: March 11

More information

CS 331 DATA STRUCTURES & ALGORITHMS BINARY TREES, THE SEARCH TREE ADT BINARY SEARCH TREES, RED BLACK TREES, THE TREE TRAVERSALS, B TREES WEEK - 7

CS 331 DATA STRUCTURES & ALGORITHMS BINARY TREES, THE SEARCH TREE ADT BINARY SEARCH TREES, RED BLACK TREES, THE TREE TRAVERSALS, B TREES WEEK - 7 Ashish Jamuda Week 7 CS 331 DATA STRUCTURES & ALGORITHMS BINARY TREES, THE SEARCH TREE ADT BINARY SEARCH TREES, RED BLACK TREES, THE TREE TRAVERSALS, B TREES OBJECTIVES: Red Black Trees WEEK - 7 RED BLACK

More information

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree Chapter 4 Trees 2 Introduction for large input, even access time may be prohibitive we need data structures that exhibit running times closer to O(log N) binary search tree 3 Terminology recursive definition

More information

Parallel Programming

Parallel Programming Parallel Programming Midterm Exam Wednesday, April 27, 2016 Your points are precious, don t let them go to waste! Your Time All points are not equal. Note that we do not think that all exercises have the

More information

In the previous presentation, Erik Sintorn presented methods for practically constructing a DAG structure from a voxel data set.

In the previous presentation, Erik Sintorn presented methods for practically constructing a DAG structure from a voxel data set. 1 In the previous presentation, Erik Sintorn presented methods for practically constructing a DAG structure from a voxel data set. This presentation presents how such a DAG structure can be accessed immediately

More information

Introduction to Parallel Computing Errata

Introduction to Parallel Computing Errata Introduction to Parallel Computing Errata John C. Kirk 27 November, 2004 Overview Book: Introduction to Parallel Computing, Second Edition, first printing (hardback) ISBN: 0-201-64865-2 Official book website:

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 5.1 Introduction You should all know a few ways of sorting in O(n log n)

More information

Information Coding / Computer Graphics, ISY, LiTH

Information Coding / Computer Graphics, ISY, LiTH Sorting on GPUs Revisiting some algorithms from lecture 6: Some not-so-good sorting approaches Bitonic sort QuickSort Concurrent kernels and recursion Adapt to parallel algorithms Many sorting algorithms

More information

Search Trees. Undirected graph Directed graph Tree Binary search tree

Search Trees. Undirected graph Directed graph Tree Binary search tree Search Trees Undirected graph Directed graph Tree Binary search tree 1 Binary Search Tree Binary search key property: Let x be a node in a binary search tree. If y is a node in the left subtree of x, then

More information

( ). Which of ( ) ( ) " #& ( ) " # g( n) ( ) " # f ( n) Test 1

( ). Which of ( ) ( )  #& ( )  # g( n) ( )  # f ( n) Test 1 CSE 0 Name Test Summer 006 Last Digits of Student ID # Multiple Choice. Write your answer to the LEFT of each problem. points each. The time to multiply two n x n matrices is: A. "( n) B. "( nlogn) # C.

More information