General Transformations for GPU Execution of Tree Traversals

Size: px
Start display at page:

Download "General Transformations for GPU Execution of Tree Traversals"

Transcription

1 eneral Transformations for PU Execution of Tree Traversals Michael oldfarb*, Youngjoon Jo**, Milind Kulkarni School of Electrical and Computer Engineering * Now at Qualcomm; ** Now at oogle

2 PU execution of irregular programs PUs offer promise of massive, energy-efficient parallelism Much success in mapping regular applications to PUs Regular memory accesses, predictable computation Much less success in mapping irregular applications Pointer-based data structures Unpredictable, input-dependent computation and memory accesses 2

3 Tree traversal algorithms Many irregular algorithms are built around tree-traversal arnes-hut Nearest-neighbor 2-point correlation Numerous papers describing how to map tree traversal algorithms to PUs 3

4 Point correlation Data mining algorithm oal: given a set of N points in k dimensions and a point p, find all points within a radius r of p Naïve approach: compare all N points with p etter approach: build kdtree over points, traverse tree for point p, prune subtrees that are far from p 4

5 Point correlation Data mining algorithm oal: given a set of N points in k dimensions and a point p, find all points within a radius r of p Naïve approach: compare all N points with p etter approach: build kdtree over points, traverse tree for point p, prune subtrees that are far from p 5

6 Point correlation Data mining algorithm oal: given a set of N points in k dimensions and a point p, find all points within a radius r of p Naïve approach: compare all N points with p etter approach: build kdtree over points, traverse tree for point p, prune subtrees that are far from p 6

7 Point correlation 7

8 Point correlation 7

9 Point correlation C F 7

10 Point correlation C F D E 7

11 Point correlation C F H K D E 7

12 Point correlation C F H K D E I J 7

13 Point correlation C F H K D E I J 8

14 Point correlation C F H K D E I J 8

15 Point correlation C F H K D E I J 9

16 Point correlation C F H K D E I J 10

17 Point correlation C F H K D E I J 11

18 Point correlation C F H K D E I J 12

19 Point correlation C F H K D E I J 13

20 Point correlation C F H K D E I J 14

21 Point correlation KDCell root = /* build kdtree */; Set<Point> ps; double radius; foreach Point p in ps { recurse(p, root, radius); }... void recurse(point p, KDCell node, double r) { if (toofar(p, node, r)) return; if (node.isleaf() && (dist(node.point, p) < r)) p.correlated++; else { recurse(p, node.left, r); recurse(p, node.right, r); } } 15

22 asic pattern TreeNode root; Set<Point> ps; foreach Point p in ps { recurse(p, root,...); }... recurse(point p, KDCell node,...) { if (truncate?(p, node,...)) {... } recurse(p, node.child1,...); recurse(p, node.child2,...);... } 16

23 asic pattern TreeNode root; Set<Point> ps; foreach Point p in ps { recurse(p, root,...); }... recurse(point p, KDCell node,...) { if (truncate?(p, node,...)) {... } recurse(p, node.child1,...); recurse(p, node.child2,...);... } recursive traversal 16

24 asic pattern TreeNode root; Set<Point> ps; foreach Point p in ps { recurse(p, root,...); }... recurse(point p, KDCell node,...) { if (truncate?(p, node,...)) {... } recurse(p, node.child1,...); recurse(p, node.child2,...);... } tree structure recursive traversal 16

25 asic pattern TreeNode root; Set<Point> ps; foreach Point p in ps { recurse(p, root,...); }... recurse(point p, KDCell node,...) { if (truncate?(p, node,...)) {... } recurse(p, node.child1,...); recurse(p, node.child2,...);... } tree structure repeated traversal recursive traversal 16

26 asic pattern TreeNode root; Set<Point> ps; foreach Point p in ps { recurse(p, root,...); }... recurse(point p, KDCell node,...) { if (truncate?(p, Lots of parallelism! node,...)) {... } recurse(p, node.child1,...); recurse(p, node.child2,...);... } tree structure repeated traversal recursive traversal 16

27 What s the problem? PUs add high overhead for recursion PUs work best when memory accesses are regular and strided, but irregular algorithms have unpredictable memory accesses Status quo: ad hoc solutions New algorithm? New PU techniques! 17

28 What s the problem? PUs add high overhead for recursion PUs work best when memory accesses are regular and strided, but irregular algorithms have unpredictable memory accesses Want generally applicable techniques for mapping irregular applications to PUs Status quo: ad hoc solutions New algorithm? New PU techniques! 17

29 Contributions Two general techniques for mapping treetraversals to PUs utoropes: eliminates recursion overhead Lockstepping: promotes memory coalescing Compiler pass to automatically apply techniques to recursive tree-traversal code Significant PU speedups on 5 tree-traversal algorithms 18

30 Naïve PU implementation Warp-based SIMT (single-instruction, multiplethread) execution 32 points put in a single warp Warp traverses tree ll points in warp must execute same instruction If points diverge, some points sit idle while other threads execute 19

31 Naïve PU implementation C F H K D E I J 20

32 Naïve PU implementation C F H K D E I J 20

33 Naïve PU implementation C F H K D E I J 20

34 Naïve PU implementation C F H K D E I J 20

35 Naïve PU implementation C F H K D E I J 21

36 Naïve PU implementation C F H K D E I J 22

37 Naïve PU implementation C F H K D E I J 23

38 Naïve PU implementation C F H K D E I J 24

39 Naïve PU implementation C F H K D E I J 25

40 Naïve PU implementation C F H K D E I J 26

41 Naïve PU implementation C F H K D E I J 27

42 Naïve PU implementation C F H K D E I J 28

43 Naïve PU implementation C F H K D E I J 29

44 Naïve PU implementation C F H K D E I J 30

45 Naïve PU implementation C F H K D E I J 31

46 Naïve PU implementation C F H K D E I J 32

47 Naïve PU implementation C F H K D E I J 33

48 Naïve PU implementation C F H K D E I J 34

49 Lots of accesses to tree Many accesses just moving up the tree in order to later move down again Lots of function stack manipulation Trees are very large, cannot be stored in PU s fast memory Want to minimize accesses to tree 35

50 How to avoid extra accesses to tree? Typical technique: ropes Pointers in each tree node that let a traversal jump to the next part of the tree Effectively linearizes traversal C F H K D E I J 36

51 How to avoid extra accesses to tree? Typical technique: ropes Pointers in each tree node that let a traversal jump to the next part of the tree Effectively linearizes traversal C F H K D E I J 36

52 How to avoid extra accesses to tree? Typical technique: ropes Pointers in each tree node that let a traversal jump to the next part of the tree Effectively linearizes traversal C F H K D E I J 36

53 How to avoid extra accesses to tree? Typical technique: ropes Pointers in each tree node that let a traversal jump to the next part of the tree Effectively linearizes traversal C F H K D E I J 36

54 How to avoid extra accesses to tree? Installing ropes into a tree requires complex, applicationspecific preprocessing Using ropes correctly during execution requires complex, application-specific logic C F H K D E I J 37

55 utoropes eneral technique for linearizing tree traversal for arbitrary traversal algorithms chieve generality, simplicity and spaceefficiency at the cost of overhead Key insight: recursive tree algorithms are just depth-first traversals of a tree; can transform into iterative algorithm 38

56 utoropes C F H K D E I J Rope stack 39

57 utoropes C F H K D E I J Rope stack 40

58 utoropes F C C F H K D E I J Rope stack 41

59 utoropes F E D C F H K D E I J Rope stack 42

60 utoropes F E C F H K D E I J Rope stack 43

61 utoropes F C F H K D E I J Rope stack 44

62 utoropes C F H K D E I J Rope stack 45

63 utoropes Ropes stored on rope stack instead of in tree No application-specific code to use ropes Ropes instantiated dynamically No preprocessing required Same access patterns as with manual ropes Extra pushes and pops on rope stack add some overhead 46

64 utoropes Simple compiler transformation converts recursive code into iterative autoropes code void recurse(point p, KDCell node, double r) { if (toofar(p, node, r)) return; if (node.isleaf() && (dist(node.point, p) < r)) p.correlated++; else { recurse(p, node.left, r); recurse(p, node.right, r); } } 47

65 utoropes ropestack.push(root); while (!ropestack.isempty()) { node = ropestack.pop(); if (toofar(p, node, r)) continue; if (node.isleaf() && (dist(node.point, p) < r)) p.correlated++; else { ropestack.push(node.right); ropestack.push(node.left); } } See paper for details of how to transform more complex code 48

66 Unintended consequence Recursive calls naturally lead to thread divergence If some threads make recursive calls, other threads wait until calls return Does not happen for iterative code ll threads reconverge at beginning of loop ropestack.push(root); while (!ropestack.isempty()) { node = ropestack.pop(); if (toofar(p, node, r)) continue; if (...) p.correlated++; else { ropestack.push(node.right); ropestack.push(node.left); } } 49

67 Unintended consequence Recursive calls naturally lead to thread divergence If some threads make recursive calls, other threads wait until calls return Does not happen for iterative code ll threads reconverge at beginning of loop ropestack.push(root); while (!ropestack.isempty()) { node = ropestack.pop(); if (toofar(p, node, r)) continue; if (...) formerly return p.correlated++; else { ropestack.push(node.right); ropestack.push(node.left); } } formerly recursion 49

68 utoropes on PU C F H K D E I J 50

69 utoropes on PU C F H K D E I J 51

70 utoropes on PU C F H K D E I J 52

71 utoropes on PU C F H K D E I J 53

72 utoropes on PU C F H K D E I J 54

73 utoropes on PU C F H K D E I J 55

74 utoropes on PU C F H K D E I J 56

75 utoropes on PU C F Threads no longer diverge in execution ut do diverge in tree! H K D E I J 56

76 Thread divergence vs. memory coalescing Memory accesses on PU only well behaved if accesses by all threads in warp can be coalesced Same memory or strided access ad memory behavior of autoropes outweighs lack of thread divergence oal: benefits of autoropes while maintaining memory coalescing 57

77 Lockstepping Essentially, force PU to let threads diverge If any thread in a warp wants to visit a node s children, all threads in a warp visit the child Threads that are dragged along are programmatically masked out Warp execution takes longer (proportional to union of threads traversals, rather than longest traversal), but improved memory performance makes up for it utomatically implemented during autoropes compiler pass 58

78 Dynamic lockstepping Some algorithms allow different traversal orders Some points visit left child before right, and others visit right before left Optimization reduces traversal size Inherently bad memory access patterns 59

79 Dynamic lockstepping C F H K D E I J 60

80 Dynamic lockstepping C F H K D E I J 61

81 Dynamic lockstepping C F H K D E I J 62

82 Dynamic lockstepping C F H K D E I J 63

83 Dynamic lockstepping C F H K D E I J 64

84 Dynamic lockstepping Dynamic lockstepping allows all points in a warp to vote on which traversal order to use Maintains memory coalescing Some points do more work than in original algorithm Tradeoff can still be worth it! 65

85 Engineering details Transform point data from array of structures format to structure of arrays Use analysis from [PCT 2013] to prove safety and transform automatically Copy tree data to PU in linearized fashion Lay out fields of tree and point according to use (more commonly-accessed fields placed in shared memory) Interleave rope stacks for points in warp to allow strided access 66

86 Results Two platforms PU platform: NVIDI Tesla C2070 (6 global memory, 14 SMs) CPU platform: 32-core, 2.3 Hz Opteron Five benchmarks arnes-hut, Point correlation, Nearest neighbor, k- Nearest neighbor, Vantage point trees Multiple inputs per benchmark Used sorted and unsorted points 67

87 High-level takeaways utoropes+lockstep always faster than simple recursive PU implementation (up to 14x faster) For most benchmarks/inputs, best PU implementation faster than CPU implementation up to 16 threads Speedups comparable to hand-written implementations 68

88 arnes-hut 69

89 Point correlation 70

90 Nearest-neighbor 71

91 Conclusions Mapping irregular applications to PUs is very difficult Developed two general techniques, autoropes and lockstepping, that can achieve significant speedup on PU vs. baseline PU code and CPU implementations utomatic approaches competitive with previous hand-written implementations 72

92 eneral Transformations for PU Execution of Tree Traversals Michael oldfarb*, Youngjoon Jo**, Milind Kulkarni School of Electrical and Computer Engineering * Now at Qualcomm; ** Now at oogle

General Transformations for GPU Execution of Tree Traversals

General Transformations for GPU Execution of Tree Traversals General Transformations for GPU Execution of Tree Traversals Michael Goldfarb, Youngjoon Jo and Milind Kulkarni School of Electrical and Computer Engineering Purdue University West Lafayette, IN {mgoldfar,

More information

AUTOMATIC VECTORIZATION OF TREE TRAVERSALS

AUTOMATIC VECTORIZATION OF TREE TRAVERSALS AUTOMATIC VECTORIZATION OF TREE TRAVERSALS Youngjoon Jo, Michael Goldfarb and Milind Kulkarni PACT, Edinburgh, U.K. September 11 th, 2013 Youngjoon Jo 2 Commodity processors and SIMD Commodity processors

More information

Warped parallel nearest neighbor searches using kd-trees

Warped parallel nearest neighbor searches using kd-trees Warped parallel nearest neighbor searches using kd-trees Roman Sokolov, Andrei Tchouprakov D4D Technologies Kd-trees Binary space partitioning tree Used for nearest-neighbor search, range search Application:

More information

Automatically Enhancing Locality for Tree Traversals with Traversal Splicing

Automatically Enhancing Locality for Tree Traversals with Traversal Splicing Purdue University Purdue e-pubs ECE Technical Reports Electrical and Computer Engineering 2-15-2012 Automatically Enhancing Locality for Tree Traversals with Traversal Splicing Youngjoon Jo Electrical

More information

Photon Mapping. Kadi Bouatouch IRISA

Photon Mapping. Kadi Bouatouch IRISA Kadi Bouatouch IRISA Email: kadi@irisa.fr 1 Photon emission and transport 2 Photon caching 3 Spatial data structure for fast access 4 Radiance estimation 5 Kd-tree Balanced Binary Tree When a splitting

More information

An Efficient CUDA Implementation of a Tree-Based N-Body Algorithm. Martin Burtscher Department of Computer Science Texas State University-San Marcos

An Efficient CUDA Implementation of a Tree-Based N-Body Algorithm. Martin Burtscher Department of Computer Science Texas State University-San Marcos An Efficient CUDA Implementation of a Tree-Based N-Body Algorithm Martin Burtscher Department of Computer Science Texas State University-San Marcos Mapping Regular Code to GPUs Regular codes Operate on

More information

CSC148 Week 7. Larry Zhang

CSC148 Week 7. Larry Zhang CSC148 Week 7 Larry Zhang 1 Announcements Test 1 can be picked up in DH-3008 A1 due this Saturday Next week is reading week no lecture, no labs no office hours 2 Recap Last week, learned about binary trees

More information

Chapter 20: Binary Trees

Chapter 20: Binary Trees Chapter 20: Binary Trees 20.1 Definition and Application of Binary Trees Definition and Application of Binary Trees Binary tree: a nonlinear linked list in which each node may point to 0, 1, or two other

More information

Enhancing Locality for Recursive Traversals of Recursive Structures

Enhancing Locality for Recursive Traversals of Recursive Structures Enhancing Locality for Recursive Traversals of Recursive Structures Youngjoon Jo and Milind Kulkarni School of Electrical and Computer Engineering Purdue University {yjo,milind}@purdue.edu Abstract While

More information

CMPSCI 187: Programming With Data Structures. Lecture #28: Binary Search Trees 21 November 2011

CMPSCI 187: Programming With Data Structures. Lecture #28: Binary Search Trees 21 November 2011 CMPSCI 187: Programming With Data Structures Lecture #28: Binary Search Trees 21 November 2011 Binary Search Trees The Idea of a Binary Search Tree The BinarySearchTreeADT Interface The LinkedBinarySearchTree

More information

Characterization of CUDA and KD-Tree K-Query Point Nearest Neighbor for Static and Dynamic Data Sets

Characterization of CUDA and KD-Tree K-Query Point Nearest Neighbor for Static and Dynamic Data Sets Characterization of CUDA and KD-Tree K-Query Point Nearest Neighbor for Static and Dynamic Data Sets Brian Bowden bbowden1@vt.edu Jon Hellman jonatho7@vt.edu 1. Introduction Nearest neighbor (NN) search

More information

Treelogy: A Benchmark Suite for Tree Traversals

Treelogy: A Benchmark Suite for Tree Traversals Purdue University Programming Languages Group Treelogy: A Benchmark Suite for Tree Traversals Nikhil Hegde, Jianqiao Liu, Kirshanthan Sundararajah, and Milind Kulkarni School of Electrical and Computer

More information

CSE 332: Data Structures & Parallelism Lecture 7: Dictionaries; Binary Search Trees. Ruth Anderson Winter 2019

CSE 332: Data Structures & Parallelism Lecture 7: Dictionaries; Binary Search Trees. Ruth Anderson Winter 2019 CSE 332: Data Structures & Parallelism Lecture 7: Dictionaries; Binary Search Trees Ruth Anderson Winter 2019 Today Dictionaries Trees 1/23/2019 2 Where we are Studying the absolutely essential ADTs of

More information

Trees. CSE 373 Data Structures

Trees. CSE 373 Data Structures Trees CSE 373 Data Structures Readings Reading Chapter 7 Trees 2 Why Do We Need Trees? Lists, Stacks, and Queues are linear relationships Information often contains hierarchical relationships File directories

More information

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh

More information

Uses for Trees About Trees Binary Trees. Trees. Seth Long. January 31, 2010

Uses for Trees About Trees Binary Trees. Trees. Seth Long. January 31, 2010 Uses for About Binary January 31, 2010 Uses for About Binary Uses for Uses for About Basic Idea Implementing Binary Example: Expression Binary Search Uses for Uses for About Binary Uses for Storage Binary

More information

Trees : Part 1. Section 4.1. Theory and Terminology. A Tree? A Tree? Theory and Terminology. Theory and Terminology

Trees : Part 1. Section 4.1. Theory and Terminology. A Tree? A Tree? Theory and Terminology. Theory and Terminology Trees : Part Section. () (2) Preorder, Postorder and Levelorder Traversals Definition: A tree is a connected graph with no cycles Consequences: Between any two vertices, there is exactly one unique path

More information

Trees. Tree Structure Binary Tree Tree Traversals

Trees. Tree Structure Binary Tree Tree Traversals Trees Tree Structure Binary Tree Tree Traversals The Tree Structure Consists of nodes and edges that organize data in a hierarchical fashion. nodes store the data elements. edges connect the nodes. The

More information

Why Do We Need Trees?

Why Do We Need Trees? CSE 373 Lecture 6: Trees Today s agenda: Trees: Definition and terminology Traversing trees Binary search trees Inserting into and deleting from trees Covered in Chapter 4 of the text Why Do We Need Trees?

More information

CS24 Week 8 Lecture 1

CS24 Week 8 Lecture 1 CS24 Week 8 Lecture 1 Kyle Dewey Overview Tree terminology Tree traversals Implementation (if time) Terminology Node The most basic component of a tree - the squares Edge The connections between nodes

More information

Sorted Arrays. Operation Access Search Selection Predecessor Successor Output (print) Insert Delete Extract-Min

Sorted Arrays. Operation Access Search Selection Predecessor Successor Output (print) Insert Delete Extract-Min Binary Search Trees FRIDAY ALGORITHMS Sorted Arrays Operation Access Search Selection Predecessor Successor Output (print) Insert Delete Extract-Min 6 10 11 17 2 0 6 Running Time O(1) O(lg n) O(1) O(1)

More information

MIDTERM WEEK - 9. Question 1 : Implement a MyQueue class which implements a queue using two stacks.

MIDTERM WEEK - 9. Question 1 : Implement a MyQueue class which implements a queue using two stacks. Ashish Jamuda Week 9 CS 331-DATA STRUCTURES & ALGORITHMS MIDTERM WEEK - 9 Question 1 : Implement a MyQueue class which implements a queue using two stacks. Solution: Since the major difference between

More information

Binary Trees

Binary Trees Binary Trees 4-7-2005 Opening Discussion What did we talk about last class? Do you have any code to show? Do you have any questions about the assignment? What is a Tree? You are all familiar with what

More information

Overview of Presentation. Heapsort. Heap Properties. What is Heap? Building a Heap. Two Basic Procedure on Heap

Overview of Presentation. Heapsort. Heap Properties. What is Heap? Building a Heap. Two Basic Procedure on Heap Heapsort Submitted by : Hardik Parikh(hjp0608) Soujanya Soni (sxs3298) Overview of Presentation Heap Definition. Adding a Node. Removing a Node. Array Implementation. Analysis What is Heap? A Heap is a

More information

LECTURE 11 TREE TRAVERSALS

LECTURE 11 TREE TRAVERSALS DATA STRUCTURES AND ALGORITHMS LECTURE 11 TREE TRAVERSALS IMRAN IHSAN ASSISTANT PROFESSOR AIR UNIVERSITY, ISLAMABAD BACKGROUND All the objects stored in an array or linked list can be accessed sequentially

More information

Outline. An Application: A Binary Search Tree. 1 Chapter 7: Trees. favicon. CSI33 Data Structures

Outline. An Application: A Binary Search Tree. 1 Chapter 7: Trees. favicon. CSI33 Data Structures Outline Chapter 7: Trees 1 Chapter 7: Trees Approaching BST Making a decision We discussed the trade-offs between linked and array-based implementations of sequences (back in Section 4.7). Linked lists

More information

3137 Data Structures and Algorithms in C++

3137 Data Structures and Algorithms in C++ 3137 Data Structures and Algorithms in C++ Lecture 4 July 17 2006 Shlomo Hershkop 1 Announcements please make sure to keep up with the course, it is sometimes fast paced for extra office hours, please

More information

The Art of Recursion: Problem Set 10

The Art of Recursion: Problem Set 10 The Art of Recursion: Problem Set Due Tuesday, November Tail recursion A recursive function is tail recursive if every recursive call is in tail position that is, the result of the recursive call is immediately

More information

Simultaneous Branch and Warp Interweaving for Sustained GPU Performance

Simultaneous Branch and Warp Interweaving for Sustained GPU Performance Simultaneous Branch and Warp Interweaving for Sustained GPU Performance Nicolas Brunie Sylvain Collange Gregory Diamos by Ravi Godavarthi Outline Introduc)on ISCA'39 (Interna'onal Society for Computers

More information

An Introduction to Trees

An Introduction to Trees An Introduction to Trees Alice E. Fischer Spring 2017 Alice E. Fischer An Introduction to Trees... 1/34 Spring 2017 1 / 34 Outline 1 Trees the Abstraction Definitions 2 Expression Trees 3 Binary Search

More information

CSE 326: Data Structures Binary Search Trees

CSE 326: Data Structures Binary Search Trees nnouncements (1/3/0) S 36: ata Structures inary Search Trees Steve Seitz Winter 0 HW # due now HW #3 out today, due at beginning of class next riday. Project due next Wed. night. Read hapter 4 1 Ts Seen

More information

Scalable Algorithmic Techniques Decompositions & Mapping. Alexandre David

Scalable Algorithmic Techniques Decompositions & Mapping. Alexandre David Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05 adavid@cs.aau.dk Introduction Focus on data parallelism, scale with size. Task parallelism limited. Notion of scalability

More information

Lecture 16: Binary Search Trees

Lecture 16: Binary Search Trees Extended Introduction to Computer Science CS1001.py Lecture 16: Binary Search Trees Instructors: Daniel Deutch, Amir Rubinstein Teaching Assistants: Michal Kleinbort, Amir Gilad School of Computer Science

More information

8. Write an example for expression tree. [A/M 10] (A+B)*((C-D)/(E^F))

8. Write an example for expression tree. [A/M 10] (A+B)*((C-D)/(E^F)) DHANALAKSHMI COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING EC6301 OBJECT ORIENTED PROGRAMMING AND DATA STRUCTURES UNIT IV NONLINEAR DATA STRUCTURES Part A 1. Define Tree [N/D 08]

More information

Global Memory Access Pattern and Control Flow

Global Memory Access Pattern and Control Flow Optimization Strategies Global Memory Access Pattern and Control Flow Objectives Optimization Strategies Global Memory Access Pattern (Coalescing) Control Flow (Divergent branch) Global l Memory Access

More information

Separate Compilation and Namespaces Week Fall. Computer Programming for Engineers

Separate Compilation and Namespaces Week Fall. Computer Programming for Engineers Separate Compilation and Namespaces Week 07 Fall Computer Programming for Engineers Problem Multiplied palindrome number Multiplied palindrome number A number is called as a palindrome when you can read

More information

Where we are. CSE373: Data Structures & Algorithms Lecture 4: Dictionaries; Binary Search Trees. The Dictionary (a.k.a. Map) ADT

Where we are. CSE373: Data Structures & Algorithms Lecture 4: Dictionaries; Binary Search Trees. The Dictionary (a.k.a. Map) ADT Where we are Studying the absolutely essential DTs of computer science and classic data structures for implementing them SE: Data Structures & lgorithms Lecture : Dictionaries; inary Search Trees Dan rossman

More information

Backtracking. Chapter 5

Backtracking. Chapter 5 1 Backtracking Chapter 5 2 Objectives Describe the backtrack programming technique Determine when the backtracking technique is an appropriate approach to solving a problem Define a state space tree for

More information

Stackless BVH Collision Detection for Physical Simulation

Stackless BVH Collision Detection for Physical Simulation Stackless BVH Collision Detection for Physical Simulation Jesper Damkjær Department of Computer Science, University of Copenhagen Universitetsparken 1, DK-2100, Copenhagen Ø, Denmark damkjaer@diku.dk February

More information

Trees! Ellen Walker! CPSC 201 Data Structures! Hiram College!

Trees! Ellen Walker! CPSC 201 Data Structures! Hiram College! Trees! Ellen Walker! CPSC 201 Data Structures! Hiram College! ADTʼs Weʼve Studied! Position-oriented ADT! List! Stack! Queue! Value-oriented ADT! Sorted list! All of these are linear! One previous item;

More information

CSI33 Data Structures

CSI33 Data Structures Outline Department of Mathematics and Computer Science Bronx Community College October 24, 2016 Outline Outline 1 Chapter 7: Trees Outline Chapter 7: Trees 1 Chapter 7: Trees The Binary Search Property

More information

DATA STRUCTURES AND ALGORITHMS

DATA STRUCTURES AND ALGORITHMS LECTURE 11 Babeş - Bolyai University Computer Science and Mathematics Faculty 2017-2018 In Lecture 10... Hash tables Separate chaining Coalesced chaining Open Addressing Today 1 Open addressing - review

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #6: Memory Management CS 61C L06 Memory Management (1) 2006-07-05 Andy Carle Memory Management (1/2) Variable declaration allocates

More information

kd-trees Idea: Each level of the tree compares against 1 dimension. Let s us have only two children at each node (instead of 2 d )

kd-trees Idea: Each level of the tree compares against 1 dimension. Let s us have only two children at each node (instead of 2 d ) kd-trees Invented in 1970s by Jon Bentley Name originally meant 3d-trees, 4d-trees, etc where k was the # of dimensions Now, people say kd-tree of dimension d Idea: Each level of the tree compares against

More information

Tree traversals and binary trees

Tree traversals and binary trees Tree traversals and binary trees Comp Sci 1575 Data Structures Valgrind Execute valgrind followed by any flags you might want, and then your typical way to launch at the command line in Linux. Assuming

More information

Parallel Physically Based Path-tracing and Shading Part 3 of 2. CIS565 Fall 2012 University of Pennsylvania by Yining Karl Li

Parallel Physically Based Path-tracing and Shading Part 3 of 2. CIS565 Fall 2012 University of Pennsylvania by Yining Karl Li Parallel Physically Based Path-tracing and Shading Part 3 of 2 CIS565 Fall 202 University of Pennsylvania by Yining Karl Li Jim Scott 2009 Spatial cceleration Structures: KD-Trees *Some portions of these

More information

CSE373: Data Structures & Algorithms Lecture 5: Dictionaries; Binary Search Trees. Aaron Bauer Winter 2014

CSE373: Data Structures & Algorithms Lecture 5: Dictionaries; Binary Search Trees. Aaron Bauer Winter 2014 CSE373: Data Structures & lgorithms Lecture 5: Dictionaries; Binary Search Trees aron Bauer Winter 2014 Where we are Studying the absolutely essential DTs of computer science and classic data structures

More information

Postfix (and prefix) notation

Postfix (and prefix) notation Postfix (and prefix) notation Also called reverse Polish reversed form of notation devised by mathematician named Jan Łukasiewicz (so really lü-kä-sha-vech notation) Infix notation is: operand operator

More information

Priority Queues, Binary Heaps, and Heapsort

Priority Queues, Binary Heaps, and Heapsort Priority Queues, Binary eaps, and eapsort Learning Goals: Provide examples of appropriate applications for priority queues and heaps. Implement and manipulate a heap using an array as the underlying data

More information

Fast Bit Sort. A New In Place Sorting Technique. Nando Favaro February 2009

Fast Bit Sort. A New In Place Sorting Technique. Nando Favaro February 2009 Fast Bit Sort A New In Place Sorting Technique Nando Favaro February 2009 1. INTRODUCTION 1.1. A New Sorting Algorithm In Computer Science, the role of sorting data into an order list is a fundamental

More information

CS 62 Practice Final SOLUTIONS

CS 62 Practice Final SOLUTIONS CS 62 Practice Final SOLUTIONS 2017-5-2 Please put your name on the back of the last page of the test. Note: This practice test may be a bit shorter than the actual exam. Part 1: Short Answer [32 points]

More information

Milind Kulkarni Research Statement

Milind Kulkarni Research Statement Milind Kulkarni Research Statement With the increasing ubiquity of multicore processors, interest in parallel programming is again on the upswing. Over the past three decades, languages and compilers researchers

More information

Balanced Search Trees. CS 3110 Fall 2010

Balanced Search Trees. CS 3110 Fall 2010 Balanced Search Trees CS 3110 Fall 2010 Some Search Structures Sorted Arrays Advantages Search in O(log n) time (binary search) Disadvantages Need to know size in advance Insertion, deletion O(n) need

More information

Data Structures and Algorithms

Data Structures and Algorithms Data Structures and Algorithms CS245-2008S-19 B-Trees David Galles Department of Computer Science University of San Francisco 19-0: Indexing Operations: Add an element Remove an element Find an element,

More information

Binary Tree. Binary tree terminology. Binary tree terminology Definition and Applications of Binary Trees

Binary Tree. Binary tree terminology. Binary tree terminology Definition and Applications of Binary Trees Binary Tree (Chapter 0. Starting Out with C++: From Control structures through Objects, Tony Gaddis) Le Thanh Huong School of Information and Communication Technology Hanoi University of Technology 11.1

More information

Programmer's View of Execution Teminology Summary

Programmer's View of Execution Teminology Summary CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 28: GP-GPU Programming GPUs Hardware specialized for graphics calculations Originally developed to facilitate the use of CAD programs

More information

Optimization Strategies Global Memory Access Pattern and Control Flow

Optimization Strategies Global Memory Access Pattern and Control Flow Global Memory Access Pattern and Control Flow Optimization Strategies Global Memory Access Pattern (Coalescing) Control Flow (Divergent branch) Highest latency instructions: 400-600 clock cycles Likely

More information

Course Review. Cpt S 223 Fall 2009

Course Review. Cpt S 223 Fall 2009 Course Review Cpt S 223 Fall 2009 1 Final Exam When: Tuesday (12/15) 8-10am Where: in class Closed book, closed notes Comprehensive Material for preparation: Lecture slides & class notes Homeworks & program

More information

a graph is a data structure made up of nodes in graph theory the links are normally called edges

a graph is a data structure made up of nodes in graph theory the links are normally called edges 1 Trees Graphs a graph is a data structure made up of nodes each node stores data each node has links to zero or more nodes in graph theory the links are normally called edges graphs occur frequently in

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17 01.433/33 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/2/1.1 Introduction In this lecture we ll talk about a useful abstraction, priority queues, which are

More information

Binomial Queue deletemin ADTs Seen So Far

Binomial Queue deletemin ADTs Seen So Far Today s Outline Trees (inary Search Trees) hapter in Weiss S ata Structures Ruth nderson nnouncements Written HW # due next riday, 1/ Project due next Monday, /1 Today s Topics: Priority Queues inomial

More information

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology Introduction Chapter 4 Trees for large input, even linear access time may be prohibitive we need data structures that exhibit average running times closer to O(log N) binary search tree 2 Terminology recursive

More information

CS61B Lecture #20. Today: Trees. Readings for Today: Data Structures, Chapter 5. Readings for Next Topic: Data Structures, Chapter 6

CS61B Lecture #20. Today: Trees. Readings for Today: Data Structures, Chapter 5. Readings for Next Topic: Data Structures, Chapter 6 CS61B Lecture #20 Today: Trees Readings for Today: Data Structures, Chapter 5 Readings for Next Topic: Data Structures, Chapter 6 Last modified: Wed Oct 14 03:20:09 2015 CS61B: Lecture #20 1 A Recursive

More information

Lecture 15 Notes Binary Search Trees

Lecture 15 Notes Binary Search Trees Lecture 15 Notes Binary Search Trees 15-122: Principles of Imperative Computation (Spring 2016) Frank Pfenning, André Platzer, Rob Simmons 1 Introduction In this lecture, we will continue considering ways

More information

Balanced Search Trees

Balanced Search Trees Balanced Search Trees Computer Science E-22 Harvard Extension School David G. Sullivan, Ph.D. Review: Balanced Trees A tree is balanced if, for each node, the node s subtrees have the same height or have

More information

Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b

Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b simas@cs.aau.dk Range Searching in 2D Main goals of the lecture: to understand and to be able to analyze the kd-trees

More information

LPT Codes used with ROAM Splitting Algorithm CMSC451 Honors Project Joseph Brosnihan Mentor: Dave Mount Fall 2015

LPT Codes used with ROAM Splitting Algorithm CMSC451 Honors Project Joseph Brosnihan Mentor: Dave Mount Fall 2015 LPT Codes used with ROAM Splitting Algorithm CMSC451 Honors Project Joseph Brosnihan Mentor: Dave Mount Fall 2015 Abstract Simplicial meshes are commonly used to represent regular triangle grids. Of many

More information

On the Correctness of the SIMT Execution Model of GPUs. Extended version of the author s ESOP 12 paper. Axel Habermaier and Alexander Knapp

On the Correctness of the SIMT Execution Model of GPUs. Extended version of the author s ESOP 12 paper. Axel Habermaier and Alexander Knapp UNIVERSITÄT AUGSBURG On the Correctness of the SIMT Execution Model of GPUs Extended version of the author s ESOP 12 paper Axel Habermaier and Alexander Knapp Report 2012-01 January 2012 INSTITUT FÜR INFORMATIK

More information

GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC

GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC MIKE GOWANLOCK NORTHERN ARIZONA UNIVERSITY SCHOOL OF INFORMATICS, COMPUTING & CYBER SYSTEMS BEN KARSIN UNIVERSITY OF HAWAII AT MANOA DEPARTMENT

More information

Data structures and libraries

Data structures and libraries Data structures and libraries Bjarki Ágúst Guðmundsson Tómas Ken Magnússon Árangursrík forritun og lausn verkefna School of Computer Science Reykjavík University Today we re going to cover Basic data types

More information

Today's Topics. CISC 458 Winter J.R. Cordy

Today's Topics. CISC 458 Winter J.R. Cordy Today's Topics Last Time Semantics - the meaning of program structures Stack model of expression evaluation, the Expression Stack (ES) Stack model of automatic storage, the Run Stack (RS) Today Managing

More information

Trees. (Trees) Data Structures and Programming Spring / 28

Trees. (Trees) Data Structures and Programming Spring / 28 Trees (Trees) Data Structures and Programming Spring 2018 1 / 28 Trees A tree is a collection of nodes, which can be empty (recursive definition) If not empty, a tree consists of a distinguished node r

More information

CSE 591: GPU Programming. Using CUDA in Practice. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591: GPU Programming. Using CUDA in Practice. Klaus Mueller. Computer Science Department Stony Brook University CSE 591: GPU Programming Using CUDA in Practice Klaus Mueller Computer Science Department Stony Brook University Code examples from Shane Cook CUDA Programming Related to: score boarding load and store

More information

Lecture 15 Binary Search Trees

Lecture 15 Binary Search Trees Lecture 15 Binary Search Trees 15-122: Principles of Imperative Computation (Fall 2017) Frank Pfenning, André Platzer, Rob Simmons, Iliano Cervesato In this lecture, we will continue considering ways to

More information

Automatically Optimizing Tree Traversal Algorithms

Automatically Optimizing Tree Traversal Algorithms Purdue University Purdue e-pubs Open Access Dissertations Theses and Dissertations Fall 2013 Automatically Optimizing Tree Traversal Algorithms Youngjoon Jo Purdue University Follow this and additional

More information

CSE373: Data Structures & Algorithms Lecture 4: Dictionaries; Binary Search Trees. Kevin Quinn Fall 2015

CSE373: Data Structures & Algorithms Lecture 4: Dictionaries; Binary Search Trees. Kevin Quinn Fall 2015 CSE373: Data Structures & Algorithms Lecture 4: Dictionaries; Binary Search Trees Kevin Quinn Fall 2015 Where we are Studying the absolutely essential ADTs of computer science and classic data structures

More information

CS61BL Summer 2013 Midterm 2

CS61BL Summer 2013 Midterm 2 CS61BL Summer 2013 Midterm 2 Sample Solutions + Common Mistakes Question 0: Each of the following cost you.5 on this problem: you earned some credit on a problem and did not put your five digit on the

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 30: GP-GPU Programming. Lecturer: Alan Christopher

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 30: GP-GPU Programming. Lecturer: Alan Christopher CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 30: GP-GPU Programming Lecturer: Alan Christopher Overview GP-GPU: What and why OpenCL, CUDA, and programming GPUs GPU Performance

More information

Binary Trees, Binary Search Trees

Binary Trees, Binary Search Trees Binary Trees, Binary Search Trees Trees Linear access time of linked lists is prohibitive Does there exist any simple data structure for which the running time of most operations (search, insert, delete)

More information

Introduction to Trees. D. Thiebaut CSC212 Fall 2014

Introduction to Trees. D. Thiebaut CSC212 Fall 2014 Introduction to Trees D. Thiebaut CSC212 Fall 2014 A bit of History & Data Visualization: The Book of Trees. (Link) We Concentrate on Binary-Trees, Specifically, Binary-Search Trees (BST) How Will Java

More information

Priority Queues Heaps Heapsort

Priority Queues Heaps Heapsort Priority Queues Heaps Heapsort After this lesson, you should be able to apply the binary heap insertion and deletion algorithms by hand implement the binary heap insertion and deletion algorithms explain

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 7 More Memory Management CS 61C L07 More Memory Management (1) 2004-09-15 Lecturer PSOE Dan Garcia www.cs.berkeley.edu/~ddgarcia Star Wars

More information

Simty: Generalized SIMT execution on RISC-V

Simty: Generalized SIMT execution on RISC-V Simty: Generalized SIMT execution on RISC-V CARRV 2017 Sylvain Collange INRIA Rennes / IRISA sylvain.collange@inria.fr From CPU-GPU to heterogeneous multi-core Yesterday (2000-2010) Homogeneous multi-core

More information

CSCI-401 Examlet #5. Name: Class: Date: True/False Indicate whether the sentence or statement is true or false.

CSCI-401 Examlet #5. Name: Class: Date: True/False Indicate whether the sentence or statement is true or false. Name: Class: Date: CSCI-401 Examlet #5 True/False Indicate whether the sentence or statement is true or false. 1. The root node of the standard binary tree can be drawn anywhere in the tree diagram. 2.

More information

Parallelizing Adaptive Triangular Grids with Refinement Trees and Space Filling Curves

Parallelizing Adaptive Triangular Grids with Refinement Trees and Space Filling Curves Parallelizing Adaptive Triangular Grids with Refinement Trees and Space Filling Curves Daniel Butnaru butnaru@in.tum.de Advisor: Michael Bader bader@in.tum.de JASS 08 Computational Science and Engineering

More information

Multi Agent Navigation on GPU. Avi Bleiweiss

Multi Agent Navigation on GPU. Avi Bleiweiss Multi Agent Navigation on GPU Avi Bleiweiss Reasoning Explicit Implicit Script, storytelling State machine, serial Compute intensive Fits SIMT architecture well Navigation planning Collision avoidance

More information

Some Search Structures. Balanced Search Trees. Binary Search Trees. A Binary Search Tree. Review Binary Search Trees

Some Search Structures. Balanced Search Trees. Binary Search Trees. A Binary Search Tree. Review Binary Search Trees Some Search Structures Balanced Search Trees Lecture 8 CS Fall Sorted Arrays Advantages Search in O(log n) time (binary search) Disadvantages Need to know size in advance Insertion, deletion O(n) need

More information

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture

More information

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 The data of the problem is of 2GB and the hard disk is of 1GB capacity, to solve this problem we should Use better data structures

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 7 C Memory Management 2007-02-06 Hello to Said S. from Columbus, OH CS61C L07 More Memory Management (1) Lecturer SOE Dan Garcia www.cs.berkeley.edu/~ddgarcia

More information

CS61B Lecture #21. A Recursive Structure. Fundamental Operation: Traversal. Preorder Traversal and Prefix Expressions. (- (- (* x (+ y 3))) z)

CS61B Lecture #21. A Recursive Structure. Fundamental Operation: Traversal. Preorder Traversal and Prefix Expressions. (- (- (* x (+ y 3))) z) CSB Lecture # A Recursive Structure Today: Trees Readings for Today: Data Structures, Chapter Readings for Next Topic: Data Structures, Chapter Trees naturally represent recursively defined, hierarchical

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 7 C Memory Management!!Lecturer SOE Dan Garcia!!!www.cs.berkeley.edu/~ddgarcia CS61C L07 More Memory Management (1)! 2010-02-03! Flexible

More information

Trees, Part 1: Unbalanced Trees

Trees, Part 1: Unbalanced Trees Trees, Part 1: Unbalanced Trees The first part of this chapter takes a look at trees in general and unbalanced binary trees. The second part looks at various schemes to balance trees and/or make them more

More information

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs Algorithms in Systems Engineering ISE 172 Lecture 16 Dr. Ted Ralphs ISE 172 Lecture 16 1 References for Today s Lecture Required reading Sections 6.5-6.7 References CLRS Chapter 22 R. Sedgewick, Algorithms

More information

Compiler construction in4303 lecture 8

Compiler construction in4303 lecture 8 Compiler construction in4303 lecture 8 Interpretation Simple code generation Chapter 4 4.2.4 Overview intermediate code program text front-end interpretation code generation code selection register allocation

More information

Understanding Task Scheduling Algorithms. Kenjiro Taura

Understanding Task Scheduling Algorithms. Kenjiro Taura Understanding Task Scheduling Algorithms Kenjiro Taura 1 / 48 Contents 1 Introduction 2 Work stealing scheduler 3 Analyzing execution time of work stealing 4 Analyzing cache misses of work stealing 5 Summary

More information

University of Illinois at Urbana-Champaign Department of Computer Science. Second Examination

University of Illinois at Urbana-Champaign Department of Computer Science. Second Examination University of Illinois at Urbana-Champaign Department of Computer Science Second Examination CS 225 Data Structures and Software Principles Fall 2011 9a-11a, Wednesday, November 2 Name: NetID: Lab Section

More information

» Access elements of a container sequentially without exposing the underlying representation

» Access elements of a container sequentially without exposing the underlying representation Iterator Pattern Behavioural Intent» Access elements of a container sequentially without exposing the underlying representation Iterator-1 Motivation Be able to process all the elements in a container

More information

DDS Dynamic Search Trees

DDS Dynamic Search Trees DDS Dynamic Search Trees 1 Data structures l A data structure models some abstract object. It implements a number of operations on this object, which usually can be classified into l creation and deletion

More information

Topic 14. The BinaryTree ADT

Topic 14. The BinaryTree ADT Topic 14 The BinaryTree ADT Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary tree implementation Examine a binary tree

More information