Tree-Based Density Clustering using Graphics Processors

Size: px

Start display at page:

Download "Tree-Based Density Clustering using Graphics Processors"

Anis Thornton
5 years ago
Views:

1 Tree-Based Density Clustering using Graphics Processors A First Marriage of MRNet and GPUs Evan Samanas and Ben Welton Paradyn Project Paradyn / Dyninst Week College Park, Maryland March 26-28, 2012

2 The Tweet Stream v ` v ` Source: Twitter, Map: About.com 2

3 Tree-Based Overlay Networks (TBONs) o Scalable multicast FE CP CP o Scalable gather o Scalable data aggregation CP CP CP CP 3

4 MRNet Multicast / Reduction Network o General-purpose TBON API o Network: user-defined topology o Stream: logical data channel o to a set of back-ends o multicast, gather, and custom reduction o Packet: collection of data o Filter: stream data operator o synchronization o transformation o Widely adopted by HPC tools o o o o o CEPBA toolkit Cray ATP & CCDB Open SpeedShop & CBTF STAT TAU F(x 1,,x n ) FE CP CP CP CP CP CP 4

5 TBON Computation Ideal Characteristics: o Filter output size constant or decreasing o Computation rate similar across levels o Adjustable for load balance Total Time: ~30 sec Total Time: ~60 sec ~10 sec CP ~10 sec FE Packet Size: 10 MB Packet Size: 10 MB Data to process: e.g. 40 MB CP 4x ~10 sec ~40 sec Data Size: 10MB per ~10 sec 5

6 Why GPUs? o Natural fit FE o Increase compute power o Trade computation for bandwidth o Derived summaries o Compute and send data o Compression algorithms (e.g. LZO, zlib, etc.) F(x 1,,x n ) CP CP CP CP CP CP 6

(Eps), cluster and is density and the spatial point distance a core point.

7 Clustering Example (DBSCAN [1] ) The For two every parameters discovered that point, determine this same if a If Goal: the number Find regions of points that in meet Eps is minimum > MinPts, calculation point is in is a performed cluster is Epsilon until the (Eps), cluster and is density and the spatial point distance a core point. characteristics fully MinPts expanded Eps Min Points Min Pts: 3 [1] M. Ester et. al., A density-based algorithm for discovering clusters in large spatial databases with noise, (1996) 7

8 Scaling DBSCAN o PDBSCAN [2] o Quality equivalent to single DBSCAN o Linear speedup up to 8 nodes o DBDC [3] o Sacrifices quality o ~30x speedup on 15 nodes o CUDA-Dclust [4] o Quality equivalent to DBSCAN o ~15x faster on 1 node [2] X. Xu et. al., A fast Parallel Clustering Algorithm for Large Spatial Databases (1999) [3] E. Januzaj et. al., DBDC: Density Based Distributed Clustering (2004) [4] C. Bohm et al., Density-based clustering using graphics processors (2009) 8

9 Tree-Based Clustering: Mr. Scan Algorithm Steps FE SpatialDecomp: FE) MergeCluster DBSCAN: CPU or ) DBSCAN CP CP DrawBoundBox: CPU or GPU MergeCluster: CPU (x #levels) 9

10 Spatial Decomposition Eps 1. Start with an input of Spatially Referenced points 2. Partition the region into equal sized density regions across one dimension 3. Add the shadow region area of one Epsilon to all density regions Partition #1 Partition #2 Partition #3 10

11 DBSCAN - CPU o Run on local slice for each o R* tree o Start at random point o Cluster w/ respect to Eps, MinPts o Complexity: O(n log n) A C D E B R* Tree Example 11

12 GPU DBSCAN Filter Candidate Clusters are potential clusters being explored concurrently in the GPU Candidate #1 State array stores the current state of points in the search space States Candidate #2 Coarse grain search space Chain Collisions causing cluster merges.. Candidate #N Number of cluster candidates is limited by GPU Characteristics GPU DBSCAN operates similarly to the CPU version with two exceptions CUDA-DCLUST [09 Böhm] 12

13 DrawBoundBox CPU GPU 13

14 MergeBoundBox - CPU o Checks for merge if box within shadow o At least one point MUST be in common o Iterate through ALL points in right cluster Match! 14

15 The Tweet Stream ` ` Source: Twitter, Map: About.com 15

16 Evaluation o Dataset: 1-3 Tweet Days o Measuring: o Time to completion o Quality compared to single-threaded DBSCAN o Algorithms: o Single-Threaded DBSCAN o DBDC o MRNet w/dbscan filter o MRNet w/dbscan GPU filter 16

17 Time (Hours) Results ELKI Single Thread CPU - MR. Scan GPU - MR. Scan 5 0 One Day Three Day 17

18 Running Time (sec) Results Decomp GPU Run CPU Run x4 1x16 1x2x16 Topology 18

19 Results 19

20 Quality Quality Quality of 1 tweet day 100% 95% 90% 85% 80% 75% Mr. Scan w/cpu; 0 internal nodes Mr. Scan w/cpu; 2 internal nodes Mr. Scan w/gpu; 0 internal nodes Mr. Scan w/gpu; 2 internal nodes DBDC 70% # Backend Nodes 20

21 Future Work o Scaling Issues o Spatial Decomposition o Merging Algorithm 21

22 2D Spatial Decomposition Eps 7pts 11pts - 1D Spatial Decomposition has some severe limitations - Splits can have wildly differing point counts - Number of splits limited by Epsilon - 2D Spatial Decomposition would allow for more Splits with more equal point counts 22

opossible Alternative: Concave Hull o DBSCAN on border

23 Merging Algorithm o Bounding Box o No quality degradation o Limits iteration, but still iterates opossible Alternative: Concave Hull o DBSCAN on border points o Avoids iteration o Constant output size o O(n 2 ) on s 23

24 Use Cases o Twitter Data o Flu Tweets o Mood/Topic clustering o Riot prediction o Any Spatial data o Currently limited to 2D 24

25 Conclusion o DBSCAN performance scales o Quality able to be maintained o Goal: Scale O(100,000 nodes) 25

Data Reduction and Partitioning in an Extreme Scale GPU-Based Clustering Algorithm

Data Reduction and Partitioning in an Extreme Scale GPU-Based Clustering Algorithm Benjamin Welton and Barton Miller Paradyn Project University of Wisconsin - Madison DRBSD-2 Workshop November 17 th 2017