Hypergraph Exploitation for Data Sciences
|
|
- Dustin Bond
- 5 years ago
- Views:
Transcription
1 Photos placed in horizontal position with even amount of white space between photos and header Hypergraph Exploitation for Data Sciences Photos placed in horizontal position with even amount of white space between photos and header Michael Wolf, Alicia Klinvex, and Daniel Dunlavy Graph Exploitation Symposium May, 216 Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy s National Nuclear Security Administration under contract DE-AC4-94AL85. SAND NO C.
2 Hypergraph Models for Data Science Explore usage of hypergraphs for modeling complex, multiway (non- pairwise) relational data Goal: Improve data analysis more accurate, faster Questions we are trying to address Why should we use hypergraphs? When are hypergraph models preferable over graph models? Are there qualitative differences? Are there computational differences? What hypergraph model should be used? How should we compute the solution to our hypergraph problems? Specific problem Data clustering: Determine groupings of data objects given sets of relationships between/amongst objects 2
3 What is a hypergraph? Graph Bob Hypergraph Bob Amy Carl Amy Carl Ed Dan Edges connect 2 vertices G(V, E) Ed Hyperedges connect 1 or more vertices Dan G(V, H) Generalization of graph Hyperedges represent multiway relationships between vertices Hyperedge set of 1 or more vertices Key feature: hyperedges can connect more than 2 vertices 3
4 What is a hypergraph? Hyperedges: s Bob Vertices: Users Amy 1 1 Bob 1 Carl 1 1 Dan 1 1 Ed 1 1 Relation data / hypergraph incidence matrix Amy Ed Carl Dan Convenient representation of relational data Each can be represented by hyperedge s/hyperedges consist of subsets of users Relational data is often stored as hypergraph incidence matrix E.g., D4M* tables *Kepner, Jeremy, et al. "Dynamic distributed dimensional data model (D4M) database and computation system." Acoustics, Speech and Signal Processing (ICASSP), 212 IEEE International Conference on. IEEE,
5 Hypergraph to Graph: Clique Expansion Vertices Hyperedges A 1 1 B 1 C 1 1 D 1 1 E 1 1 B C A A Graph Edges A B 1 Vertices C D E B C E g = X e h 2E h d(eh ) 2 E D E D hyperedge cardinality Graph obtained through clique expansion of hypergraph 5
6 Why hypergraphs? Hypergraph Graph Bob Bob Amy 1 1 Bob 1 Carl 1 1 Amy Carl Amy Carl Dan 1 1 Ed 1 1 Ed Dan Ed Dan Multiway relationships can be represented nonambiguously Were Carl, Dan, and Ed involved in same ? Typically graph models lose information Fix: multi- graphs + metadata, changes to algorithms 6
7 Why hypergraphs? Vertices Hypergraph incidence matrix Graph Incidence matrix E g = X e h 2E h Hypergraphs require significantly less storage space than graphs generated using clique expansion Hypergraph incidence matrices require fewer operations for matrix- vector multiplication Hypergraphs have computational advantages d(eh ) 2 7
8 Outline Introduction to Hypergraphs Spectral clustering Software implementation Spectral clustering results Conclusions 8
9 Motivating Problem: Clustering of Relational Data Determine groupings of data objects given sets of relationships amongst those objects Relationships may be represented as graph or hypergraph Focus: spectral clustering Compute the smallest eigenpairs of the graph or hypergraph Laplacian Normalized graph Laplacian: Hypergraph Laplacian (Zhou, et al., 26) Never explicitly form Laplacians L G = I D 1/2 vg (H GHG T D vg )D 1/2 vg L h = I D 1/2 vh H HD 1 eh HT HD 1/2 vh Laplacian operator defined by series of SpMV ops and vector addition Reason 1: Matrix- matrix products are expensive Reason 2: Dynamic graphs easier to change incidence matrix than Laplacian SpMV = Sparse matrix-dense vector multiplication 9
10 Spectral Clustering (hyper)graph incidence matrix Form (hyper)graph Laplacian: L Find eigenvectors of L: V k-means(v) to find clusters vertex cluster assignments Basic spectral clustering procedure of Ng, et al., 22 Compute the smallest eigenpairs of the normalized graph or hypergraph Laplacian L G = I D 1/2 vg (H GHG T D vg )D 1/2 vg L h = I D 1/2 vh H HD 1 eh HT HD 1/2 vh Perform k- means clustering on those eigenvectors (k- means++) Partition data points into clusters such that each data points belongs to cluster with nearest mean
11 Outline Introduction to Hypergraphs Spectral clustering Software implementation Spectral clustering results Conclusions 11
12 Trilinos for Data Analysis C++ Trilinos based software for data analytics problems High performance (target: billions of vertices) Early focus on spectral methods Hypergraphs and graphs Supports several eigensolvers available in Anasazi LOBPCG, TraceMin- Davidson, Riemannian Trust Region, Block Krylov- Schur Very fast convergence for Laplacians (especially hypergraphs) Trilinos/Anasazi supports modern HPC architectures MPI+X Where X = {multicore, GPUs, Xeon Phi } 12
13 Hypergraph Generator Randomly generated hypergraphs Stochastic block- like Can generate multiple random instances for each parameter set Parameterized Generator Clusters Vertices per cluster Vertices # of intra/inter- cluster hyperedges Intra/Inter- cluster hyperedge cardinalities Edges Incidence Matrix: Generate real-world inspired hypergraphs at different scales Generates ground truth 13
14 Outline Introduction to Hypergraphs Spectral clustering Software implementation Spectral clustering results Clustering Quality Runtimes Real Data Conclusions 14
15 Experimental Setup Experiments conducted on workstation with 128 GB of memory using 16 cores Runtime parameters 5 k- means trials per matrix Eigensolver: LOBPCG (available in Trilinos), Tolerance: 1e- 3 Number of computed eigenpairs: same as number of clusters Quality of our clustering measured using the Jaccard index T = true cluster assignments ground truth from generator P = predicted cluster assignments J(T,P)=1 means matches ground truth exactly 15
16 Data Sets P1 P2 P3 Hyperedge Cardinality Number of clusters 5 5 Vertices per cluster*,,,, Intra-cluster hyperedges* 4, 2, 2, 2, Inter-cluster hyperedges* 5, 2, 2, 2, Intra-cluster hyperedge cardinality* Inter-cluster hyperedge cardinality* Generate hypergraph incidence matrices 4 different sets of parameters Different levels of difficulties randomly generated hypergraphs for each parameter set * Designates mean value 16
17 Outline Introduction to Hypergraphs Spectral clustering Software implementation Spectral clustering results Clustering Quality Runtimes Real Data Conclusions 17
18 Clustering Quality: P1, P2, P3 1 Jaccard&Index graph hypergraph.5 P1 P2 P3 Hypergraph based clusters more similar to ground truth clusters than graph based clusters 18
19 Clustering Quality: Hyperedge Cardinality Jaccard index graph JI Jaccard index Hyperedge cardinality Hyperedge cardinality General trend: hypergraph based clusters more similar to ground truth clusters than graph based clusters 19
20 Outline Introduction to Hypergraphs Spectral clustering Software implementation Spectral clustering results Clustering Quality Runtimes Real Data Conclusions 2
21 Runtimes: P1, P2, P3 Runtime((s) P1 P2 P3 graph hypergraph Hypergraph models significantly more computationally efficient than graph models (up to 3x faster) 21
22 Runtimes: Hyperedge Cardinality Unexpected Trend 3 Runtime(Ratio within cluster Hyperedge cardinality 8 5 Runtime ratio = graph runtime / hypergraph runtime Hypergraph models up to 3x faster than graph models 22
23 Eigensolver Iterations: Hyperedge Cardinality Graph Model 5 Hypergraph Model graph LOBPCG iterations LOBPCG'Iterations cardinality within cluster Unexpected Trend Hyperedge cardinality LOBPCG'Iterations within cluster Hyperedge cardinality Hypergraph required fewer LOBPCG iterations than graph Better separation of eigenvalues in hypergraph Laplacian Hypergraph models converged faster (up to 6x) than graph models 23
24 Operator Apply Time: Hyperedge Cardinality 2 Operator(Apply Ratio within cluster Hyperedge cardinality Laplacian operator apply more efficient for hypergraph model than graph model (up to 17x faster) 24
25 Outline Introduction to Hypergraphs Spectral clustering Software implementation Spectral clustering results Runtimes Clustering Quality Real Data Conclusions 25
26 Real Data Hypergraph, 7 Clusters Ground Truth Hypergraph Clustering Hgraph EV mammals birds reptiles fish amphibians insects other invertebrates Hgraph EV mammals birds sea mammals fish reptiles/amphibians insects other invertebrates seal porpoise dolphin seasnake pitviper slowworm tuatara platypus tortoise sealion UCI ML Repository Zoo Data Set* Animals with 16 attributes (# legs, eggs, ) + category Hypergraph Clustering vs. ground truth for 7 clusters Jaccard index:.815 (Graph model.743) Merged reptiles/amphibians, new category: sea mammals * EV=eigenvector 26
27 Real Data Graph, 7 Clusters Ground Truth Graph Clustering Graph EV mammals birds reptiles fish amphibians insects other invertebrates Graph EV Graph Clustering vs. ground truth for 7 clusters Jaccard index:.743 Data harder for kmeans to separate Insects, birds, mammals, other invertebrates,? EV=eigenvector 27
28 Real Data Hypergraph, 3 Clusters Ground Truth Hypergraph Clustering Hgraph EV mam/rep/fis/amp birds invertebrates Hgraph EV mam/rep/fis/amp birds invertebrates UCI ML Repository Zoo Data Set* Merged into 3 clusters: mammals/reptiles/fish/amphibians, birds, invertebrates Hypergraph Clustering vs. ground truth for 3 clusters Jaccard index: 1. (Graph model.865) * EV=eigenvector 28
29 Outline Introduction to Hypergraphs Spectral clustering Software implementation Spectral clustering results Conclusions 29
30 Summary/Conclusions Explored hypergraphs for modeling relational data Focus: Spectral Clustering Hypergraphs have advantages over graphs Less ambiguous representation of relationships Computationally advantageous Software for Linear Algebra Based Data Analytics Built on Trilinos framework (support for modern HPC architectures) High performance (target: billions of vertices) Support for both hypergraph and graph models Compelling computational results Larger Jaccard indices for hypergraph over graph for several problems Graph models challenging to weight properly Dramatic reduction in runtime for hypergraph vs. graph HPC = High performance computing 3
31 Future Work Target larger problems Currently solving problems of O(k) vertices Eventual target: O(1 billion) vertices Performance improvements to software 2D partitioning important for scalability beyond O(k) processors Improved efficiency of parallel hypergraph generator Hypergraph vs. Graph for larger real data sets Extend beyond spectral clustering Theoretical hypergraph models 31
32 Acknowledgements/References Thanks Jon Berry, Rich Lehoucq Spectral Clustering Ng, Andrew Y., Michael I. Jordan, and Yair Weiss. "On spectral clustering: Analysis and an algorithm." Advances in neural information processing systems 2 (22): Hypergraph Laplacian Zhou, Dengyong, Jiayuan Huang, and Bernhard Schölkopf. "Learning with hypergraphs: Clustering, classification, and embedding." Advances in neural information processing systems
Exploiting Scientific Software to Solve Problems in Data Analytics
Photos placed in horizontal position with even amount of white space between photos and header Photos placed in horizontal position with even amount of white space between photos and header Exploiting
More informationPULP: Fast and Simple Complex Network Partitioning
PULP: Fast and Simple Complex Network Partitioning George Slota #,* Kamesh Madduri # Siva Rajamanickam * # The Pennsylvania State University *Sandia National Laboratories Dagstuhl Seminar 14461 November
More informationBeyond Pairwise Classification and Clustering Using Hypergraphs
Max Planck Institut für biologische Kybernetik Max Planck Institute for Biological Cybernetics Technical Report No. TR-143 Beyond Pairwise Classification and Clustering Using Hypergraphs Dengyong Zhou
More informationSparse Matrix Partitioning for Parallel Eigenanalysis of Large Static and Dynamic Graphs
Sparse Matrix Partitioning for Parallel Eigenanalysis of Large Static and Dynamic Graphs Michael M. Wolf and Benjamin A. Miller Lincoln Laboratory Massachusetts Institute of Technology Lexington, MA 02420
More informationSpectral Clustering X I AO ZE N G + E L HA M TA BA S SI CS E CL A S S P R ESENTATION MA RCH 1 6,
Spectral Clustering XIAO ZENG + ELHAM TABASSI CSE 902 CLASS PRESENTATION MARCH 16, 2017 1 Presentation based on 1. Von Luxburg, Ulrike. "A tutorial on spectral clustering." Statistics and computing 17.4
More informationPuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks
PuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks George M. Slota 1,2 Kamesh Madduri 2 Sivasankaran Rajamanickam 1 1 Sandia National Laboratories, 2 The Pennsylvania
More informationVisual Representations for Machine Learning
Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering
More informationWhat to come. There will be a few more topics we will cover on supervised learning
Summary so far Supervised learning learn to predict Continuous target regression; Categorical target classification Linear Regression Classification Discriminative models Perceptron (linear) Logistic regression
More informationSPECTRAL SPARSIFICATION IN SPECTRAL CLUSTERING
SPECTRAL SPARSIFICATION IN SPECTRAL CLUSTERING Alireza Chakeri, Hamidreza Farhidzadeh, Lawrence O. Hall Department of Computer Science and Engineering College of Engineering University of South Florida
More informationFeature Selection for fmri Classification
Feature Selection for fmri Classification Chuang Wu Program of Computational Biology Carnegie Mellon University Pittsburgh, PA 15213 chuangw@andrew.cmu.edu Abstract The functional Magnetic Resonance Imaging
More informationIlya Lashuk, Merico Argentati, Evgenii Ovtchinnikov, Andrew Knyazev (speaker)
Ilya Lashuk, Merico Argentati, Evgenii Ovtchinnikov, Andrew Knyazev (speaker) Department of Mathematics and Center for Computational Mathematics University of Colorado at Denver SIAM Conference on Parallel
More informationSimple Parallel Biconnectivity Algorithms for Multicore Platforms
Simple Parallel Biconnectivity Algorithms for Multicore Platforms George M. Slota Kamesh Madduri The Pennsylvania State University HiPC 2014 December 17-20, 2014 Code, presentation available at graphanalysis.info
More informationPortability and Scalability of Sparse Tensor Decompositions on CPU/MIC/GPU Architectures
Photos placed in horizontal position with even amount of white space between photos and header Portability and Scalability of Sparse Tensor Decompositions on CPU/MIC/GPU Architectures Christopher Forster,
More informationBig Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic
More informationThe clustering in general is the task of grouping a set of objects in such a way that objects
Spectral Clustering: A Graph Partitioning Point of View Yangzihao Wang Computer Science Department, University of California, Davis yzhwang@ucdavis.edu Abstract This course project provide the basic theory
More informationSocial Network Analysis
Social Network Analysis Mathematics of Networks Manar Mohaisen Department of EEC Engineering Adjacency matrix Network types Edge list Adjacency list Graph representation 2 Adjacency matrix Adjacency matrix
More informationLoopy Belief Propagation
Loopy Belief Propagation Research Exam Kristin Branson September 29, 2003 Loopy Belief Propagation p.1/73 Problem Formalization Reasoning about any real-world problem requires assumptions about the structure
More informationOptimizing Parallel Sparse Matrix-Vector Multiplication by Corner Partitioning
Optimizing Parallel Sparse Matrix-Vector Multiplication by Corner Partitioning Michael M. Wolf 1,2, Erik G. Boman 2, and Bruce A. Hendrickson 3 1 Dept. of Computer Science, University of Illinois at Urbana-Champaign,
More informationCluster-based 3D Reconstruction of Aerial Video
Cluster-based 3D Reconstruction of Aerial Video Scott Sawyer (scott.sawyer@ll.mit.edu) MIT Lincoln Laboratory HPEC 12 12 September 2012 This work is sponsored by the Assistant Secretary of Defense for
More informationExtreme-scale Graph Analysis on Blue Waters
Extreme-scale Graph Analysis on Blue Waters 2016 Blue Waters Symposium George M. Slota 1,2, Siva Rajamanickam 1, Kamesh Madduri 2, Karen Devine 1 1 Sandia National Laboratories a 2 The Pennsylvania State
More informationSpectral Methods for Network Community Detection and Graph Partitioning
Spectral Methods for Network Community Detection and Graph Partitioning M. E. J. Newman Department of Physics, University of Michigan Presenters: Yunqi Guo Xueyin Yu Yuanqi Li 1 Outline: Community Detection
More informationHarp-DAAL for High Performance Big Data Computing
Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big
More informationClustering in Networks
Clustering in Networks (Spectral Clustering with the Graph Laplacian... a brief introduction) Tom Carter Computer Science CSU Stanislaus http://csustan.csustan.edu/ tom/clustering April 1, 2012 1 Our general
More informationLesson 2 7 Graph Partitioning
Lesson 2 7 Graph Partitioning The Graph Partitioning Problem Look at the problem from a different angle: Let s multiply a sparse matrix A by a vector X. Recall the duality between matrices and graphs:
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationClustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic
Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationPreconditioning Linear Systems Arising from Graph Laplacians of Complex Networks
Preconditioning Linear Systems Arising from Graph Laplacians of Complex Networks Kevin Deweese 1 Erik Boman 2 1 Department of Computer Science University of California, Santa Barbara 2 Scalable Algorithms
More informationAndrew V. Knyazev and Merico E. Argentati (speaker)
1 Andrew V. Knyazev and Merico E. Argentati (speaker) Department of Mathematics and Center for Computational Mathematics University of Colorado at Denver 2 Acknowledgement Supported by Lawrence Livermore
More informationAPPROXIMATE SPECTRAL LEARNING USING NYSTROM METHOD. Aleksandar Trokicić
FACTA UNIVERSITATIS (NIŠ) Ser. Math. Inform. Vol. 31, No 2 (2016), 569 578 APPROXIMATE SPECTRAL LEARNING USING NYSTROM METHOD Aleksandar Trokicić Abstract. Constrained clustering algorithms as an input
More informationApplication of Spectral Clustering Algorithm
1/27 Application of Spectral Clustering Algorithm Danielle Middlebrooks dmiddle1@math.umd.edu Advisor: Kasso Okoudjou kasso@umd.edu Department of Mathematics University of Maryland- College Park Advance
More informationCommunity Detection. Community
Community Detection Community In social sciences: Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group a.k.a. group,
More informationGiovanni De Micheli. Integrated Systems Centre EPF Lausanne
Two-level Logic Synthesis and Optimization Giovanni De Micheli Integrated Systems Centre EPF Lausanne This presentation can be used for non-commercial purposes as long as this note and the copyright footers
More informationMeasurements on (Complete) Graphs: The Power of Wedge and Diamond Sampling
Measurements on (Complete) Graphs: The Power of Wedge and Diamond Sampling Tamara G. Kolda plus Grey Ballard, Todd Plantenga, Ali Pinar, C. Seshadhri Workshop on Incomplete Network Data Sandia National
More informationExploiting Parallelism to Support Scalable Hierarchical Clustering
Exploiting Parallelism to Support Scalable Hierarchical Clustering Rebecca Cathey, Eric Jensen, Steven Beitzel, Ophir Frieder, David Grossman Information Retrieval Laboratory http://ir.iit.edu Background
More informationCS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering
W10.B.0.0 CS435 Introduction to Big Data W10.B.1 FAQs Term project 5:00PM March 29, 2018 PA2 Recitation: Friday PART 1. LARGE SCALE DATA AALYTICS 4. RECOMMEDATIO SYSTEMS 5. EVALUATIO AD VALIDATIO TECHIQUES
More informationSpectral Clustering on Handwritten Digits Database
October 6, 2015 Spectral Clustering on Handwritten Digits Database Danielle dmiddle1@math.umd.edu Advisor: Kasso Okoudjou kasso@umd.edu Department of Mathematics University of Maryland- College Park Advance
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationImprovements in Dynamic Partitioning. Aman Arora Snehal Chitnavis
Improvements in Dynamic Partitioning Aman Arora Snehal Chitnavis Introduction Partitioning - Decomposition & Assignment Break up computation into maximum number of small concurrent computations that can
More informationExtreme-scale Graph Analysis on Blue Waters
Extreme-scale Graph Analysis on Blue Waters 2016 Blue Waters Symposium George M. Slota 1,2, Siva Rajamanickam 1, Kamesh Madduri 2, Karen Devine 1 1 Sandia National Laboratories a 2 The Pennsylvania State
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationBig Data Analytics. Special Topics for Computer Science CSE CSE Feb 11
Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 11 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Clustering II Spectral
More informationClustering. Partition unlabeled examples into disjoint subsets of clusters, such that:
Text Clustering 1 Clustering Partition unlabeled examples into disjoint subsets of clusters, such that: Examples within a cluster are very similar Examples in different clusters are very different Discover
More informationParallel Greedy Matching Algorithms
Parallel Greedy Matching Algorithms Fredrik Manne Department of Informatics University of Bergen, Norway Rob Bisseling, University of Utrecht Md. Mostofa Patwary, University of Bergen 1 Outline Background
More informationCS 231A CA Session: Problem Set 4 Review. Kevin Chen May 13, 2016
CS 231A CA Session: Problem Set 4 Review Kevin Chen May 13, 2016 PS4 Outline Problem 1: Viewpoint estimation Problem 2: Segmentation Meanshift segmentation Normalized cut Problem 1: Viewpoint Estimation
More informationTELCOM2125: Network Science and Analysis
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning
More informationMachine Learning. Unsupervised Learning. Manfred Huber
Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training
More informationBased on Raymond J. Mooney s slides
Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit
More informationMassively Parallel Graph Analytics
Massively Parallel Graph Analytics Manycore graph processing, distributed graph layout, and supercomputing for graph analytics George M. Slota 1,2,3 Kamesh Madduri 2 Sivasankaran Rajamanickam 1 1 Sandia
More informationHierarchical Clustering
Hierarchical Clustering Build a tree-based hierarchical taxonomy (dendrogram) from a set animal of documents. vertebrate invertebrate fish reptile amphib. mammal worm insect crustacean One approach: recursive
More informationPregel. Ali Shah
Pregel Ali Shah s9alshah@stud.uni-saarland.de 2 Outline Introduction Model of Computation Fundamentals of Pregel Program Implementation Applications Experiments Issues with Pregel 3 Outline Costs of Computation
More informationA Spectral-based Clustering Algorithm for Categorical Data Using Data Summaries (SCCADDS)
A Spectral-based Clustering Algorithm for Categorical Data Using Data Summaries (SCCADDS) Eman Abdu eha90@aol.com Graduate Center The City University of New York Douglas Salane dsalane@jjay.cuny.edu Center
More informationSocial-Network Graphs
Social-Network Graphs Mining Social Networks Facebook, Google+, Twitter Email Networks, Collaboration Networks Identify communities Similar to clustering Communities usually overlap Identify similarities
More informationStatistical Physics of Community Detection
Statistical Physics of Community Detection Keegan Go (keegango), Kenji Hata (khata) December 8, 2015 1 Introduction Community detection is a key problem in network science. Identifying communities, defined
More informationGraph Partitioning for Scalable Distributed Graph Computations
Graph Partitioning for Scalable Distributed Graph Computations Aydın Buluç ABuluc@lbl.gov Kamesh Madduri madduri@cse.psu.edu 10 th DIMACS Implementation Challenge, Graph Partitioning and Graph Clustering
More informationCafeGPI. Single-Sided Communication for Scalable Deep Learning
CafeGPI Single-Sided Communication for Scalable Deep Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Deep Neural Networks
More informationClustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search
Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2
More informationA Platform for Provisioning Integrated Data and Visualization Capabilities Presented to SATURN in May 2016 Gerry Giese, Sandia National Laboratories
Photos placed in horizontal position with even amount of white space between photos and header A Platform for Provisioning Integrated Data and Visualization Capabilities Presented to SATURN in May 2016
More informationk-way Hypergraph Partitioning via n-level Recursive Bisection
k-way Hypergraph Partitioning via n-level Recursive Bisection Sebastian Schlag, Vitali Henne, Tobias Heuer, Henning Meyerhenke Peter Sanders, Christian Schulz January 10th, 2016 @ ALENEX 16 INSTITUTE OF
More informationSpectral Clustering and Community Detection in Labeled Graphs
Spectral Clustering and Community Detection in Labeled Graphs Brandon Fain, Stavros Sintos, Nisarg Raval Machine Learning (CompSci 571D / STA 561D) December 7, 2015 {btfain, nisarg, ssintos} at cs.duke.edu
More informationPLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters
PLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters IEEE CLUSTER 2015 Chicago, IL, USA Luis Sant Ana 1, Daniel Cordeiro 2, Raphael Camargo 1 1 Federal University of ABC,
More informationCluster Analysis (b) Lijun Zhang
Cluster Analysis (b) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Grid-Based and Density-Based Algorithms Graph-Based Algorithms Non-negative Matrix Factorization Cluster Validation Summary
More informationDiffusion and Clustering on Large Graphs
Diffusion and Clustering on Large Graphs Alexander Tsiatas Final Defense 17 May 2012 Introduction Graphs are omnipresent in the real world both natural and man-made Examples of large graphs: The World
More informationUsing a Robust Metadata Management System to Accelerate Scientific Discovery at Extreme Scales
Using a Robust Metadata Management System to Accelerate Scientific Discovery at Extreme Scales Margaret Lawson, Jay Lofstead Sandia National Laboratories is a multimission laboratory managed and operated
More informationMultilevel Algorithms for Multi-Constraint Hypergraph Partitioning
Multilevel Algorithms for Multi-Constraint Hypergraph Partitioning George Karypis University of Minnesota, Department of Computer Science / Army HPC Research Center Minneapolis, MN 55455 Technical Report
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationDistributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability
Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability Janis Keuper Itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern,
More informationA Unified Framework to Integrate Supervision and Metric Learning into Clustering
A Unified Framework to Integrate Supervision and Metric Learning into Clustering Xin Li and Dan Roth Department of Computer Science University of Illinois, Urbana, IL 61801 (xli1,danr)@uiuc.edu December
More informationOn Covering a Graph Optimally with Induced Subgraphs
On Covering a Graph Optimally with Induced Subgraphs Shripad Thite April 1, 006 Abstract We consider the problem of covering a graph with a given number of induced subgraphs so that the maximum number
More informationThe Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem
Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran
More informationA Classifica*on of Scien*fic Visualiza*on Algorithms for Massive Threading Kenneth Moreland Berk Geveci Kwan- Liu Ma Robert Maynard
A Classifica*on of Scien*fic Visualiza*on Algorithms for Massive Threading Kenneth Moreland Berk Geveci Kwan- Liu Ma Robert Maynard Sandia Na*onal Laboratories Kitware, Inc. University of California at Davis
More informationClustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2
So far in the course Clustering Subhransu Maji : Machine Learning 2 April 2015 7 April 2015 Supervised learning: learning with a teacher You had training data which was (feature, label) pairs and the goal
More informationScalable Community Detection Benchmark Generation
Scalable Community Detection Benchmark Generation Jonathan Berry 1 Cynthia Phillips 1 Siva Rajamanickam 1 George M. Slota 2 1 Sandia National Labs, 2 Rensselaer Polytechnic Institute jberry@sandia.gov,
More informationWar Stories : Graph Algorithms in GPUs
SAND2014-18323PE War Stories : Graph Algorithms in GPUs Siva Rajamanickam(SNL) George Slota, Kamesh Madduri (PSU) FASTMath Meeting Exceptional service in the national interest is a multi-program laboratory
More informationDecentralized and Distributed Machine Learning Model Training with Actors
Decentralized and Distributed Machine Learning Model Training with Actors Travis Addair Stanford University taddair@stanford.edu Abstract Training a machine learning model with terabytes to petabytes of
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More informationFinding and Visualizing Graph Clusters Using PageRank Optimization. Fan Chung and Alexander Tsiatas, UCSD WAW 2010
Finding and Visualizing Graph Clusters Using PageRank Optimization Fan Chung and Alexander Tsiatas, UCSD WAW 2010 What is graph clustering? The division of a graph into several partitions. Clusters should
More informationIntegrating Analysis and Computation with Trios Services
October 31, 2012 Integrating Analysis and Computation with Trios Services Approved for Public Release: SAND2012-9323P Ron A. Oldfield Scalable System Software Sandia National Laboratories Albuquerque,
More informationKeywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.
Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey of Clustering
More informationAccelerated Machine Learning Algorithms in Python
Accelerated Machine Learning Algorithms in Python Patrick Reilly, Leiming Yu, David Kaeli reilly.pa@husky.neu.edu Northeastern University Computer Architecture Research Lab Outline Motivation and Goals
More informationGraph Classification for the Red Team Event of the Los Alamos Authentication Graph. John M. Conroy IDA Center for Computing Sciences
Graph Classification for the Red Team Event of the Los Alamos Authentication Graph John M. Conroy IDA Center for Computing Sciences Thanks! Niall Adams, Imperial College, London! Nick Heard, Imperial College,
More informationLarge-Scale Face Manifold Learning
Large-Scale Face Manifold Learning Sanjiv Kumar Google Research New York, NY * Joint work with A. Talwalkar, H. Rowley and M. Mohri 1 Face Manifold Learning 50 x 50 pixel faces R 2500 50 x 50 pixel random
More informationSmall Libraries of Protein Fragments through Clustering
Small Libraries of Protein Fragments through Clustering Varun Ganapathi Department of Computer Science Stanford University June 8, 2005 Abstract When trying to extract information from the information
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationDiffusion and Clustering on Large Graphs
Diffusion and Clustering on Large Graphs Alexander Tsiatas Thesis Proposal / Advancement Exam 8 December 2011 Introduction Graphs are omnipresent in the real world both natural and man-made Examples of
More informationExploiting Locality in Sparse Matrix-Matrix Multiplication on the Many Integrated Core Architecture
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Exploiting Locality in Sparse Matrix-Matrix Multiplication on the Many Integrated Core Architecture K. Akbudak a, C.Aykanat
More informationPuLP. Complex Objective Partitioning of Small-World Networks Using Label Propagation. George M. Slota 1,2 Kamesh Madduri 2 Sivasankaran Rajamanickam 1
PuLP Complex Objective Partitioning of Small-World Networks Using Label Propagation George M. Slota 1,2 Kamesh Madduri 2 Sivasankaran Rajamanickam 1 1 Sandia National Laboratories, 2 The Pennsylvania State
More informationSpectral Graph Sparsification: overview of theory and practical methods. Yiannis Koutis. University of Puerto Rico - Rio Piedras
Spectral Graph Sparsification: overview of theory and practical methods Yiannis Koutis University of Puerto Rico - Rio Piedras Graph Sparsification or Sketching Compute a smaller graph that preserves some
More informationClustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015
Clustering Subhransu Maji CMPSCI 689: Machine Learning 2 April 2015 7 April 2015 So far in the course Supervised learning: learning with a teacher You had training data which was (feature, label) pairs
More informationGeneralized trace ratio optimization and applications
Generalized trace ratio optimization and applications Mohammed Bellalij, Saïd Hanafi, Rita Macedo and Raca Todosijevic University of Valenciennes, France PGMO Days, 2-4 October 2013 ENSTA ParisTech PGMO
More informationIterative Non-linear Dimensionality Reduction by Manifold Sculpting
Iterative Non-linear Dimensionality Reduction by Manifold Sculpting Mike Gashler, Dan Ventura, and Tony Martinez Brigham Young University Provo, UT 84604 Abstract Many algorithms have been recently developed
More informationMulti-Objective Hypergraph Partitioning Algorithms for Cut and Maximum Subdomain Degree Minimization
IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN, VOL XX, NO. XX, 2005 1 Multi-Objective Hypergraph Partitioning Algorithms for Cut and Maximum Subdomain Degree Minimization Navaratnasothie Selvakkumaran and
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More informationKartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18
Accelerating PageRank using Partition-Centric Processing Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Outline Introduction Partition-centric Processing Methodology Analytical Evaluation
More informationSimilarity Ranking in Large- Scale Bipartite Graphs
Similarity Ranking in Large- Scale Bipartite Graphs Alessandro Epasto Brown University - 20 th March 2014 1 Joint work with J. Feldman, S. Lattanzi, S. Leonardi, V. Mirrokni [WWW, 2014] 2 AdWords Ads Ads
More informationAnomaly Detection in Very Large Graphs Modeling and Computational Considerations
Anomaly Detection in Very Large Graphs Modeling and Computational Considerations Benjamin A. Miller, Nicholas Arcolano, Edward M. Rutledge and Matthew C. Schmidt MIT Lincoln Laboratory Nadya T. Bliss ASURE
More informationGraph based machine learning with applications to media analytics
Graph based machine learning with applications to media analytics Lei Ding, PhD 9-1-2011 with collaborators at Outline Graph based machine learning Basic structures Algorithms Examples Applications in
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationLecture 15 Clustering. Oct
Lecture 15 Clustering Oct 31 2008 Unsupervised learning and pattern discovery So far, our data has been in this form: x 11,x 21, x 31,, x 1 m y1 x 12 22 2 2 2,x, x 3,, x m y We will be looking at unlabeled
More informationCOMP260 Spring 2014 Notes: February 4th
COMP260 Spring 2014 Notes: February 4th Andrew Winslow In these notes, all graphs are undirected. We consider matching, covering, and packing in bipartite graphs, general graphs, and hypergraphs. We also
More information