Chapters 11 and 13, Graph Data Mining
|
|
- Charleen Pierce
- 6 years ago
- Views:
Transcription
1 CSI 4352, Introduction to Data Mining Chapters 11 and 13, Graph Data Mining Young-Rae Cho Associate Professor Department of Computer Science Balor Universit Graph Representation Graph An ordered pair GV,E with a set of vertices V and a set of edges E Etended Graph Representation Directed vs. undirected graph Whether each edge has a direction Weighted vs. unweighted graph Whether each edge has a weight Labeled vs. unlabeled graph Whether each verte has a label 2-D vs. 3-D graph representation Each verte has angles between two linked edges 1
2 Wh Graph Mining is Important? Data are often represented as a graph Biological networks Chemical compounds Internet WWW Electric circuits Workflows Social networks Graph is a general model for data mining!! Graph Data Mining Topics 1 Single Graph Mining Frequent sub-graph pattern mining Finding sub-graphs that frequentl occur in a graph Graph clustering Verte clustering Partitioning a graph into sub-graphs Verte classification Classifing a verte in a graph 2
3 Graph Data Mining Topics 2 Graph Dataset Mining Frequent sub-graph pattern mining Finding sub-graphs that frequentl occur among graphs Graph data clustering Grouping similar graphs Graph data classification Classifing a new graph id 1 2 graph 3 4 Applications Application of Single Graph Mining Biological network analsis Social network analsis Web communit analsis Application of Graph Dataset Mining Biochemical structure analsis Program control flow analsis XML structure analsis Challenges Finding the complete set satisfing the minimum support threshold Developing efficient and scalable algorithms Incorporating various kinds of user-specific constraints 3
4 CSI 4352, Introduction to Data Mining Chapters 11 and 13, Graph Data Mining General Definitions Graph Clustering Subgraph Pattern Mining Connectivit Degree, degv i The number of links from v i to other vertices Incoming degree and outgoing degree for directed graphs Weighted degree sum of the weights of the edges directl connected for weighted graphs A Set of eighbors, v i A set of vertices directl linked to the verte v i Also called adjacent neighbors or direct neighbors Degree Distribution, Pk Probabilit that a verte has eactl k links The number of vertices whose degree is k over the total number of vertices 4
5 Length & Sie Walk A sequence of vertices such that each is linked to its succeeding one Path A walk such that each verte in the walk is distinct Path Length The number of edges in path p Shortest Path between v i and v j A path with the smallest length out of all paths from v i to v j Characteristic Path Length of G Average length of the shortest paths between each pair of vertices Diameter of G Largest length of the shortest paths between each pair of vertices Densit Densit of GV,E The number of actual edges in G over the number of all possible edges 2 E D G V V 1 Clique A full connected graph also called, complete graph DG = 1 Quasi-Clique Close to clique A densel connected sub-graph DG > θ where θ is a user-specified threshold 5
6 Modularit Clustering Coefficient of v i The densit of a sub-graph G V,E where V is the set of neighbors of v i j v, v, v v j vk k i C vi v v 1 i i Measuring the effectiveness of v i on denseness Average Clustering Coefficient of GV,E Average of the clustering coefficients of all vertices in V Maimum is 1 Measuring the modularit of G Centralit Closeness, C c v i Detects the vertices located in the center of a graph CC vi v j V 1 p v, v s i j where p s v i,v j is the shortest path length between v i and v j Betweenness, C B v i Detects the vertices located between two clusters C v B i svi tv st vi st where σ st is the number of shortest paths between s and t, and σ st v i is the number of shortest paths between s and t, which pass through the verte v i 6
7 CSI 4352, Introduction to Data Mining Chapters 11 and 13, Graph Data Mining General Definitions Graph Clustering Subgraph Pattern Mining Graph Clustering Problem Definition Finding densel connected sub-graphs G V,E from a graph GV,E Finding sub-graphs with dense intra-connections and sparse interconnections modularit Methods Densit-based methods Hierarchical methods Partition-based methods 7
8 CSI 4352, Introduction to Data Mining Chapters 11 and 13, Graph Data Mining General Definitions Graph Clustering Densit-based Methods Partition-based Methods Hierarchical Methods Subgraph Pattern Mining Maimum Clique Find all maimum sied cliques Use antimonotonic propert If a subset of set A is not a clique, then the set A is not a clique G A B C D E F J I K sie 2 cliques: {AB}, {AC}, {AE},.. sie 3 cliques: {ABC}, {ACE},.. sie 4 cliques: {JKLM} L M 8
9 Clique Percolation Definitions Two k-cliques are adjacent if the share k-1 vertices where k is the number of vertices in each clique A k-clique chain is a sub-graph comprising the union of a sequence of adjacent k cliques 1 Find all k-cliques 2 Find all maimal k-clique chains b iterative merging adjacent k-cliques Reference Palla, G., et al., Uncovering the overlapping communit structure of comple networks in nature and societ, ature 2005 k-core Decomposition Definition k-core is a sub-graph b pruning all vertices whose degree is less than k Remove repeatedl all vertices whose degree is less than k A B C D E F J G I K k = 3 L Reference Wuch, S. and Almaas, E., Peeling the east protein network, Proteomics 2005 M 9
10 Seed Growth Main Idea Search for local optimiation Use a modularit or densit function Tpes of seeds Random seeds: selected randoml Core seeds: selected b degree or clustering coefficient 1 Select a verte seed as an initial cluster S 2 Add a neighbor of the seed to S repeatedl to find the maimum modularit or densit 3 Return S if the modularit of S > threshold 4 Repeat 1, 2 and 3 to find a set of clusters Graph Entrop 1 Definitions An eample of seed-growth algorithms General notation Inner links of v in G V,E : edges from v to the vertices in V p i v: probabilit of v having inner links Outer links of v in G V,E : edges from v to the vertices not in V p o v: probabilit of v having outer links Definitions Verte entrop: ev = - p i v log 2 p i v - p o v log 2 p o v Graph entrop : egv,e = Σ vv ev Find the minimum graph entrop during seed growth 10
11 Graph Entrop 2 Eample Graph Entrop 3 1 Select a seed node, and include all neighbors of the seed node into a seed cluster 2 Iterativel remove a neighbor if removal decreases graph entrop 3 Iterativel add a node on the outer boundar of a current cluster if addition decreases graph entrop 4 Output the cluster with the minimal graph entrop 5 Repeat 1, 2, 3, and 4 until no seed node remains Reference Kenle, E.C. and Cho, Y.-R., Detecting protein complees and functional modules from protein interaction networks: a graph entrop approach, Proteomics
12 CSI 4352, Introduction to Data Mining Chapters 11 and 13, Graph Data Mining General Definitions Graph Clustering Densit-based Methods Partition-based Methods Hierarchical Methods Subgraph Pattern Mining Restricted eighborhood Search 1 Main Idea Iterative moves of vertices to find the best global modularit Tpes of moves Global move: moving a random verte to a random cluster Intensification move: moving in the restricted neighborhood vertices on the boundar of partitions 1 Randoml partition the graph into k subgraphs 2 Make an intensification move of a random verte if this move improves modularit 3 Repeat 2 until finding the best modularit 12
13 Restricted eighborhood Search 2 Eample Modularit function: number of interconnecting edges between clusters G B D F I A C E J K L M Reference King, A., et al., Protein comple prediction via cost-based clustering Bioinformatics 2004 CSI 4352, Introduction to Data Mining Chapters 11 and 13, Graph Data Mining General Definitions Graph Clustering Densit-based Methods Partition-based Methods Hierarchical Methods Subgraph Pattern Mining 13
14 Bottom-Up vs. Top-Down Bottom-Up Agglomerative Approaches Start with each verte as a cluster Iterativel merge the closest clusters Require a distance function between two clusters Top-Down Divisive Approaches Start with the whole graph as a cluster Recursivel divide up the clusters Require a cutting algorithm Merging b Shortest Path Length Main Idea Agglomerative approach using single-link distance 1 Select two closest vertices from different clusters based on the shortest path length between them 2 Merge two clusters that include the selected vertices 3 Repeat 1 and 2 until the shortest path length reaches a threshold 14
15 15 Merging b Common eighbors 1 Main Idea Agglomerative approach using the similarit based on common neighbors More common neighbors two vertices share, more similar the are 1 Find the most similar vertices from different clusters based on a similarit function 2 Merge the two clusters if the merged cluster reaches a densit threshold 3 Repeat 1 and 2 until no more clusters can be merged Similarit Functions Jaccard coefficient: Geometric coefficient:, S, 2 S Merging b Common eighbors 2 More Similarit Functions Dice coefficient: Simpson coefficient: Marland bridge coefficient: Reference Brun, C., et al., Clustering proteins from interaction networks for the prediction of cellular functions BMC Bioinformatics , S, min, S 2 1, S
16 Merging b Statistical Significance Statistical Similarit Function Hper-geometric coefficient P-value: V V Z V X Z X Z Y Z P V V X Y V is the total number of vertices, X=, Y=, and Z= for vertices and Repeatedl find the vertices with the smallest P-value, and merge them Reference Samanta, M.P. and Liang, S., Predicting protein functions from redundancies in large-scale protein interaction networks PAS 2003 Minimum Cut Definitions Cut: a set of edges whose removal disconnects the graph Minimum cut: a cut with minimum number of edges Recursivel find the minimum cut B D F G I A Parameter C E J K Minimum densit threshold Minimum sie threshold L M 16
17 Betweenness Cut Betweenness Measurement of vertices or edges located between clusters 1 Iterativel eliminate a verte or an edge with the highest Betweenness value until the graph is separated 2 Recursivel appl 1 into each subgraph 3 Repeat 1 and 2 until all subgraphs reach a densit threshold Reference Dunn, R., et al., The use of edge-betweenness clustering to investigate biological function in protein interaction networks BMC Bioinformatics 2005 Dividing b Common eighbors Main Idea Divisive approach using the dissimilarit based on common neighbors Less common neighbors two vertices share, more dissimilar the are 1 Iterativel eliminate the edge between the most dissimilar vertices based on a similarit function, until the graph is separated 2 Recursivel appl 1 into each subgraph 3 Repeat 1 and 2 until all subgraphs reach a densit threshold Reference Radicchi, F., et al., Defining and identifing communities in networks PAS
18 CSI 4352, Introduction to Data Mining Chapters 11 and 13, Graph Data Mining General Definitions Graph Clustering Subgraph Pattern Mining Properties of Subgraph Patterns from Graph Dataset Properties Anti-monotonic propert Apriori algorithm If a sub-graph G is not frequent, then none of the super-graphs of G are frequent id graph Eample If is infrequent, so do and
19 Properties of Subgraph Patterns from a Single Graph Properties Frequent sub-graph pattern mining in a graph ot anti-monotonic propert! Even if a sub-graph G is not frequent, some of the super-graphs of G might be frequent Eample Suppose minimum support is 10 How man? How man? FSG Algorithm FSG Frequent Sub-graph discover 1 Initiall, find all frequent sie-3 sub-graphs 2 Generate candidate sie-k+1 sub-graphs from frequent sie-k sub-graphs 3 Count support of each candidate sub-graph to select frequent sub-graphs 4 Repeat 2 and 3 until no frequent sub-graph or no candidate is found 19
20 Candidate Sub-graphs Selective Joining Join two sie-k sub-graphs if the share sie-k-1 sub-graphs Ma join the same sie-k sub-graph Produce multiple distinct sie-k+1 sub-graphs + + Summar of FSG Algorithm Strength Apriori pruning Weakness Generates a huge set of candidate sub-graphs Requires multiple scans of database Inefficient for mining large-sied sub-graph patterns eeds efficient finding of isomorphic graphs to count support Reference Kuramochi, M. and Karpis, G., Frequent subgraph discover. In Proceedings of ICDM
21 Structural Isomorphism Definition If two graphs are isomorphic, then the are structurall identical Eamples Canonical Adjacenc Matri Adjacenc Matri Canonical Adjacenc Matri Canonical Code
22 FFSM Algorithm FFSM Fast frequent sub-graph mining Main Idea Use canonical adjacenc matrices for selective joining and support a 1 1 b 1 b counting a 1 b 1 1 b b Reference a a + 1 b 1 b 1 0 b 1 1 b b b a a 1 b 1 b b 1 1 b b b Huan, J., Wang, W. and Prins, J., Efficient mining of frequent subgraph in the presence of isomorphism. In Proceedings of ICDM 2003 DFS Code DFS Tree Trace on depth-first-search DFS Tree in a Leicographic Order DFS tree constructed b the leicographic order of labels DFS Code A sequence of edges of the DFS tree in a leicographic order {,,,,,,,,,,,} 22
23 gspan Algorithm gspan Graph-based sub-structure pattern mining Main Idea Use DFS codes for frequent sub-graph pattern mining Efficient for pattern growth Reference Yan, X. and Han, J., gspan: Graph-based sub-structure pattern mining, In Proceedings of ICDM 2002 Yan, X. and Han, J., CloseGraph: Mining closed frequent graph patterns, In Proceedings of KDD 2003 Questions? Lecture Slides on the Course Website, 23
Data Mining in Bioinformatics Day 5: Graph Mining
Data Mining in Bioinformatics Day 5: Graph Mining Karsten Borgwardt February 25 to March 10 Bioinformatics Group MPIs Tübingen from Borgwardt and Yan, KDD 2008 tutorial Graph Mining and Graph Kernels,
More informationData Mining in Bioinformatics Day 3: Graph Mining
Graph Mining and Graph Kernels Data Mining in Bioinformatics Day 3: Graph Mining Karsten Borgwardt & Chloé-Agathe Azencott February 6 to February 17, 2012 Machine Learning and Computational Biology Research
More informationData Mining in Bioinformatics Day 5: Frequent Subgraph Mining
Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining Chloé-Agathe Azencott & Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institutes
More informationUnsupervised Learning. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis
7 Supervised learning vs unsupervised learning Unsupervised Learning Supervised learning: discover patterns in the data that relate data attributes with a target (class) attribute These patterns are then
More informationgspan: Graph-Based Substructure Pattern Mining
University of Illinois at Urbana-Champaign February 3, 2017 Agenda What motivated the development of gspan? Technical Preliminaries Exploring the gspan algorithm Experimental Performance Evaluation Introduction
More informationChapter 13, Sequence Data Mining
CSI 4352, Introduction to Data Mining Chapter 13, Sequence Data Mining Young-Rae Cho Associate Professor Department of Computer Science Baylor University Topics Single Sequence Mining Frequent sequence
More informationCommunity Detection. Community
Community Detection Community In social sciences: Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group a.k.a. group,
More informationLecture 3, Review of Algorithms. What is Algorithm?
BINF 336, Introduction to Computational Biology Lecture 3, Review of Algorithms Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Algorithm? Definition A process
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data: Part I Instructor: Yizhou Sun yzsun@ccs.neu.edu November 12, 2013 Announcement Homework 4 will be out tonight Due on 12/2 Next class will be canceled
More informationData mining, 4 cu Lecture 8:
582364 Data mining, 4 cu Lecture 8: Graph mining Spring 2010 Lecturer: Juho Rousu Teaching assistant: Taru Itäpelto Frequent Subgraph Mining Extend association rule mining to finding frequent subgraphs
More informationWeb Structure Mining Community Detection and Evaluation
Web Structure Mining Community Detection and Evaluation 1 Community Community. It is formed by individuals such that those within a group interact with each other more frequently than with those outside
More informationGraph mining-based Image Indexing
Graph mining-based Image Indeing Gábor Iváncs, Renáta Iváncs and István Vajk Department of Automation and Applied Informatics, Budapest Universit of Technolog and Economics,, Goldmann G. ter 3. Budapest,
More informationClustering fundamentals
Elena Baralis, Tania Cerquitelli Politecnico di Torino What is Cluster Analsis? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from
More informationCommunity detection algorithms survey and overlapping communities. Presented by Sai Ravi Kiran Mallampati
Community detection algorithms survey and overlapping communities Presented by Sai Ravi Kiran Mallampati (sairavi5@vt.edu) 1 Outline Various community detection algorithms: Intuition * Evaluation of the
More informationTan,Steinbach, Kumar Introduction to Data Mining 4/18/ Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter Introduction to Data Mining b Tan, Steinbach, Kumar What is Cluster Analsis? Finding groups of objects such that the
More informationMCL. (and other clustering algorithms) 858L
MCL (and other clustering algorithms) 858L Comparing Clustering Algorithms Brohee and van Helden (2006) compared 4 graph clustering algorithms for the task of finding protein complexes: MCODE RNSC Restricted
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar What is Cluster Analsis? Finding groups of objects such that the
More informationOnline Social Networks and Media. Community detection
Online Social Networks and Media Community detection 1 Notes on Homework 1 1. You should write your own code for generating the graphs. You may use SNAP graph primitives (e.g., add node/edge) 2. For the
More informationPattern Mining in Frequent Dynamic Subgraphs
Pattern Mining in Frequent Dynamic Subgraphs Karsten M. Borgwardt, Hans-Peter Kriegel, Peter Wackersreuther Institute of Computer Science Ludwig-Maximilians-Universität Munich, Germany kb kriegel wackersr@dbs.ifi.lmu.de
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationMining frequent Closed Graph Pattern
Mining frequent Closed Graph Pattern Seminar aus maschninellem Lernen Referent: Yingting Fan 5.November Fachbereich 21 Institut Knowledge Engineering Prof. Fürnkranz 1 Outline Motivation and introduction
More informationClustering Part 2. A Partitional Clustering
Universit of Florida CISE department Gator Engineering Clustering Part Dr. Sanja Ranka Professor Computer and Information Science and Engineering Universit of Florida, Gainesville Universit of Florida
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References q Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster
More informationBrief description of the base clustering algorithms
Brief description of the base clustering algorithms Le Ou-Yang, Dao-Qing Dai, and Xiao-Fei Zhang In this paper, we choose ten state-of-the-art protein complex identification algorithms as base clustering
More information! Introduction. ! Partitioning methods. ! Hierarchical methods. ! Model-based methods. ! Density-based methods. ! Scalability
Preview Lecture Clustering! Introduction! Partitioning methods! Hierarchical methods! Model-based methods! Densit-based methods What is Clustering?! Cluster: a collection of data objects! Similar to one
More informationAnalysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths
Analysis of Biological Networks 1. Clustering 2. Random Walks 3. Finding paths Problem 1: Graph Clustering Finding dense subgraphs Applications Identification of novel pathways, complexes, other modules?
More information9/17/2009. Wenyan Li (Emily Li) Sep. 15, Introduction to Clustering Analysis
Introduction ti to K-means Algorithm Wenan Li (Emil Li) Sep. 5, 9 Outline Introduction to Clustering Analsis K-means Algorithm Description Eample of K-means Algorithm Other Issues of K-means Algorithm
More informationCommunity detection. Leonid E. Zhukov
Community detection Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Network Science Leonid E.
More informationClusters and Communities
Clusters and Communities Lecture 7 CSCI 4974/6971 22 Sep 2016 1 / 14 Today s Biz 1. Reminders 2. Review 3. Communities 4. Betweenness and Graph Partitioning 5. Label Propagation 2 / 14 Today s Biz 1. Reminders
More informationMining Social Network Graphs
Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be
More informationIntroduction to Graph Theory
Introduction to Graph Theory Tandy Warnow January 20, 2017 Graphs Tandy Warnow Graphs A graph G = (V, E) is an object that contains a vertex set V and an edge set E. We also write V (G) to denote the vertex
More informationComparative Study of Subspace Clustering Algorithms
Comparative Study of Subspace Clustering Algorithms S.Chitra Nayagam, Asst Prof., Dept of Computer Applications, Don Bosco College, Panjim, Goa. Abstract-A cluster is a collection of data objects that
More informationCMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)
CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification
More informationData Clustering Hierarchical Clustering, Density based clustering Grid based clustering
Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms
More informationApriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke
Apriori Algorithm For a given set of transactions, the main aim of Association Rule Mining is to find rules that will predict the occurrence of an item based on the occurrences of the other items in the
More informationTan,Steinbach, Kumar Introduction to Data Mining 4/18/ Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter Introduction to Data Mining b Tan, Steinbach, Kumar What is Cluster Analsis? Finding groups of objects such that the
More informationClustering in Data Mining
Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,
More informationFrequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar
Frequent Pattern Mining Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Item sets A New Type of Data Some notation: All possible items: Database: T is a bag of transactions Transaction transaction
More informationSubdue: Compression-Based Frequent Pattern Discovery in Graph Data
Subdue: Compression-Based Frequent Pattern Discovery in Graph Data Nikhil S. Ketkar University of Texas at Arlington ketkar@cse.uta.edu Lawrence B. Holder University of Texas at Arlington holder@cse.uta.edu
More informationGraph Theory. ICT Theory Excerpt from various sources by Robert Pergl
Graph Theory ICT Theory Excerpt from various sources by Robert Pergl What can graphs model? Cost of wiring electronic components together. Shortest route between two cities. Finding the shortest distance
More informationHow and what do we see? Segmentation and Grouping. Fundamental Problems. Polyhedral objects. Reducing the combinatorics of pose estimation
Segmentation and Grouping Fundamental Problems ' Focus of attention, or grouping ' What subsets of piels do we consider as possible objects? ' All connected subsets? ' Representation ' How do we model
More informationEfficient Subgraph Matching by Postponing Cartesian Products
Efficient Subgraph Matching by Postponing Cartesian Products Computer Science and Engineering Lijun Chang Lijun.Chang@unsw.edu.au The University of New South Wales, Australia Joint work with Fei Bi, Xuemin
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationSupervised and Unsupervised Learning (II)
Supervised and Unsupervised Learning (II) Yong Zheng Center for Web Intelligence DePaul University, Chicago IPD 346 - Data Science for Business Program DePaul University, Chicago, USA Intro: Supervised
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationData Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application
Data Structures Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali 2009-2010 Association Rules: Basic Concepts and Application 1. Association rules: Given a set of transactions, find
More informationTight Clustering: a method for extracting stable and tight patterns in expression profiles
Statistical issues in microarra analsis Tight Clustering: a method for etracting stable and tight patterns in epression profiles Eperimental design Image analsis Normalization George C. Tseng Dept. of
More informationand Algorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 8/30/ Introduction to Data Mining 08/06/2006 1
Cluster Analsis: Basic Concepts and Algorithms Dr. Hui Xiong Rutgers Universit Introduction to Data Mining 8//6 Introduction to Data Mining 8/6/6 What is Cluster Analsis? Finding groups of objects such
More informationData Mining Concepts
Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential
More informationGraphs. Tessema M. Mengistu Department of Computer Science Southern Illinois University Carbondale Room - Faner 3131
Graphs Tessema M. Mengistu Department of Computer Science Southern Illinois University Carbondale tessema.mengistu@siu.edu Room - Faner 3131 1 Outline Introduction to Graphs Graph Traversals Finding a
More informationMachine Learning 15/04/2015. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis
// Supervised learning vs unsupervised learning Machine Learning Unsupervised Learning Supervised learning: discover patterns in the data that relate data attributes with a target (class) attribute These
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationClustering Lecture 9: Other Topics. Jing Gao SUNY Buffalo
Clustering Lecture 9: Other Topics Jing Gao SUNY Buffalo 1 Basics Outline Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Miture model Spectral methods Advanced topics
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More information9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology
9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example
More informationSurvey on Graph Query Processing on Graph Database. Presented by FAN Zhe
Survey on Graph Query Processing on Graph Database Presented by FA Zhe utline Introduction of Graph and Graph Database. Background of Subgraph Isomorphism. Background of Subgraph Query Processing. Background
More informationCross-validation for detecting and preventing overfitting
Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.
More informationNon Overlapping Communities
Non Overlapping Communities Davide Mottin, Konstantina Lazaridou HassoPlattner Institute Graph Mining course Winter Semester 2016 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationData Mining for Knowledge Management. Association Rules
1 Data Mining for Knowledge Management Association Rules Themis Palpanas University of Trento http://disi.unitn.eu/~themis 1 Thanks for slides to: Jiawei Han George Kollios Zhenyu Lu Osmar R. Zaïane Mohammad
More informationCLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets
CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets Jianyong Wang, Jiawei Han, Jian Pei Presentation by: Nasimeh Asgarian Department of Computing Science University of Alberta
More informationFrequent Pattern-Growth Approach for Document Organization
Frequent Pattern-Growth Approach for Document Organization Monika Akbar Department of Computer Science Virginia Tech, Blacksburg, VA 246, USA. amonika@cs.vt.edu Rafal A. Angryk Department of Computer Science
More informationDATA MINING II - 1DL460
DATA MINING II - 1DL460 Spring 2013 " An second class in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationStrength of Co-authorship Ties in Clusters: a Comparative Analysis
Strength of Co-authorship Ties in Clusters: a Comparative Analysis Michele A. Brandão and Mirella M. Moro Universidade Federal de Minas Gerais, Belo Horizonte, Brazil micheleabrandao@dcc.ufmg.br, mirella@dcc.ufmg.br
More informationIntroduction to Mobile Robotics
Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,
More informationDiscovering Frequent Geometric Subgraphs
Discovering Frequent Geometric Subgraphs Michihiro Kuramochi and George Karypis Department of Computer Science/Army HPC Research Center University of Minnesota 4-192 EE/CS Building, 200 Union St SE Minneapolis,
More informationGraph Theory for Network Science
Graph Theory for Network Science Dr. Natarajan Meghanathan Professor Department of Computer Science Jackson State University, Jackson, MS E-mail: natarajan.meghanathan@jsums.edu Networks or Graphs We typically
More informationExtracting Information from Complex Networks
Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining, nd Edition b Tan, Steinbach, Karpatne, Kumar What is Cluster Analsis? Finding groups
More informationCAIM: Cerca i Anàlisi d Informació Massiva
1 / 72 CAIM: Cerca i Anàlisi d Informació Massiva FIB, Grau en Enginyeria Informàtica Slides by Marta Arias, José Balcázar, Ricard Gavaldá Department of Computer Science, UPC Fall 2016 http://www.cs.upc.edu/~caim
More informationAssociation Pattern Mining. Lijun Zhang
Association Pattern Mining Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction The Frequent Pattern Mining Model Association Rule Generation Framework Frequent Itemset Mining Algorithms
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationChapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the
Chapter 6: What Is Frequent ent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc) that occurs frequently in a data set frequent itemsets and association rule
More informationgprune: A Constraint Pushing Framework for Graph Pattern Mining
gprune: A Constraint Pushing Framework for Graph Pattern Mining Feida Zhu Xifeng Yan Jiawei Han Philip S. Yu Computer Science, UIUC, {feidazhu,xyan,hanj}@cs.uiuc.edu IBM T. J. Watson Research Center, psyu@us.ibm.com
More informationChapter 5, Data Cube Computation
CSI 4352, Introduction to Data Mining Chapter 5, Data Cube Computation Young-Rae Cho Associate Professor Department of Computer Science Baylor University A Roadmap for Data Cube Computation Full Cube Full
More informationOutline. Graphs. Divide and Conquer.
GRAPHS COMP 321 McGill University These slides are mainly compiled from the following resources. - Professor Jaehyun Park slides CS 97SI - Top-coder tutorials. - Programming Challenges books. Outline Graphs.
More informationKeyLabel Algorithms for Keyword Search in Large Graphs
KeyLabel Algorithms for Keyword Search in Large Graphs by Yue Wang B.Sc., Simon Fraser University, 2013 B.Eng., China Agriculture University, 2004 Thesis Submitted in Partial Fulfillment of the Requirements
More informationSocial Data Management Communities
Social Data Management Communities Antoine Amarilli 1, Silviu Maniu 2 January 9th, 2018 1 Télécom ParisTech 2 Université Paris-Sud 1/20 Table of contents Communities in Graphs 2/20 Graph Communities Communities
More informationLesson 3. Prof. Enza Messina
Lesson 3 Prof. Enza Messina Clustering techniques are generally classified into these classes: PARTITIONING ALGORITHMS Directly divides data points into some prespecified number of clusters without a hierarchical
More informationChapter 4: Association analysis:
Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their day-to-day operations, huge amounts of customer purchase data are collected daily
More informationExploratory Analysis: Clustering
Exploratory Analysis: Clustering (some material taken or adapted from slides by Hinrich Schutze) Heejun Kim June 26, 2018 Clustering objective Grouping documents or instances into subsets or clusters Documents
More informationCUT: Community Update and Tracking in Dynamic Social Networks
CUT: Community Update and Tracking in Dynamic Social Networks Hao-Shang Ma National Cheng Kung University No.1, University Rd., East Dist., Tainan City, Taiwan ablove904@gmail.com ABSTRACT Social network
More informationMining Mixed-drove Co-occurrence Patterns For Large Spatio-temporal Data Sets: A Summary of Results
Mining Mixed-drove Co-occurrence Patterns For Large Spatio-temporal Data Sets: A Summary of Results Xiangxiang Cong, Zhanquan Wang(Corresponding Author),Man Kong, Chunhua Gu Computer Science&Engineer Department,
More informationV10 Metabolic networks - Graph connectivity
V10 Metabolic networks - Graph connectivity Graph connectivity is related to analyzing biological networks for - finding cliques - edge betweenness - modular decomposition that have been or will be covered
More informationINF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22
INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task
More informationLecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Scalable Clustering Methods: BIRCH and Others Reading: Chapter 10.3 Han, Chapter 9.5 Tan Cengiz Gunay, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei.
More informationCluster analysis. Agnieszka Nowak - Brzezinska
Cluster analysis Agnieszka Nowak - Brzezinska Outline of lecture What is cluster analysis? Clustering algorithms Measures of Cluster Validity What is Cluster Analysis? Finding groups of objects such that
More informationBig Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic
More informationIntroduction to Parallel & Distributed Computing Parallel Graph Algorithms
Introduction to Parallel & Distributed Computing Parallel Graph Algorithms Lecture 16, Spring 2014 Instructor: 罗国杰 gluo@pku.edu.cn In This Lecture Parallel formulations of some important and fundamental
More informationData Mining. Cluster Analysis: Basic Concepts and Algorithms
Data Mining Cluster Analsis: Basic Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster Analsis? Finding groups of objects such that the objects in a group will
More informationCS224W: Analysis of Networks Jure Leskovec, Stanford University
CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 Observations Models
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Sequence Data: Sequential Pattern Mining Instructor: Yizhou Sun yzsun@cs.ucla.edu November 27, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification
More informationGraph Connectivity G G G
Graph Connectivity 1 Introduction We have seen that trees are minimally connected graphs, i.e., deleting any edge of the tree gives us a disconnected graph. What makes trees so susceptible to edge deletions?
More informationDiscussion: Clustering Random Curves Under Spatial Dependence
Discussion: Clustering Random Curves Under Spatial Dependence Gareth M. James, Wenguang Sun and Xinghao Qiao Abstract We discuss the advantages and disadvantages of a functional approach to clustering
More informationChapter 4: Mining Frequent Patterns, Associations and Correlations
Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent
More information