Survey on Graph Query Processing on Graph Database. Presented by FAN Zhe

Size: px
Start display at page:

Download "Survey on Graph Query Processing on Graph Database. Presented by FAN Zhe"

Transcription

1 Survey on Graph Query Processing on Graph Database Presented by FA Zhe

2 utline Introduction of Graph and Graph Database. Background of Subgraph Isomorphism. Background of Subgraph Query Processing. Background of Similarity Graph Query Processing. Background of Supergraph Query Processing.

3 What is Graph Graph is powerful. Graph is everywhere. Graph is complex. While the size and volume of graph is increasing. Trade off. Easier for model, harder for analysis. Chemical bonds Internet DA Daily-life bjects

4 Two Scenarios of Graph Database Bio-informatics Social etwork Graph Database is a database that contains millions of graphs.

5 Definition of Graph A graph g is defined as a 4-tuple, g = (V,E,L, l), where V is the set of vertices, E is the set of edges, L is the set of labels and l is a labelling function that maps each vertex or edge to a label in L. We define the size of a graph g as size(g) = E(g). We restrict our discussion on undirected, labelled connected graphs.

6 Graph Query Processing Problem in Current Research Field 1 Subgraph Isomorphism [Ullmann, J.ACM 76], [Cordella, PAMI 04], [QuickSI, VLDB 08] 2 Subgraph Query 2.1 ne large graph [GraphGrep, ICPR 02], [TALE, ICDE 08], [GADDI, EDBT 09], [SAPPER, VLDB 10] 2.2 umbers of small graphs[graphgrep, ICPR 02], [gindex, SIGMD 04], [FG-index, SIGMD 07], [C- Tree, ICDE 06], [QuickSI, VLDB 08], [GBLEDER, SIGMD 10], [igraph, VLDB 10] 3 Similarity Graph Query (subgraph query is not always available in all cases) 3.1 ne large graph [GraphGrep, ICPR 02], [TALE, ICDE 08], [GADDI, EDBT 09], [SAPPER, VLDB 10] 3.2 umbers of small graphs [C-Tree, ICDE 06], [Grafil, SIGMD 05] 4 Supergraph Query (containment graph query) [cindex, VLDB 07], [GPTree, 09 EDBT], 5 Reachability Problem 6 Shortest path Problem 7 Spatial Data Problem

7 utline Introduction of Graph and Graph Database. Background of Subgraph Isomorphism. Background of Subgraph Query Processing. Background of Similarity Graph Query Processing. Background of Supergraph Query Processing.

8 Definition of Subgraph Isomorphism 1 A 2 3 B C 1 A 2 3 B C 4 A g g

9 Subgraph Isomorphism Algorithm Condition: 2 B 1 A 1) M [i][j] = 1 means that the i-th vertex in Q corresponds to query G. C 3 1 A 2 3 B C 4 A 2) Each row in M contains exactly one 1. 3) o column contains more than one 1. M specifies an subgraph isomorphism from Q to G. How to find such matrix M? ---Ullmann Algorithm@76

10 Subgraph Isomorphism Algorithm (Cont.) 1 A 2 B C 3 1 A 2 3 B C 4 A MC = M '( M ' MB) T i j : ( MA[ i][ j] = 1) ( MC[ i][ j] = 1)

11 Subgraph Isomorphism Algorithm (Cont.) 1 A 2 B C 3 1 A 2 3 B C 4 A MC = M '( M ' MB) T i j : ( MA[ i][ j] = 1) ( MC[ i][ j] = 1)

12 Subgraph Isomorphism Algorithm (Cont.) Given two graphs Q and G, their corresponding matrixes are MA n n =[a ij ] and MB m m = [b ij ]. Goal: 1) Find matrix M n m such that MC = M '( M ' MB) T 2) or report no such marix M. i j : ( MA[ i][ j] = 1) ( MC[ i][ j] = 1)

13 Subgraph Isomorphism Algorithm (Cont.) Step 1. Set up matrix M n m, such that M[i][j]=1, if 1) the i-th vertex in Q has the same label as the j-th vertex in G; and 2) the i-th vertex has smaller vertex degree than the j-th vertex in G. 1 A 2 3 B C 1 A 2 3 B C 4 A

14 Subgraph Isomorphism Algorithm (Cont.) Step 2. Matrixes M are generated by systematically changing to 0 all but one of the 1 s in each of the rows of M, subject to the definition condition that no column of a matrix M may contain more than one 1. (the maximal depth is MA )

15 Subgraph Isomorphism Algorithm (Cont.) Step 3. Verify matrix M by the following equation MC = M '( M ' MB) T i j : ( MA[ i][ j] = 1) ( MC[ i][ j] = 1) Iterate the above steps and enumerate all possible matrixes M. In the worst case, there are ( MB!) possible matrixes. (subgraph isomorphism is a classical P-hard problem)

16 Subgraph Isomorphism Algorithm (Cont.) Some ptimizations of Ullmann s algorithm, if interested, please check the original research paper. QuickSI: VLDB 08 A good survey about graph matching algorithms: THIRTY YEARS F GRAPH MATCHIG I PATTER C++ library For Graph Isomorphism: VFLib library

17 utline Introduction of Graph and Graph Database. Background of Subgraph Isomorphism. Background of Subgraph Query Processing. Background of Similarity Graph Query Processing. Background of Supergraph Query Processing.

18 Subgraph Query Problem definition Given a graph database D and a graph query q. Find all graphs g in D s.t. q is a subgraph of g. Sample database + H S H H H H H S (a) (b) (c) Query graph Complexity: exactly P-complete!

19 Application of Subgraph Query Protein interaction analysis Motif discovery in 3D protein structures Drug design Schema matching Graph similarity search Correlation discovery in graph databases

20 Challenges of Subgraph Query Sequential scan is not scalable Disk I/ Subgraph isomorphism testing An indexing mechanism is needed DayLight: Daylight.com (commercial) GraphGrep: Dennis Shasha etc. PDS'02 gindex: FG-index: C-Tree: SwiftIndex: igraph:

21 Representative Works on Subgraph Query Feature-based approach gindex, SIGMD 04 Fgindex, SIGMD 07 on-feature-based approach GraphGrep, PDS 02 QuickSI, VLDB 08 C-Tree, ICDE 06 GString, ICDE 07 GCoding, EDBT 08

22 GraphGrep (shasha et 02) Fingerprinting: to filter the database A subgraph matching algorithm Basic Idea Use small components of the query graph and the database graphs to filter the database and to do the matching

23 GraphGrep (shasha et 02) (Cont.)

24 GraphGrep (shasha et 02) (Cont.)

25 gindex (Yan et 04) Query graph (Q) Graph (G) If graph G contains query graph Q, G should contain any substructure of Q Substructure Remarks Index substructures of a query graph to prune graphs that do not contain these substructures

26 gindex (Yan et 04) (Cont.) Two steps in processing graph queries Step 1. Index Construction Framework Enumerate structures in the graph database, build an inverted index between structures and graphs Step 2. Query Processing Enumerate structures in the query graph Calculate the candidate graphs containing these structures Prune the false positive answers by performing subgraph isomorphism test

27 gindex (Yan et 04) (Cont.) Two Approaches: Path-based indexing Subgraph-based indexing

28 gindex (Yan et 04) (Cont.) Path-Based Approach Sample database H H + S H H S (a) (b) (c) H Paths 0-length: C,,, S 1-length: C-C, C-, C-, C-S, -, S- 2-length: C-C-C, C--C, C--C,... 3-length:... Built an inverted index between paths and graphs

29 gindex (Yan et 04) (Cont.) Query graph Path-Based Approach (Cont.) 0-length: S C ={a, b, c}, S ={a, b, c} 1-length: S C-C ={a, b, c}, S C- ={a, b, c} 2-length: S C--C = {a, b}, Intersect these sets, we obtain the candidate answers - graph (a) and graph (b) - which may contain this query graph.

30 gindex (Yan et 04) (Cont.) Sample database Problem of Path-Based Approach H H + S H H S (a) (b) (c) H Query graph Graph (c) contains this query graph. However, if we only index paths: C, C-C, C-C-C, C-C-C-C, we can not prune graph (a) and (b).

31 gindex (Yan et 04) (Cont.) Paths are simple, structural information is lost There are too many paths Problem of Path-Based Approach gindex propose Use structures instead of paths Use discriminative structures

32 gindex (Yan et 04) (Cont.) gindex: Indexing Graphs by Data Mining Identify frequent structures in the database, the frequent structures are subgraphs that appear quite often in the graph database Prune redundant frequent structures to maintain a small set of discriminative structures Create an inverted index between discriminative frequent structures and graphs in the database

33 gindex (Yan et 04) (Cont.) Sample database H H Frequent Structures H + H H S S H (a) (b) (c) Frequent structures with support 2 (a) (b)

34 gindex (Yan et 04) (Cont.) Frequent Structures (cont.) Efficient frequent graph mining algorithms are available Apriori: AGM/AcGM: Inokuchi et al (PKDD 00) FSG, Kuramochi et al (ICDM 01) Vanetik et al (ICDM 02) Pattern-growth: MoFa, Borgelt et al (ICDM 02) gspan: Yan and Han (ICDM 02)

35 gindex (Yan et 04) (Cont.) Frequent Structures: Threshold Issue How to set up the minimum support threshold? If it is too low, it may generate too many frequent graphs If it is too high, it may miss important structures Should we enforce a uniform threshold for the different size of structures? Size-increasing support threshold

36 gindex (Yan et 04) (Cont.) Frequent Structures: Threshold Issue 20 support(%) Θ θ fragment size (edges) Intuition: large structures with low support will likely be indexed well by their substructures that have the similar support Size-increasing support threshold The support threshold increases when the indexed structures become larger

37 gindex (Yan et 04) (Cont.) Frequent Structures: Volume Issue The number of frequent structures may exceed the number of graphs in the database when the support is low 1,000 graphs may generate 1,000,000 frequent structures It is time and memory expensive to compute and index all frequent structures discriminative structures

38 gindex (Yan et 04) (Cont.) Redundant Structures Sample database H H + S H H S (a) (b) (c) H All graphs contain structures: C, C-C, C-C-C Why bother indexing these redundant frequent structures? Remove these redundant structures nly index structures that provide more information than existing structures

39 gindex (Yan et 04) (Cont.) Discriminative Structures Pinpoint the most useful frequent structures Given a set of sturctures f1, f2, K f n and a new structure x, we measure the extra indexing power provided by x, P ( x f, f2, K f ), f. 1 n i x When P is small enough, is a discriminative structure and should be included in the index Index discriminative frequent structures only Reduce the index size by an order of magnitude Achieve good performance x

40 gindex (Yan et 04) (Cont.) gindex - Construction First generates all frequent fragments while taking out redundant ones Translates fragments into sequences and holds them in a prefix tree Each fragment has an id list: the ids of the graphs containing the fragment Graph Sequentialization (DFS Code) Labeled edge is a 5-tuple (I,j,l i, l (I,j),l j ) Described in another paper

41 gindex (Yan et 04) (Cont.) gindex - Construction gindex Tree each fragment can be mapped to an edge sequence (DFS code), insert the edge sequences of discriminative fragments in a prefix tree called the gindex Tree

42 gindex (Yan et 04) (Cont.) gindex - Search Query Filtering Verification Answers

43 gindex (Yan et 04) (Cont.) Query Response Time Cost Analysis T index + C q ( ) T + T io isomorphism _ testing Disk I/ time Query indexing time Isomorphism testing time Size of candidate answer set Remark: make C q as small as possible

44 gindex (Yan et 04) (Cont.) gindex - Search ptimization AprioriPruning If a fragment is not in the gindex tree, we need not check its super-graphs

45 gindex (Yan et 04) (Cont.)

46 gindex (Yan et 04) (Cont.)

47 FGindex (Cheng et 07) First work propose the concept of verification-free Basic idea: If the query is frequent feature, then no need to verify the candidate If the query is not frequent feature, the cost is the same to gindex. Problem is if the query graph is large, the probability of being frequent feature would be low.

48 Closure-Tree (He and 04)

49 Closure-Tree (He and 04) (Cont.)

50 igraph 10) [igraph, VLDB 10] is a common framework that implement most of the above representative index, it uses the same subgraph isomorphism algorithm and a common storage engine that guarantees real disk I/s by bypassing the S file system cache. In terms of the experiments and conclusions. We have known that: 1. There is no single winner for all the above techniques on subgraph query processing. 2. Feature-based index, like gindex and FGindex, have the best pruning power, which leads to lowest I/ cost and small candidate set.

51 utline Introduction of Graph and Graph Database. Background of Subgraph Isomorphism. Background of Subgraph Query Processing. Background of Similarity Graph Query Processing. Background of Supergraph Query Processing.

52 Precise vs. Approximate Search in Graphs Given a graph database and a query graph Q, Find graphs containing Q exactly (Precise Matching, gindex, SIGMD 04) Find graphs containing Q approximately (Approximate Matching, Grafil)

53 Evaluating Graph Similarity 1. Maximal Common Subgraph (MCS): Given two graphs Q and G, assume that S is subgraph isomorphism to both Q and G. S is called a common subgraphof Q and G. A MCS E The MCS between Q and G is the common subgraph with the largest number of edges ( E(S) ). B C A B F Q C G

54 Evaluating Graph Similarity (Cont.) 2. Minimal Graph Edit Distance The minimal edit distance between Q and G is the minimal number of edit operations (insertion, deletion, or relabeling ) in the optimal alignments that make Q reach G. A E B C B C F Q A G

55 Solution 1 Compute the similarity between the graphs in the database and the query graph directly (costly) sequential scan subgraph similarity computation

56 Solution 2 Form a set of subgraph queries from the original query graph and use the exact subgraph search (costly) If we allow 3 edges to be missed in a 20-edge query graph, it may generate 1,140 subgraphs.

57 Solution 3 Precise Search Use frequent patterns as indexing features Select features in the database space based on their selectivity Build the index Approximate Search Hard to build indices covering similar subgraphs explosive number of subgraphs in databases Idea: (1) keep the index structure (2) select features in the query space

58 Substructure Similarity Measure Structure-based similarity measure The largest overlapping part of two graphs G Relaxation: the number of edges that can be relabeled or deleted (relaxation of the query graph) Q

59 Structural Features Graph Database H H + S H H S (a) (b) (c) H Structural Features (small fragments) atom path bond subgraph

60 Substructure Similarity Measure Feature-based similarity measure Each graph is represented as a feature vector X = {x 1, x 2,, x n } The similarity is defined by the distance of their corresponding vectors Easy to index Very fast Rough measure

61 Substructure Similarity Measure Structure-based similarity Accurate measure Slow Can we transform structure-based to feature-based? Feature-based similarity Rough measure Fast

62 Grafil (Yan et 05) Query (Q) Graph (G 1 ) If graph G contains the major part of a query graph Q, G should share a number of common features with Q Substructure Graph (G 2 ) Given a relaxation ratio, calculate the maximal number of features that can be missed! At least one of them should be contained

63 Grafil (Yan et 05) (Cont. ) Feature-Graph Matrix An occurrence table between feature and graph G 1 G 2 G 3 G 4 G 5 f f f f Assume a query graph has 4 features and only 1 feature to miss due to the relaxation threshold

64 Grafil (Yan et 05) (Cont. ) Query Processing Framework Three steps in processing approximate graph queries Step 1. Index Construction Select small structures as features in a graph database, and build the feature-graph matrix between the features and the graphs in the database.

65 Grafil (Yan et 05) (Cont. ) Query Processing Framework Step 2. Feature Miss Estimation Determine the indexed features belonging to the query graph Calculate the upper bound of the number of features that can be missed for an approximate matching, denoted by J n the query graph, not the graph database

66 Grafil (Yan et 05) (Cont. ) Query Processing Framework Step 3. Query Processing Use the feature-graph matrix to calculate the difference in the number of features between graph G and query Q, F G F Q If F G F Q > J, discard G. The remaining graphs constitute a candidate answer set

67 Grafil (Yan et 05) (Cont. ) Selection of Upper Bound If we allow k edges to be relaxed, the main idea is to transform edge misses k to feature misses m. Classic set k-cover problem, P-complete k: the number of missing edges in q. m: max number of features covered by k edges.

68 Grafil (Yan et 05) (Cont. ) Usage of the feature misses m m = 4

69 utline Introduction of Graph and Graph Database. Background of Subgraph Isomorphism. Background of Subgraph Query Processing. Background of Similarity Graph Query Processing. Background of Supergraph Query Processing.

70 Supergraph Query Processing Counterpart of subgraph query processing. Problem statement: Given a graph database D and a graph query q. Find all graphs g in D s.t. q is a supergraph of g.

71 Challenges Problem complexity: P-Complete. Same as subgraph query. Existing feature-based indexes for subgraph queries are not applicable: Inclusion logic for subgraph query If f q and f g, then q g Exclusion logic for supergraph query If f q and f g, then q g Representative work cindex (Chen et 07). Feature-based approach. GPTree (Zhang et 07) Feature-based approach. Fast sub-iso approach.

72 References [Shasha et al., PDS 02] Shasha, D., Wang, J.T.L., Giugno, R.: Algorithmics and applications of tree and graph searching. In: PDS. (2002) [Yan et al., SIGMD 04] Yan, X., Yu, P.S., Han, J.: Graph indexing based on discriminative frequent structure analysis. In: SIGMD. (2004) [He and Singh, ICDE 06] He, H., Singh, A.K.: Closure-tree: An index structure for graph queries. In: ICDE. (2006) 38 [Cheng et al., SIGMD 07] Cheng, J., Ke, Y., g, W., Lu, A.: Fg-index: towards verification- free query processing on graph databases. In: SIGMD. (2007) [Cheng et al., TDS 09] Cheng, J., Ke, Y., g, W.: Effective query processing on graph databases. ACM Trans. Database Syst. 34(1) (2009) [Jiang et al., ICDE 07] Jiang, H., Wang, H., Yu, P.S., Zhou, S.: Gstring: A novel approach for efficient search in graph databases. In: ICDE. (2007) [Zhang et al., ICDE 07] Zhang, S., Hu, M., Yang, J.: Treepi: A novel graph indexing method. In: ICDE. (2007) [Williams et al., ICDE 07] Williams, D.W., Huan, J., Wang, W.: Graph database indexing using structured graph decomposition. In: ICDE. (2007)

73 References [Zhao et al., VLDB 07] Zhao, P., Yu, J.X., Yu, P.S.: Graph indexing: Tree + delta >= graph. In: VLDB. (2007) [Zou et al., EDBT 08] Zou, L., Chen, L., Yu, J.X., Lu, Y.: A novel spectral coding in a large graph database. In: EDBT. (2008) [Shang et al., VLDB 08] Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: An efficient algorithm for testing subgraph isomorphism. In: VLDB. (2008) [Chen et al., VLDB 07] Chen, C., Yan, X., Yu, P.S., Han, J., Zhang, D.Q., Gu, X.: Towards graph containment search and indexing. In: VLDB. (2007) [Zhang et al., EDBT 09] Zhang, S., Li, J., Gao, H., Zou, Z.: A novel approach for efficient supergraph query processing on graph databases. In: EDBT. (2009) [Raymond et al., CJ 02] Raymond, J.W., Gardiner, E.J., Willett, P.: RASCAL: calculation of graph similarity using maximum common edge subgraphs. Comput. J. 45(6) (2002) [Yan et al., SIGMD 05] Yan, X., Yu, P.S., Han, J.: Substructure similarity search in graph databases. In: SIGMD Conference. (2005) [Faloutsos and Tong, ICDE 09] Faloutsos, C., Tong, H.: Large graph mining: patterns, tools and case studies. In: ICDE (2009) tutorial

74 References [Shang et al., ICDE 10] Shang, H., Zhu, K., Lin, X., Zhang, Y., Ichise, R.: Similarity Search on Supergraph Containment. In: ICDE. (2010) [Ke et al., KDD 07] Ke, Y., Cheng, J., g, W.: Correlation search in graph databases. In: KDD. (2007) [Ke et al., SDM 09] Ke, Y., Cheng, J., Yu, J.X.: Top-k correlative graph mining. In: SDM. (2009) [Ke et al. ICDM 09] Ke, Y., Cheng, J., Yu, J.X.: Efficient discovery of frequent correlated subgraph pairs. In: ICDM. (2009)

75 Thank you!

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data: Part I Instructor: Yizhou Sun yzsun@ccs.neu.edu November 12, 2013 Announcement Homework 4 will be out tonight Due on 12/2 Next class will be canceled

More information

Data Mining: Concepts and Techniques. Graph Mining. Graphs are Everywhere. Why Graph Mining? Chapter Graph mining

Data Mining: Concepts and Techniques. Graph Mining. Graphs are Everywhere. Why Graph Mining? Chapter Graph mining Data Mining: Concepts and Techniques Chapter 9 9.1. Graph mining Jiawei Han and Micheline Kamber Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj 2006 Jiawei

More information

Data Mining in Bioinformatics Day 3: Graph Mining

Data Mining in Bioinformatics Day 3: Graph Mining Graph Mining and Graph Kernels Data Mining in Bioinformatics Day 3: Graph Mining Karsten Borgwardt & Chloé-Agathe Azencott February 6 to February 17, 2012 Machine Learning and Computational Biology Research

More information

Data Mining in Bioinformatics Day 5: Graph Mining

Data Mining in Bioinformatics Day 5: Graph Mining Data Mining in Bioinformatics Day 5: Graph Mining Karsten Borgwardt February 25 to March 10 Bioinformatics Group MPIs Tübingen from Borgwardt and Yan, KDD 2008 tutorial Graph Mining and Graph Kernels,

More information

Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining

Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining Chloé-Agathe Azencott & Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institutes

More information

EGDIM - Evolving Graph Database Indexing Method

EGDIM - Evolving Graph Database Indexing Method EGDIM - Evolving Graph Database Indexing Method Shariful Islam Department of Computer Science and Engineering University of Dhaka, Bangladesh tulip.du@gmail.com Chowdhury Farhan Ahmed Department of Computer

More information

GADDI: Distance Index based Subgraph Matching in Biological Networks

GADDI: Distance Index based Subgraph Matching in Biological Networks GADDI: Distance Index based Subgraph Matching in Biological Networks Shijie Zhang, Shirong Li, and Jiong Yang Dept. of Electrical Engineering and Computer Science Case Western Reserve University 10900

More information

FAST-ON*: AN EXTENDED ALGORITHM FOR GRAPH ISOMORPHISM PROBLEM AND GRAPH QUERY PROCESSING

FAST-ON*: AN EXTENDED ALGORITHM FOR GRAPH ISOMORPHISM PROBLEM AND GRAPH QUERY PROCESSING FAST-ON*: AN EXTENDED ALORITHM FOR RAPH ISOMORPHISM PROBLEM AND RAPH QUERY PROCESSIN Mosab Hassaan and Karam ouda Faculty of Computers and Informatics, Benha University, Egypt {mosab.hassaan, karam.gouda}@fci.bu.edu.eg

More information

gspan: Graph-Based Substructure Pattern Mining

gspan: Graph-Based Substructure Pattern Mining University of Illinois at Urbana-Champaign February 3, 2017 Agenda What motivated the development of gspan? Technical Preliminaries Exploring the gspan algorithm Experimental Performance Evaluation Introduction

More information

MINING AND SEARCHING GRAPHS AND STRUCTURES

MINING AND SEARCHING GRAPHS AND STRUCTURES MINING AND SEARCHING GRAPHS AND STRUCTURES Jiawei Han Xifeng Yan Department of Computer Science University of Illinois at Urbana-Champaign Philip S. Yu IBM T. J. Watson Research Center http://ews.uiuc.edu/~xyan/tutorial/kdd06_graph.htm

More information

Efficient Subgraph Matching by Postponing Cartesian Products

Efficient Subgraph Matching by Postponing Cartesian Products Efficient Subgraph Matching by Postponing Cartesian Products Computer Science and Engineering Lijun Chang Lijun.Chang@unsw.edu.au The University of New South Wales, Australia Joint work with Fei Bi, Xuemin

More information

State of the Art in 3D Modeling

State of the Art in 3D Modeling State of the rt in 3D Modeling Gerald Farin rizona State University farin@asu.edu 3D Modeling is a discipline which is typically hosted in omputer Science Departments, but it also has roots in Mathematics

More information

Pattern Mining in Frequent Dynamic Subgraphs

Pattern Mining in Frequent Dynamic Subgraphs Pattern Mining in Frequent Dynamic Subgraphs Karsten M. Borgwardt, Hans-Peter Kriegel, Peter Wackersreuther Institute of Computer Science Ludwig-Maximilians-Universität Munich, Germany kb kriegel wackersr@dbs.ifi.lmu.de

More information

Extraction of Frequent Subgraph from Graph Database

Extraction of Frequent Subgraph from Graph Database Extraction of Frequent Subgraph from Graph Database Sakshi S. Mandke, Sheetal S. Sonawane Deparment of Computer Engineering Pune Institute of Computer Engineering, Pune, India. sakshi.mandke@cumminscollege.in;

More information

GiS: Fast Indexing and Querying of Graph Structures

GiS: Fast Indexing and Querying of Graph Structures GiS: Fast Indexing and Querying of Graph Structures Dipali Pal Praveen R. Rao Vasil Slavov {dp244@umkc.edu,raopr@umkc.edu,vgslavov@mail.umkc.edu} Technical Report UMKC-TR-DB-29-1 Abstract We propose a

More information

Mining frequent Closed Graph Pattern

Mining frequent Closed Graph Pattern Mining frequent Closed Graph Pattern Seminar aus maschninellem Lernen Referent: Yingting Fan 5.November Fachbereich 21 Institut Knowledge Engineering Prof. Fürnkranz 1 Outline Motivation and introduction

More information

SeqIndex: Indexing Sequences by Sequential Pattern Analysis

SeqIndex: Indexing Sequences by Sequential Pattern Analysis SeqIndex: Indexing Sequences by Sequential Pattern Analysis Hong Cheng Xifeng Yan Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign {hcheng3, xyan, hanj}@cs.uiuc.edu

More information

Graph Mining Sub Domains and a Framework for Indexing A Graphical Approach

Graph Mining Sub Domains and a Framework for Indexing A Graphical Approach Graph Mining Sub Domains and a Framework for Indexing A Graphical Approach K. Vivekanandan Professor BSMED A. Pankaj Moses Monickaraj (Correspoding author) Doctoral Scholar Department of Computer Science

More information

Answering Subgraph Queries over Large Graphs

Answering Subgraph Queries over Large Graphs Answering Subgraph Queries over Large Graphs Weiguo Zheng, Lei Zou, and Dongyan Zhao Peking University, Beijing, China {zhengweiguo,zoulei,zdy}@icst.pku.edu.cn Abstract. Recently, subgraph queries over

More information

Graph Indexing: A Frequent Structure-based Approach

Graph Indexing: A Frequent Structure-based Approach Graph Indexing: A Frequent Structure-based Approach Xifeng Yan Philip S. Yu Jiawei Han Department of omputer Science University of Illinois at Urbana-hampaign {xyan, hanj}@cs.uiuc.edu IBM T. J. Watson

More information

igraph: A Framework for Comparisons of Disk Based Graph Indexing Techniques

igraph: A Framework for Comparisons of Disk Based Graph Indexing Techniques igraph: A Framework for Comparisons of Disk Based Graph Indexing Techniques Wook Shin Han Department of Computer Engineering Kyungpook National University, Korea wshan@knu.ac.kr Jinsoo Lee Department of

More information

Comparative Survey of Query Processing on Graph Databases

Comparative Survey of Query Processing on Graph Databases Comparative Survey of Query Processing on Graph Databases Project Report for COP5725: Spring 2013 Group name: Sunsteeds (Sharanya Jayaraman, Srinath Viswanathan) April 25, 2013 Abstract Graph Databases

More information

Department of Electronics and Technology, Shivaji University, Kolhapur, Maharashtra, India. set algorithm. Figure 1: System Architecture

Department of Electronics and Technology, Shivaji University, Kolhapur, Maharashtra, India. set algorithm. Figure 1: System Architecture merican International Journal of Research in Science, Technology, Engineering & Mathematics vailable online at http://www.iasir.net ISSN (Print): 2328-3491, ISSN (Online): 2328-3580, ISSN (D-ROM): 2328-3629

More information

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery : Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,

More information

Towards Graph Containment Search and Indexing

Towards Graph Containment Search and Indexing Towards Graph Containment Search and Indexing Chen Chen 1 Xifeng Yan 2 Philip S. Yu 2 Jiawei Han 1 Dong-Qing Zhang 3 Xiaohui Gu 2 1 University of Illinois at Urbana-Champaign {cchen37, hanj}@cs.uiuc.edu

More information

Efficient Subgraph Search with Presorting and Indexing on Label Frequency

Efficient Subgraph Search with Presorting and Indexing on Label Frequency DEIM Forum XX-Y. Efficient Subgraph Search with Presorting and Indexing on Label Frequency Haichuan Shang Masaru Kitsuregawa Institute of Industrial Science University of Tokyo {shang,kitsure}@tkl.iis.u-tokyo.ac.jp

More information

Mining Minimal Contrast Subgraph Patterns

Mining Minimal Contrast Subgraph Patterns Mining Minimal Contrast Subgraph Patterns Roger Ming Hieng Ting James Bailey Abstract In this paper, we introduce a new type of contrast pattern, the minimal contrast subgraph. It is able to capture structural

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

gprune: A Constraint Pushing Framework for Graph Pattern Mining

gprune: A Constraint Pushing Framework for Graph Pattern Mining gprune: A Constraint Pushing Framework for Graph Pattern Mining Feida Zhu Xifeng Yan Jiawei Han Philip S. Yu Computer Science, UIUC, {feidazhu,xyan,hanj}@cs.uiuc.edu IBM T. J. Watson Research Center, psyu@us.ibm.com

More information

Sub-Graph Finding Information over Nebula Networks

Sub-Graph Finding Information over Nebula Networks ISSN (e): 2250 3005 Volume, 05 Issue, 10 October 2015 International Journal of Computational Engineering Research (IJCER) Sub-Graph Finding Information over Nebula Networks K.Eswara Rao $1, A.NagaBhushana

More information

Data mining, 4 cu Lecture 8:

Data mining, 4 cu Lecture 8: 582364 Data mining, 4 cu Lecture 8: Graph mining Spring 2010 Lecturer: Juho Rousu Teaching assistant: Taru Itäpelto Frequent Subgraph Mining Extend association rule mining to finding frequent subgraphs

More information

Mining Top K Large Structural Patterns in a Massive Network

Mining Top K Large Structural Patterns in a Massive Network Mining Top K Large Structural Patterns in a Massive Network Feida Zhu Singapore Management University fdzhu@smu.edu.sg Xifeng Yan University of California at Santa Barbara xyan@cs.ucsb.edu Qiang Qu Peking

More information

Subgraph Isomorphism. Artem Maksov, Yong Li, Reese Butler 03/04/2015

Subgraph Isomorphism. Artem Maksov, Yong Li, Reese Butler 03/04/2015 Subgraph Isomorphism Artem Maksov, Yong Li, Reese Butler 03/04/2015 Similar Graphs The two graphs below look different but are structurally the same. Definition What is Graph Isomorphism? An isomorphism

More information

Social Network Analysis as Knowledge Discovery process: a case study on Digital Bibliography

Social Network Analysis as Knowledge Discovery process: a case study on Digital Bibliography Social etwork Analysis as Knowledge Discovery process: a case study on Digital Bibliography Michele Coscia, Fosca Giannotti, Ruggero Pensa ISTI-CR Pisa, Italy Email: name.surname@isti.cnr.it Abstract Today

More information

Top-k Keyword Search Over Graphs Based On Backward Search

Top-k Keyword Search Over Graphs Based On Backward Search Top-k Keyword Search Over Graphs Based On Backward Search Jia-Hui Zeng, Jiu-Ming Huang, Shu-Qiang Yang 1College of Computer National University of Defense Technology, Changsha, China 2College of Computer

More information

GRAPH MINING AND GRAPH KERNELS

GRAPH MINING AND GRAPH KERNELS GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan* ^University of Cambridge *IBM T. J. Watson Research Center August 24, 2008 ACM SIG KDD, Las Vegas Graphs Are Everywhere

More information

Behavior Query Discovery in System-Generated Temporal Graphs

Behavior Query Discovery in System-Generated Temporal Graphs Behavior Query Discovery in System-Generated Temporal Graphs Bo Zong,, Xusheng Xiao, Zhichun Li, Zhenyu Wu, Zhiyun Qian, Xifeng Yan, Ambuj K. Singh, Guofei Jiang UC Santa Barbara NEC Labs, America UC Riverside

More information

BMC Bioinformatics. Open Access. Abstract

BMC Bioinformatics. Open Access. Abstract BMC Bioinformatics BioMed Central Research : enhancing graph searching by low support data mining techniques Alfredo Ferro*,2, Rosalba Giugno, Misael Mongiovì, Alfredo Pulvirenti, Dmitry Skripin and Dennis

More information

Subdue: Compression-Based Frequent Pattern Discovery in Graph Data

Subdue: Compression-Based Frequent Pattern Discovery in Graph Data Subdue: Compression-Based Frequent Pattern Discovery in Graph Data Nikhil S. Ketkar University of Texas at Arlington ketkar@cse.uta.edu Lawrence B. Holder University of Texas at Arlington holder@cse.uta.edu

More information

Web page recommendation using a stochastic process model

Web page recommendation using a stochastic process model Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,

More information

MARGIN: Maximal Frequent Subgraph Mining Λ

MARGIN: Maximal Frequent Subgraph Mining Λ MARGIN: Maximal Frequent Subgraph Mining Λ Lini T Thomas Satyanarayana R Valluri Kamalakar Karlapalem enter For Data Engineering, IIIT, Hyderabad flini,satyag@research.iiit.ac.in, kamal@iiit.ac.in Abstract

More information

Chapters 11 and 13, Graph Data Mining

Chapters 11 and 13, Graph Data Mining CSI 4352, Introduction to Data Mining Chapters 11 and 13, Graph Data Mining Young-Rae Cho Associate Professor Department of Computer Science Balor Universit Graph Representation Graph An ordered pair GV,E

More information

9.1. Graph Mining, Social Network Analysis, and Multirelational Data Mining. Graph Mining

9.1. Graph Mining, Social Network Analysis, and Multirelational Data Mining. Graph Mining 9 Graph Mining, Social Network Analysis, and Multirelational Data Mining 9.1 We have studied frequent-itemset mining in Chapter 5 and sequential-pattern mining in Section 3 of Chapter 8. Many scientific

More information

A New Approach To Graph Based Object Classification On Images

A New Approach To Graph Based Object Classification On Images A New Approach To Graph Based Object Classification On Images Sandhya S Krishnan,Kavitha V K P.G Scholar, Dept of CSE, BMCE, Kollam, Kerala, India Sandhya4parvathy@gmail.com Abstract: The main idea of

More information

Biclustering with δ-pcluster John Tantalo. 1. Introduction

Biclustering with δ-pcluster John Tantalo. 1. Introduction Biclustering with δ-pcluster John Tantalo 1. Introduction The subject of biclustering is chiefly concerned with locating submatrices of gene expression data that exhibit shared trends between genes. That

More information

Upper bound tighter Item caps for fast frequent itemsets mining for uncertain data Implemented using splay trees. Shashikiran V 1, Murali S 2

Upper bound tighter Item caps for fast frequent itemsets mining for uncertain data Implemented using splay trees. Shashikiran V 1, Murali S 2 Volume 117 No. 7 2017, 39-46 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu Upper bound tighter Item caps for fast frequent itemsets mining for uncertain

More information

Mining Significant Graph Patterns by Leap Search

Mining Significant Graph Patterns by Leap Search Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC) Graphs Are Everywhere Magwene et al. Genome Biology 2004 5:R100 Co-expression

More information

Discovering Frequent Geometric Subgraphs

Discovering Frequent Geometric Subgraphs Discovering Frequent Geometric Subgraphs Michihiro Kuramochi and George Karypis Department of Computer Science/Army HPC Research Center University of Minnesota 4-192 EE/CS Building, 200 Union St SE Minneapolis,

More information

An Algorithm for Frequent Pattern Mining Based On Apriori

An Algorithm for Frequent Pattern Mining Based On Apriori An Algorithm for Frequent Pattern Mining Based On Goswami D.N.*, Chaturvedi Anshu. ** Raghuvanshi C.S.*** *SOS In Computer Science Jiwaji University Gwalior ** Computer Application Department MITS Gwalior

More information

Efficient homomorphism-free enumeration of conjunctive queries

Efficient homomorphism-free enumeration of conjunctive queries Efficient homomorphism-free enumeration of conjunctive queries Jan Ramon 1, Samrat Roy 1, and Jonny Daenen 2 1 K.U.Leuven, Belgium, Jan.Ramon@cs.kuleuven.be, Samrat.Roy@cs.kuleuven.be 2 University of Hasselt,

More information

Kernel-based Similarity Search in Massive Graph Databases with Wavelet Trees

Kernel-based Similarity Search in Massive Graph Databases with Wavelet Trees Kernel-based Similarity Search in Massive Graph Databases with Wavelet Trees Yasuo Tabei Koji Tsuda Abstract Similarity search in databases of labeled graphs is a fundamental task in managing graph data

More information

WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity

WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity Unil Yun and John J. Leggett Department of Computer Science Texas A&M University College Station, Texas 7783, USA

More information

Mining Interesting Itemsets in Graph Datasets

Mining Interesting Itemsets in Graph Datasets Mining Interesting Itemsets in Graph Datasets Boris Cule Bart Goethals Tayena Hendrickx Department of Mathematics and Computer Science University of Antwerp firstname.lastname@ua.ac.be Abstract. Traditionally,

More information

A Comparative study of CARM and BBT Algorithm for Generation of Association Rules

A Comparative study of CARM and BBT Algorithm for Generation of Association Rules A Comparative study of CARM and BBT Algorithm for Generation of Association Rules Rashmi V. Mane Research Student, Shivaji University, Kolhapur rvm_tech@unishivaji.ac.in V.R.Ghorpade Principal, D.Y.Patil

More information

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the

More information

Data Mining for Knowledge Management. Association Rules

Data Mining for Knowledge Management. Association Rules 1 Data Mining for Knowledge Management Association Rules Themis Palpanas University of Trento http://disi.unitn.eu/~themis 1 Thanks for slides to: Jiawei Han George Kollios Zhenyu Lu Osmar R. Zaïane Mohammad

More information

An Approach for Finding Frequent Item Set Done By Comparison Based Technique

An Approach for Finding Frequent Item Set Done By Comparison Based Technique Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Probabilistic Graph Summarization

Probabilistic Graph Summarization Probabilistic Graph Summarization Nasrin Hassanlou, Maryam Shoaran, and Alex Thomo University of Victoria, Victoria, Canada {hassanlou,maryam,thomo}@cs.uvic.ca 1 Abstract We study group-summarization of

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References q Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Using a Hash-Based Method for Apriori-Based Graph Mining

Using a Hash-Based Method for Apriori-Based Graph Mining Using a Hash-Based Method for Apriori-Based Graph Mining Phu Chien Nguyen, Takashi Washio, Kouzou Ohara, and Hiroshi Motoda The Institute of Scientific and Industrial Research, Osaka University 8-1 Mihogaoka,

More information

Appropriate Item Partition for Improving the Mining Performance

Appropriate Item Partition for Improving the Mining Performance Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National

More information

Positive and Unlabeled Learning for Graph Classification

Positive and Unlabeled Learning for Graph Classification Positive and Unlabeled Learning for Graph Classification Yuchen Zhao Department of Computer Science University of Illinois at Chicago Chicago, IL Email: yzhao@cs.uic.edu Xiangnan Kong Department of Computer

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic

More information

CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets

CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets Jianyong Wang, Jiawei Han, Jian Pei Presentation by: Nasimeh Asgarian Department of Computing Science University of Alberta

More information

Managing and Mining Graph Data

Managing and Mining Graph Data Managing and Mining Graph Data by Charu C. Aggarwal IBM T.J. Watson Research Center Hawthorne, NY, USA Haixun Wang Microsoft Research Asia Beijing, China

More information

Adaptive Workload-based Partitioning and Replication for RDF Graphs

Adaptive Workload-based Partitioning and Replication for RDF Graphs Adaptive Workload-based Partitioning and Replication for RDF Graphs Ahmed Al-Ghezi and Lena Wiese Institute of Computer Science, University of Göttingen {ahmed.al-ghezi wiese}@cs.uni-goettingen.de Abstract.

More information

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: [35] [Rana, 3(12): December, 2014] ISSN:

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: [35] [Rana, 3(12): December, 2014] ISSN: IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY A Brief Survey on Frequent Patterns Mining of Uncertain Data Purvi Y. Rana*, Prof. Pragna Makwana, Prof. Kishori Shekokar *Student,

More information

Data Mining Query Scheduling for Apriori Common Counting

Data Mining Query Scheduling for Apriori Common Counting Data Mining Query Scheduling for Apriori Common Counting Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland {marek,

More information

A Graph-Based Approach for Mining Closed Large Itemsets

A Graph-Based Approach for Mining Closed Large Itemsets A Graph-Based Approach for Mining Closed Large Itemsets Lee-Wen Huang Dept. of Computer Science and Engineering National Sun Yat-Sen University huanglw@gmail.com Ye-In Chang Dept. of Computer Science and

More information

A Roadmap to an Enhanced Graph Based Data mining Approach for Multi-Relational Data mining

A Roadmap to an Enhanced Graph Based Data mining Approach for Multi-Relational Data mining A Roadmap to an Enhanced Graph Based Data mining Approach for Multi-Relational Data mining D.Kavinya 1 Student, Department of CSE, K.S.Rangasamy College of Technology, Tiruchengode, Tamil Nadu, India 1

More information

A NEW ASSOCIATION RULE MINING BASED ON FREQUENT ITEM SET

A NEW ASSOCIATION RULE MINING BASED ON FREQUENT ITEM SET A NEW ASSOCIATION RULE MINING BASED ON FREQUENT ITEM SET Ms. Sanober Shaikh 1 Ms. Madhuri Rao 2 and Dr. S. S. Mantha 3 1 Department of Information Technology, TSEC, Bandra (w), Mumbai s.sanober1@gmail.com

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

More information

Parallel Structural Graph Clustering

Parallel Structural Graph Clustering Parallel Structural Graph Clustering Madeleine Seeland 1, Simon A. Berger 2, Alexandros Stamatakis 2, and Stefan Kramer 1 1 Technische Universität München, Institut für Informatik/I12, 85748 Garching b.

More information

A Review on Mining Top-K High Utility Itemsets without Generating Candidates

A Review on Mining Top-K High Utility Itemsets without Generating Candidates A Review on Mining Top-K High Utility Itemsets without Generating Candidates Lekha I. Surana, Professor Vijay B. More Lekha I. Surana, Dept of Computer Engineering, MET s Institute of Engineering Nashik,

More information

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

Mining Frequent Itemsets for data streams over Weighted Sliding Windows Mining Frequent Itemsets for data streams over Weighted Sliding Windows Pauray S.M. Tsai Yao-Ming Chen Department of Computer Science and Information Engineering Minghsin University of Science and Technology

More information

Chemical Similarity Searching Using a Neural Graph Matcher

Chemical Similarity Searching Using a Neural Graph Matcher Chemical Similarity Searching Using a Neural Graph Matcher Stefan Klinger and Jim Austin Advanced Computer Architectures Group - Department of Computer Science Heslington, York, YO10 5DD - UK Abstract.

More information

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42 Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

More information

Iliya Mitov 1, Krassimira Ivanova 1, Benoit Depaire 2, Koen Vanhoof 2

Iliya Mitov 1, Krassimira Ivanova 1, Benoit Depaire 2, Koen Vanhoof 2 Iliya Mitov 1, Krassimira Ivanova 1, Benoit Depaire 2, Koen Vanhoof 2 1: Institute of Mathematics and Informatics BAS, Sofia, Bulgaria 2: Hasselt University, Belgium 1 st Int. Conf. IMMM, 23-29.10.2011,

More information

Privacy-Preserving Query over Encrypted Graph-Structured Data in Cloud Computing

Privacy-Preserving Query over Encrypted Graph-Structured Data in Cloud Computing 2011 31st International Conference on Distributed Computing Systems Privacy-Preserving Query over Encrypted Graph-Structured Data in Cloud Computing Ning Cao,ZhenyuYang, Cong Wang,KuiRen, and Wenjing Lou

More information

This paper proposes: Mining Frequent Patterns without Candidate Generation

This paper proposes: Mining Frequent Patterns without Candidate Generation Mining Frequent Patterns without Candidate Generation a paper by Jiawei Han, Jian Pei and Yiwen Yin School of Computing Science Simon Fraser University Presented by Maria Cutumisu Department of Computing

More information

Sequences Modeling and Analysis Based on Complex Network

Sequences Modeling and Analysis Based on Complex Network Sequences Modeling and Analysis Based on Complex Network Li Wan 1, Kai Shu 1, and Yu Guo 2 1 Chongqing University, China 2 Institute of Chemical Defence People Libration Army {wanli,shukai}@cqu.edu.cn

More information

A Modern Search Technique for Frequent Itemset using FP Tree

A Modern Search Technique for Frequent Itemset using FP Tree A Modern Search Technique for Frequent Itemset using FP Tree Megha Garg Research Scholar, Department of Computer Science & Engineering J.C.D.I.T.M, Sirsa, Haryana, India Krishan Kumar Department of Computer

More information

Overlapping Communities

Overlapping Communities Yangyang Hou, Mu Wang, Yongyang Yu Purdue Univiersity Department of Computer Science April 25, 2013 Overview Datasets Algorithm I Algorithm II Algorithm III Evaluation Overview Graph models of many real

More information

A. Papadopoulos, G. Pallis, M. D. Dikaiakos. Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks

A. Papadopoulos, G. Pallis, M. D. Dikaiakos. Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks A. Papadopoulos, G. Pallis, M. D. Dikaiakos Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks IEEE/WIC/ACM International Conference on Web Intelligence Nov.

More information

Chapter 2. Related Work

Chapter 2. Related Work Chapter 2 Related Work There are three areas of research highly related to our exploration in this dissertation, namely sequential pattern mining, multiple alignment, and approximate frequent pattern mining.

More information

Review Article Performance Evaluation of Frequent Subgraph Discovery Techniques

Review Article Performance Evaluation of Frequent Subgraph Discovery Techniques Mathematical Problems in Engineering, rticle ID 869198, 6 pages http://dx.doi.org/10.1155/2014/869198 Review rticle Performance Evaluation of Frequent Subgraph Discovery Techniques Saif Ur Rehman, 1 Sohail

More information

2. Discovery of Association Rules

2. Discovery of Association Rules 2. Discovery of Association Rules Part I Motivation: market basket data Basic notions: association rule, frequency and confidence Problem of association rule mining (Sub)problem of frequent set mining

More information

A Comparative Study on Exact Triangle Counting Algorithms on the GPU

A Comparative Study on Exact Triangle Counting Algorithms on the GPU A Comparative Study on Exact Triangle Counting Algorithms on the GPU Leyuan Wang, Yangzihao Wang, Carl Yang, John D. Owens University of California, Davis, CA, USA 31 st May 2016 L. Wang, Y. Wang, C. Yang,

More information

An Efficient Algorithm for finding high utility itemsets from online sell

An Efficient Algorithm for finding high utility itemsets from online sell An Efficient Algorithm for finding high utility itemsets from online sell Sarode Nutan S, Kothavle Suhas R 1 Department of Computer Engineering, ICOER, Maharashtra, India 2 Department of Computer Engineering,

More information

Mining Frequent Patterns without Candidate Generation

Mining Frequent Patterns without Candidate Generation Mining Frequent Patterns without Candidate Generation Outline of the Presentation Outline Frequent Pattern Mining: Problem statement and an example Review of Apriori like Approaches FP Growth: Overview

More information

Graph Mining: Repository vs. Canonical Form

Graph Mining: Repository vs. Canonical Form Graph Mining: Repository vs. Canonical Form Christian Borgelt and Mathias Fiedler European Center for Soft Computing c/ Gonzalo Gutiérrez Quirós s/n, 336 Mieres, Spain christian.borgelt@softcomputing.es,

More information

A Hierarchical Document Clustering Approach with Frequent Itemsets

A Hierarchical Document Clustering Approach with Frequent Itemsets A Hierarchical Document Clustering Approach with Frequent Itemsets Cheng-Jhe Lee, Chiun-Chieh Hsu, and Da-Ren Chen Abstract In order to effectively retrieve required information from the large amount of

More information

ANALYSIS OF DENSE AND SPARSE PATTERNS TO IMPROVE MINING EFFICIENCY

ANALYSIS OF DENSE AND SPARSE PATTERNS TO IMPROVE MINING EFFICIENCY ANALYSIS OF DENSE AND SPARSE PATTERNS TO IMPROVE MINING EFFICIENCY A. Veeramuthu Department of Information Technology, Sathyabama University, Chennai India E-Mail: aveeramuthu@gmail.com ABSTRACT Generally,

More information

Maintenance of the Prelarge Trees for Record Deletion

Maintenance of the Prelarge Trees for Record Deletion 12th WSEAS Int. Conf. on APPLIED MATHEMATICS, Cairo, Egypt, December 29-31, 2007 105 Maintenance of the Prelarge Trees for Record Deletion Chun-Wei Lin, Tzung-Pei Hong, and Wen-Hsiang Lu Department of

More information

RHUIET : Discovery of Rare High Utility Itemsets using Enumeration Tree

RHUIET : Discovery of Rare High Utility Itemsets using Enumeration Tree International Journal for Research in Engineering Application & Management (IJREAM) ISSN : 2454-915 Vol-4, Issue-3, June 218 RHUIET : Discovery of Rare High Utility Itemsets using Enumeration Tree Mrs.

More information

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Sequence Data Sequence Database: Timeline 10 15 20 25 30 35 Object Timestamp Events A 10 2, 3, 5 A 20 6, 1 A 23 1 B 11 4, 5, 6 B

More information

Keyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan

Keyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan Keyword search in relational databases By SO Tsz Yan Amanda & HON Ka Lam Ethan 1 Introduction Ubiquitous relational databases Need to know SQL and database structure Hard to define an object 2 Query representation

More information

KeyLabel Algorithms for Keyword Search in Large Graphs

KeyLabel Algorithms for Keyword Search in Large Graphs KeyLabel Algorithms for Keyword Search in Large Graphs Yue Wang, Ke Wang, Ada Wai-Chee Fu, and Raymond Chi-Wing Wong School of Computing Science, Simon Fraser University Email: {ywa138, wangk }@cs.sfu.ca

More information

Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar

Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar School of Computer Science Faculty of Science University of Windsor How Big is this Big Data? 40 Billion Instagram Photos 300 Hours

More information

Learning decomposable models with a bounded clique size

Learning decomposable models with a bounded clique size Learning decomposable models with a bounded clique size Achievements 2014-2016 Aritz Pérez Basque Center for Applied Mathematics Bilbao, March, 2016 Outline 1 Motivation and background 2 The problem 3

More information