Extraction of Frequent Subgraph from Graph Database
|
|
- Sharleen Floyd
- 6 years ago
- Views:
Transcription
1 Extraction of Frequent Subgraph from Graph Database Sakshi S. Mandke, Sheetal S. Sonawane Deparment of Computer Engineering Pune Institute of Computer Engineering, Pune, India. Abstract Graphs are promising abstraction of complex structured and semi-structured data. Graph mining techniques extract, analyze and summarize significant and useful information from the graph databases. Finding frequent subgraph from graph database is an essence of graph mining. Sometimes the mined subgraphs are large in numbers, posing difficulty in selecting significant subgraph. Every frequent subgraph is not always significant from the application perspective. This paper proposes an innovative concept to extract significant subgraphs. Our method does this in two stages. In the first stage, frequent subgraphs are identified using frequency threshold ( ϴ), which is an input parameter. In the second stage, feature vectors of subgraphs are generated to calculate its statistical significance. P-value is measure of statistical significance. Key terms Frequent subgraphs mining, feature selection, random walk on graph, statistical significance. I. Introduction Complex data can be effectively represented in graphs. Many application areas such as social networking, web links, bioinformatics, chemistry etc. uses graphs to represent complex data. Graph consists of set of vertices and the edges connecting it. For example, in chemistry, molecule consists of atoms and bonds that are represented in graphs as vertex and edges respectively. In web link, users are represented as nodes and communication links between them are represented as edges. Graph mining can be applied on single graph or series of graphs. A graph database consists of collection of many graphs. Let G B is graph dataset such as G B ={G 1, G 2,,..., G n }. Each graph G i = { V i, E i } is collection of set of vertices and set of edges connecting them. V= {v 1, v 2, v 3,..., v k } and E ={(u,v) u,v ϵ V}. A graph g is subgraph of G, if there is isomorphism from g to G. A support of g is number of graphs in G B where g is subgraph. A subgraph said to be frequent if its support is greater than or equal to user defined frequency threshold ϴ. Extraction of frequent substructures from series of graph database is required in many applications. For example, in chemistry, frequent subgraph mining is aimed to analyze large collections of molecules to find some regularity among molecules of a specific class. Another application can be found in web log files. Web log file are analyzed to search set of activities carried out by users, such as frequently accessed URLs, common group interactions, and so on. Numerous methods are developed to mine frequent subgraphs from graph database. However, in frequent subgraph mining has to face few challenges. The mined patterns may be large in numbers, and every subgraph may not be significant. A frequency parameter not always sufficient to categorize graphs efficiently. Other graph properties may also help in categorization of the graph. For example, benzene is common frequent subgraph in chemistry molecule dataset which is not effective as it does not indicate any biological or chemical activity. Significance of graph depends on the graph data characteristics. The domain specific or topological features are therefore being viewed as reference point to find significance of graphs. Feature analysis helps in reducing answer set and finding significant subgraphs. Extracting feature based frequent subgraph solves the problem of quality selection of frequent subgraph. Page 309
2 Figure 1: Overall Approach Our work is to filter answer set of frequent subgraphs by calculating its statistical significance. As shown in figure 1, firstly frequent subgraphs are extracted and then analyze these graphs in feature domain. Stastical significance refers to difference between samples under observation are real or they are exist just by chance. P value is measure of statistical significance. P value is probability of differences between observed and real. In graph database pvalue is definedd as: Give a graph g and observed frequency threshold µ 0 is statistical significant if probability of its occurrences in random database with frequency µ µ 0.[15] The remainder of the paper is organized as follows. Section II describes related work. Section III presents design of proposed work. Datasets are discussed in section IV. Results are discussed in section V. Conclusion in section VI. II. Related Work Many algorithms based on frequent subgraphs mining have been developed, such as AGM [6], FSG [10], gspan [21], SUBDUE [4], FFSM[7], MoFa [1] and Gaston [13]. Thesee algorithms are broadly classified into apriori- based algorithms and pattern-growth algorithms. In apriori based approach a set of k subgraphs at one level are consider first before generating k+ +1 subgraphs of next level. It uses breadth first search approach to explore graphs of next level. Pattern growth approach uses depth first search to generate subgraph candidates. In pattern growth approach each subgraph g is extended recursively to find all its subgraphs. Various FSM algorithms are developed in last past decades. Now in recent years, research is focused to optimize the result set of FSM to improve quality of it. In survey on graph miningng C. Jiang [2] noted some issues related to FSM which are still in research. He noted that there is need of reducing size of answer set generated by FSM algorithms. In many cases, as number of subgraphs from result set are loo large it is difficult to analyze them individually. Similarly, in some cases redundant subgraphs are present in large result set. Different approaches like approximate frequent subgraphs, closed frequent subgraphs, maximal frequent subgraphs and discriminative subgraphs are useful to address reducing size of subgraph. Defining compact subgraph without disturbing its importance for specific application is difficult. He also noted that feature selection can be incorporated in frequent subgraph mining process. It is useful to achieve better classification using frequent subgraph based classifier. Frequent subgraph mining can be made application specific by applying domain knowledge. In this case, features are used as mining parameters. It is difficult to select suitable parameters for given application as different features are available. In third issue he suggested that t different isomorphism test can be applied, for finding subgraphs. For example instead of exact matching approximate matching concept can be used. SUBDUE [4] algorithm uses heuristic beam search using domain knowledge to reduce search space. GREW [9], gapprox [3], RAM[22] are algorithms which uses approximate measures to generate result set. Above first two issues can be solved by applying feature analysis on graphs. But selecting parameter for significant mining is difficult. Significance parameter may change with an application. Page ranking, graph classification, frequent subgraph mining are the areas in which feature based analysis is in research. Yan and Han [17] presented pattern based ndexing in GIndex to achieve fast graph search. He and Singh proposed a GraphRank [5] which calculates statistical significance of subgraph. Subgraphs are converted into feature vectors for calculating its stastical significance using Pvalue. Gang Li [11], proposed graph Classification method based on Topological and Label Attributes. Cluster component can be used as discriminative property for graph classification is proposed by X. Yan [18]. CORK[12] uses gspan frequent subgraph mining algorithm to generate binary feature vectors for classification. Few algorithms exist that mines significant subgraphs. Milto et al.[20] proposed algorithm that Page 310
3 mines motifs as graph pattern in randomized networks. They use p value calculation to decide significance of pattern. Yan et al. [19] developed a mining framework for mining significant patterns using structural leap search and frequency descending mining concepts. GraphSig [15, 16] method mines the statistical significant subgraphs from the subgraphs at low frequency threshold. Using random walk on graph concepts graphs are converted into feature vectors. P-value of each subgraph is calculated to find statistical significance in feature space. A. Feature Vector Generation To find the feature vectors of mined subgraphs random walk is applied on it. Random walk starts from one node and it keeps jumping over all other nodes within graph. Each neighbour has an equal probability for jumping. In our work we are combining techniques mentioned in GraphRank[5] and GraphSig[15]. To preserve more structural information in subgraph feature vector, we are implementing random walk technique on subgraphs. Stastical significance of subgraph feature vector is then calculated using Pvalue. III. Design of proposed work Figure 3 outlines the proposed idea for finding significant frequent subgraphs. Existing algorithm, like Fast Frequent Subgraph algorithm [7] is applied to extract frequent subgraphs. Figure 3 a: Sample graphs Figure 3b: Frequent subgraphs with frequency threshold is 3 Figure 2: Block Diagram Sample graph database and its frequent subgraphs are illustrated in figure 3a and 3b respectively. Random walk on mined subgraphs is applied to convert them into feature vector. Statistical significance of these feature vectors is then calculated using Pvalue. Feature vector generation and its significance calculation are described in following subsection. A random walk on graph of length L on one graph is a set of X1, X2, X3,Xn random variables where X1= root vertex and Xi+1is neighbouring vertex of Xi and it is chosen uniformly at random. In random walk while traversing from one node to its neighbourhood node s features are captured. Features may consist of nodes, edges, or small subgraph. Even some pharmacophoric features can also be considered as feature. Here, edge type (NNP- node to node pair) is considered as feature [1]. For subgraph having n nodes, n number of Page 311
4 vectors will be generated. All the edges noted as column in feature vector. If specific edge is not present then 0 is inserted in row. After counting all NNP types during random walk; frequency of NNP is calculated. Value of NNP is noted in feature vector as: Value of NNP= Value of NNP is truncated to make it more Starting Node C-1-S S-1-N N-1-O C S N O traceable. calculated. First, probability density function of vector PDF(x) is computed using prior probabilities of features. In prior probabilities matrix each row represents one feature component (in our case, NNP-types). Xij element within prior probability matrix represents feature i found in subgraph feature vectors dataset at least j number of times. NNPs C-1-B C-1-A B-1-A A-1-B Table III: Prior-probability Matrix Probability of feature vector in random vector database can be expressed using joint probability: Figure 4: Sample subgraph Table I: Random walk on graph shown in figure 4. Feature vectors extracted from subgraphs are further analyzed. Subgraph represented in single feature vector by taking floor of values stored in feature vector matrix of subgraph. Finally subgraph is represented in one feature vector in which each column represents frequency count of one NNPtype. Floor of matrix: Floor([x 1,x 2,..., x n ], [y 1, y 2,..., y n ], [z 1,z 2,..., z n ]...)=[Min(x i, y i, z i,...))] for all i=1...n. P(x) = (,.. ) Where P (xi) is the probability that element i occurs at least yi times. Example: P (7, 7, 6, 0) =P(C-1-B 7) P(C-1-A 7) P (B-1-A 3) P (A-1-B) 0 = = Binomial distribution is used to measure frequency of feature vector in database. A random histogram can be viewed as a trial and x occurring in the histogram is success. Number of trials for vector x on database depends on number of histograms. Example: Floor([2,4,2],[2,3,3],[2,2,4])=[2,2,2] P-value(x, µ0) = µ binomial(p(x), i)[1] C-1-B C-1-A B-1-A A-1-B g g g g g g Table II: Subgraph feature vector dataset B. Calculating significance of feature vector In this section, we explain p-value calculation on feature vector of subgraph. The occurrences of each feature vector in random graph database are Lower the Pvalue higher is significance. Algorithm1: CalSignificance (G, maxpvalue) Input: G is a subgraph database with support of each subgraph. MaxPvalue is the p-value threshold, support of each subgraph. Output: O is the answer set of all significant subgraphs. D ø O ø For each g G do for each node in g do Page 312
5 Dg Dg + RWR (g) X X + Vector(floor(Dg)) for each NNP-type nnp in G do for i=1 to G do for k=1 to m do Pnnp (k) {probability (nnp) count of NNP at k th position and Value (G ik ) >= k} for each g in G do Pval=Calculate value(xg, g_support) if Pvalue maxpval then O O+g IV. Datasets In chemistry, molecules are represented in graphs and are analyzed using graph mining techniques. Extraction of frequent substructures from chemical database is required in many of the applications in chemistry domain such as drug discovery. Figure 5: Cyclohexene (C 6 H 10 ) compound in graph. Hydrogen s are implicit in graph. We are testing our experiment on chemical graph datasets. Three different datasets are used. The first dataset is DTP-AIDS Antiviral Screen 1 chemical compound dataset from National Chemical Institute (NCI/NIH). Compounds are divided into three categories on the basis of their antiviral activity. Compound which provides at least 50% protections are classified as CM (Confirm moderately active) and which provides 100% protections are listed as CA (Confirm active).other compounds are listed as CI (Confirm Inactive). Second dataset is anticancer compound dataset from pubchem 2. They are classified into two classes active and inactive. Third dataset is PTE 3 - Predicative Toxicology Evaluation compound dataset by NIEHS. It contains total 340 chemical compounds. 1 http : //dtp.nci.nih.gov/docs/aids/aids data.html 2 Chemical data represented in special different formats such as.sdf,.mol,.cml, and.smile etc. Tools like JoeLib[8], OpenBabel[14] are useful to convert these files format into different file format. V. Discussion about expected result We are implementing our algorithm in Java. The experiments will be performed on a 3.2GHz, 8GB memory PC running Linux Fedora 17. We are using FFSM algorithm[7] to generate frequent subgraphs from graph database. P-value often ranges from 0.01 to 0.1. If subgraph has pvalue less than 0.01 then it is very strong significant. If subgraph has pvalue<= 0.01 and >=.05 then it is strong significant. Subgraph also consider as significant if its pvalue is 0.1. Stastical significance calculation will improve result set. All insignificant subgraphs will be filtered out by calculating p-value. When numbers of frequent subgraphs are large in numbers then this filtering process is more effective. For example, as shown in figure 6, if numbers of frequent subgraphs are then significant subgraphs will not be more than Result set will be reducing by 10%. Thus, some subgraphs which exist just by chance will be filtered out. Running time also increase linearly with increasing number of frequent subgraphs. Freguent subgraphs VI. Conclusion All frequent subgraphs are not always significant one. There is need of one more filtering process. Feature analysis using random walks on graph 3 /PTE MaxPvalue=0. 01 MaxPvalue=0. 1 Significant Frequent Subgraph Figure 6: Frequent subgraphs Vs Significant frequent subgraphs Page 313
6 preserves more structural information. P-value calculation provides statistical significance of feature vector of graph. Quality and quantity of result set will be improved by applying above experiment. Significant feature vectors further can be given as input to classifier for classification. References [1] Borglet, C., & Berlthold, M. (November 2002). Mining Molecular Fragments: Finding Relevant Substructures of Molecules. IEEE International conference on Data Mining, (pp ). Maebashi City, Japan. [2] C. Jiang, F. C. (2004). A Survey of Frequent Subgraph Mining Algorithms. The Knowledge Engineering Review, Cambridge University Press. [3] C., C., Yan, X. Z., & Han., J. (2007). gapprox:mining Frequent Approximate Patterns from Massive Network. 7th IEEE International Conference on Data Mining, (pp ). [4] Cook, D. J., & Holder, L. B. (1994). Substructure Discovery Using Minimum Description Length and Background Knowledge. Journal of Artificial Intelligence Research, 1: [5] He, H., & Singh, A. (2006). "GraphRank: Stastical Modeling and Mining of Significant Subgraphs in the Feature Space". 6th International Conference on Data Mining IEEE Computer Society, (pp ). Washington, DC, US. [6] Inokuchi, A., Wahio, T., & Motoda, H. (2000). An Apriory based Algorithm for Mining Frequent Substructures from Graph Data. PKDD'00, (pp ). [7] J. Huan, Wang, W., & Prins. (2003). Efficient Mining of Frequent Subgraph in Presence of Isomorphism. International Conference on Data Mining, (pp ). [8] JoeLib: A JAva Based Computational Chemistry Pacakge. (2009). Wilhwlm-Schickard- Insitute for Computer Science. Tubinge, Germany. [9] Kuramochi, M. a. (2004). GREW: Scalable Frequent Subgraph Discovery Algorithm. 4th IEEE International Conference on Data Mining, (pp ). [10] Kuramochi, M., & Karypis, G. (2001). Frequent Subgraph discovery. ICDM, (pp ). [11] Li, G., Semerci, M., Yenar, B., & J.Zaki, M. (2011, August). "Graph Classification via Topological and Label Attributes". 9th Workshop on Mining and Learning with Graphs. SIGKDD. [12] M.Thoma, H. C.-P. (October,2010). "Descriminative Frequent Subgraph Mining with Optimally Garuntees.". Statistical Analysis and Data Mining, (pp. 3(5): ). [13] Nijssen, S., & Kok, J. N. (2004). The Gaston tool for frequent Subgraph Mining. International Workshop on Graph-Based Tools. Amsterdam, the Netherlands: Elsevier. [14] OpenBabel An open chemical toolbox. [15] Ranu, S., & Singh, A. (April, 2 009). "GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Databases". 25th IEEE International Conference on Data Engineering. (ICDE). [16] Ranu, S., & Singh, K. (2009). " Mining Statistically Significant Molecular Sub-structures for Efficient Molecular Classification". Journal of Chemical Information and Modeling, 49, [17] X. Yan, P. Y. (2004). "GrapghIndexing : a frequent structure- based approach" ACM SIGMOD (pp ). SIGMOD. [18] Xifeng Yan, F. Z. (2006). "Featur e-based similarity Search in graph structures.". ACM transaction on Database System, (pp. 31(4): ). [19] Xifeng Yan, H. C. (2008). Mining Significant graph patterns by leap search. SIGMOD, (pp ). [20] Y. Chi, Y. Y. (2003). Indexing and min ing free trees. ICDM. Page 314
7 [21] Yan, X., & Han, J. (2002). "gsapn: Graph - Based Substructure Pattern Mining ". IEEE Computer Society. Washington, DC,USA: ICDM'02. [22] Zhang, S., & Yang, J. (2008). RAM: Randomized Approximate Graph Mining. 20th International Conference on Scientific and Statistical Database Management, (pp ). Page 315
Data Mining in Bioinformatics Day 3: Graph Mining
Graph Mining and Graph Kernels Data Mining in Bioinformatics Day 3: Graph Mining Karsten Borgwardt & Chloé-Agathe Azencott February 6 to February 17, 2012 Machine Learning and Computational Biology Research
More informationData Mining in Bioinformatics Day 5: Graph Mining
Data Mining in Bioinformatics Day 5: Graph Mining Karsten Borgwardt February 25 to March 10 Bioinformatics Group MPIs Tübingen from Borgwardt and Yan, KDD 2008 tutorial Graph Mining and Graph Kernels,
More informationData Mining in Bioinformatics Day 5: Frequent Subgraph Mining
Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining Chloé-Agathe Azencott & Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institutes
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data: Part I Instructor: Yizhou Sun yzsun@ccs.neu.edu November 12, 2013 Announcement Homework 4 will be out tonight Due on 12/2 Next class will be canceled
More informationSubdue: Compression-Based Frequent Pattern Discovery in Graph Data
Subdue: Compression-Based Frequent Pattern Discovery in Graph Data Nikhil S. Ketkar University of Texas at Arlington ketkar@cse.uta.edu Lawrence B. Holder University of Texas at Arlington holder@cse.uta.edu
More informationPattern Mining in Frequent Dynamic Subgraphs
Pattern Mining in Frequent Dynamic Subgraphs Karsten M. Borgwardt, Hans-Peter Kriegel, Peter Wackersreuther Institute of Computer Science Ludwig-Maximilians-Universität Munich, Germany kb kriegel wackersr@dbs.ifi.lmu.de
More informationSurvey on Graph Query Processing on Graph Database. Presented by FAN Zhe
Survey on Graph Query Processing on Graph Database Presented by FA Zhe utline Introduction of Graph and Graph Database. Background of Subgraph Isomorphism. Background of Subgraph Query Processing. Background
More informationA Roadmap to an Enhanced Graph Based Data mining Approach for Multi-Relational Data mining
A Roadmap to an Enhanced Graph Based Data mining Approach for Multi-Relational Data mining D.Kavinya 1 Student, Department of CSE, K.S.Rangasamy College of Technology, Tiruchengode, Tamil Nadu, India 1
More informationGraph Mining: Repository vs. Canonical Form
Graph Mining: Repository vs. Canonical Form Christian Borgelt and Mathias Fiedler European Center for Soft Computing c/ Gonzalo Gutiérrez Quirós s/n, 336 Mieres, Spain christian.borgelt@softcomputing.es,
More informationCanonical Forms for Frequent Graph Mining
Canonical Forms for Frequent Graph Mining Christian Borgelt Dept. of Knowledge Processing and Language Engineering Otto-von-Guericke-University of Magdeburg borgelt@iws.cs.uni-magdeburg.de Summary. A core
More informationFP-GROWTH BASED NEW NORMALIZATION TECHNIQUE FOR SUBGRAPH RANKING
FP-GROWTH BASED NEW NORMALIZATION TECHNIQUE FOR SUBGRAPH RANKING E.R.Naganathan 1 S.Narayanan 2 K.Ramesh kumar 3 1 Department of Computer Applications, Velammal Engineering College Ambattur-Redhills Road,
More informationGraph-based Learning. Larry Holder Computer Science and Engineering University of Texas at Arlington
Graph-based Learning Larry Holder Computer Science and Engineering University of Texas at Arlingt 1 Graph-based Learning Multi-relatial data mining and learning SUBDUE graph-based relatial learner Discovery
More informationData Mining: Concepts and Techniques. Graph Mining. Graphs are Everywhere. Why Graph Mining? Chapter Graph mining
Data Mining: Concepts and Techniques Chapter 9 9.1. Graph mining Jiawei Han and Micheline Kamber Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj 2006 Jiawei
More informationReview Article Performance Evaluation of Frequent Subgraph Discovery Techniques
Mathematical Problems in Engineering, rticle ID 869198, 6 pages http://dx.doi.org/10.1155/2014/869198 Review rticle Performance Evaluation of Frequent Subgraph Discovery Techniques Saif Ur Rehman, 1 Sohail
More informationMining Interesting Itemsets in Graph Datasets
Mining Interesting Itemsets in Graph Datasets Boris Cule Bart Goethals Tayena Hendrickx Department of Mathematics and Computer Science University of Antwerp firstname.lastname@ua.ac.be Abstract. Traditionally,
More informationGRAPH MINING AND GRAPH KERNELS
GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan* ^University of Cambridge *IBM T. J. Watson Research Center August 24, 2008 ACM SIG KDD, Las Vegas Graphs Are Everywhere
More informationData mining, 4 cu Lecture 8:
582364 Data mining, 4 cu Lecture 8: Graph mining Spring 2010 Lecturer: Juho Rousu Teaching assistant: Taru Itäpelto Frequent Subgraph Mining Extend association rule mining to finding frequent subgraphs
More informationMINING GRAPH DATA EDITED BY. Diane J. Cook School of Electrical Engineering and Computei' Science Washington State University Puliman, Washington
MINING GRAPH DATA EDITED BY Diane J. Cook School of Electrical Engineering and Computei' Science Washington State University Puliman, Washington Lawrence B. Holder School of Electrical Engineering and
More informationCombining Ring Extensions and Canonical Form Pruning
Combining Ring Extensions and Canonical Form Pruning Christian Borgelt European Center for Soft Computing c/ Gonzalo Gutiérrez Quirós s/n, 00 Mieres, Spain christian.borgelt@softcomputing.es Abstract.
More informationUAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA
UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University
More informationManaging and Mining Graph Data
Managing and Mining Graph Data by Charu C. Aggarwal IBM T.J. Watson Research Center Hawthorne, NY, USA Haixun Wang Microsoft Research Asia Beijing, China
More informationInnovative Study to the Graph-based Data Mining: Application of the Data Mining
Innovative Study to the Graph-based Data Mining: Application of the Data Mining Amit Kr. Mishra, Pradeep Gupta, Ashutosh Bhatt, Jainendra Singh Rana Abstract Graph-based data mining represents a collection
More informationA New Approach To Graph Based Object Classification On Images
A New Approach To Graph Based Object Classification On Images Sandhya S Krishnan,Kavitha V K P.G Scholar, Dept of CSE, BMCE, Kollam, Kerala, India Sandhya4parvathy@gmail.com Abstract: The main idea of
More informationMINING AND SEARCHING GRAPHS AND STRUCTURES
MINING AND SEARCHING GRAPHS AND STRUCTURES Jiawei Han Xifeng Yan Department of Computer Science University of Illinois at Urbana-Champaign Philip S. Yu IBM T. J. Watson Research Center http://ews.uiuc.edu/~xyan/tutorial/kdd06_graph.htm
More informationMining Minimal Contrast Subgraph Patterns
Mining Minimal Contrast Subgraph Patterns Roger Ming Hieng Ting James Bailey Abstract In this paper, we introduce a new type of contrast pattern, the minimal contrast subgraph. It is able to capture structural
More informationgspan: Graph-Based Substructure Pattern Mining
University of Illinois at Urbana-Champaign February 3, 2017 Agenda What motivated the development of gspan? Technical Preliminaries Exploring the gspan algorithm Experimental Performance Evaluation Introduction
More informationGraph Mining Sub Domains and a Framework for Indexing A Graphical Approach
Graph Mining Sub Domains and a Framework for Indexing A Graphical Approach K. Vivekanandan Professor BSMED A. Pankaj Moses Monickaraj (Correspoding author) Doctoral Scholar Department of Computer Science
More informationUsing Graphs to Improve Activity Prediction in Smart Environments based on Motion Sensor Data
Using Graphs to Improve Activity Prediction in Smart Environments based on Motion Sensor Data S. Seth Long and Lawrence B. Holder Washington State University Abstract. Activity Recognition in Smart Environments
More informationgprune: A Constraint Pushing Framework for Graph Pattern Mining
gprune: A Constraint Pushing Framework for Graph Pattern Mining Feida Zhu Xifeng Yan Jiawei Han Philip S. Yu Computer Science, UIUC, {feidazhu,xyan,hanj}@cs.uiuc.edu IBM T. J. Watson Research Center, psyu@us.ibm.com
More informationUsing a Hash-Based Method for Apriori-Based Graph Mining
Using a Hash-Based Method for Apriori-Based Graph Mining Phu Chien Nguyen, Takashi Washio, Kouzou Ohara, and Hiroshi Motoda The Institute of Scientific and Industrial Research, Osaka University 8-1 Mihogaoka,
More informationEfficient homomorphism-free enumeration of conjunctive queries
Efficient homomorphism-free enumeration of conjunctive queries Jan Ramon 1, Samrat Roy 1, and Jonny Daenen 2 1 K.U.Leuven, Belgium, Jan.Ramon@cs.kuleuven.be, Samrat.Roy@cs.kuleuven.be 2 University of Hasselt,
More informationcmfsm: a scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining
Yang et al. BMC Bioinformatics 2018, 19(Suppl 4):98 https://doi.org/10.1186/s12859-018-2071-z RESEARCH cmfsm: a scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining Open Access Shunyun
More informationChapters 11 and 13, Graph Data Mining
CSI 4352, Introduction to Data Mining Chapters 11 and 13, Graph Data Mining Young-Rae Cho Associate Professor Department of Computer Science Balor Universit Graph Representation Graph An ordered pair GV,E
More informationDiscovering Frequent Topological Structures from Graph Datasets
Discovering Frequent Topological Structures from Graph Datasets R. Jin C. Wang D. Polshakov S. Parthasarathy G. Agrawal Department of Computer Science and Engineering Ohio State University, Columbus OH
More informationEGDIM - Evolving Graph Database Indexing Method
EGDIM - Evolving Graph Database Indexing Method Shariful Islam Department of Computer Science and Engineering University of Dhaka, Bangladesh tulip.du@gmail.com Chowdhury Farhan Ahmed Department of Computer
More informationMonotone Constraints in Frequent Tree Mining
Monotone Constraints in Frequent Tree Mining Jeroen De Knijf Ad Feelders Abstract Recent studies show that using constraints that can be pushed into the mining process, substantially improves the performance
More informationImproved Frequent Pattern Mining Algorithm with Indexing
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.
More informationDual Active Feature and Sample Selection for Graph Classification
Dual Active Feature and Sample Selection for Graph Classification Xiangnan Kong University of Illinois at Chicago Chicago, IL, USA xkong4@uic.edu Wei Fan IBM T. J. Watson Research Hawthorn, NY, USA weifan@us.ibm.com
More informationNumeric Ranges Handling for Graph Based Knowledge Discovery Oscar E. Romero A., Jesús A. González B., Lawrence B. Holder
Numeric Ranges Handling for Graph Based Knowledge Discovery Oscar E. Romero A., Jesús A. González B., Lawrence B. Holder Reporte Técnico No. CCC-08-003 27 de Febrero de 2008 2008 Coordinación de Ciencias
More informationIn Mathematics and computer science, the study of graphs is graph theory where graphs are data structures used to model
ISSN: 0975-766X CODEN: IJPTFI Available Online through Research Article www.ijptonline.com A BRIEF REVIEW ON APPLICATION OF GRAPH THEORY IN DATA MINING Abhinav Chanana*, Tanya Rastogi, M.Yamuna VIT University,
More informationA Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining
A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining Miss. Rituja M. Zagade Computer Engineering Department,JSPM,NTC RSSOER,Savitribai Phule Pune University Pune,India
More informationA Quantitative Comparison of the Subgraph Miners MoFa, gspan, FFSM, and Gaston
A Quantitative omparison of the Subgraph Miners MoFa,,, and Marc Wörlein, Thorsten Meinl, Ingrid Fischer, and Michael Philippsen University of Erlangen-Nuremberg, omputer Science Department 2, Martensstr.
More informationIliya Mitov 1, Krassimira Ivanova 1, Benoit Depaire 2, Koen Vanhoof 2
Iliya Mitov 1, Krassimira Ivanova 1, Benoit Depaire 2, Koen Vanhoof 2 1: Institute of Mathematics and Informatics BAS, Sofia, Bulgaria 2: Hasselt University, Belgium 1 st Int. Conf. IMMM, 23-29.10.2011,
More informationFrequent Pattern Mining On Un-rooted Unordered Tree Using FRESTM
Frequent Pattern Mining On Un-rooted Unordered Tree Using FRESTM Dhananjay G. Telavekar 1, Hemant A. Tirmare 2 1M.Tech. Scholar, Dhananjay G. Telavekar, Dept. Of Technology, Shivaji University, Kolhapur,
More informationSearching and ranking similar clusters of polyhedra in inorganic crystal structures
Searching and ranking similar clusters of polyhedra in inorganic crystal structures Hans-Joachim Klein Institut f. Informatik Christian-Albrechts-Universität Kiel Germany 2 Definition: A crystal is an
More informationLower and upper queries for graph-mining
Lower and upper queries for graph-mining Amina Kemmar, Yahia Lebbah, Samir Loudni, Mohammed Ouali To cite this version: Amina Kemmar, Yahia Lebbah, Samir Loudni, Mohammed Ouali. Lower and upper queries
More informationPositive and Unlabeled Learning for Graph Classification
Positive and Unlabeled Learning for Graph Classification Yuchen Zhao Department of Computer Science University of Illinois at Chicago Chicago, IL Email: yzhao@cs.uic.edu Xiangnan Kong Department of Computer
More informationFrequent Subgraph Retrieval in Geometric Graph Databases
Frequent Subgraph Retrieval in Geometric Graph Databases Sebastian Nowozin Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tübingen, Germany sebastian.nowozin@tuebingen.mpg.de Koji
More informationA COMPARATIVE STUDY OF FREQUENT SUBGRAPH MINING ALGORITHMS
A COMPARATIVE STUDY OF FREQUENT SUBGRAPH MINING ALGORITHMS K.Lakshmi 1 and Dr. T. Meyyappan 2 1. Department of MCA, Sir M.Visvesvaraya Institute of Technology, Bangalore. lakshmi_kes@rediffmail.com 2.
More informationEfficient Subgraph Matching by Postponing Cartesian Products
Efficient Subgraph Matching by Postponing Cartesian Products Computer Science and Engineering Lijun Chang Lijun.Chang@unsw.edu.au The University of New South Wales, Australia Joint work with Fei Bi, Xuemin
More informationFrequent Pattern-Growth Approach for Document Organization
Frequent Pattern-Growth Approach for Document Organization Monika Akbar Department of Computer Science Virginia Tech, Blacksburg, VA 246, USA. amonika@cs.vt.edu Rafal A. Angryk Department of Computer Science
More informationMining Top K Large Structural Patterns in a Massive Network
Mining Top K Large Structural Patterns in a Massive Network Feida Zhu Singapore Management University fdzhu@smu.edu.sg Xifeng Yan University of California at Santa Barbara xyan@cs.ucsb.edu Qiang Qu Peking
More informationMARGIN: Maximal Frequent Subgraph Mining Λ
MARGIN: Maximal Frequent Subgraph Mining Λ Lini T Thomas Satyanarayana R Valluri Kamalakar Karlapalem enter For Data Engineering, IIIT, Hyderabad flini,satyag@research.iiit.ac.in, kamal@iiit.ac.in Abstract
More informationMining Significant Graph Patterns by Leap Search
Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC) Graphs Are Everywhere Magwene et al. Genome Biology 2004 5:R100 Co-expression
More information9.1. Graph Mining, Social Network Analysis, and Multirelational Data Mining. Graph Mining
9 Graph Mining, Social Network Analysis, and Multirelational Data Mining 9.1 We have studied frequent-itemset mining in Chapter 5 and sequential-pattern mining in Section 3 of Chapter 8. Many scientific
More informationSemi-supervised Clustering of Graph Objects: A Subgraph Mining Approach
Semi-supervised Clustering of Graph Objects: A Subgraph Mining Approach Xin Huang 1, Hong Cheng 1, Jiong Yang 2, Jeffery Xu Yu 1, Hongliang Fei 3, and Jun Huan 3 1 The Chinese University of Hong Kong 2
More informationLes Cahiers du GERAD ISSN:
Les Cahiers du GERAD ISSN: 0711 2440 SyGMA: Reducing Symmetry in Graph Mining C. Desrosiers, Ph. Galinier, P. Hansen, A. Hertz G 2007 12 February 2007 Revised: February 2008 Les textes publiés dans la
More informationWeb Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India
Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the
More informationEMPIRICAL COMPARISON OF GRAPH CLASSIFICATION AND REGRESSION ALGORITHMS. By NIKHIL S. KETKAR
EMPIRICAL COMPARISON OF GRAPH CLASSIFICATION AND REGRESSION ALGORITHMS By NIKHIL S. KETKAR A dissertation submitted in partial fulfillment of the requirements for the degree of DOCTORATE OF PHILOSOPHY
More informationParallelization of Graph Isomorphism using OpenMP
Parallelization of Graph Isomorphism using OpenMP Vijaya Balpande Research Scholar GHRCE, Nagpur Priyadarshini J L College of Engineering, Nagpur ABSTRACT Advancement in computer architecture leads to
More informationEdgar: the Embedding-baseD GrAph MineR
Edgar: the Embedding-baseD GrAph MineR Marc Wörlein, 1 Alexander Dreweke, 1 Thorsten Meinl, 2 Ingrid Fischer 2, and Michael Philippsen 1 1 University of Erlangen-Nuremberg, Computer Science Department
More informationEdgar: the Embedding-baseD GrAph MineR
Edgar: the Embedding-baseD GrAph MineR Marc Wörlein, 1 Alexander Dreweke, 1 Thorsten Meinl, 2 Ingrid Fischer 2, and Michael Philippsen 1 1 University of Erlangen-Nuremberg, Computer Science Department
More informationI. INTRODUCTION. Keywords : Spatial Data Mining, Association Mining, FP-Growth Algorithm, Frequent Data Sets
2017 IJSRSET Volume 3 Issue 5 Print ISSN: 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology Emancipation of FP Growth Algorithm using Association Rules on Spatial Data Sets Sudheer
More informationTemporal Weighted Association Rule Mining for Classification
Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider
More informationGraph Pattern Mining
: Lecture VIII Graph Pattern Mining Computer Science Department Data Mining Research Nov 26, 2014 Announcement No Homework Slides available at www.cs.ucsb.edu/~xyan/classes/ns201 Two Quizzes (Dec 3, 10),
More informationInfrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset
Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset M.Hamsathvani 1, D.Rajeswari 2 M.E, R.Kalaiselvi 3 1 PG Scholar(M.E), Angel College of Engineering and Technology, Tiruppur,
More informationMining frequent Closed Graph Pattern
Mining frequent Closed Graph Pattern Seminar aus maschninellem Lernen Referent: Yingting Fan 5.November Fachbereich 21 Institut Knowledge Engineering Prof. Fürnkranz 1 Outline Motivation and introduction
More information2. Department of Electronic Engineering and Computer Science, Case Western Reserve University
Chapter MINING HIGH-DIMENSIONAL DATA Wei Wang 1 and Jiong Yang 2 1. Department of Computer Science, University of North Carolina at Chapel Hill 2. Department of Electronic Engineering and Computer Science,
More informationUpper bound tighter Item caps for fast frequent itemsets mining for uncertain data Implemented using splay trees. Shashikiran V 1, Murali S 2
Volume 117 No. 7 2017, 39-46 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu Upper bound tighter Item caps for fast frequent itemsets mining for uncertain
More informationKnowledge Discovery from Transportation Network Data
Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery from Transportation Network Data. In ICDE, 2005 1
More informationWeb Data mining-a Research area in Web usage mining
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References q Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationAn Approach for Finding Frequent Item Set Done By Comparison Based Technique
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
More informationWeb Usage Mining: A Research Area in Web Mining
Web Usage Mining: A Research Area in Web Mining Rajni Pamnani, Pramila Chawan Department of computer technology, VJTI University, Mumbai Abstract Web usage mining is a main research area in Web mining
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining
More informationA Hierarchical Document Clustering Approach with Frequent Itemsets
A Hierarchical Document Clustering Approach with Frequent Itemsets Cheng-Jhe Lee, Chiun-Chieh Hsu, and Da-Ren Chen Abstract In order to effectively retrieve required information from the large amount of
More informationUnderstanding Rule Behavior through Apriori Algorithm over Social Network Data
Global Journal of Computer Science and Technology Volume 12 Issue 10 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: 0975-4172
More informationImproving the Efficiency of Web Usage Mining Using K-Apriori and FP-Growth Algorithm
International Journal of Scientific & Engineering Research Volume 4, Issue3, arch-2013 1 Improving the Efficiency of Web Usage ining Using K-Apriori and FP-Growth Algorithm rs.r.kousalya, s.k.suguna, Dr.V.
More informationSEQUENTIAL PATTERN MINING FROM WEB LOG DATA
SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract
More informationPamba Pravallika 1, K. Narendra 2
2018 IJSRSET Volume 4 Issue 1 Print ISSN: 2395-1990 Online ISSN : 2394-4099 Themed Section : Engineering and Technology Analysis on Medical Data sets using Apriori Algorithm Based on Association Rules
More informationParallel Popular Crime Pattern Mining in Multidimensional Databases
Parallel Popular Crime Pattern Mining in Multidimensional Databases BVS. Varma #1, V. Valli Kumari *2 # Department of CSE, Sri Venkateswara Institute of Science & Information Technology Tadepalligudem,
More informationAN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE
AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining
More informationThe Transpose Technique to Reduce Number of Transactions of Apriori Algorithm
The Transpose Technique to Reduce Number of Transactions of Apriori Algorithm Narinder Kumar 1, Anshu Sharma 2, Sarabjit Kaur 3 1 Research Scholar, Dept. Of Computer Science & Engineering, CT Institute
More informationBehavior Query Discovery in System-Generated Temporal Graphs
Behavior Query Discovery in System-Generated Temporal Graphs Bo Zong,, Xusheng Xiao, Zhichun Li, Zhenyu Wu, Zhiyun Qian, Xifeng Yan, Ambuj K. Singh, Guofei Jiang UC Santa Barbara NEC Labs, America UC Riverside
More informationMulti-Label Feature Selection for Graph Classification
Multi-Label Feature Selection for Graph Classification Xiangnan Kong Department of Computer Science University of Illinois at Chicago, IL, USA xkong4@uic.edu Philip S. Yu Department of Computer Science
More informationDiscovering Geometric Patterns in Genomic Data
Discovering Geometric Patterns in Genomic Data Wenxuan Gao Department of Computer Science University of Illinois at Chicago wgao5@uic.edu Lijia Ma ljma @uchicago.edu Christopher Brown caseybrown@uchicago.edu
More informationTendency Mining in Dynamic Association Rules Based on SVM Classifier
Send Orders for Reprints to reprints@benthamscienceae The Open Mechanical Engineering Journal, 2014, 8, 303-307 303 Open Access Tendency Mining in Dynamic Association Rules Based on SVM Classifier Zhonglin
More informationA New Technique to Optimize User s Browsing Session using Data Mining
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationOptimization using Ant Colony Algorithm
Optimization using Ant Colony Algorithm Er. Priya Batta 1, Er. Geetika Sharmai 2, Er. Deepshikha 3 1Faculty, Department of Computer Science, Chandigarh University,Gharaun,Mohali,Punjab 2Faculty, Department
More informationGraph mining-based Image Indexing
Graph mining-based Image Indeing Gábor Iváncs, Renáta Iváncs and István Vajk Department of Automation and Applied Informatics, Budapest Universit of Technolog and Economics,, Goldmann G. ter 3. Budapest,
More informationPSM-Flow: Probabilistic Subgraph Mining for Discovering Reusable Fragments in Workflows
PSM-Flow: Probabilistic Subgraph Mining for Discovering Reusable Fragments in Workflows Ken Cheong CS Department HK Baptist University Hong Kong Daniel Garijo Information Sciences Institute U. of Southern
More informationMining Molecular Datasets on Symmetric Multiprocessor Systems
Mining Molecular Datasets on Symmetric Multiprocessor Systems Thorsten Meinl ALTANA Chair for Bioinformatics and Information Mining, University of Konstanz, Germany meinl@inf.uni-konstanz.de Marc Wörlein,
More informationAI Web-Based Agent for Banks
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 10 October, 2014 Page No.8782-8787 AI Web-Based Agent for Banks Nikhila Kamat 1, Michelle D cruz 2,
More informationIJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: [35] [Rana, 3(12): December, 2014] ISSN:
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY A Brief Survey on Frequent Patterns Mining of Uncertain Data Purvi Y. Rana*, Prof. Pragna Makwana, Prof. Kishori Shekokar *Student,
More informationAn Approach for Privacy Preserving in Association Rule Mining Using Data Restriction
International Journal of Engineering Science Invention Volume 2 Issue 1 January. 2013 An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction Janakiramaiah Bonam 1, Dr.RamaMohan
More informationFREQUENT PATTERN MINING IN BIG DATA USING MAVEN PLUGIN. School of Computing, SASTRA University, Thanjavur , India
Volume 115 No. 7 2017, 105-110 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu FREQUENT PATTERN MINING IN BIG DATA USING MAVEN PLUGIN Balaji.N 1,
More informationGDClust: A Graph-Based Document Clustering Technique
GDClust: A Graph-Based Document Clustering Technique M. Shahriar Hossain, Rafal A. Angryk Department of Computer Science, Montana State University, Bozeman, MT 59715, USA E-mail: {mshossain, angryk}@cs.montana.edu
More informationWeb page recommendation using a stochastic process model
Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,
More informationPerformance Analysis of Apriori Algorithm with Progressive Approach for Mining Data
Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data Shilpa Department of Computer Science & Engineering Haryana College of Technology & Management, Kaithal, Haryana, India
More information