Mining Significant Graph Patterns by Leap Search

Size: px

Start display at page:

Download "Mining Significant Graph Patterns by Leap Search"

Felicia Shaw
5 years ago
Views:

1 Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

2 Graphs Are Everywhere Magwene et al. Genome Biology :R100 Co-expression Network Program Flow Social Network Chemical Compound Protein Structure 2

3 Graph Pattern Mining 3

4 Graph Patterns Interestingness measures / Objective functions Frequency: frequent graph pattern Discriminative: information gain, Fisher score Significance: G-test 4

5 Frequent Graph Pattern 5

6 Optimal Graph Pattern (this work) 6

7 Objective Functions Challenge: Not Anti-Monotonic X 7

8 Challenge: Non Anti-Monotonic Non Monotonic Anti-Monotonic Enumerate subgraphs : small-size to large-size Non-Monotonic: Enumerate all subgraphs then check their score? 8

9 Frequent Pattern Based Mining Framework Exploratory task Graph clustering Graph classification Graph index Graph Database Frequent Patterns Optimal Patterns (SIGMOD 04, 05) (ISMB 05, 07) 1. Bottleneck : millions, even billions of patterns 2. No guarantee of quality 9

10 Direct Pattern Mining Framework Exploratory task Graph clustering Direct Graph classification Graph index Graph Database Optimal Patterns How? 10

11 Upper-Bound IBM T. J. Watson Research Center 11

12 Upper-Bound: Anti-Monotonic (cont.) Rule of Thumb : If the frequency difference of a graph pattern in the positive dataset and the negative dataset increases, the pattern becomes more interesting We can recycle the existing graph mining algorithms to accommodate non-monotonic functions. 12

13 Vertical Pruning Large <- small 13

14 Horizontal Pruning: Structural Proximity 14

15 Structural Proximity: Another Perspective # of frequent patterns >> # of possible frequency pairs Many patterns share the same score 15

16 Structural Leap Search 16

17 Frequency Association Significant patterns often fall into the high-quantile of frequency Starting with the most frequent patterns 17

18 Descending Leap Mine 1. Structural Leap Search with frequency threshold F(g*) converges 2. Frequency-Descending Mining 3. Structural Leap Search 18

19 Results: NCI Anti-Cancer Screen Datasets Chemical Compounds: anti-cancer or not # of vertices: 10 ~ 200 Name MCF-7 MOLT-4 NCI-H23 OVCAR-8 P388 PC-3 SF-295 SN12C SW-620 UACC257 YEAST # of Compounds 27,770 39,765 40,353 40,516 41,472 27,509 40,271 40,004 40,532 39,988 79,601 Tumor Description Breast Leukemia Non-Small Cell Lung Ovarian Leukemia Prostate Central Nerve System Renal Colon Melanoma Yeast anti-cancer 19 Link:

20 Efficiency IBM T. J. Watson Research Center Vertical Pruning Vertical Pruning + Horizontal Pruning 20

21 Effectiveness IBM T. J. Watson Research Center frequency descending frequency descending + structural leap search 21

Graph Classification Name OA Kernel LEAP OA Kernel (6x) LEAP (6x) Average (AUC) 0.

22 Graph Classification Name OA Kernel LEAP OA Kernel (6x) LEAP (6x) Average (AUC) (6x) (6x) * OA Kernel: Optimal Assignment Kernel LEAP: LEAP search 22

23 Scalability Means Something! ~8000sec OA(6X) Quadratic ~200sec ~100sec ~20sec OA LEAP(6X) LEAP Linear 23

24 Beyond Graph Patterns Pattern-based categorical data classification (ICDE 07) 24

25 Beyond Graph Patterns (cont.) 1. Direct mining can be applied to itemsets, sequences, and trees Direct Exploratory task Clustering Classification Index itemset/sequence/tree Database Optimal Patterns 2. Existing algorithms can be recycled to mine patterns with sophisticated measures. 3. Pattern-based methods including indexing and classification are competitive. 25

26 Thank You 26

GRAPH MINING AND GRAPH KERNELS

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan* ^University of Cambridge *IBM T. J. Watson Research Center August 24, 2008 ACM SIG KDD, Las Vegas Graphs Are Everywhere