Approximate Graph Searching Spatio-Temporal Graphs

Size: px
Start display at page:

Download "Approximate Graph Searching Spatio-Temporal Graphs"

Transcription

1 Approximate Graph Searching Spatio-Temporal Graphs Hrishikesh Terdalkar Prof. Arnab Bhattacharya

2 1. Introduction Terminology and Setting

3 What are Graphs V : Vertices E : Edges Av : Vertex Attributes AE : Edge Attributes Labelled G = (V, E)

4 What is.. Querying Matching Graph Searching Given Graph database D D = { G1, G2,.., Gn } Query graph Q Find Occurrences of Q in D GraphGrep Giugno et al. 2002

5 What is.. Graph Indexing Graph Searching Construct Index Paths Structures gindex Yan et al. 2004

6 Subgraph Isomorphism NP-Complete Given graphs G and H. Is there a subgraph G0 G such that G0 H

7 Frequent Subgraph Mining Given Graph database D = { G1, G2,.., Gn } Threshold - δ (0,1) AGM Inokuchi et al Find G = { g subgraph g occurs in at least δ. D graphs in D } gspan Yan et al. 2002

8 2. Principle Techniques Evolution

9 Process Overview Filter Matching Search Build Index Query Processing Verify Prune

10 GraphGrep A Fast and Universal Method for Querying Graphs Giugno et al Properties of Graphs in D Nodes numeric id - id-node string label - label-node Edges undirected unlabelled Terms id-path : list of n id-node (edges between consecutive nodes) label-path : list of n label-node Database Containing 3 graphs D B A C B B A Graph g1 B 3 C 4 (3, 1) is the corresponding id-path A Graph g2 e.g. (C, A) is a label-path in g1 and 1 1 B C 2 3 Graph g3 C B 6 E

11 Basic Steps GraphGrep Giugno et al B C 1. Database Construction foreach - graph Gi - node v find all paths starting at v of length [1, lp ] Path Representation set of label-paths each label-path has set of id-paths Hash Table keys : hash values of label-paths row : number of id-paths associated with a key per graph Fingerprint ( Gi ) Column vector A B Graph g1 Path Representation of graph g1 A = {(1)} AB = {(1,0), (1,2)} AC = {(1,3)} ABCA = {(1,0,3,1),(1,2,3,1)} Key h(ca) g1 g2 g h(abca) h(abcb) 2 2 0

12 Basic Steps GraphGrep Giugno et al B C 2. Database Filtering Parsing Query Graph Glide Representation Build Fingerprint of Query Hashed set of paths label-paths of length < lp A B Query q Example If query has a label-path ABCA prune out g2 and g3 Filtering Prune graphs that clearly do not contain the query Compare fingerprint of query with fingerprint of database Discard a graph if at least one value in its fingerprint is less than corresponding value in fingerprint of query Key h(ca) g1 g2 g h(abca) h(abcb) 2 2 0

13 Basic Steps GraphGrep Giugno et al Exact Matching Example: Match Query q - Graph g1 Look for all matching subgraphs B A Query q C B B A C 1. Select set of paths in g1 with lp = 4 ABCA = { (1,0,3,1), (1, 2, 3, 1) } CB = { (3,0), (3,2) } 2. Combine any list from ABCA with any list from CB (matching positions) ABCACB = { ((1, 0, 3, 1), (3, 0)), ((1, 0, 3, 1), (3, 2)), ((1, 2, 3, 1), (3, 0)), ((1, 2, 3, 1), (3, 2)) } 3. Remove lists from ABCACB if they contain equal id-nodes in non-overlapping positions B Graph g1 Matching: ((1, 0, 3, 1), (3, 2)), ((1, 2, 3, 1), (3, 0))

14 gindex Graph Indexing: A Frequent Structure-based Approach Yan et al Drawbacks of GraphGrep : Example: Sample Query Path is too simple: structural information is lost. Too many paths: the set of paths in a graph database usually is huge. Example: Sample Database Cannot prune (a) and (b) Contain all paths from Query graph

15 Methodology gindex Yan et al Use graph structure instead of path as basic index feature. Frequent Fragments: Major Steps: Index Construction enumerate & select features F for a feature f F Df is set of graphs containing f Query Processing Search enumerate all features in q candidate answer set Cq = graphs having all features of q Verification support(g) = Dg number of graphs g appears in threshold minsup Example : ( minsup = 2 ) Choice of minsup Discriminative Fragments

16 Features gindex Yan et al Do we need to index every frequent fragment? gindex Tree Redundant Fragments Fragment x is redundant wrt feature set F if, Dx f F and f x Df subgraphs of x already capture its essence Discriminative Fragments Fragment x is discriminative wrt F each fragment edge sequence (DFS code) prefix tree codes of discriminative fragments redundant fragments intermediate nodes leaf nodes discriminative fragments Insert / Delete Maintenance Incremental updates Only for involved frequent fragments Single database scan algorithm for index construction if, Dx f F and f x Df contrary to redundant.

17 Graph Matching Variety of Approaches NeMa SAPPER TALE Giugno et al gindex Yan et al Path Index Zhang et al Fan et al Khan et al VELSET + NAGA Neighbor Based Hybrid Index Local Inexact Global Modify Graph Struct SIM-T Kpodjedo et al Problem Variants Graph Edit Distance Freq Pattern Index Exact Match (cubic) Dutta et al Tian et al GraphGrep Statistical Measures Single DB Graph DB Many Graphs

18 Approximate Graph Matching Given Graph database D D = { G1, G2,.., Gn } Query graph Q Find Occurrences of Qi in D Qi similar to Q Similarity various notions

19 Graph Similarity Techniques Overview Edit Distance Feature Extraction Generalization of Graph Isomorphism Aim : Transform one graph to the other by doing a number of operations add, delete substitute nodes or edges reverse edges operations cost Find : Sequence of operations that minimizes cost of matching two graphs Key Idea : Similar graphs probably share few properties degree distribution diameter eigenvalues Extract features and apply a similarity measure Plethora of measures Neighborhood Based Love thy neighbour Key Idea : Two nodes are similar if their neighbors are. In each iteration, nodes exchange similarity scores. Stop at convergence

20 TALE (Neighborhood Index) A Tool for Approximate Large Graph Matching Tian et al Neighborhood Index (NH-Index) Example For the dark node (say v) Indexing Unit: neighborhood of each node Captures local graph structure v.degree = 7 v.nbconnection = 4 Neighborhood : Induced subgraph of a node and its adjacent nodes Properties of the neighborhood : Number of neighbors How the neighbors connect (# of edges between neighbors) Labels of the actual neighbors NH-Index(v) = (label, degree, nbconnection, nbarray[])

21 TALE - Methodology Matching Query Node (Query node Nq, Database node Nd) Exact Matching Nq.label == Nd.label Nq.degree Nd.degree Nq.nbConn Nd.nbConn neighbors of Nq and Nd should match in same way Approximate Matching Nq.label == Nd.label Nq.degree Nd.degree + nbmiss Nq.nbConn Nd.nbConn + nbcmiss Miss(Nd.nbArr[i], Nq.nbArr[i]) nbmiss Node Match Quality : w(nq, Nd) Case of Approximate Matching parameter ρ : % of neighbors of q that can have no corresponding matches in the neighbors of d # of nodes allowed to be missing: nbmiss = ρ * Nq.degree # of connections allowed to miss : nbcmiss = nbmiss * (nbmiss - 1)/2 + (Nq.degree - nbmiss) * nbmiss based on fraction of nodes missing higher w value better node match Neighbor Array Deterministic bit array Size of bit array = Lv

22 TALE - Methodology Index Structure and Matching Index Structure Hybrid Index : 2 Levels B+ Tree (label, degree, nbconn) Fast Search (equality on label, range on degree and nbconn) Second-level index node-id Bitmap index for neighborarray Index Probing Utilize label, degree and nbconn Probe the B+-Tree index Examine obtained list of bitmaps - count nbmiss - conditions for approx. matching Prune Matching Match important nodes Pimp - degree centrality Probe NH-index imp. nodes Node match quality Extend the match Match nearby nodes ( 2-hop)

23 SIGMA A Set-cover-based Inexact Graph Matching Algorithm Mongiovi et al Generic Steps Multiset-Multicover : NP-Hard, but.. Pre-processing Filtering Matching Greedy Approximation Algorithm with tight bounded error (by Hn Harmonic Number) Multiset = (A, m) where, multiplicity m : A ℕ (A, m) (B, n) = (A B, min(m, n)) (A, m) (B, n) = (A B, (m + n)) Multiset Multicover Feature set F edges of query Collections of such sets whose removal will allow exact matching Covering the missing features of the graph with overlapping multisets Example: Exact Matching Features = Edges (graphs with size 1)

24 SIGMA Example - Multiset-Multicover Mongiovi et al G contains a copy of Q with two deletions Features of Q FQ = { F1, F2, F3, F4 } Fi = Features containing edge i Minimum Cover of FQ - FG of is cardinality 2 at least two deletions needed {F1, F2} = a possible cover G is a candidate to match Q with edges 1 and 2 deleted Greedy Multiset-Multicover (Y, S) Ans φ while ( Y φ ) do X argmaxa S A Y Y Y-X Ans Ans { X } return Ans Example: Matching with 2-deletions query Q and a graph G Features = connected subgraphs with exactly 2 edges

25 SAPPER Subgraph Indexing and Approximate Matching in Large Graphs Zhang et al Edge Edit Distance # of edge modifications needed to transform one graph to another Bloom Filter ( B ) used to qualify an occurrence of q θ threshold (max EED allowed) AI(q, θ) = Set of approx. matches v in G, Ni (v, G) = { u G path of i edges between u and v } Space-efficient probabilistic data structure L-bit vector set of m independent hash functions {f1,, fm} Used to determine whether an element x is member of a set X fi : X [1, L] ℤ Hybrid Neighborhood Unit Index (HNU) ( label(v), degree(v), labels(n1(v, G)), labels(n2(v, G)) ) Properties No false negatives Small rate ( 1%) of false positives

26 SAPPER - Methodology Index Construction - Query Processing - Matching Zhang et al Bloom Filter - Parameters Error rate depends on L, X and m Optimal m = 0.7 * L / X for 1% L / X = 9.6 X = labels in N2(v, G) X d2 where d = avg degree in G WLOG L = 9.6 d2, m = 7 Query Processing and Matching Index Construction In HNU(v), labels N2(v, G) collected L-bit bloom filter is built during index construction time Time Complexity: Per vertex : O( d + m * d2 + L ) = O( md2 ) Total : O(md2 VG ) Vertex Matching Candidate matches vq q to vertices in G based on HNU Constructing Spanning Trees of q Randomly generate a set of spanning trees of q Generating a matching order of graphs in AI(q, θ) Find matches of spanning trees based on vertices match Use ST matches to match the approximate graphs DFS based order on matching these graphs Final graph matching

27 3. Graph Indexing and Searching Advances

28 NeMa Fast Graph Search with Label Similarity Khan et al Neighborhood based matching Neighborhood Vectorization Convert h-hop neighborhood into multi-dimensional vector RG(u) = { v, wu(v) } (where wu = αd(u, v)) Cost Metric Measure the quality of the matches Example: h = 2, α = 0.5 costs of matching individual query nodes ( cost of matching node labels and neighborhoods within h hops. (h = 2)) C( φ ) = [ ΔL+ ΔL ] Handles structural and label noise RG(a) = { b, 0.5, c, 0.5, d, 0.25 } Minimum Cost Subgraph Matching a b Given, query graph Q and a data graph G c d e Identify, top-k matches of query graph with minimum costs in target graph NP-Hard Hard to approximate Heuristic method efficient

29 NeMa - Methodology Example - Iterative Inference Algorithm Khan et al Query Graph Q Target Graph G a a v2 b v4 c b u2 a c b u5 a c b d d u7 u9 c d u10 e c e Ui (v, u) = Inference Cost in iteration i (query node v, candidate node u) Suppose, we ve already determined candidate matches (v) h = 1, i = 0 (v2) = {u2, u5, u9} (v4) = {u7, u10} U0 (v2, u5) = U0 (v2, u9) = 0 U0 (v4, u10) < U0 (v4, u7) U1(v2, u5) < U1 (v2, u9)

30 SIM-T Using local similarity measures for efficient approximate matching Kpodjedo et al Local (neighborhood) Searches Take a potential solution Check its immediate neighbors (solutions very similar except for very few minor details) Hope to find an improved solution Tabu Search Metaheuristic search method Enhances the performance of local search by relaxing its basic rule At each step worsening moves can be accepted if no improving move is available e.g. the search is stuck at a strict local minimum prohibitions ( tabu) to discourage the search from coming back to previously-visited solutions SIM-T : Two-Phase Algorithm Greedy procedure Greedy-Sim Followed by tabu search High accuracy when initialized with a potentially good solution Greedy-Sim Step-by-step matching between V1 and V2 (vertex sets of G1 and G2) Iteratively insert a new node match into configuration Choice of node match to be inserted follows greedy criterion Greedy score: gr( x1, x2 ) = δ0 ( x1, x2 ) + B S ( x1, x2 ) where, - δ0 ( x1, x2 ) is # of new perfect matches B ( B 1 ) is a real number used to weigh the similarity of x1 and x2 S(x1, x2) - similarity value of x1 and x2

31 Statistical Measures Preliminaries Null Hypothesis : (H0) General statement that there is no relationship between two phenomena under consideration. Statistical Inference : The process of deducing properties of an underlying distribution by analysis of data. Gamma Distribution : k > 0 (shape), θ > 0 (scale) Two-parameter family of continuous probability distributions. Chi-square Distribution : k degrees of freedom Distribution of a sum of the squares of k independent standard normal random variables. ( a special case of the gamma distribution )

32 Statistical Significance Measures and Tests p-value probability that the phenomenon happened by chance probability of occurrence of events at least as extreme as this one. z-score z-score (aka, a standard score) indicates how many standard deviations a data point is above from the mean eg, in a coin toss experiment, p-value of observing 8 heads in a series of 10 tosses = probability of getting 8, 9 or 10 heads 2 (chi-square) log-likelihood (G2) Category of statistical tests. Follows chi-square distribution. Category of statistical tests. Often comparable to 2 tests. Numerous variations. Log of the ratio of two likelihoods as the test statistic.

33 Motivation: Graph Setting p-value Vertices represented by label set pertaining to their neighbors v2 G { A, B, EA } q2 Q { A, A, B } q5 Q { A, B } Using structural similarity (3 nghbrs) v2 can be matched with q2 or q3 Using labels v2 can be matched with q2 or q5 Best match of v2 is with q2 q2 provides a higher degree of match Computing the p-value is practically inefficient (exponential # of possibilities) Graph Matching Vertices with higher degrees of match Higher statistical significance Better candidates for matching

34 Pearson s chi-square 2 goodness-of-fit test Difference between the expected and observed occurrences of outcome values Good approximation of p-value Σ i (Oi-Ei)2 Ei

35 Properties Pearson s Chi-Square Statistic Follows chi-square distribution Degrees of freedom : # { possible outcomes } - 1 Higher the 2 Lower p-value Higher statistical significance More similar subgraph subgraph with larger 2

36 Example Pearson s Chi-Square Statistic Publish fliers in three different colours H0 : No relation in which fliers are taken Two degrees of freedom (R - 1) * (C - 1) = (3-1) * (2-1) = 2 Fliers Pink Light Blue Neon Pink Taken Not Taken = (32-24)2 / 24 + (38-36)2/ 36 + (20-30)2/30 + (8-16)2/16 + (22-24)2/24 + (30-20)2/20 = 82/ / / / / /20 = 64/24 + 4/ / /16 + 4/ /20 = Chi-square distribution table probability p < Highly Significant

37 VELSET + NAGA Neighbor-Aware Search using the Chi-Square Statistics Dutta et al Subgraph Similarity based on Statistical Significance ( 2 ) VELSET: VErtex Label Similarity on Edge Triplets Accuracy NAGA: Neighbor-Aware Greedy Algorithm Efficiency Input Large database graph G, Query graph q Number of matches k Output top-k subgraphs most similar to q Example Input graph G, query Q

38 VELSET + NAGA Framework 1. Creation of Different Lists (IL, LNL) Inverted lists mapping vertex labels to vertices Label-neighbor-list (LNL) information of neighbors 3a. Statistical Significance 2 for each VPs is computed 3b. Statistical Significance NAGA VELSET u G : Consider LNLu as set u G : triplet of vertices (x, u, y) LNLu and LNLu : divided in sliding window of size 2. v3 (A,B,C) (C,B,D), (D,B,A) Degree of match - s0, s1, s2 q5 (A,C,B) Triplets matched for pairs VP Degree of match - s0, s1, s2 2. Vertex Pair Construction (VP) v Q fnd g G with NAGA - exact same labels VELSET - similar labels Jaccard Similarity Expected Values Let p = ( 1-1/L)d where, d = degree(u) P(s0) = p2 P(s1) = 2. (1 - p). (p) P(s2) = (1 -p)2 4. Generating Approximate Match Pick Highest 2 pair (u, v) Consider neighbors-pairs of u, v repeat

39 Performance Comparison NeMa, SIM-T, VELSET, NAGA YAGO Dataset F1 Score Running Time (seconds) Query Scenario NeMa SIM-T VELSET NAGA NeMa SIM-T VELSET NAGA Exact Match Noisy Edges Noisy Labels Combined Average

40 Performance Comparison NeMa, SIM-T, VELSET, NAGA IMDB Dataset F1 Score Running Time (seconds) Query Scenario NeMa SIM-T VELSET NAGA NeMa SIM-T VELSET NAGA Exact Match Noisy Edges Noisy Labels Combined Average

41 Performance Comparison NeMa, SIM-T, VELSET, NAGA Overall Performance YAGO Methods IMDB F1 (%) Running Time (s) Indexing Time (s) F1 (%) Running Time (s) Indexing Time (s) VELSET , NAGA NeMa ~10, ~10,000 SIM-T

42 igq - New Perspective Indexing Query Graphs to Speedup Graph Query Processing Wang et al DB Graph Index CS(g) = { g1, g2, g3, g4 } CS(g) - Answer(G) = { g3, g4 } igq Query Index (SUB) g G Answer(G) = { g1, g2 } Properties Utilize knowledge from previously executed queries Can be incorporated with existing algorithms Subgraph Isomorphism Test Answer Answer(g) = Answer { g1, g2 } Process (Subgraph query g) igq index - start off empty query processing parallelized candidate set CS(g) as usual check relation to prev queries subgraph of prev query graphs supergraph

43 igq - New Perspective Indexing Query Graphs to Speedup Graph Query Processing Wang et al DB Graph Index CS(g) = { g1, g2, g3, g4 } CS(g) Answer(G) = { g1 } igq Query Index (SUP) Answer(g) G g Answer(G) = { g1, g5 } Properties Subgraph Isomorphism Test Utilize knowledge from previously executed queries Can be incorporated with existing algorithms Process (Subgraph query g) igq index - start off empty query processing parallelized candidate set CS(g) as usual check relation to prev queries subgraph of prev query graphs supergraph

44 4. Spatio-Temporal Graphs Future

45 What are.. Spatio-Temporal Graphs G = (V, E) Edges and Vertices may appear, disappear with time label vector with location and time dimensions Lv = ( _, _, _, l, ts, te ) Le = ( _, _, _, t s, t e ) Attributes Location Time

46 CHEBY Indexing Spatio-Temporal Trajectories with Chebyshev Polynomials Cai et al { Pm (t) = cos(m cos 1 (t)) } Spatio-Temporal Trajectory: Discrete (t1, v1 ),..., (tn, vn) Theorem 1: Orthogonality Chebyshev Polynomials P0(t),, Pm(t) Chebyshev Polynomials: P0 (t) = 1 P1 (t) = t P2 (t) = 2t2-1 P3 (t) = 4t3-3t P4 (t) = 8t4-8t2 + 1 Chebyshev approximation almost identical to the minimax polynomial (optimal) easy to compute Approximating Any Function Theorem 1 CPs can be used as a base for approximating any function. Given a function f(t), f(t) = c0p0 + + cmpm

47 CHEBY Properties of Chebyshev Polynomials Cai et al Approximation of Time-Series Calculation of coefficients Theorem 2 : Gauss-Chebyshev Formula General Assumptions about Time Series i. ii. iii. iv. Same-length Each time series of same length (N) Power-of-2 N = 2k for some integer k Same-set Each series occurs at same set of time points {t1,, tn } (need not be uniform width) Same-set-uniformly-spaced uniform width (ti - ti-1 ) = (ti+1 - ti ) Indexing 1-dim s-t trajectory i.e., time serie (TS) Given collection of TS of length N Represent it by Chebyshev approximation of degree m << N n = m + 1 coeffs in multi-dim index

48 Example - CHEBY Time Series of the Opening Stock Price Cai et al Fortune500 company called ALCOA Time Period (6480 days) Feb 28, 1978 to Oct 24, 2003

49 TG-CSA Compressed Suffix-Array Strategy for Temporal-Graph Indexing Brisaboa et. al Suffix Array sorted array of all suffixes of a string Compressed Suffix Array (CSA) compressed data structure for pattern matching general class of data structure that improve on the suffix array enable quick search for an arbitrary string with relatively small index Temporal-Graph CSA TGCSA representation consists of a bitmap B structures D and Ψ from the icsa (integer based CSA)

50 Example - TG-CSA Structures Involved in the creation of TGCSA Brisaboa et. al Example: Suppose, we have a Temporal Graph with v = 5 vertices numbered 1,, 5 τ = 8 time instants numbered 1,, 8 Five Contacts (u, v, ts, te) (1, 3, 1, 8), (1, 4, 5, 8), (2, 1, 1, 5), (4, 3, 7, 8), and (4, 5, 5, 7)

51 Problem Approximate Exact Spatio Temporal

52 Thank You! Questions? Hrishikesh Terdalkar

53 References I Yan, Xifeng, and Jiawei Han. "gspan: Graph-based substructure pattern mining." Data Mining, ICDM Proceedings IEEE International Conference on. IEEE, Giugno, Rosalba, and Dennis Shasha. "Graphgrep: A fast and universal method for querying graphs." Pattern Recognition, Proceedings. 16th International Conference on. Vol. 2. IEEE, Yan, Xifeng, Philip S. Yu, and Jiawei Han. "Graph indexing: a frequent structure-based approach." Proceedings of the 2004 ACM SIGMOD international conference on Management of data. ACM, Fan, Wenfei, et al. "Graph pattern matching: from intractable to polynomial time." Proceedings of the VLDB Endowment (2010): Tian, Yuanyuan, and Jignesh M. Patel. "Tale: A tool for approximate large graph matching." Data Engineering, ICDE IEEE 24th International Conference on. IEEE, Mongiovi, Misael, et al. "Sigma: a set-cover-based inexact graph matching algorithm." Journal of bioinformatics and computational biology 8.02 (2010): Zhang, Shijie, Jiong Yang, and Wei Jin. "Sapper: Subgraph indexing and approximate matching in large graphs." Proceedings of the VLDB Endowment (2010):

54 References II Khan, Arijit, et al. "Nema: Fast graph search with label similarity." Proceedings of the VLDB Endowment. Vol. 6. No. 3. VLDB Endowment, Kpodjedo, Segla, Philippe Galinier, and Giulio Antoniol. "Using local similarity measures to efficiently address approximate graph matching." Discrete Applied Mathematics 164 (2014): Wang, Jing, Nikos Ntarmos, and Peter Triantafillou. "Indexing Query Graphs to Speed Up Graph Query Processing." (2016). Dutta, Sourav, Pratik Nayek, and Arnab Bhattacharya. "Neighbor-Aware Search for Approximate Labeled Graph Matching using the Chi-Square Statistics." Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, Cai, Yuhan, and Raymond Ng. "Indexing spatio-temporal trajectories with Chebyshev polynomials." Proceedings of the 2004 ACM SIGMOD international conference on Management of data. ACM, Brisaboa, Nieves R., et al. "A compressed suffix-array strategy for temporal-graph indexing." International Symposium on String Processing and Information Retrieval. Springer International Publishing, 2014.

55 Appendix I

56 Gamma and Chi-square Distributions Gamma Distribution Chi-square Distribution

57

58 Credits Special thanks to all the people who made and released these awesome resources for free: Presentation template by SlidesCarnival Photographs by Unsplash & Death to the Stock Photo (license)

59 Presentation design This presentations uses the following typographies and colors: Titles: Roboto Slab Body copy: Source Sans Pro You can download the fonts on this page: Roboto+Slab:400,700 Click on the arrow button that appears on the top right Blue #0091ea Dark gray # Medium gray #607d8b Light gray #cfd8dc You don t need to keep this slide in your presentation. It s only here to serve you as a design guide if you need to create new slides or download the fonts to edit the presentation in PowerPoint

60 SlidesCarnival icons are editable shapes. This means that you can: Resize them without losing quality. Change line color, width and style. Isn t that nice? :) Examples:

Survey on Graph Query Processing on Graph Database. Presented by FAN Zhe

Survey on Graph Query Processing on Graph Database. Presented by FAN Zhe Survey on Graph Query Processing on Graph Database Presented by FA Zhe utline Introduction of Graph and Graph Database. Background of Subgraph Isomorphism. Background of Subgraph Query Processing. Background

More information

Mining frequent Closed Graph Pattern

Mining frequent Closed Graph Pattern Mining frequent Closed Graph Pattern Seminar aus maschninellem Lernen Referent: Yingting Fan 5.November Fachbereich 21 Institut Knowledge Engineering Prof. Fürnkranz 1 Outline Motivation and introduction

More information

gspan: Graph-Based Substructure Pattern Mining

gspan: Graph-Based Substructure Pattern Mining University of Illinois at Urbana-Champaign February 3, 2017 Agenda What motivated the development of gspan? Technical Preliminaries Exploring the gspan algorithm Experimental Performance Evaluation Introduction

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data: Part I Instructor: Yizhou Sun yzsun@ccs.neu.edu November 12, 2013 Announcement Homework 4 will be out tonight Due on 12/2 Next class will be canceled

More information

Efficient homomorphism-free enumeration of conjunctive queries

Efficient homomorphism-free enumeration of conjunctive queries Efficient homomorphism-free enumeration of conjunctive queries Jan Ramon 1, Samrat Roy 1, and Jonny Daenen 2 1 K.U.Leuven, Belgium, Jan.Ramon@cs.kuleuven.be, Samrat.Roy@cs.kuleuven.be 2 University of Hasselt,

More information

Interaction Between Input and Output-Sensitive

Interaction Between Input and Output-Sensitive Interaction Between Input and Output-Sensitive Really? Mamadou M. Kanté Université Blaise Pascal - LIMOS, CNRS Enumeration Algorithms Using Structure, Lorentz Institute, August 26 th, 2015 1 Introduction

More information

CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh HW#3 Due at the beginning of class Thursday 02/26/15

CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh HW#3 Due at the beginning of class Thursday 02/26/15 CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh (rezab@stanford.edu) HW#3 Due at the beginning of class Thursday 02/26/15 1. Consider a model of a nonbipartite undirected graph in which

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining

Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining Chloé-Agathe Azencott & Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institutes

More information

Lecture Note: Computation problems in social. network analysis

Lecture Note: Computation problems in social. network analysis Lecture Note: Computation problems in social network analysis Bang Ye Wu CSIE, Chung Cheng University, Taiwan September 29, 2008 In this lecture note, several computational problems are listed, including

More information

Efficient Subgraph Matching by Postponing Cartesian Products

Efficient Subgraph Matching by Postponing Cartesian Products Efficient Subgraph Matching by Postponing Cartesian Products Computer Science and Engineering Lijun Chang Lijun.Chang@unsw.edu.au The University of New South Wales, Australia Joint work with Fei Bi, Xuemin

More information

GADDI: Distance Index based Subgraph Matching in Biological Networks

GADDI: Distance Index based Subgraph Matching in Biological Networks GADDI: Distance Index based Subgraph Matching in Biological Networks Shijie Zhang, Shirong Li, and Jiong Yang Dept. of Electrical Engineering and Computer Science Case Western Reserve University 10900

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References q Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Spatial Outlier Detection

Spatial Outlier Detection Spatial Outlier Detection Chang-Tien Lu Department of Computer Science Northern Virginia Center Virginia Tech Joint work with Dechang Chen, Yufeng Kou, Jiang Zhao 1 Spatial Outlier A spatial data point

More information

Pattern Mining in Frequent Dynamic Subgraphs

Pattern Mining in Frequent Dynamic Subgraphs Pattern Mining in Frequent Dynamic Subgraphs Karsten M. Borgwardt, Hans-Peter Kriegel, Peter Wackersreuther Institute of Computer Science Ludwig-Maximilians-Universität Munich, Germany kb kriegel wackersr@dbs.ifi.lmu.de

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Graph Indexing: A Frequent Structure-based Approach

Graph Indexing: A Frequent Structure-based Approach Graph Indexing: A Frequent Structure-based Approach Xifeng Yan Philip S. Yu Jiawei Han Department of omputer Science University of Illinois at Urbana-hampaign {xyan, hanj}@cs.uiuc.edu IBM T. J. Watson

More information

Searching complex graphs

Searching complex graphs Searching complex graphs complex graph data Big volume: huge number of nodes/links Big variety: complex, heterogeneous schema Big velocity: e.g., frequently updated Noisy, ambiguous attributes and values

More information

EE512 Graphical Models Fall 2009

EE512 Graphical Models Fall 2009 EE512 Graphical Models Fall 2009 Prof. Jeff Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2009 http://ssli.ee.washington.edu/~bilmes/ee512fa09 Lecture 11 -

More information

Data Mining in Bioinformatics Day 3: Graph Mining

Data Mining in Bioinformatics Day 3: Graph Mining Graph Mining and Graph Kernels Data Mining in Bioinformatics Day 3: Graph Mining Karsten Borgwardt & Chloé-Agathe Azencott February 6 to February 17, 2012 Machine Learning and Computational Biology Research

More information

TALE: A Tool for Approximate Large Graph Matching

TALE: A Tool for Approximate Large Graph Matching TALE: A Tool for Approximate Large Graph Matching Yuanyuan Tian, Jignesh M. Patel EECS Department, University of Michigan, Ann Arbor, Michigan, USA {ytian, jignesh}@eecs.umich.edu Abstract Large graph

More information

Ch9: Exact Inference: Variable Elimination. Shimi Salant, Barak Sternberg

Ch9: Exact Inference: Variable Elimination. Shimi Salant, Barak Sternberg Ch9: Exact Inference: Variable Elimination Shimi Salant Barak Sternberg Part 1 Reminder introduction (1/3) We saw two ways to represent (finite discrete) distributions via graphical data structures: Bayesian

More information

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University

More information

Finding a -regular Supergraph of Minimum Order

Finding a -regular Supergraph of Minimum Order Finding a -regular Supergraph of Minimum Order Hans L. Bodlaender a, Richard B. Tan a,b and Jan van Leeuwen a a Department of Computer Science Utrecht University Padualaan 14, 3584 CH Utrecht The Netherlands

More information

arxiv: v1 [cs.si] 16 Dec 2015

arxiv: v1 [cs.si] 16 Dec 2015 Subgraph Similarity Search in Large Graphs Kanigalpula Samanvi, Naveen Sivadasan Dept. of Computer Science and Engineering Indian Institute of Technology Hyderabad, India cs13m1001@iith.ac.in arxiv:1512.05256v1

More information

Comparative Survey of Query Processing on Graph Databases

Comparative Survey of Query Processing on Graph Databases Comparative Survey of Query Processing on Graph Databases Project Report for COP5725: Spring 2013 Group name: Sunsteeds (Sharanya Jayaraman, Srinath Viswanathan) April 25, 2013 Abstract Graph Databases

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms Group Members: 1. Geng Xue (A0095628R) 2. Cai Jingli (A0095623B) 3. Xing Zhe (A0095644W) 4. Zhu Xiaolu (A0109657W) 5. Wang Zixiao (A0095670X) 6. Jiao Qing (A0095637R) 7. Zhang

More information

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Devina Desai ddevina1@csee.umbc.edu Tim Oates oates@csee.umbc.edu Vishal Shanbhag vshan1@csee.umbc.edu Machine Learning

More information

Chapter 4: Association analysis:

Chapter 4: Association analysis: Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their day-to-day operations, huge amounts of customer purchase data are collected daily

More information

CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh HW#3 Due at the beginning of class Thursday 03/02/17

CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh HW#3 Due at the beginning of class Thursday 03/02/17 CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh (rezab@stanford.edu) HW#3 Due at the beginning of class Thursday 03/02/17 1. Consider a model of a nonbipartite undirected graph in which

More information

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS Overview of Networks Instructor: Yizhou Sun yzsun@cs.ucla.edu January 10, 2017 Overview of Information Network Analysis Network Representation Network

More information

Graphs: Introduction. Ali Shokoufandeh, Department of Computer Science, Drexel University

Graphs: Introduction. Ali Shokoufandeh, Department of Computer Science, Drexel University Graphs: Introduction Ali Shokoufandeh, Department of Computer Science, Drexel University Overview of this talk Introduction: Notations and Definitions Graphs and Modeling Algorithmic Graph Theory and Combinatorial

More information

EGDIM - Evolving Graph Database Indexing Method

EGDIM - Evolving Graph Database Indexing Method EGDIM - Evolving Graph Database Indexing Method Shariful Islam Department of Computer Science and Engineering University of Dhaka, Bangladesh tulip.du@gmail.com Chowdhury Farhan Ahmed Department of Computer

More information

GiS: Fast Indexing and Querying of Graph Structures

GiS: Fast Indexing and Querying of Graph Structures GiS: Fast Indexing and Querying of Graph Structures Dipali Pal Praveen R. Rao Vasil Slavov {dp244@umkc.edu,raopr@umkc.edu,vgslavov@mail.umkc.edu} Technical Report UMKC-TR-DB-29-1 Abstract We propose a

More information

Data Mining Part 3. Associations Rules

Data Mining Part 3. Associations Rules Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets

More information

Advanced Data Management

Advanced Data Management Advanced Data Management Medha Atre Office: KD-219 atrem@cse.iitk.ac.in Sept 26, 2016 defined Given a graph G(V, E) with V as the set of nodes and E as the set of edges, a reachability query asks does

More information

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs Advanced Operations Research Techniques IE316 Quiz 1 Review Dr. Ted Ralphs IE316 Quiz 1 Review 1 Reading for The Quiz Material covered in detail in lecture. 1.1, 1.4, 2.1-2.6, 3.1-3.3, 3.5 Background material

More information

Discrete Mathematics

Discrete Mathematics Discrete Mathematics 310 (2010) 2769 2775 Contents lists available at ScienceDirect Discrete Mathematics journal homepage: www.elsevier.com/locate/disc Optimal acyclic edge colouring of grid like graphs

More information

Part II. Graph Theory. Year

Part II. Graph Theory. Year Part II Year 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2017 53 Paper 3, Section II 15H Define the Ramsey numbers R(s, t) for integers s, t 2. Show that R(s, t) exists for all s,

More information

Algorithm and Complexity of Disjointed Connected Dominating Set Problem on Trees

Algorithm and Complexity of Disjointed Connected Dominating Set Problem on Trees Algorithm and Complexity of Disjointed Connected Dominating Set Problem on Trees Wei Wang joint with Zishen Yang, Xianliang Liu School of Mathematics and Statistics, Xi an Jiaotong University Dec 20, 2016

More information

Biclustering with δ-pcluster John Tantalo. 1. Introduction

Biclustering with δ-pcluster John Tantalo. 1. Introduction Biclustering with δ-pcluster John Tantalo 1. Introduction The subject of biclustering is chiefly concerned with locating submatrices of gene expression data that exhibit shared trends between genes. That

More information

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please) Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in

More information

Solutions for the Exam 6 January 2014

Solutions for the Exam 6 January 2014 Mastermath and LNMB Course: Discrete Optimization Solutions for the Exam 6 January 2014 Utrecht University, Educatorium, 13:30 16:30 The examination lasts 3 hours. Grading will be done before January 20,

More information

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery : Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,

More information

Chapter 4: Mining Frequent Patterns, Associations and Correlations

Chapter 4: Mining Frequent Patterns, Associations and Correlations Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent

More information

Leveraging Transitive Relations for Crowdsourced Joins*

Leveraging Transitive Relations for Crowdsourced Joins* Leveraging Transitive Relations for Crowdsourced Joins* Jiannan Wang #, Guoliang Li #, Tim Kraska, Michael J. Franklin, Jianhua Feng # # Department of Computer Science, Tsinghua University, Brown University,

More information

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017) 1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Suggested Reading: Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Probabilistic Modelling and Reasoning: The Junction

More information

Copyright 2000, Kevin Wayne 1

Copyright 2000, Kevin Wayne 1 Guessing Game: NP-Complete? 1. LONGEST-PATH: Given a graph G = (V, E), does there exists a simple path of length at least k edges? YES. SHORTEST-PATH: Given a graph G = (V, E), does there exists a simple

More information

Chordal Graphs: Theory and Algorithms

Chordal Graphs: Theory and Algorithms Chordal Graphs: Theory and Algorithms 1 Chordal graphs Chordal graph : Every cycle of four or more vertices has a chord in it, i.e. there is an edge between two non consecutive vertices of the cycle. Also

More information

Clustering algorithms and introduction to persistent homology

Clustering algorithms and introduction to persistent homology Foundations of Geometric Methods in Data Analysis 2017-18 Clustering algorithms and introduction to persistent homology Frédéric Chazal INRIA Saclay - Ile-de-France frederic.chazal@inria.fr Introduction

More information

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying given that Maximum a posteriori (MAP query: given evidence 2 which has the highest probability: instantiation of all other variables in the network,, Most probable evidence (MPE: given evidence, find an

More information

Definition: A graph G = (V, E) is called a tree if G is connected and acyclic. The following theorem captures many important facts about trees.

Definition: A graph G = (V, E) is called a tree if G is connected and acyclic. The following theorem captures many important facts about trees. Tree 1. Trees and their Properties. Spanning trees 3. Minimum Spanning Trees 4. Applications of Minimum Spanning Trees 5. Minimum Spanning Tree Algorithms 1.1 Properties of Trees: Definition: A graph G

More information

Effective probabilistic stopping rules for randomized metaheuristics: GRASP implementations

Effective probabilistic stopping rules for randomized metaheuristics: GRASP implementations Effective probabilistic stopping rules for randomized metaheuristics: GRASP implementations Celso C. Ribeiro Isabel Rosseti Reinaldo C. Souza Universidade Federal Fluminense, Brazil July 2012 1/45 Contents

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

On Structural Parameterizations of the Matching Cut Problem

On Structural Parameterizations of the Matching Cut Problem On Structural Parameterizations of the Matching Cut Problem N. R. Aravind, Subrahmanyam Kalyanasundaram, and Anjeneya Swami Kare Department of Computer Science and Engineering, IIT Hyderabad, Hyderabad,

More information

Communication Networks I December 4, 2001 Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page 1

Communication Networks I December 4, 2001 Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page 1 Communication Networks I December, Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page Communication Networks I December, Notation G = (V,E) denotes a

More information

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2

More information

An Edge-Swap Heuristic for Finding Dense Spanning Trees

An Edge-Swap Heuristic for Finding Dense Spanning Trees Theory and Applications of Graphs Volume 3 Issue 1 Article 1 2016 An Edge-Swap Heuristic for Finding Dense Spanning Trees Mustafa Ozen Bogazici University, mustafa.ozen@boun.edu.tr Hua Wang Georgia Southern

More information

Community Detection. Community

Community Detection. Community Community Detection Community In social sciences: Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group a.k.a. group,

More information

CS Introduction to Data Mining Instructor: Abdullah Mueen

CS Introduction to Data Mining Instructor: Abdullah Mueen CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING) Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts

More information

Hardness of Subgraph and Supergraph Problems in c-tournaments

Hardness of Subgraph and Supergraph Problems in c-tournaments Hardness of Subgraph and Supergraph Problems in c-tournaments Kanthi K Sarpatwar 1 and N.S. Narayanaswamy 1 Department of Computer Science and Engineering, IIT madras, Chennai 600036, India kanthik@gmail.com,swamy@cse.iitm.ac.in

More information

Top-k Keyword Search Over Graphs Based On Backward Search

Top-k Keyword Search Over Graphs Based On Backward Search Top-k Keyword Search Over Graphs Based On Backward Search Jia-Hui Zeng, Jiu-Ming Huang, Shu-Qiang Yang 1College of Computer National University of Defense Technology, Changsha, China 2College of Computer

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

e-ccc-biclustering: Related work on biclustering algorithms for time series gene expression data

e-ccc-biclustering: Related work on biclustering algorithms for time series gene expression data : Related work on biclustering algorithms for time series gene expression data Sara C. Madeira 1,2,3, Arlindo L. Oliveira 1,2 1 Knowledge Discovery and Bioinformatics (KDBIO) group, INESC-ID, Lisbon, Portugal

More information

Construction of Minimum-Weight Spanners Mikkel Sigurd Martin Zachariasen

Construction of Minimum-Weight Spanners Mikkel Sigurd Martin Zachariasen Construction of Minimum-Weight Spanners Mikkel Sigurd Martin Zachariasen University of Copenhagen Outline Motivation and Background Minimum-Weight Spanner Problem Greedy Spanner Algorithm Exact Algorithm:

More information

Highway Dimension and Provably Efficient Shortest Paths Algorithms

Highway Dimension and Provably Efficient Shortest Paths Algorithms Highway Dimension and Provably Efficient Shortest Paths Algorithms Andrew V. Goldberg Microsoft Research Silicon Valley www.research.microsoft.com/ goldberg/ Joint with Ittai Abraham, Amos Fiat, and Renato

More information

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06

More information

Chapters 11 and 13, Graph Data Mining

Chapters 11 and 13, Graph Data Mining CSI 4352, Introduction to Data Mining Chapters 11 and 13, Graph Data Mining Young-Rae Cho Associate Professor Department of Computer Science Balor Universit Graph Representation Graph An ordered pair GV,E

More information

Approximating Node-Weighted Multicast Trees in Wireless Ad-Hoc Networks

Approximating Node-Weighted Multicast Trees in Wireless Ad-Hoc Networks Approximating Node-Weighted Multicast Trees in Wireless Ad-Hoc Networks Thomas Erlebach Department of Computer Science University of Leicester, UK te17@mcs.le.ac.uk Ambreen Shahnaz Department of Computer

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

More information

A tight bound on approximating arbitrary metrics by tree metric Reference :[FRT2004]

A tight bound on approximating arbitrary metrics by tree metric Reference :[FRT2004] A tight bound on approximating arbitrary metrics by tree metric Reference :[FRT2004] Jittat Fakcharoenphol, Satish Rao, Kunal Talwar Journal of Computer & System Sciences, 69 (2004), 485-497 1 A simple

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

More information

Complexity Results on Graphs with Few Cliques

Complexity Results on Graphs with Few Cliques Discrete Mathematics and Theoretical Computer Science DMTCS vol. 9, 2007, 127 136 Complexity Results on Graphs with Few Cliques Bill Rosgen 1 and Lorna Stewart 2 1 Institute for Quantum Computing and School

More information

Lecture 5: Exact inference

Lecture 5: Exact inference Lecture 5: Exact inference Queries Inference in chains Variable elimination Without evidence With evidence Complexity of variable elimination which has the highest probability: instantiation of all other

More information

Chapter 23. Minimum Spanning Trees

Chapter 23. Minimum Spanning Trees Chapter 23. Minimum Spanning Trees We are given a connected, weighted, undirected graph G = (V,E;w), where each edge (u,v) E has a non-negative weight (often called length) w(u,v). The Minimum Spanning

More information

1 More configuration model

1 More configuration model 1 More configuration model In the last lecture, we explored the definition of the configuration model, a simple method for drawing networks from the ensemble, and derived some of its mathematical properties.

More information

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey G. Shivaprasad, N. V. Subbareddy and U. Dinesh Acharya

More information

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007 CS880: Approximations Algorithms Scribe: Chi Man Liu Lecturer: Shuchi Chawla Topic: Local Search: Max-Cut, Facility Location Date: 2/3/2007 In previous lectures we saw how dynamic programming could be

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu /2/8 Jure Leskovec, Stanford CS246: Mining Massive Datasets 2 Task: Given a large number (N in the millions or

More information

Data mining, 4 cu Lecture 8:

Data mining, 4 cu Lecture 8: 582364 Data mining, 4 cu Lecture 8: Graph mining Spring 2010 Lecturer: Juho Rousu Teaching assistant: Taru Itäpelto Frequent Subgraph Mining Extend association rule mining to finding frequent subgraphs

More information

Market baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged.

Market baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged. Frequent itemset Association&decision rule mining University of Szeged What frequent itemsets could be used for? Features/observations frequently co-occurring in some database can gain us useful insights

More information

Mining Frequent Patterns without Candidate Generation

Mining Frequent Patterns without Candidate Generation Mining Frequent Patterns without Candidate Generation Outline of the Presentation Outline Frequent Pattern Mining: Problem statement and an example Review of Apriori like Approaches FP Growth: Overview

More information

CPS 102: Discrete Mathematics. Quiz 3 Date: Wednesday November 30, Instructor: Bruce Maggs NAME: Prob # Score. Total 60

CPS 102: Discrete Mathematics. Quiz 3 Date: Wednesday November 30, Instructor: Bruce Maggs NAME: Prob # Score. Total 60 CPS 102: Discrete Mathematics Instructor: Bruce Maggs Quiz 3 Date: Wednesday November 30, 2011 NAME: Prob # Score Max Score 1 10 2 10 3 10 4 10 5 10 6 10 Total 60 1 Problem 1 [10 points] Find a minimum-cost

More information

NP Completeness. Andreas Klappenecker [partially based on slides by Jennifer Welch]

NP Completeness. Andreas Klappenecker [partially based on slides by Jennifer Welch] NP Completeness Andreas Klappenecker [partially based on slides by Jennifer Welch] Dealing with NP-Complete Problems Dealing with NP-Completeness Suppose the problem you need to solve is NP-complete. What

More information

Vertex-Colouring Edge-Weightings

Vertex-Colouring Edge-Weightings Vertex-Colouring Edge-Weightings L. Addario-Berry a, K. Dalal a, C. McDiarmid b, B. A. Reed a and A. Thomason c a School of Computer Science, McGill University, University St. Montreal, QC, H3A A7, Canada

More information

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Midterm Examination CS540-2: Introduction to Artificial Intelligence Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search

More information

Computational Molecular Biology

Computational Molecular Biology Computational Molecular Biology Erwin M. Bakker Lecture 3, mainly from material by R. Shamir [2] and H.J. Hoogeboom [4]. 1 Pairwise Sequence Alignment Biological Motivation Algorithmic Aspect Recursive

More information

On Multiple Query Optimization in Data Mining

On Multiple Query Optimization in Data Mining On Multiple Query Optimization in Data Mining Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland {marek,mzakrz}@cs.put.poznan.pl

More information

A Framework for Efficient Fingerprint Identification using a Minutiae Tree

A Framework for Efficient Fingerprint Identification using a Minutiae Tree A Framework for Efficient Fingerprint Identification using a Minutiae Tree Praveer Mansukhani February 22, 2008 Problem Statement Developing a real-time scalable minutiae-based indexing system using a

More information

On the Max Coloring Problem

On the Max Coloring Problem On the Max Coloring Problem Leah Epstein Asaf Levin May 22, 2010 Abstract We consider max coloring on hereditary graph classes. The problem is defined as follows. Given a graph G = (V, E) and positive

More information

Practice Problems for the Final

Practice Problems for the Final ECE-250 Algorithms and Data Structures (Winter 2012) Practice Problems for the Final Disclaimer: Please do keep in mind that this problem set does not reflect the exact topics or the fractions of each

More information

TRIE BASED METHODS FOR STRING SIMILARTIY JOINS

TRIE BASED METHODS FOR STRING SIMILARTIY JOINS TRIE BASED METHODS FOR STRING SIMILARTIY JOINS Venkat Charan Varma Buddharaju #10498995 Department of Computer and Information Science University of MIssissippi ENGR-654 INFORMATION SYSTEM PRINCIPLES RESEARCH

More information

Theorem 2.9: nearest addition algorithm

Theorem 2.9: nearest addition algorithm There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used

More information

Geometric data structures:

Geometric data structures: Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other

More information

BAYESIAN NETWORKS STRUCTURE LEARNING

BAYESIAN NETWORKS STRUCTURE LEARNING BAYESIAN NETWORKS STRUCTURE LEARNING Xiannian Fan Uncertainty Reasoning Lab (URL) Department of Computer Science Queens College/City University of New York http://url.cs.qc.cuny.edu 1/52 Overview : Bayesian

More information

Data Mining in Bioinformatics Day 5: Graph Mining

Data Mining in Bioinformatics Day 5: Graph Mining Data Mining in Bioinformatics Day 5: Graph Mining Karsten Borgwardt February 25 to March 10 Bioinformatics Group MPIs Tübingen from Borgwardt and Yan, KDD 2008 tutorial Graph Mining and Graph Kernels,

More information

DO NOT RE-DISTRIBUTE THIS SOLUTION FILE

DO NOT RE-DISTRIBUTE THIS SOLUTION FILE Professor Kindred Math 104, Graph Theory Homework 2 Solutions February 7, 2013 Introduction to Graph Theory, West Section 1.2: 26, 38, 42 Section 1.3: 14, 18 Section 2.1: 26, 29, 30 DO NOT RE-DISTRIBUTE

More information

Double Vertex Graphs and Complete Double Vertex Graphs. Jobby Jacob, Wayne Goddard and Renu Laskar Clemson University April, 2007

Double Vertex Graphs and Complete Double Vertex Graphs. Jobby Jacob, Wayne Goddard and Renu Laskar Clemson University April, 2007 Double Vertex Graphs and Complete Double Vertex Graphs Jobby Jacob, Wayne Goddard and Renu Laskar Clemson University April, 2007 Abstract Let G = (V, E) be a graph of order n 2. The double vertex graph,

More information

CS161 - Final Exam Computer Science Department, Stanford University August 16, 2008

CS161 - Final Exam Computer Science Department, Stanford University August 16, 2008 CS161 - Final Exam Computer Science Department, Stanford University August 16, 2008 Name: Honor Code 1. The Honor Code is an undertaking of the students, individually and collectively: a) that they will

More information