Céline Robardet. December 9, Habilitation à diriger des recherches

Size: px
Start display at page:

Download "Céline Robardet. December 9, Habilitation à diriger des recherches"

Transcription

1 Céline Robardet December 9, 2013 Habilitation à diriger des recherches 1 / 51

2 Introduction Habilitation à diriger des recherches Introduction 2 / 51

3 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51

4 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51

5 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51

6 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51

7 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51

8 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51

9 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51

10 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51

11 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51

12 Data mining Principle Searching for interested patterns/models in the data Data mining ingredients: a language of patterns/models some criteria a search strategy Habilitation à diriger des recherches Introduction 4 / 51

13 Data mining Principle Searching for interested patterns/models in the data Data mining ingredients: a language of patterns/models some criteria a search strategy Habilitation à diriger des recherches Introduction 4 / 51

14 Data mining algorithms Descriptive vs Predictive Pattern mining Classification Clustering Regression Apriori Decision tree Hierarchical Linear Regression FP-Growth Neural Networks Partitional Non-Linear Regression Bayesian Density-Based Logical Regression K Nearest Neighbors Support Vectors Machine Genetic Algorithm Habilitation à diriger des recherches Introduction 5 / 51

15 Data mining algorithms Exact vs Heuristic approaches Pattern mining Classification Clustering Regression Apriori Decision tree Hierarchical Linear Regression FP-Growth Neural Networks Partitional Non-Linear Regression Bayesian Density-Based Logical Regression K Nearest Neighbors Support Vectors Machine Genetic Algorithm Habilitation à diriger des recherches Introduction 5 / 51

16 Constraint-based pattern mining Developed for extracting itemsets in transaction databases Can it be used for the analysis of complex systems? Complex systems: refer to a broader class of data (transportation network, internet, gene network, social network) are composed by a large number of highly interconnected dynamical units can be modeled by dynamic attributed relational graphs Habilitation à diriger des recherches Introduction 6 / 51

17 Constraint-based pattern mining Developed for extracting itemsets in transaction databases Can it be used for the analysis of complex systems? Complex systems: refer to a broader class of data (transportation network, internet, gene network, social network) are composed by a large number of highly interconnected dynamical units can be modeled by dynamic attributed relational graphs Habilitation à diriger des recherches Introduction 6 / 51

18 Relational graphs Graphs that are often Dynamic: when vertices and edges appear/disappear through time Attributed: when another relation describes the vertices themselves Relational graph Vertices are uniquely identified through time Habilitation à diriger des recherches Introduction 7 / 51

19 Some inductive queries What are the sub-graphs What are the vertex whose vertex attributes evolve similarly? attributes that strongly co-vary with the graph structure? Co-authors that published at ICDE with a high degree and a low clustering coefficient Habilitation à diriger des recherches Airports whose arrival delays increased over the three weeks following Katrina hurricane Introduction 8 / 51

20 Outline Constraint-based pattern mining General framework Constraint properties Algorithmic principles Relational dynamic graphs Statistical description Simulation Mining attributed dynamic graphs Topological patterns in static attributed graphs Trend dynamic sub-graphs Conclusions and perspectives Habilitation à diriger des recherches Introduction 9 / 51

21 Constraint-based pattern mining Habilitation à diriger des recherches Constraint-based pattern mining 10 / 51

22 Pattern mining A Pattern φ describes a subgroup of the data D observed several times or associated with characteristic properties The pattern shape is fixed: φ L whose cartinality is exponential or infinite in the size of the data C evaluates the adequacy of the pattern to the data C(φ, D) Boolean Pattern mining task: Find all interesting subgroups Th(L, D, C) = {φ L C(φ, D) is true } Exact approach Habilitation à diriger des recherches Constraint-based pattern mining 11 / 51

23 Pattern mining A Pattern φ describes a subgroup of the data D observed several times or associated with characteristic properties The pattern shape is fixed: φ L whose cartinality is exponential or infinite in the size of the data C evaluates the adequacy of the pattern to the data C(φ, D) Boolean Pattern mining task: Find all interesting subgroups Th(L, D, C) = {φ L C(φ, D) is true } Exact approach Habilitation à diriger des recherches Constraint-based pattern mining 11 / 51

24 Pattern constraints Constraints are needed for: only retrieving patterns that describe an interesting subgroup of the data making the extraction feasible Constraint properties are used to infer constraint values on (many) patterns without having to evaluate them individually They are defined up to the partial order used for listing the patterns Habilitation à diriger des recherches Constraint-based pattern mining 12 / 51

25 Pattern constraints Constraints are needed for: only retrieving patterns that describe an interesting subgroup of the data making the extraction feasible Constraint properties are used to infer constraint values on (many) patterns without having to evaluate them individually They are defined up to the partial order used for listing the patterns Habilitation à diriger des recherches Constraint-based pattern mining 12 / 51

26 Pattern constraints Constraints are needed for: only retrieving patterns that describe an interesting subgroup of the data making the extraction feasible Constraint properties are used to infer constraint values on (many) patterns without having to evaluate them individually They are defined up to the partial order used for listing the patterns Habilitation à diriger des recherches Constraint-based pattern mining 12 / 51

27 Constraint properties - 1 Monotone constraint φ 1 φ 2, C(φ 1, D) C(φ 2, D) abcde Anti-monotone constraint φ 1 φ 2, C(φ 2, D) C(φ 1, D) abcde abcd abce abde acde bcde abcd abce abde acde bcde abc abd abe acd ace ade bcd bce bde cde abc abd abe acd ace ade bcd bce bde cde ab ac ad ae bc bd be cd ce de ab ac ad ae bc bd be cd ce de a b c d e a b c d e C(φ, D) b φ c φ C(φ, D) a φ c φ Habilitation à diriger des recherches Constraint-based pattern mining 13 / 51

28 Constraint properties - 2 Convertible constraints Loose AM constraints is extended to the prefix order so C(φ, D) e φ : C(φ \ {e}, D) that φ 1 φ 2, C(φ 2, D) C(φ 1, D) abcde abcde abcd abce abde acde bcde abcd abce abde acde bcde abc abd abe acd ace ade bcd bce bde cde abc abd abe acd ace ade bcd bce bde cde ab ac ad ae bc bd be cd ce de ab ac ad ae bc bd be cd ce de a b c d e a b c d e C(φ, w) avg(w(φ)) > σ w(a) w(b) w(c) w(d) w(e) C(φ, w) var(w(φ)) σ Pei and Han Bonchi and Lucchese Habilitation à diriger des recherches Constraint-based pattern mining 14 / 51

29 Enumeration strategy Binary partition: the element 'a' is enumerated abcde abcd abce abde acde bcde abc abd abe acd ace ade bcd bce bde cde ab ac ad ae bc bd be cd ce de a b c d e Habilitation à diriger des recherches Constraint-based pattern mining 15 / 51

30 Enumeration strategy Binary partition: the element 'a' is enumerated R R a R \ R R {a} R \ {a} R R Habilitation à diriger des recherches Constraint-based pattern mining 15 / 51

31 Constraint evaluation Monotone constraint R C(R, D) is false Anti-monotone constraint R empty empty R R C(R, D) is false Habilitation à diriger des recherches Constraint-based pattern mining 16 / 51

32 Constraint evaluation Monotone constraint R C(R, D) is false Anti-monotone constraint R empty empty R R C(R, D) is false Habilitation à diriger des recherches Constraint-based pattern mining 16 / 51

33 Constraint evaluation Monotone constraint R C(R, D) is false Anti-monotone constraint R empty empty R R C(R, D) is false Habilitation à diriger des recherches Constraint-based pattern mining 16 / 51

34 Constraint evaluation Monotone constraint R C(R, D) is false Anti-monotone constraint R empty empty R R C(R, D) is false Habilitation à diriger des recherches Constraint-based pattern mining 16 / 51

35 A new class of constraints Piecewise monotone and anti-monotone constraints 1 C involves p times the pattern φ: C(φ, D) = f(φ 1, φ p, D) 2 f i,φ (x) = (φ 1, φ i 1, x, φ i+1,, φ p, D) 3 i = 1 p, f i,φ is either monotone or anti-monotone: x y, { fi,φ (x) f i,φ (y) iff f i,φ is monotone f i,φ (y) f i,φ (x) iff f i,φ is anti-monotone L Cerf, J Besson, C Robardet, J-F Boulicaut Habilitation à diriger des recherches Constraint-based pattern mining 17 / 51

36 An example e, w(e) 0 C(φ, w) avg(w(φ)) > σ e φ w(e) φ > σ C(φ, D) is piecewise monotone and anti-monotone with e φ f(φ 1, φ 2, D) = 1 w(e) φ 2 x y, f 1,φ is monotone: f(x, φ 2, D) = f 2,φ is anti-monotone: f(φ 1, y, D) = e φ w(e) 1 y > σ e x w(e) e y φ 2 > σ w(e) φ 2 > σ e φ w(e) 1 x > σ Habilitation à diriger des recherches Constraint-based pattern mining 18 / 51

37 Piecewise constraint exploitation R Evaluation If f(r, R, D) = then R is empty e R w(e) R σ empty Propagation R e R \ R such that f(r \ {e}, R, D) σ, then e is moved in R e R \ R such that f(r, R {e}, D) σ, then e is removed from R Habilitation à diriger des recherches Constraint-based pattern mining 19 / 51

38 Piecewise constraint exploitation R Evaluation If f(r, R, D) = then R is empty e R w(e) R σ empty Propagation R e R \ R such that f(r \ {e}, R, D) σ, then e is moved in R e R \ R such that f(r, R {e}, D) σ, then e is removed from R Habilitation à diriger des recherches Constraint-based pattern mining 19 / 51

39 Algorithmic principles Function Generic_CBPM_enumeration(R, R ) 1: if Check_constraints(R, R ) then 2: (R, R ) Constraint_Propagation(R, R ) 3: if R = R then 4: output R 5: else 6: for all e R \ R do 7: Generic_CBPM_Enumeration(R {e}, R ) 8: Generic_CBPM_Enumeration(R, R \ {e}) 9: end for 10: end if 11: end if Habilitation à diriger des recherches Constraint-based pattern mining 20 / 51

40 Case studies Mining of Formal concepts [IDA journal 05] Fault-tolerant patterns [KDID 05, ICCS 06] Closed patterns in n-ary relations [SDM 08] Parallel episodes [SDM 09] Subspace clustering [SDM 09] Evolution patterns in dynamic graphs [ICDM 09] Topological patterns in static attributed graphs [ICDM 12 (demo), TKDE 13] Trend dynamic sub-graphs [DS 12, PKDD 13] Habilitation à diriger des recherches Constraint-based pattern mining 21 / 51

41 Relational dynamic graphs Habilitation à diriger des recherches Relational dynamic graphs 22 / 51

42 Complex network analysis A large set of tools can be used on any static complex network Such networks exhibit non-trivial topological features Boccaletti, Stefano, Vito Latora, Yamir Moreno, Martin Chavez and D-U Hwang Habilitation à diriger des recherches Relational dynamic graphs 23 / 51

43 Complex network analysis A large set of tools can be used on any static complex network Such networks exhibit non-trivial topological features Assortative Disassortative Boccaletti, Stefano, Vito Latora, Yamir Moreno, Martin Chavez and D-U Hwang Habilitation à diriger des recherches Relational dynamic graphs 23 / 51

44 Complex network analysis A large set of tools can be used on any static complex network Such networks exhibit non-trivial topological features Boccaletti, Stefano, Vito Latora, Yamir Moreno, Martin Chavez and D-U Hwang Habilitation à diriger des recherches Relational dynamic graphs 23 / 51

45 Dynamic complex networks Generic technique Identify the rules that govern the network evolution Generate random dynamic networks Means Statistical analysis of graph properties as functions of time Study of the evolution of some structures A Scherrer, P Borgnat, E Fleury, JL Guillaume, C Robardet Habilitation à diriger des recherches Relational dynamic graphs 24 / 51

46 Sensor mobility networks 41 persons kept a sensor during 3 days: Vertices are human beings Edges stand for wireless communication Habilitation à diriger des recherches Relational dynamic graphs 25 / 51

47 Properties as function of time High correlation coefficient in time between E(t): the number of edges V(t): the number of connected vertices D(t): the average degree N c (t): the number of connected components Edge creation and deletion processes are memory-less and have a low correlation coefficient with other properties Contact and inter-contact periods are distributed as power-laws P[X>x] P[X>x] Contact Duration Inter contact Duration Habilitation à diriger des recherches Relational dynamic graphs 26 / 51

48 Dynamic structural properties Instability of connected components Number of components Total lifetime Total lifetime Number of vertices Triangle creations: 40% of link creations increase the number of triangles 6% of inactive links would create a triangle if activated Edges whose endpoints have a common neighbors are more likely to be activated Habilitation à diriger des recherches Relational dynamic graphs 27 / 51

49 Simulation of dynamic graphs Algorithm ingredients each edge is simulated independently transition probabilities are independent in time the empirical (power law) contact and inter-contact period distributions are imposed the transition probability is weighted by: the distribution of the number of connected components the dynamics of triangles Habilitation à diriger des recherches Relational dynamic graphs 28 / 51

50 Validation Qualitative summaries of the trajectories of vertices among few identified structures of the graph are very similar: Original data Simulated data Habilitation à diriger des recherches Relational dynamic graphs 29 / 51

51 Mining attributed dynamic graphs Habilitation à diriger des recherches Mining attributed dynamic graphs 30 / 51

52 Mining attributed static graphs Composed by two relations Are these two relations linked? Do the attribute values depend on the role played by the vertex in the graph? A new pattern domain that identifies sets of vertex attributes that co-vary with the graph structure the graph structure is captured by topological properties A Prado, M Plantevit, C Robardet, J-F Boulicaut Habilitation à diriger des recherches Mining attributed dynamic graphs 31 / 51

53 Topological patterns Topological properties derived from the edges of the graph Direct neighborhood degree, clustering Connection to all other vertices Centrality measures Topological patterns M: set of all properties M {+, } m s (m, s) M {+, } L = 2 M {+, } h 9 i t B 1 h 0 h 2 i t 7 2 i t A C 0 h 17 P 0 h 2 i t h 1 D h 25 i t 52 9 N 0 73 i t 6 1 i t O 166 h 47 h 32 E i t 0 8 i t 1 12 h h 13 M I F 12 i t 53 0 h 7 i t 25 7 i t H h 8 0 h J 3 L 0 1 h 6 G K 0 i t 4 15 i t 5 3 h i t 45 4 i t 3 5 Habilitation à diriger des recherches Mining attributed dynamic graphs 32 / 51

54 Topological constraint Sets of properties that behave in a similar manner for a large number of vertex pairs Kendall tau rank correlation coefficient based on the ranks of pattern property values counts the number of vertex pairs that are ordered similarly on all pattern properties Age Income Habilitation à diriger des recherches Mining attributed dynamic graphs 33 / 51

55 Constraint properties Let P L, Supp τ (P) = {(u,v) V2 m s P:m(u) sm(v)} ( 2) n s { < when s is + > when s is C topo is anti-monotone and n is the number of vertices C topo (P, D) Supp τ (P) σ Redundant symmetrical patterns with Supp τ (A +, B ) = Supp τ (A, B + ) Support computation quadratic on the number of vertices: An upper bound to avoid, in linear time, some support computation ECLAT mining algorithm with range trees to ease the search of supporting vertices Habilitation à diriger des recherches Mining attributed dynamic graphs 34 / 51

56 Constraint properties Let P L, Supp τ (P) = {(u,v) V2 m s P:m(u) sm(v)} ( 2) n s { < when s is + > when s is C topo is anti-monotone and n is the number of vertices C topo (P, D) Supp τ (P) σ Redundant symmetrical patterns with Supp τ (A +, B ) = Supp τ (A, B + ) Support computation quadratic on the number of vertices: An upper bound to avoid, in linear time, some support computation ECLAT mining algorithm with range trees to ease the search of supporting vertices Habilitation à diriger des recherches Mining attributed dynamic graphs 34 / 51

57 Results on DBLP - 1 Co-author graph (DBLP) vertices edges (a common publication) 29 local attributes (number of publications in Database and data mining conferences 9 topological properties Some Results {SAC +, ECML/PKDD } {SAC +, KDD }, {SAC +, VLDB } {SAC +, PageRank } Is publishing to SAC penalizing? Of course not! SAC has a broader scope than DB and DM Habilitation à diriger des recherches Mining attributed dynamic graphs 35 / 51

58 Results on DBLP - 1 Co-author graph (DBLP) vertices edges (a common publication) 29 local attributes (number of publications in Database and data mining conferences 9 topological properties Some Results {SAC +, ECML/PKDD } {SAC +, KDD }, {SAC +, VLDB } {SAC +, PageRank } Is publishing to SAC penalizing? Of course not! SAC has a broader scope than DB and DM Habilitation à diriger des recherches Mining attributed dynamic graphs 35 / 51

59 Results on DBLP - 1 Co-author graph (DBLP) vertices edges (a common publication) 29 local attributes (number of publications in Database and data mining conferences 9 topological properties Some Results {SAC +, ECML/PKDD } {SAC +, KDD }, {SAC +, VLDB } {SAC +, PageRank } Is publishing to SAC penalizing? Of course not! SAC has a broader scope than DB and DM Habilitation à diriger des recherches Mining attributed dynamic graphs 35 / 51

60 Results on DBLP - 2 Emerging patterns wrt the graph structure Which conferences are more related to Deg+ and Between+? Rank Prk+ Deg+ Publication Factor ECML/PKDD IEEE TKDE PAKDD 221 DASFAA ICDM 195 Prk+ Deg+ ECML/PKDD+ Christos Faloutsos Jiawei Han Philip S Yu Bing Liu C Lee Giles Prk+ Between+ Publication Factor PVLDB EDBT VLDB J SIGMOD 425 ICDE+ 342 Prk+ Between+ PVLDB+ Gerhard Weikum Jiawei Han David Maier Philip S Yu Hector Garcia-Molina PVLDB+, Deg+, Between+ Habilitation à diriger des recherches Mining attributed dynamic graphs 36 / 51

61 Dynamic attributed graph a1 a2 a A B a1 a2 a a1 a2 a A B a1 a2 a a1 a2 a A B a1 a2 a a1 a2 a a1 a2 a3 F C a1 a2 a a1 a2 a3 F C a1 a2 a a1 a2 a3 F C a1 a2 a E D a1 a2 a a1 a2 a E D a1 a2 a a1 a2 a E D a1 a2 a ti 1 ti ti+1 ti+2 ti+3 The data {G t = (V, E t ), t = 1,, t max } and A a set of ordinal attributes: a i : V T D i, with D i the domain of a i Habilitation à diriger des recherches Mining attributed dynamic graphs 37 / 51

62 Trend Dynamic Sub-graphs What are the sub-graphs whose vertex attributes evolve similarly? Mining maximal sub-graphs that satisfy some constraints on the graph topology and on the attribute values The connectivity of the dynamic subgraphs is constrained by a maximum diameter value To be more robust towards intrinsic inter-individual variability, we do not compare raw numerical values, but their trends E Desmier, M Plantevit, C Robardet, J-F Boulicaut Habilitation à diriger des recherches Mining attributed dynamic graphs 38 / 51

63 Language of patterns L = {(U, S, Ω) U V, S = t 1,, t s, Ω A {+, }} Example: ({A, C, D}, t 0, t 1, {a + 1,a 3 }) L a1 0 a1 3 a1 0 a2 3 a2 5 a2 2 a3 5 a3 3 a3 1 A B a1 2 a2 4 a3 5 a1 a2 a3 F C E D a1 2 a2 1 a3 4 a1 1 a1 5 a2 3 a1 2 a2 2 a3 4 a2 0 a3 1 F a3 0 A E a1 2 B C D a1 4 a2 1 a1 3 a2 2 a3 0 a2 2 a3 1 a3 4 a1 1 a1 6 a1 3 a2 3 a2 3 a2 1 a3 2 a3 0 a3 3 F A E a1 2 B C D a1 5 a2 1 a1 5 a2 1 a3 0 a2 4 a3 0 a3 3 t0 t1 t2 Habilitation à diriger des recherches Mining attributed dynamic graphs 39 / 51

64 Definition of constraints A pattern (U, S, Ω) satisfies C diam ((U, S, Ω), D) diameter Gt(U) k, t S 1 Diameter C trend ((U, S, Ω), D) u U, t S, (a, m) Ω { a(u, t) < a(u, t + 1), if m = + a(u, t) > a(u, t + 1), if m = C max ((U, S, Ω), D) U, S and Ω cannot be enlarged without invalidating one or both of the above constraints Habilitation à diriger des recherches Mining attributed dynamic graphs 40 / 51

65 Constraint properties C trend ((U, S, Ω), D) is anti-monotone C diam ((U, S, Ω), D) is piecewise monotone: diameter Gt (U) k max v,w U d G t (U)(v, w) k f(u 1, U 2, D) = max v,w U 1 d Gt(U 2 )(v, w) f 1,U is anti-monotone ie if it is satisfied on U 1, it is also satisfied for any of its subsets; f 2,U is monotone ie if it is satisfied on G t (U 2 ), then, adding some vertices and edges to G t (U 2 ) will not increase its value We can use the propagation mechanisms C max ((U, S, Ω), D) is pushed using specific mechanisms Habilitation à diriger des recherches Mining attributed dynamic graphs 41 / 51

66 Katrina hurricane results Top pattern wrt time specificity (in red) Top pattern wrt trend specificity (Yellow) 71 airports whose arrival delays increase over 3 weeks 5 airports whose number of flights increased over 3 weeks Arrival delays never increased in these airports during another week Substitutions flights were provided from these airports during this period The hurricane strongly influenced the domestic flight punctuality This behavior is rather rare in the rest of the graph Habilitation à diriger des recherches Mining attributed dynamic graphs 42 / 51

67 Conclusions and perspectives Habilitation à diriger des recherches Conclusions and perspectives 43 / 51

68 Conclusion Constraint-based pattern mining framework can be used to analyze dynamic graphs Patterns combine information about the topological structure, the vertex attribute values and their tendencies A wide variety of data can be mined Provide new insights on the techniques that can be used to analyze dynamic graphs Habilitation à diriger des recherches Conclusions and perspectives 44 / 51

69 Challenges and perspectives - 1 Enhancing the quality of the extracted information by capturing changes in the graph structure while allowing changes in trend direction Combining sub-graph and sequence mining Habilitation à diriger des recherches Conclusions and perspectives 45 / 51

70 Challenges and perspectives - 2 Avoiding the known drawbacks of pattern mining approaches How to get rid of redundant patterns? How to fix the threshold values? Returning patterns whose constraint values are surprising with respect to what could be expected from randomized data with respect to what happened in the past (real-time data mining) Trigger alerts when changes in behavior Habilitation à diriger des recherches Conclusions and perspectives 46 / 51

71 Challenges and perspectives - 2 Avoiding the known drawbacks of pattern mining approaches How to get rid of redundant patterns? How to fix the threshold values? Returning patterns whose constraint values are surprising with respect to what could be expected from randomized data with respect to what happened in the past (real-time data mining) Trigger alerts when changes in behavior Habilitation à diriger des recherches Conclusions and perspectives 46 / 51

72 Challenges and perspectives - 3 Pattern without threshold values: Skyline patterns Retrieves the patterns that are not dominated by any other pattern Tid Items t 1 A B C D E F t 2 A B C D E F t 3 A B t 4 D t 5 A C t 6 E Patterns freq length area ABCDEF AB AC A ABCD C A Soulet, C Raissi, M Plantevit, B Cremilleux Habilitation à diriger des recherches Conclusions and perspectives 47 / 51

73 Ongoing research projects Vél'innov - ANR INOV labex IMU project Constraint-based pattern mining of Vélo'V data Characterizing the usage of the bicycle sharing system Modeling the system to simulate the impacts of local changes (in the city or in the system) Pitfall: Vertex attributes are static (INSEE data) or derived from the edges (eg, number of users of a considered category) Searching for relationships between attributes and the graph structure does not make sense Envisioned solutions: Considering the user trajectories and looking for paths that characterize a user group Considering an attributed static graph with temporal usage vectors associated to edges and seeking for sub-graphs homogeneous on vertex and edges attributes Habilitation à diriger des recherches Conclusions and perspectives 48 / 51

74 Upcoming research projects - 1 GRAISearch: enhancing a location-based social media search engine FP7-PEOPLE-2013-IAPP: Industry-Academia Partnerships and Pathways Benefits for DM2L team 40 researcher months Access to data with high added value An industrial cooperation with critical potential benefits for the company Habilitation à diriger des recherches Conclusions and perspectives 49 / 51

75 Upcoming research projects - 2 GRAISearch: Research tasks Geo-localized demographic flow prediction using geo-tagged social media uploaded data Geo-localized event detection algorithm Integration into a geo-located social media recommender system Habilitation à diriger des recherches Conclusions and perspectives 50 / 51

76 Acknowledgments Habilitation à diriger des recherches Conclusions and perspectives 51 / 51

Discret event trace analysis: from an automatical to an interactive process

Discret event trace analysis: from an automatical to an interactive process Discret event trace analysis: from an automatical to an interactive process Céline Robardet Lundi 12 mai 2014 SILEX seminar 1 / 43 Data Mining: a KDD step Making sense of voluminous data Elicit hypothesis

More information

Effectiveness of Freq Pat Mining

Effectiveness of Freq Pat Mining Effectiveness of Freq Pat Mining Too many patterns! A pattern a 1 a 2 a n contains 2 n -1 subpatterns Understanding many patterns is difficult or even impossible for human users Non-focused mining A manager

More information

Mining Frequent Patterns without Candidate Generation

Mining Frequent Patterns without Candidate Generation Mining Frequent Patterns without Candidate Generation Outline of the Presentation Outline Frequent Pattern Mining: Problem statement and an example Review of Apriori like Approaches FP Growth: Overview

More information

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Apriori Algorithm For a given set of transactions, the main aim of Association Rule Mining is to find rules that will predict the occurrence of an item based on the occurrences of the other items in the

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Spring 2013 " An second class in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Frequent Pattern Mining Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Item sets A New Type of Data Some notation: All possible items: Database: T is a bag of transactions Transaction transaction

More information

Chapter 4: Mining Frequent Patterns, Associations and Correlations

Chapter 4: Mining Frequent Patterns, Associations and Correlations Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 16: Association Rules Jan-Willem van de Meent (credit: Yijun Zhao, Yi Wang, Tan et al., Leskovec et al.) Apriori: Summary All items Count

More information

Building Roads. Page 2. I = {;, a, b, c, d, e, ab, ac, ad, ae, bc, bd, be, cd, ce, de, abd, abe, acd, ace, bcd, bce, bde}

Building Roads. Page 2. I = {;, a, b, c, d, e, ab, ac, ad, ae, bc, bd, be, cd, ce, de, abd, abe, acd, ace, bcd, bce, bde} Page Building Roads Page 2 2 3 4 I = {;, a, b, c, d, e, ab, ac, ad, ae, bc, bd, be, cd, ce, de, abd, abe, acd, ace, bcd, bce, bde} Building Roads Page 3 2 a d 3 c b e I = {;, a, b, c, d, e, ab, ac, ad,

More information

Mining Association Rules in Large Databases

Mining Association Rules in Large Databases Mining Association Rules in Large Databases Association rules Given a set of transactions D, find rules that will predict the occurrence of an item (or a set of items) based on the occurrences of other

More information

Association Rules. A. Bellaachia Page: 1

Association Rules. A. Bellaachia Page: 1 Association Rules 1. Objectives... 2 2. Definitions... 2 3. Type of Association Rules... 7 4. Frequent Itemset generation... 9 5. Apriori Algorithm: Mining Single-Dimension Boolean AR 13 5.1. Join Step:...

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

Advance Association Analysis

Advance Association Analysis Advance Association Analysis 1 Minimum Support Threshold 3 Effect of Support Distribution Many real data sets have skewed support distribution Support distribution of a retail data set 4 Effect of Support

More information

Frequent Pattern Mining

Frequent Pattern Mining Frequent Pattern Mining How Many Words Is a Picture Worth? E. Aiden and J-B Michel: Uncharted. Reverhead Books, 2013 Jian Pei: CMPT 741/459 Frequent Pattern Mining (1) 2 Burnt or Burned? E. Aiden and J-B

More information

Chapter 4: Association analysis:

Chapter 4: Association analysis: Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their day-to-day operations, huge amounts of customer purchase data are collected daily

More information

Preference-Based Pattern Mining

Preference-Based Pattern Mining Preference-Based Pattern Mining Marc Plantevit Université Claude Bernard Lyon 1 LIRIS CNRS UMR5205 * Slides from different research school lectures and conference tutorials ebiss 2018 About me marcplantevit@univ-lyon1fr

More information

Frequent Pattern Mining

Frequent Pattern Mining Frequent Pattern Mining...3 Frequent Pattern Mining Frequent Patterns The Apriori Algorithm The FP-growth Algorithm Sequential Pattern Mining Summary 44 / 193 Netflix Prize Frequent Pattern Mining Frequent

More information

BCB 713 Module Spring 2011

BCB 713 Module Spring 2011 Association Rule Mining COMP 790-90 Seminar BCB 713 Module Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline What is association rule mining? Methods for association rule mining Extensions

More information

Association Pattern Mining. Lijun Zhang

Association Pattern Mining. Lijun Zhang Association Pattern Mining Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction The Frequent Pattern Mining Model Association Rule Generation Framework Frequent Itemset Mining Algorithms

More information

Association rules. Marco Saerens (UCL), with Christine Decaestecker (ULB)

Association rules. Marco Saerens (UCL), with Christine Decaestecker (ULB) Association rules Marco Saerens (UCL), with Christine Decaestecker (ULB) 1 Slides references Many slides and figures have been adapted from the slides associated to the following books: Alpaydin (2004),

More information

Preference-based Pattern Mining

Preference-based Pattern Mining Preference-based Pattern Mining Bruno Crémilleux, Marc Plantevit, Arnaud Soulet Rennes, France - June 13, 2017 Introduction 2/96 Who are we? Bruno Crémilleux, Professor, Univ. Caen, France. Marc Plantevit,

More information

Performance and Scalability: Apriori Implementa6on

Performance and Scalability: Apriori Implementa6on Performance and Scalability: Apriori Implementa6on Apriori R. Agrawal and R. Srikant. Fast algorithms for mining associa6on rules. VLDB, 487 499, 1994 Reducing Number of Comparisons Candidate coun6ng:

More information

Chapter 7: Frequent Itemsets and Association Rules

Chapter 7: Frequent Itemsets and Association Rules Chapter 7: Frequent Itemsets and Association Rules Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14 VII.1&2 1 Motivational Example Assume you run an on-line

More information

Data Mining for Knowledge Management. Association Rules

Data Mining for Knowledge Management. Association Rules 1 Data Mining for Knowledge Management Association Rules Themis Palpanas University of Trento http://disi.unitn.eu/~themis 1 Thanks for slides to: Jiawei Han George Kollios Zhenyu Lu Osmar R. Zaïane Mohammad

More information

Market baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged.

Market baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged. Frequent itemset Association&decision rule mining University of Szeged What frequent itemsets could be used for? Features/observations frequently co-occurring in some database can gain us useful insights

More information

Interestingness Measurements

Interestingness Measurements Interestingness Measurements Objective measures Two popular measurements: support and confidence Subjective measures [Silberschatz & Tuzhilin, KDD95] A rule (pattern) is interesting if it is unexpected

More information

Frequent Pattern Mining S L I D E S B Y : S H R E E J A S W A L

Frequent Pattern Mining S L I D E S B Y : S H R E E J A S W A L Frequent Pattern Mining S L I D E S B Y : S H R E E J A S W A L Topics to be covered Market Basket Analysis, Frequent Itemsets, Closed Itemsets, and Association Rules; Frequent Pattern Mining, Efficient

More information

ANU MLSS 2010: Data Mining. Part 2: Association rule mining

ANU MLSS 2010: Data Mining. Part 2: Association rule mining ANU MLSS 2010: Data Mining Part 2: Association rule mining Lecture outline What is association mining? Market basket analysis and association rule examples Basic concepts and formalism Basic rule measurements

More information

Preference-based Pattern Mining

Preference-based Pattern Mining Preference-based Pattern Mining Bruno Crémilleux, Marc Plantevit, Arnaud Soulet Nancy, France - November 16, 2017 Introduction 2/97 Who are we? Bruno Crémilleux, Professor, Univ. Caen, France. Marc Plantevit,

More information

CHAPTER 8. ITEMSET MINING 226

CHAPTER 8. ITEMSET MINING 226 CHAPTER 8. ITEMSET MINING 226 Chapter 8 Itemset Mining In many applications one is interested in how often two or more objectsofinterest co-occur. For example, consider a popular web site, which logs all

More information

Product presentations can be more intelligently planned

Product presentations can be more intelligently planned Association Rules Lecture /DMBI/IKI8303T/MTI/UI Yudho Giri Sucahyo, Ph.D, CISA (yudho@cs.ui.ac.id) Faculty of Computer Science, Objectives Introduction What is Association Mining? Mining Association Rules

More information

Chapter 6: Association Rules

Chapter 6: Association Rules Chapter 6: Association Rules Association rule mining Proposed by Agrawal et al in 1993. It is an important data mining model. Transaction data (no time-dependent) Assume all data are categorical. No good

More information

Data Mining Concepts

Data Mining Concepts Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential

More information

Fast Nearest Neighbor Search on Large Time-Evolving Graphs

Fast Nearest Neighbor Search on Large Time-Evolving Graphs Fast Nearest Neighbor Search on Large Time-Evolving Graphs Leman Akoglu Srinivasan Parthasarathy Rohit Khandekar Vibhore Kumar Deepak Rajan Kun-Lung Wu Graphs are everywhere Leman Akoglu Fast Nearest Neighbor

More information

Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges

Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Abhishek Santra 1 and Sanjukta Bhowmick 2 1 Information Technology Laboratory, CSE Department, University of

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

CS570 Introduction to Data Mining

CS570 Introduction to Data Mining CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay Partial slide credits: Li Xiong, Jiawei Han and Micheline Kamber George Kollios 1 Mining Frequent Patterns,

More information

Roadmap. PCY Algorithm

Roadmap. PCY Algorithm 1 Roadmap Frequent Patterns A-Priori Algorithm Improvements to A-Priori Park-Chen-Yu Algorithm Multistage Algorithm Approximate Algorithms Compacting Results Data Mining for Knowledge Management 50 PCY

More information

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the Chapter 6: What Is Frequent ent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc) that occurs frequently in a data set frequent itemsets and association rule

More information

Mining Trusted Information in Medical Science: An Information Network Approach

Mining Trusted Information in Medical Science: An Information Network Approach Mining Trusted Information in Medical Science: An Information Network Approach Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign Collaborated with many, especially Yizhou

More information

Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining

Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining Chloé-Agathe Azencott & Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institutes

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 6 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013-2017 Han, Kamber & Pei. All

More information

Mining Data Streams. From Data-Streams Management System Queries to Knowledge Discovery from continuous and fast-evolving Data Records.

Mining Data Streams. From Data-Streams Management System Queries to Knowledge Discovery from continuous and fast-evolving Data Records. DATA STREAMS MINING Mining Data Streams From Data-Streams Management System Queries to Knowledge Discovery from continuous and fast-evolving Data Records. Hammad Haleem Xavier Plantaz APPLICATIONS Sensors

More information

Fundamental Data Mining Algorithms

Fundamental Data Mining Algorithms 2018 EE448, Big Data Mining, Lecture 3 Fundamental Data Mining Algorithms Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html REVIEW What is Data

More information

A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets

A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets A Two-Phase Algorithm for Fast Discovery of High Utility temsets Ying Liu, Wei-keng Liao, and Alok Choudhary Electrical and Computer Engineering Department, Northwestern University, Evanston, L, USA 60208

More information

Clustering from Data Streams

Clustering from Data Streams Clustering from Data Streams João Gama LIAAD-INESC Porto, University of Porto, Portugal jgama@fep.up.pt 1 Introduction 2 Clustering Micro Clustering 3 Clustering Time Series Growing the Structure Adapting

More information

Trajectory Pattern Mining. Figures and charts are from some materials downloaded from the internet.

Trajectory Pattern Mining. Figures and charts are from some materials downloaded from the internet. Trajectory Pattern Mining Figures and charts are from some materials downloaded from the internet. Outline Spatio-temporal data types Mining trajectory patterns Spatio-temporal data types Spatial extension

More information

APPLYING BIT-VECTOR PROJECTION APPROACH FOR EFFICIENT MINING OF N-MOST INTERESTING FREQUENT ITEMSETS

APPLYING BIT-VECTOR PROJECTION APPROACH FOR EFFICIENT MINING OF N-MOST INTERESTING FREQUENT ITEMSETS APPLYIG BIT-VECTOR PROJECTIO APPROACH FOR EFFICIET MIIG OF -MOST ITERESTIG FREQUET ITEMSETS Zahoor Jan, Shariq Bashir, A. Rauf Baig FAST-ational University of Computer and Emerging Sciences, Islamabad

More information

PFPM: Discovering Periodic Frequent Patterns with Novel Periodicity Measures

PFPM: Discovering Periodic Frequent Patterns with Novel Periodicity Measures PFPM: Discovering Periodic Frequent Patterns with Novel Periodicity Measures 1 Introduction Frequent itemset mining is a popular data mining task. It consists of discovering sets of items (itemsets) frequently

More information

Big Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety

Big Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Big Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety Abhishek

More information

Data Mining: Concepts and Techniques. Chapter 5. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1

Data Mining: Concepts and Techniques. Chapter 5. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques Chapter 5 SS Chung April 5, 2013 Data Mining: Concepts and Techniques 1 Chapter 5: Mining Frequent Patterns, Association and Correlations Basic concepts and a road

More information

Knowledge Discovery in Databases II Winter Term 2015/2016. Optional Lecture: Pattern Mining & High-D Data Mining

Knowledge Discovery in Databases II Winter Term 2015/2016. Optional Lecture: Pattern Mining & High-D Data Mining Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases II Winter Term 2015/2016 Optional Lecture: Pattern Mining

More information

DATA MINING RESEARCH: RETROSPECT AND PROSPECT

DATA MINING RESEARCH: RETROSPECT AND PROSPECT DATA MINING RESEARCH: RETROSPECT AND PROSPECT Prof(Dr).V.SARAVANAN & Mr. ABDUL KHADAR JILANI Department of Computer Science College of Computer and Information Sciences Majmaah University Kingdom of Saudi

More information

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús

More information

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

What Is Data Mining? CMPT 354: Database I -- Data Mining 2 Data Mining What Is Data Mining? Mining data mining knowledge Data mining is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data CMPT

More information

Association Rule Learning

Association Rule Learning Association Rule Learning 16s1: COMP9417 Machine Learning and Data Mining School of Computer Science and Engineering, University of New South Wales March 15, 2016 COMP9417 ML & DM (CSE, UNSW) Association

More information

CS Introduction to Data Mining Instructor: Abdullah Mueen

CS Introduction to Data Mining Instructor: Abdullah Mueen CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING) Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts

More information

Acyclic Network. Tree Based Clustering. Tree Decomposition Methods

Acyclic Network. Tree Based Clustering. Tree Decomposition Methods Summary s Join Tree Importance of s Solving Topological structure defines key features for a wide class of problems CSP: Inference in acyclic network is extremely efficient (polynomial) Idea: remove cycles

More information

H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases. Paper s goals. H-mine characteristics. Why a new algorithm?

H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases. Paper s goals. H-mine characteristics. Why a new algorithm? H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases Paper s goals Introduce a new data structure: H-struct J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang Int. Conf. on Data Mining

More information

Association Rule Mining

Association Rule Mining Association Rule Mining Generating assoc. rules from frequent itemsets Assume that we have discovered the frequent itemsets and their support How do we generate association rules? Frequent itemsets: {1}

More information

Data Mining: Concepts and Techniques. Chapter Mining sequence patterns in transactional databases

Data Mining: Concepts and Techniques. Chapter Mining sequence patterns in transactional databases Data Mining: Concepts and Techniques Chapter 8 8.3 Mining sequence patterns in transactional databases Jiawei Han and Micheline Kamber Department of Computer Science University of Illinois at Urbana-Champaign

More information

Advanced Data Management

Advanced Data Management Advanced Data Management Medha Atre Office: KD-219 atrem@cse.iitk.ac.in Sept 26, 2016 defined Given a graph G(V, E) with V as the set of nodes and E as the set of edges, a reachability query asks does

More information

Lecture Note: Computation problems in social. network analysis

Lecture Note: Computation problems in social. network analysis Lecture Note: Computation problems in social network analysis Bang Ye Wu CSIE, Chung Cheng University, Taiwan September 29, 2008 In this lecture note, several computational problems are listed, including

More information

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems Data Warehousing and Data Mining CPS 116 Introduction to Database Systems Announcements (December 1) 2 Homework #4 due today Sample solution available Thursday Course project demo period has begun! Check

More information

Uncovering the Formation of Triadic Closure in Social Networks. Zhanpeng Fang and Jie Tang Tsinghua University

Uncovering the Formation of Triadic Closure in Social Networks. Zhanpeng Fang and Jie Tang Tsinghua University Uncovering the Formation of Triadic Closure in Social Networks Zhanpeng Fang and Jie Tang Tsinghua University 1 Triangle Laws Triangle is one of most basic human groups in social networks Friends of friends

More information

Triggering Patterns of Topology Changes in Dynamic Graphs*

Triggering Patterns of Topology Changes in Dynamic Graphs* Triggering Patterns of Topology Changes in Dynamic Graphs* Mehdi Kaytoue INSA-Lyon, CNRS LIRIS UMR55 F-69621, France mkaytoue@insa-lyon.fr Yoann Pitarch Université de Toulouse CNRS, IRIT UMR555 F-3171,

More information

Graph Cube: On Warehousing and OLAP Multidimensional Networks

Graph Cube: On Warehousing and OLAP Multidimensional Networks Graph Cube: On Warehousing and OLAP Multidimensional Networks Peixiang Zhao, Xiaolei Li, Dong Xin, Jiawei Han Department of Computer Science, UIUC Groupon Inc. Google Cooperation pzhao4@illinois.edu, hanj@cs.illinois.edu

More information

Constraint-based Pattern Mining in Dynamic Graphs

Constraint-based Pattern Mining in Dynamic Graphs 2009 Ninth IEEE International Conference on Data Mining Constraint-based Pattern Mining in Dynamic Graphs Céline Robardet Université de Lyon, INSA-Lyon, CNRS, LIRIS UMR 5205 F-69621 Villeurbanne, France

More information

Chapter 13, Sequence Data Mining

Chapter 13, Sequence Data Mining CSI 4352, Introduction to Data Mining Chapter 13, Sequence Data Mining Young-Rae Cho Associate Professor Department of Computer Science Baylor University Topics Single Sequence Mining Frequent sequence

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University 10/19/2017 Slides adapted from Prof. Jiawei Han @UIUC, Prof.

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank

More information

9. Conclusions. 9.1 Definition KDD

9. Conclusions. 9.1 Definition KDD 9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]

More information

Distance-based Methods: Drawbacks

Distance-based Methods: Drawbacks Distance-based Methods: Drawbacks Hard to find clusters with irregular shapes Hard to specify the number of clusters Heuristic: a cluster must be dense Jian Pei: CMPT 459/741 Clustering (3) 1 How to Find

More information

Pattern Lattice Traversal by Selective Jumps

Pattern Lattice Traversal by Selective Jumps Pattern Lattice Traversal by Selective Jumps Osmar R. Zaïane Mohammad El-Hajj Department of Computing Science, University of Alberta Edmonton, AB, Canada {zaiane, mohammad}@cs.ualberta.ca ABSTRACT Regardless

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Sequence Data: Sequential Pattern Mining Instructor: Yizhou Sun yzsun@cs.ucla.edu November 27, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification

More information

ECE 3060 VLSI and Advanced Digital Design

ECE 3060 VLSI and Advanced Digital Design ECE 3060 VLSI and Advanced Digital Design Lecture 15 Multiple-Level Logic Minimization Outline Multi-level circuit representations Minimization methods goals: area, delay, power algorithms: algebraic,

More information

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms

More information

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery : Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References q Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Mining High Average-Utility Itemsets

Mining High Average-Utility Itemsets Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Mining High Itemsets Tzung-Pei Hong Dept of Computer Science and Information Engineering

More information

Machine Learning: Symbolische Ansätze

Machine Learning: Symbolische Ansätze Machine Learning: Symbolische Ansätze Unsupervised Learning Clustering Association Rules V2.0 WS 10/11 J. Fürnkranz Different Learning Scenarios Supervised Learning A teacher provides the value for the

More information

Summary. Acyclic Networks Join Tree Clustering. Tree Decomposition Methods. Acyclic Network. Tree Based Clustering. Tree Decomposition.

Summary. Acyclic Networks Join Tree Clustering. Tree Decomposition Methods. Acyclic Network. Tree Based Clustering. Tree Decomposition. Summary s Join Tree Importance of s Solving Topological structure denes key features for a wide class of problems CSP: Inference in acyclic network is extremely ecient (polynomial) Idea: remove cycles

More information

Dimension Induced Clustering

Dimension Induced Clustering Dimension Induced Clustering Aris Gionis Alexander Hinneburg Spiros Papadimitriou Panayiotis Tsaparas HIIT, University of Helsinki Martin Luther University, Halle Carnegie Melon University HIIT, University

More information

We had several appearances of the curse of dimensionality:

We had several appearances of the curse of dimensionality: Short Recap We had several appearances of the curse of dimensionality: Concentration of distances => meaningless similarity/distance/neighborhood concept => instability of neighborhoods Growing hypothesis

More information

ERMO DG: Evolving Region Moving Object Dataset Generator. Authors: B. Aydin, R. Angryk, K. G. Pillai Presented by Micheal Schuh May 21, 2014

ERMO DG: Evolving Region Moving Object Dataset Generator. Authors: B. Aydin, R. Angryk, K. G. Pillai Presented by Micheal Schuh May 21, 2014 ERMO DG: Evolving Region Moving Object Dataset Generator Authors: B. Aydin, R. Angryk, K. G. Pillai Presented by Micheal Schuh May 21, 2014 Introduction Spatiotemporal pattern mining algorithms Motivation

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before

More information

Survey: Efficent tree based structure for mining frequent pattern from transactional databases

Survey: Efficent tree based structure for mining frequent pattern from transactional databases IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 9, Issue 5 (Mar. - Apr. 2013), PP 75-81 Survey: Efficent tree based structure for mining frequent pattern from

More information

Induction of Association Rules: Apriori Implementation

Induction of Association Rules: Apriori Implementation 1 Induction of Association Rules: Apriori Implementation Christian Borgelt and Rudolf Kruse Department of Knowledge Processing and Language Engineering School of Computer Science Otto-von-Guericke-University

More information

Basic Concepts: Association Rules. What Is Frequent Pattern Analysis? COMP 465: Data Mining Mining Frequent Patterns, Associations and Correlations

Basic Concepts: Association Rules. What Is Frequent Pattern Analysis? COMP 465: Data Mining Mining Frequent Patterns, Associations and Correlations What Is Frequent Pattern Analysis? COMP 465: Data Mining Mining Frequent Patterns, Associations and Correlations Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and

More information

Hiding Sensitive Frequent Itemsets by a Border-Based Approach

Hiding Sensitive Frequent Itemsets by a Border-Based Approach Hiding Sensitive Frequent Itemsets by a Border-Based Approach Xingzhi Sun IBM China Research Lab, Beijing, China sunxingz@cn.ibm.com Philip S. Yu IBM Watson Research Center, Hawthorne, NY, USA psyu@us.ibm.com

More information

Acyclic Network. Tree Based Clustering. Tree Decomposition Methods

Acyclic Network. Tree Based Clustering. Tree Decomposition Methods Summary s Cluster Tree Elimination Importance of s Solving Topological structure dene key features for a wide class of problems CSP: Inference in acyclic network is extremely ecient (polynomial) Idea:

More information

Graph Exploration: Taking the User into the Loop

Graph Exploration: Taking the User into the Loop Graph Exploration: Taking the User into the Loop Davide Mottin, Anja Jentzsch, Emmanuel Müller Hasso Plattner Institute, Potsdam, Germany 2016/10/24 CIKM2016, Indianapolis, US Where we are Background (5

More information

OPTIMISING ASSOCIATION RULE ALGORITHMS USING ITEMSET ORDERING

OPTIMISING ASSOCIATION RULE ALGORITHMS USING ITEMSET ORDERING OPTIMISING ASSOCIATION RULE ALGORITHMS USING ITEMSET ORDERING ES200 Peterhouse College, Cambridge Frans Coenen, Paul Leng and Graham Goulbourne The Department of Computer Science The University of Liverpool

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 6 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2011 Han, Kamber & Pei. All rights

More information

Association Rules. Charles Sutton Data Mining and Exploration Spring Based on slides by Chris Williams and Amos Storkey. Thursday, 8 March 12

Association Rules. Charles Sutton Data Mining and Exploration Spring Based on slides by Chris Williams and Amos Storkey. Thursday, 8 March 12 Association Rules Charles Sutton Data Mining and Exploration Spring 2012 Based on slides by Chris Williams and Amos Storkey The Goal Find patterns : local regularities that occur more often than you would

More information

MS-FP-Growth: A multi-support Vrsion of FP-Growth Agorithm

MS-FP-Growth: A multi-support Vrsion of FP-Growth Agorithm , pp.55-66 http://dx.doi.org/0.457/ijhit.04.7..6 MS-FP-Growth: A multi-support Vrsion of FP-Growth Agorithm Wiem Taktak and Yahya Slimani Computer Sc. Dept, Higher Institute of Arts MultiMedia (ISAMM),

More information

Data Mining Part 3. Associations Rules

Data Mining Part 3. Associations Rules Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets

More information