Céline Robardet. December 9, Habilitation à diriger des recherches

Size: px

Start display at page:

Download "Céline Robardet. December 9, Habilitation à diriger des recherches"

Byron Ford
6 years ago
Views:

1 Céline Robardet December 9, 2013 Habilitation à diriger des recherches 1 / 51

2 Introduction Habilitation à diriger des recherches Introduction 2 / 51

3 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51

4 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51

5 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51

6 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51

7 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51

8 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51

9 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51

10 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51

11 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51

12 Data mining Principle Searching for interested patterns/models in the data Data mining ingredients: a language of patterns/models some criteria a search strategy Habilitation à diriger des recherches Introduction 4 / 51

13 Data mining Principle Searching for interested patterns/models in the data Data mining ingredients: a language of patterns/models some criteria a search strategy Habilitation à diriger des recherches Introduction 4 / 51

14 Data mining algorithms Descriptive vs Predictive Pattern mining Classification Clustering Regression Apriori Decision tree Hierarchical Linear Regression FP-Growth Neural Networks Partitional Non-Linear Regression Bayesian Density-Based Logical Regression K Nearest Neighbors Support Vectors Machine Genetic Algorithm Habilitation à diriger des recherches Introduction 5 / 51

15 Data mining algorithms Exact vs Heuristic approaches Pattern mining Classification Clustering Regression Apriori Decision tree Hierarchical Linear Regression FP-Growth Neural Networks Partitional Non-Linear Regression Bayesian Density-Based Logical Regression K Nearest Neighbors Support Vectors Machine Genetic Algorithm Habilitation à diriger des recherches Introduction 5 / 51

16 Constraint-based pattern mining Developed for extracting itemsets in transaction databases Can it be used for the analysis of complex systems? Complex systems: refer to a broader class of data (transportation network, internet, gene network, social network) are composed by a large number of highly interconnected dynamical units can be modeled by dynamic attributed relational graphs Habilitation à diriger des recherches Introduction 6 / 51

17 Constraint-based pattern mining Developed for extracting itemsets in transaction databases Can it be used for the analysis of complex systems? Complex systems: refer to a broader class of data (transportation network, internet, gene network, social network) are composed by a large number of highly interconnected dynamical units can be modeled by dynamic attributed relational graphs Habilitation à diriger des recherches Introduction 6 / 51

18 Relational graphs Graphs that are often Dynamic: when vertices and edges appear/disappear through time Attributed: when another relation describes the vertices themselves Relational graph Vertices are uniquely identified through time Habilitation à diriger des recherches Introduction 7 / 51

Co-authors that published at ICDE with a high degree and a low clustering coeﬃcient Habilitation à

19 Some inductive queries What are the sub-graphs What are the vertex whose vertex attributes evolve similarly? attributes that strongly co-vary with the graph structure? Co-authors that published at ICDE with a high degree and a low clustering coeﬃcient Habilitation à diriger des recherches Airports whose arrival delays increased over the three weeks following Katrina hurricane Introduction 8 / 51

20 Outline Constraint-based pattern mining General framework Constraint properties Algorithmic principles Relational dynamic graphs Statistical description Simulation Mining attributed dynamic graphs Topological patterns in static attributed graphs Trend dynamic sub-graphs Conclusions and perspectives Habilitation à diriger des recherches Introduction 9 / 51

21 Constraint-based pattern mining Habilitation à diriger des recherches Constraint-based pattern mining 10 / 51

22 Pattern mining A Pattern φ describes a subgroup of the data D observed several times or associated with characteristic properties The pattern shape is fixed: φ L whose cartinality is exponential or infinite in the size of the data C evaluates the adequacy of the pattern to the data C(φ, D) Boolean Pattern mining task: Find all interesting subgroups Th(L, D, C) = {φ L C(φ, D) is true } Exact approach Habilitation à diriger des recherches Constraint-based pattern mining 11 / 51

23 Pattern mining A Pattern φ describes a subgroup of the data D observed several times or associated with characteristic properties The pattern shape is fixed: φ L whose cartinality is exponential or infinite in the size of the data C evaluates the adequacy of the pattern to the data C(φ, D) Boolean Pattern mining task: Find all interesting subgroups Th(L, D, C) = {φ L C(φ, D) is true } Exact approach Habilitation à diriger des recherches Constraint-based pattern mining 11 / 51

24 Pattern constraints Constraints are needed for: only retrieving patterns that describe an interesting subgroup of the data making the extraction feasible Constraint properties are used to infer constraint values on (many) patterns without having to evaluate them individually They are defined up to the partial order used for listing the patterns Habilitation à diriger des recherches Constraint-based pattern mining 12 / 51

25 Pattern constraints Constraints are needed for: only retrieving patterns that describe an interesting subgroup of the data making the extraction feasible Constraint properties are used to infer constraint values on (many) patterns without having to evaluate them individually They are defined up to the partial order used for listing the patterns Habilitation à diriger des recherches Constraint-based pattern mining 12 / 51

26 Pattern constraints Constraints are needed for: only retrieving patterns that describe an interesting subgroup of the data making the extraction feasible Constraint properties are used to infer constraint values on (many) patterns without having to evaluate them individually They are defined up to the partial order used for listing the patterns Habilitation à diriger des recherches Constraint-based pattern mining 12 / 51

27 Constraint properties - 1 Monotone constraint φ 1 φ 2, C(φ 1, D) C(φ 2, D) abcde Anti-monotone constraint φ 1 φ 2, C(φ 2, D) C(φ 1, D) abcde abcd abce abde acde bcde abcd abce abde acde bcde abc abd abe acd ace ade bcd bce bde cde abc abd abe acd ace ade bcd bce bde cde ab ac ad ae bc bd be cd ce de ab ac ad ae bc bd be cd ce de a b c d e a b c d e C(φ, D) b φ c φ C(φ, D) a φ c φ Habilitation à diriger des recherches Constraint-based pattern mining 13 / 51

28 Constraint properties - 2 Convertible constraints Loose AM constraints is extended to the prefix order so C(φ, D) e φ : C(φ \ {e}, D) that φ 1 φ 2, C(φ 2, D) C(φ 1, D) abcde abcde abcd abce abde acde bcde abcd abce abde acde bcde abc abd abe acd ace ade bcd bce bde cde abc abd abe acd ace ade bcd bce bde cde ab ac ad ae bc bd be cd ce de ab ac ad ae bc bd be cd ce de a b c d e a b c d e C(φ, w) avg(w(φ)) > σ w(a) w(b) w(c) w(d) w(e) C(φ, w) var(w(φ)) σ Pei and Han Bonchi and Lucchese Habilitation à diriger des recherches Constraint-based pattern mining 14 / 51

29 Enumeration strategy Binary partition: the element 'a' is enumerated abcde abcd abce abde acde bcde abc abd abe acd ace ade bcd bce bde cde ab ac ad ae bc bd be cd ce de a b c d e Habilitation à diriger des recherches Constraint-based pattern mining 15 / 51

30 Enumeration strategy Binary partition: the element 'a' is enumerated R R a R \ R R {a} R \ {a} R R Habilitation à diriger des recherches Constraint-based pattern mining 15 / 51

31 Constraint evaluation Monotone constraint R C(R, D) is false Anti-monotone constraint R empty empty R R C(R, D) is false Habilitation à diriger des recherches Constraint-based pattern mining 16 / 51

32 Constraint evaluation Monotone constraint R C(R, D) is false Anti-monotone constraint R empty empty R R C(R, D) is false Habilitation à diriger des recherches Constraint-based pattern mining 16 / 51

33 Constraint evaluation Monotone constraint R C(R, D) is false Anti-monotone constraint R empty empty R R C(R, D) is false Habilitation à diriger des recherches Constraint-based pattern mining 16 / 51

34 Constraint evaluation Monotone constraint R C(R, D) is false Anti-monotone constraint R empty empty R R C(R, D) is false Habilitation à diriger des recherches Constraint-based pattern mining 16 / 51

35 A new class of constraints Piecewise monotone and anti-monotone constraints 1 C involves p times the pattern φ: C(φ, D) = f(φ 1, φ p, D) 2 f i,φ (x) = (φ 1, φ i 1, x, φ i+1,, φ p, D) 3 i = 1 p, f i,φ is either monotone or anti-monotone: x y, { fi,φ (x) f i,φ (y) iff f i,φ is monotone f i,φ (y) f i,φ (x) iff f i,φ is anti-monotone L Cerf, J Besson, C Robardet, J-F Boulicaut Habilitation à diriger des recherches Constraint-based pattern mining 17 / 51

36 An example e, w(e) 0 C(φ, w) avg(w(φ)) > σ e φ w(e) φ > σ C(φ, D) is piecewise monotone and anti-monotone with e φ f(φ 1, φ 2, D) = 1 w(e) φ 2 x y, f 1,φ is monotone: f(x, φ 2, D) = f 2,φ is anti-monotone: f(φ 1, y, D) = e φ w(e) 1 y > σ e x w(e) e y φ 2 > σ w(e) φ 2 > σ e φ w(e) 1 x > σ Habilitation à diriger des recherches Constraint-based pattern mining 18 / 51

37 Piecewise constraint exploitation R Evaluation If f(r, R, D) = then R is empty e R w(e) R σ empty Propagation R e R \ R such that f(r \ {e}, R, D) σ, then e is moved in R e R \ R such that f(r, R {e}, D) σ, then e is removed from R Habilitation à diriger des recherches Constraint-based pattern mining 19 / 51

38 Piecewise constraint exploitation R Evaluation If f(r, R, D) = then R is empty e R w(e) R σ empty Propagation R e R \ R such that f(r \ {e}, R, D) σ, then e is moved in R e R \ R such that f(r, R {e}, D) σ, then e is removed from R Habilitation à diriger des recherches Constraint-based pattern mining 19 / 51

39 Algorithmic principles Function Generic_CBPM_enumeration(R, R ) 1: if Check_constraints(R, R ) then 2: (R, R ) Constraint_Propagation(R, R ) 3: if R = R then 4: output R 5: else 6: for all e R \ R do 7: Generic_CBPM_Enumeration(R {e}, R ) 8: Generic_CBPM_Enumeration(R, R \ {e}) 9: end for 10: end if 11: end if Habilitation à diriger des recherches Constraint-based pattern mining 20 / 51

40 Case studies Mining of Formal concepts [IDA journal 05] Fault-tolerant patterns [KDID 05, ICCS 06] Closed patterns in n-ary relations [SDM 08] Parallel episodes [SDM 09] Subspace clustering [SDM 09] Evolution patterns in dynamic graphs [ICDM 09] Topological patterns in static attributed graphs [ICDM 12 (demo), TKDE 13] Trend dynamic sub-graphs [DS 12, PKDD 13] Habilitation à diriger des recherches Constraint-based pattern mining 21 / 51

41 Relational dynamic graphs Habilitation à diriger des recherches Relational dynamic graphs 22 / 51

42 Complex network analysis A large set of tools can be used on any static complex network Such networks exhibit non-trivial topological features Boccaletti, Stefano, Vito Latora, Yamir Moreno, Martin Chavez and D-U Hwang Habilitation à diriger des recherches Relational dynamic graphs 23 / 51

Complex network analysis A large set of tools can be used on any static complex network Such networks exhibit non-trivial topological features Assortative

43 Complex network analysis A large set of tools can be used on any static complex network Such networks exhibit non-trivial topological features Assortative Disassortative Boccaletti, Stefano, Vito Latora, Yamir Moreno, Martin Chavez and D-U Hwang Habilitation à diriger des recherches Relational dynamic graphs 23 / 51

44 Complex network analysis A large set of tools can be used on any static complex network Such networks exhibit non-trivial topological features Boccaletti, Stefano, Vito Latora, Yamir Moreno, Martin Chavez and D-U Hwang Habilitation à diriger des recherches Relational dynamic graphs 23 / 51

45 Dynamic complex networks Generic technique Identify the rules that govern the network evolution Generate random dynamic networks Means Statistical analysis of graph properties as functions of time Study of the evolution of some structures A Scherrer, P Borgnat, E Fleury, JL Guillaume, C Robardet Habilitation à diriger des recherches Relational dynamic graphs 24 / 51

46 Sensor mobility networks 41 persons kept a sensor during 3 days: Vertices are human beings Edges stand for wireless communication Habilitation à diriger des recherches Relational dynamic graphs 25 / 51

47 Properties as function of time High correlation coefficient in time between E(t): the number of edges V(t): the number of connected vertices D(t): the average degree N c (t): the number of connected components Edge creation and deletion processes are memory-less and have a low correlation coefficient with other properties Contact and inter-contact periods are distributed as power-laws P[X>x] P[X>x] Contact Duration Inter contact Duration Habilitation à diriger des recherches Relational dynamic graphs 26 / 51

48 Dynamic structural properties Instability of connected components Number of components Total lifetime Total lifetime Number of vertices Triangle creations: 40% of link creations increase the number of triangles 6% of inactive links would create a triangle if activated Edges whose endpoints have a common neighbors are more likely to be activated Habilitation à diriger des recherches Relational dynamic graphs 27 / 51

49 Simulation of dynamic graphs Algorithm ingredients each edge is simulated independently transition probabilities are independent in time the empirical (power law) contact and inter-contact period distributions are imposed the transition probability is weighted by: the distribution of the number of connected components the dynamics of triangles Habilitation à diriger des recherches Relational dynamic graphs 28 / 51

50 Validation Qualitative summaries of the trajectories of vertices among few identified structures of the graph are very similar: Original data Simulated data Habilitation à diriger des recherches Relational dynamic graphs 29 / 51

51 Mining attributed dynamic graphs Habilitation à diriger des recherches Mining attributed dynamic graphs 30 / 51

52 Mining attributed static graphs Composed by two relations Are these two relations linked? Do the attribute values depend on the role played by the vertex in the graph? A new pattern domain that identifies sets of vertex attributes that co-vary with the graph structure the graph structure is captured by topological properties A Prado, M Plantevit, C Robardet, J-F Boulicaut Habilitation à diriger des recherches Mining attributed dynamic graphs 31 / 51

53 Topological patterns Topological properties derived from the edges of the graph Direct neighborhood degree, clustering Connection to all other vertices Centrality measures Topological patterns M: set of all properties M {+, } m s (m, s) M {+, } L = 2 M {+, } h 9 i t B 1 h 0 h 2 i t 7 2 i t A C 0 h 17 P 0 h 2 i t h 1 D h 25 i t 52 9 N 0 73 i t 6 1 i t O 166 h 47 h 32 E i t 0 8 i t 1 12 h h 13 M I F 12 i t 53 0 h 7 i t 25 7 i t H h 8 0 h J 3 L 0 1 h 6 G K 0 i t 4 15 i t 5 3 h i t 45 4 i t 3 5 Habilitation à diriger des recherches Mining attributed dynamic graphs 32 / 51

54 Topological constraint Sets of properties that behave in a similar manner for a large number of vertex pairs Kendall tau rank correlation coefficient based on the ranks of pattern property values counts the number of vertex pairs that are ordered similarly on all pattern properties Age Income Habilitation à diriger des recherches Mining attributed dynamic graphs 33 / 51

55 Constraint properties Let P L, Supp τ (P) = {(u,v) V2 m s P:m(u) sm(v)} ( 2) n s { < when s is + > when s is C topo is anti-monotone and n is the number of vertices C topo (P, D) Supp τ (P) σ Redundant symmetrical patterns with Supp τ (A +, B ) = Supp τ (A, B + ) Support computation quadratic on the number of vertices: An upper bound to avoid, in linear time, some support computation ECLAT mining algorithm with range trees to ease the search of supporting vertices Habilitation à diriger des recherches Mining attributed dynamic graphs 34 / 51

56 Constraint properties Let P L, Supp τ (P) = {(u,v) V2 m s P:m(u) sm(v)} ( 2) n s { < when s is + > when s is C topo is anti-monotone and n is the number of vertices C topo (P, D) Supp τ (P) σ Redundant symmetrical patterns with Supp τ (A +, B ) = Supp τ (A, B + ) Support computation quadratic on the number of vertices: An upper bound to avoid, in linear time, some support computation ECLAT mining algorithm with range trees to ease the search of supporting vertices Habilitation à diriger des recherches Mining attributed dynamic graphs 34 / 51

57 Results on DBLP - 1 Co-author graph (DBLP) vertices edges (a common publication) 29 local attributes (number of publications in Database and data mining conferences 9 topological properties Some Results {SAC +, ECML/PKDD } {SAC +, KDD }, {SAC +, VLDB } {SAC +, PageRank } Is publishing to SAC penalizing? Of course not! SAC has a broader scope than DB and DM Habilitation à diriger des recherches Mining attributed dynamic graphs 35 / 51

58 Results on DBLP - 1 Co-author graph (DBLP) vertices edges (a common publication) 29 local attributes (number of publications in Database and data mining conferences 9 topological properties Some Results {SAC +, ECML/PKDD } {SAC +, KDD }, {SAC +, VLDB } {SAC +, PageRank } Is publishing to SAC penalizing? Of course not! SAC has a broader scope than DB and DM Habilitation à diriger des recherches Mining attributed dynamic graphs 35 / 51

59 Results on DBLP - 1 Co-author graph (DBLP) vertices edges (a common publication) 29 local attributes (number of publications in Database and data mining conferences 9 topological properties Some Results {SAC +, ECML/PKDD } {SAC +, KDD }, {SAC +, VLDB } {SAC +, PageRank } Is publishing to SAC penalizing? Of course not! SAC has a broader scope than DB and DM Habilitation à diriger des recherches Mining attributed dynamic graphs 35 / 51

60 Results on DBLP - 2 Emerging patterns wrt the graph structure Which conferences are more related to Deg+ and Between+? Rank Prk+ Deg+ Publication Factor ECML/PKDD IEEE TKDE PAKDD 221 DASFAA ICDM 195 Prk+ Deg+ ECML/PKDD+ Christos Faloutsos Jiawei Han Philip S Yu Bing Liu C Lee Giles Prk+ Between+ Publication Factor PVLDB EDBT VLDB J SIGMOD 425 ICDE+ 342 Prk+ Between+ PVLDB+ Gerhard Weikum Jiawei Han David Maier Philip S Yu Hector Garcia-Molina PVLDB+, Deg+, Between+ Habilitation à diriger des recherches Mining attributed dynamic graphs 36 / 51

61 Dynamic attributed graph a1 a2 a A B a1 a2 a a1 a2 a A B a1 a2 a a1 a2 a A B a1 a2 a a1 a2 a a1 a2 a3 F C a1 a2 a a1 a2 a3 F C a1 a2 a a1 a2 a3 F C a1 a2 a E D a1 a2 a a1 a2 a E D a1 a2 a a1 a2 a E D a1 a2 a ti 1 ti ti+1 ti+2 ti+3 The data {G t = (V, E t ), t = 1,, t max } and A a set of ordinal attributes: a i : V T D i, with D i the domain of a i Habilitation à diriger des recherches Mining attributed dynamic graphs 37 / 51

62 Trend Dynamic Sub-graphs What are the sub-graphs whose vertex attributes evolve similarly? Mining maximal sub-graphs that satisfy some constraints on the graph topology and on the attribute values The connectivity of the dynamic subgraphs is constrained by a maximum diameter value To be more robust towards intrinsic inter-individual variability, we do not compare raw numerical values, but their trends E Desmier, M Plantevit, C Robardet, J-F Boulicaut Habilitation à diriger des recherches Mining attributed dynamic graphs 38 / 51

63 Language of patterns L = {(U, S, Ω) U V, S = t 1,, t s, Ω A {+, }} Example: ({A, C, D}, t 0, t 1, {a + 1,a 3 }) L a1 0 a1 3 a1 0 a2 3 a2 5 a2 2 a3 5 a3 3 a3 1 A B a1 2 a2 4 a3 5 a1 a2 a3 F C E D a1 2 a2 1 a3 4 a1 1 a1 5 a2 3 a1 2 a2 2 a3 4 a2 0 a3 1 F a3 0 A E a1 2 B C D a1 4 a2 1 a1 3 a2 2 a3 0 a2 2 a3 1 a3 4 a1 1 a1 6 a1 3 a2 3 a2 3 a2 1 a3 2 a3 0 a3 3 F A E a1 2 B C D a1 5 a2 1 a1 5 a2 1 a3 0 a2 4 a3 0 a3 3 t0 t1 t2 Habilitation à diriger des recherches Mining attributed dynamic graphs 39 / 51

64 Definition of constraints A pattern (U, S, Ω) satisfies C diam ((U, S, Ω), D) diameter Gt(U) k, t S 1 Diameter C trend ((U, S, Ω), D) u U, t S, (a, m) Ω { a(u, t) < a(u, t + 1), if m = + a(u, t) > a(u, t + 1), if m = C max ((U, S, Ω), D) U, S and Ω cannot be enlarged without invalidating one or both of the above constraints Habilitation à diriger des recherches Mining attributed dynamic graphs 40 / 51

65 Constraint properties C trend ((U, S, Ω), D) is anti-monotone C diam ((U, S, Ω), D) is piecewise monotone: diameter Gt (U) k max v,w U d G t (U)(v, w) k f(u 1, U 2, D) = max v,w U 1 d Gt(U 2 )(v, w) f 1,U is anti-monotone ie if it is satisfied on U 1, it is also satisfied for any of its subsets; f 2,U is monotone ie if it is satisfied on G t (U 2 ), then, adding some vertices and edges to G t (U 2 ) will not increase its value We can use the propagation mechanisms C max ((U, S, Ω), D) is pushed using specific mechanisms Habilitation à diriger des recherches Mining attributed dynamic graphs 41 / 51

during another week Substitutions ﬂights were provided from these airports during this period The hurricane strongly inﬂuenced the domestic

66 Katrina hurricane results Top pattern wrt time speciﬁcity (in red) Top pattern wrt trend speciﬁcity (Yellow) 71 airports whose arrival delays increase over 3 weeks 5 airports whose number of ﬂights increased over 3 weeks Arrival delays never increased in these airports during another week Substitutions ﬂights were provided from these airports during this period The hurricane strongly inﬂuenced the domestic ﬂight punctuality This behavior is rather rare in the rest of the graph Habilitation à diriger des recherches Mining attributed dynamic graphs 42 / 51

67 Conclusions and perspectives Habilitation à diriger des recherches Conclusions and perspectives 43 / 51

68 Conclusion Constraint-based pattern mining framework can be used to analyze dynamic graphs Patterns combine information about the topological structure, the vertex attribute values and their tendencies A wide variety of data can be mined Provide new insights on the techniques that can be used to analyze dynamic graphs Habilitation à diriger des recherches Conclusions and perspectives 44 / 51

69 Challenges and perspectives - 1 Enhancing the quality of the extracted information by capturing changes in the graph structure while allowing changes in trend direction Combining sub-graph and sequence mining Habilitation à diriger des recherches Conclusions and perspectives 45 / 51

70 Challenges and perspectives - 2 Avoiding the known drawbacks of pattern mining approaches How to get rid of redundant patterns? How to fix the threshold values? Returning patterns whose constraint values are surprising with respect to what could be expected from randomized data with respect to what happened in the past (real-time data mining) Trigger alerts when changes in behavior Habilitation à diriger des recherches Conclusions and perspectives 46 / 51

71 Challenges and perspectives - 2 Avoiding the known drawbacks of pattern mining approaches How to get rid of redundant patterns? How to fix the threshold values? Returning patterns whose constraint values are surprising with respect to what could be expected from randomized data with respect to what happened in the past (real-time data mining) Trigger alerts when changes in behavior Habilitation à diriger des recherches Conclusions and perspectives 46 / 51

t 6 E Patterns freq length area ABCDEF 2 6 12 AB 3 2 6 AC 3 2 6 A 4 1 4 ABCD 2 4 8 C 3 1 3 A Soulet, C

72 Challenges and perspectives - 3 Pattern without threshold values: Skyline patterns Retrieves the patterns that are not dominated by any other pattern Tid Items t 1 A B C D E F t 2 A B C D E F t 3 A B t 4 D t 5 A C t 6 E Patterns freq length area ABCDEF AB AC A ABCD C A Soulet, C Raissi, M Plantevit, B Cremilleux Habilitation à diriger des recherches Conclusions and perspectives 47 / 51

73 Ongoing research projects Vél'innov - ANR INOV labex IMU project Constraint-based pattern mining of Vélo'V data Characterizing the usage of the bicycle sharing system Modeling the system to simulate the impacts of local changes (in the city or in the system) Pitfall: Vertex attributes are static (INSEE data) or derived from the edges (eg, number of users of a considered category) Searching for relationships between attributes and the graph structure does not make sense Envisioned solutions: Considering the user trajectories and looking for paths that characterize a user group Considering an attributed static graph with temporal usage vectors associated to edges and seeking for sub-graphs homogeneous on vertex and edges attributes Habilitation à diriger des recherches Conclusions and perspectives 48 / 51

74 Upcoming research projects - 1 GRAISearch: enhancing a location-based social media search engine FP7-PEOPLE-2013-IAPP: Industry-Academia Partnerships and Pathways Benefits for DM2L team 40 researcher months Access to data with high added value An industrial cooperation with critical potential benefits for the company Habilitation à diriger des recherches Conclusions and perspectives 49 / 51

75 Upcoming research projects - 2 GRAISearch: Research tasks Geo-localized demographic ﬂow prediction using geo-tagged social media uploaded data Geo-localized event detection algorithm Integration into a geo-located social media recommender system Habilitation à diriger des recherches Conclusions and perspectives 50 / 51

76 Acknowledgments Habilitation à diriger des recherches Conclusions and perspectives 51 / 51

Discret event trace analysis: from an automatical to an interactive process

Discret event trace analysis: from an automatical to an interactive process Céline Robardet Lundi 12 mai 2014 SILEX seminar 1 / 43 Data Mining: a KDD step Making sense of voluminous data Elicit hypothesis