Céline Robardet. December 9, Habilitation à diriger des recherches
|
|
- Byron Ford
- 6 years ago
- Views:
Transcription
1 Céline Robardet December 9, 2013 Habilitation à diriger des recherches 1 / 51
2 Introduction Habilitation à diriger des recherches Introduction 2 / 51
3 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51
4 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51
5 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51
6 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51
7 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51
8 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51
9 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51
10 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51
11 KDD: what is it? Making sense of voluminous data Elicit hypothesis Habilitation à diriger des recherches Introduction 3 / 51
12 Data mining Principle Searching for interested patterns/models in the data Data mining ingredients: a language of patterns/models some criteria a search strategy Habilitation à diriger des recherches Introduction 4 / 51
13 Data mining Principle Searching for interested patterns/models in the data Data mining ingredients: a language of patterns/models some criteria a search strategy Habilitation à diriger des recherches Introduction 4 / 51
14 Data mining algorithms Descriptive vs Predictive Pattern mining Classification Clustering Regression Apriori Decision tree Hierarchical Linear Regression FP-Growth Neural Networks Partitional Non-Linear Regression Bayesian Density-Based Logical Regression K Nearest Neighbors Support Vectors Machine Genetic Algorithm Habilitation à diriger des recherches Introduction 5 / 51
15 Data mining algorithms Exact vs Heuristic approaches Pattern mining Classification Clustering Regression Apriori Decision tree Hierarchical Linear Regression FP-Growth Neural Networks Partitional Non-Linear Regression Bayesian Density-Based Logical Regression K Nearest Neighbors Support Vectors Machine Genetic Algorithm Habilitation à diriger des recherches Introduction 5 / 51
16 Constraint-based pattern mining Developed for extracting itemsets in transaction databases Can it be used for the analysis of complex systems? Complex systems: refer to a broader class of data (transportation network, internet, gene network, social network) are composed by a large number of highly interconnected dynamical units can be modeled by dynamic attributed relational graphs Habilitation à diriger des recherches Introduction 6 / 51
17 Constraint-based pattern mining Developed for extracting itemsets in transaction databases Can it be used for the analysis of complex systems? Complex systems: refer to a broader class of data (transportation network, internet, gene network, social network) are composed by a large number of highly interconnected dynamical units can be modeled by dynamic attributed relational graphs Habilitation à diriger des recherches Introduction 6 / 51
18 Relational graphs Graphs that are often Dynamic: when vertices and edges appear/disappear through time Attributed: when another relation describes the vertices themselves Relational graph Vertices are uniquely identified through time Habilitation à diriger des recherches Introduction 7 / 51
19 Some inductive queries What are the sub-graphs What are the vertex whose vertex attributes evolve similarly? attributes that strongly co-vary with the graph structure? Co-authors that published at ICDE with a high degree and a low clustering coefficient Habilitation à diriger des recherches Airports whose arrival delays increased over the three weeks following Katrina hurricane Introduction 8 / 51
20 Outline Constraint-based pattern mining General framework Constraint properties Algorithmic principles Relational dynamic graphs Statistical description Simulation Mining attributed dynamic graphs Topological patterns in static attributed graphs Trend dynamic sub-graphs Conclusions and perspectives Habilitation à diriger des recherches Introduction 9 / 51
21 Constraint-based pattern mining Habilitation à diriger des recherches Constraint-based pattern mining 10 / 51
22 Pattern mining A Pattern φ describes a subgroup of the data D observed several times or associated with characteristic properties The pattern shape is fixed: φ L whose cartinality is exponential or infinite in the size of the data C evaluates the adequacy of the pattern to the data C(φ, D) Boolean Pattern mining task: Find all interesting subgroups Th(L, D, C) = {φ L C(φ, D) is true } Exact approach Habilitation à diriger des recherches Constraint-based pattern mining 11 / 51
23 Pattern mining A Pattern φ describes a subgroup of the data D observed several times or associated with characteristic properties The pattern shape is fixed: φ L whose cartinality is exponential or infinite in the size of the data C evaluates the adequacy of the pattern to the data C(φ, D) Boolean Pattern mining task: Find all interesting subgroups Th(L, D, C) = {φ L C(φ, D) is true } Exact approach Habilitation à diriger des recherches Constraint-based pattern mining 11 / 51
24 Pattern constraints Constraints are needed for: only retrieving patterns that describe an interesting subgroup of the data making the extraction feasible Constraint properties are used to infer constraint values on (many) patterns without having to evaluate them individually They are defined up to the partial order used for listing the patterns Habilitation à diriger des recherches Constraint-based pattern mining 12 / 51
25 Pattern constraints Constraints are needed for: only retrieving patterns that describe an interesting subgroup of the data making the extraction feasible Constraint properties are used to infer constraint values on (many) patterns without having to evaluate them individually They are defined up to the partial order used for listing the patterns Habilitation à diriger des recherches Constraint-based pattern mining 12 / 51
26 Pattern constraints Constraints are needed for: only retrieving patterns that describe an interesting subgroup of the data making the extraction feasible Constraint properties are used to infer constraint values on (many) patterns without having to evaluate them individually They are defined up to the partial order used for listing the patterns Habilitation à diriger des recherches Constraint-based pattern mining 12 / 51
27 Constraint properties - 1 Monotone constraint φ 1 φ 2, C(φ 1, D) C(φ 2, D) abcde Anti-monotone constraint φ 1 φ 2, C(φ 2, D) C(φ 1, D) abcde abcd abce abde acde bcde abcd abce abde acde bcde abc abd abe acd ace ade bcd bce bde cde abc abd abe acd ace ade bcd bce bde cde ab ac ad ae bc bd be cd ce de ab ac ad ae bc bd be cd ce de a b c d e a b c d e C(φ, D) b φ c φ C(φ, D) a φ c φ Habilitation à diriger des recherches Constraint-based pattern mining 13 / 51
28 Constraint properties - 2 Convertible constraints Loose AM constraints is extended to the prefix order so C(φ, D) e φ : C(φ \ {e}, D) that φ 1 φ 2, C(φ 2, D) C(φ 1, D) abcde abcde abcd abce abde acde bcde abcd abce abde acde bcde abc abd abe acd ace ade bcd bce bde cde abc abd abe acd ace ade bcd bce bde cde ab ac ad ae bc bd be cd ce de ab ac ad ae bc bd be cd ce de a b c d e a b c d e C(φ, w) avg(w(φ)) > σ w(a) w(b) w(c) w(d) w(e) C(φ, w) var(w(φ)) σ Pei and Han Bonchi and Lucchese Habilitation à diriger des recherches Constraint-based pattern mining 14 / 51
29 Enumeration strategy Binary partition: the element 'a' is enumerated abcde abcd abce abde acde bcde abc abd abe acd ace ade bcd bce bde cde ab ac ad ae bc bd be cd ce de a b c d e Habilitation à diriger des recherches Constraint-based pattern mining 15 / 51
30 Enumeration strategy Binary partition: the element 'a' is enumerated R R a R \ R R {a} R \ {a} R R Habilitation à diriger des recherches Constraint-based pattern mining 15 / 51
31 Constraint evaluation Monotone constraint R C(R, D) is false Anti-monotone constraint R empty empty R R C(R, D) is false Habilitation à diriger des recherches Constraint-based pattern mining 16 / 51
32 Constraint evaluation Monotone constraint R C(R, D) is false Anti-monotone constraint R empty empty R R C(R, D) is false Habilitation à diriger des recherches Constraint-based pattern mining 16 / 51
33 Constraint evaluation Monotone constraint R C(R, D) is false Anti-monotone constraint R empty empty R R C(R, D) is false Habilitation à diriger des recherches Constraint-based pattern mining 16 / 51
34 Constraint evaluation Monotone constraint R C(R, D) is false Anti-monotone constraint R empty empty R R C(R, D) is false Habilitation à diriger des recherches Constraint-based pattern mining 16 / 51
35 A new class of constraints Piecewise monotone and anti-monotone constraints 1 C involves p times the pattern φ: C(φ, D) = f(φ 1, φ p, D) 2 f i,φ (x) = (φ 1, φ i 1, x, φ i+1,, φ p, D) 3 i = 1 p, f i,φ is either monotone or anti-monotone: x y, { fi,φ (x) f i,φ (y) iff f i,φ is monotone f i,φ (y) f i,φ (x) iff f i,φ is anti-monotone L Cerf, J Besson, C Robardet, J-F Boulicaut Habilitation à diriger des recherches Constraint-based pattern mining 17 / 51
36 An example e, w(e) 0 C(φ, w) avg(w(φ)) > σ e φ w(e) φ > σ C(φ, D) is piecewise monotone and anti-monotone with e φ f(φ 1, φ 2, D) = 1 w(e) φ 2 x y, f 1,φ is monotone: f(x, φ 2, D) = f 2,φ is anti-monotone: f(φ 1, y, D) = e φ w(e) 1 y > σ e x w(e) e y φ 2 > σ w(e) φ 2 > σ e φ w(e) 1 x > σ Habilitation à diriger des recherches Constraint-based pattern mining 18 / 51
37 Piecewise constraint exploitation R Evaluation If f(r, R, D) = then R is empty e R w(e) R σ empty Propagation R e R \ R such that f(r \ {e}, R, D) σ, then e is moved in R e R \ R such that f(r, R {e}, D) σ, then e is removed from R Habilitation à diriger des recherches Constraint-based pattern mining 19 / 51
38 Piecewise constraint exploitation R Evaluation If f(r, R, D) = then R is empty e R w(e) R σ empty Propagation R e R \ R such that f(r \ {e}, R, D) σ, then e is moved in R e R \ R such that f(r, R {e}, D) σ, then e is removed from R Habilitation à diriger des recherches Constraint-based pattern mining 19 / 51
39 Algorithmic principles Function Generic_CBPM_enumeration(R, R ) 1: if Check_constraints(R, R ) then 2: (R, R ) Constraint_Propagation(R, R ) 3: if R = R then 4: output R 5: else 6: for all e R \ R do 7: Generic_CBPM_Enumeration(R {e}, R ) 8: Generic_CBPM_Enumeration(R, R \ {e}) 9: end for 10: end if 11: end if Habilitation à diriger des recherches Constraint-based pattern mining 20 / 51
40 Case studies Mining of Formal concepts [IDA journal 05] Fault-tolerant patterns [KDID 05, ICCS 06] Closed patterns in n-ary relations [SDM 08] Parallel episodes [SDM 09] Subspace clustering [SDM 09] Evolution patterns in dynamic graphs [ICDM 09] Topological patterns in static attributed graphs [ICDM 12 (demo), TKDE 13] Trend dynamic sub-graphs [DS 12, PKDD 13] Habilitation à diriger des recherches Constraint-based pattern mining 21 / 51
41 Relational dynamic graphs Habilitation à diriger des recherches Relational dynamic graphs 22 / 51
42 Complex network analysis A large set of tools can be used on any static complex network Such networks exhibit non-trivial topological features Boccaletti, Stefano, Vito Latora, Yamir Moreno, Martin Chavez and D-U Hwang Habilitation à diriger des recherches Relational dynamic graphs 23 / 51
43 Complex network analysis A large set of tools can be used on any static complex network Such networks exhibit non-trivial topological features Assortative Disassortative Boccaletti, Stefano, Vito Latora, Yamir Moreno, Martin Chavez and D-U Hwang Habilitation à diriger des recherches Relational dynamic graphs 23 / 51
44 Complex network analysis A large set of tools can be used on any static complex network Such networks exhibit non-trivial topological features Boccaletti, Stefano, Vito Latora, Yamir Moreno, Martin Chavez and D-U Hwang Habilitation à diriger des recherches Relational dynamic graphs 23 / 51
45 Dynamic complex networks Generic technique Identify the rules that govern the network evolution Generate random dynamic networks Means Statistical analysis of graph properties as functions of time Study of the evolution of some structures A Scherrer, P Borgnat, E Fleury, JL Guillaume, C Robardet Habilitation à diriger des recherches Relational dynamic graphs 24 / 51
46 Sensor mobility networks 41 persons kept a sensor during 3 days: Vertices are human beings Edges stand for wireless communication Habilitation à diriger des recherches Relational dynamic graphs 25 / 51
47 Properties as function of time High correlation coefficient in time between E(t): the number of edges V(t): the number of connected vertices D(t): the average degree N c (t): the number of connected components Edge creation and deletion processes are memory-less and have a low correlation coefficient with other properties Contact and inter-contact periods are distributed as power-laws P[X>x] P[X>x] Contact Duration Inter contact Duration Habilitation à diriger des recherches Relational dynamic graphs 26 / 51
48 Dynamic structural properties Instability of connected components Number of components Total lifetime Total lifetime Number of vertices Triangle creations: 40% of link creations increase the number of triangles 6% of inactive links would create a triangle if activated Edges whose endpoints have a common neighbors are more likely to be activated Habilitation à diriger des recherches Relational dynamic graphs 27 / 51
49 Simulation of dynamic graphs Algorithm ingredients each edge is simulated independently transition probabilities are independent in time the empirical (power law) contact and inter-contact period distributions are imposed the transition probability is weighted by: the distribution of the number of connected components the dynamics of triangles Habilitation à diriger des recherches Relational dynamic graphs 28 / 51
50 Validation Qualitative summaries of the trajectories of vertices among few identified structures of the graph are very similar: Original data Simulated data Habilitation à diriger des recherches Relational dynamic graphs 29 / 51
51 Mining attributed dynamic graphs Habilitation à diriger des recherches Mining attributed dynamic graphs 30 / 51
52 Mining attributed static graphs Composed by two relations Are these two relations linked? Do the attribute values depend on the role played by the vertex in the graph? A new pattern domain that identifies sets of vertex attributes that co-vary with the graph structure the graph structure is captured by topological properties A Prado, M Plantevit, C Robardet, J-F Boulicaut Habilitation à diriger des recherches Mining attributed dynamic graphs 31 / 51
53 Topological patterns Topological properties derived from the edges of the graph Direct neighborhood degree, clustering Connection to all other vertices Centrality measures Topological patterns M: set of all properties M {+, } m s (m, s) M {+, } L = 2 M {+, } h 9 i t B 1 h 0 h 2 i t 7 2 i t A C 0 h 17 P 0 h 2 i t h 1 D h 25 i t 52 9 N 0 73 i t 6 1 i t O 166 h 47 h 32 E i t 0 8 i t 1 12 h h 13 M I F 12 i t 53 0 h 7 i t 25 7 i t H h 8 0 h J 3 L 0 1 h 6 G K 0 i t 4 15 i t 5 3 h i t 45 4 i t 3 5 Habilitation à diriger des recherches Mining attributed dynamic graphs 32 / 51
54 Topological constraint Sets of properties that behave in a similar manner for a large number of vertex pairs Kendall tau rank correlation coefficient based on the ranks of pattern property values counts the number of vertex pairs that are ordered similarly on all pattern properties Age Income Habilitation à diriger des recherches Mining attributed dynamic graphs 33 / 51
55 Constraint properties Let P L, Supp τ (P) = {(u,v) V2 m s P:m(u) sm(v)} ( 2) n s { < when s is + > when s is C topo is anti-monotone and n is the number of vertices C topo (P, D) Supp τ (P) σ Redundant symmetrical patterns with Supp τ (A +, B ) = Supp τ (A, B + ) Support computation quadratic on the number of vertices: An upper bound to avoid, in linear time, some support computation ECLAT mining algorithm with range trees to ease the search of supporting vertices Habilitation à diriger des recherches Mining attributed dynamic graphs 34 / 51
56 Constraint properties Let P L, Supp τ (P) = {(u,v) V2 m s P:m(u) sm(v)} ( 2) n s { < when s is + > when s is C topo is anti-monotone and n is the number of vertices C topo (P, D) Supp τ (P) σ Redundant symmetrical patterns with Supp τ (A +, B ) = Supp τ (A, B + ) Support computation quadratic on the number of vertices: An upper bound to avoid, in linear time, some support computation ECLAT mining algorithm with range trees to ease the search of supporting vertices Habilitation à diriger des recherches Mining attributed dynamic graphs 34 / 51
57 Results on DBLP - 1 Co-author graph (DBLP) vertices edges (a common publication) 29 local attributes (number of publications in Database and data mining conferences 9 topological properties Some Results {SAC +, ECML/PKDD } {SAC +, KDD }, {SAC +, VLDB } {SAC +, PageRank } Is publishing to SAC penalizing? Of course not! SAC has a broader scope than DB and DM Habilitation à diriger des recherches Mining attributed dynamic graphs 35 / 51
58 Results on DBLP - 1 Co-author graph (DBLP) vertices edges (a common publication) 29 local attributes (number of publications in Database and data mining conferences 9 topological properties Some Results {SAC +, ECML/PKDD } {SAC +, KDD }, {SAC +, VLDB } {SAC +, PageRank } Is publishing to SAC penalizing? Of course not! SAC has a broader scope than DB and DM Habilitation à diriger des recherches Mining attributed dynamic graphs 35 / 51
59 Results on DBLP - 1 Co-author graph (DBLP) vertices edges (a common publication) 29 local attributes (number of publications in Database and data mining conferences 9 topological properties Some Results {SAC +, ECML/PKDD } {SAC +, KDD }, {SAC +, VLDB } {SAC +, PageRank } Is publishing to SAC penalizing? Of course not! SAC has a broader scope than DB and DM Habilitation à diriger des recherches Mining attributed dynamic graphs 35 / 51
60 Results on DBLP - 2 Emerging patterns wrt the graph structure Which conferences are more related to Deg+ and Between+? Rank Prk+ Deg+ Publication Factor ECML/PKDD IEEE TKDE PAKDD 221 DASFAA ICDM 195 Prk+ Deg+ ECML/PKDD+ Christos Faloutsos Jiawei Han Philip S Yu Bing Liu C Lee Giles Prk+ Between+ Publication Factor PVLDB EDBT VLDB J SIGMOD 425 ICDE+ 342 Prk+ Between+ PVLDB+ Gerhard Weikum Jiawei Han David Maier Philip S Yu Hector Garcia-Molina PVLDB+, Deg+, Between+ Habilitation à diriger des recherches Mining attributed dynamic graphs 36 / 51
61 Dynamic attributed graph a1 a2 a A B a1 a2 a a1 a2 a A B a1 a2 a a1 a2 a A B a1 a2 a a1 a2 a a1 a2 a3 F C a1 a2 a a1 a2 a3 F C a1 a2 a a1 a2 a3 F C a1 a2 a E D a1 a2 a a1 a2 a E D a1 a2 a a1 a2 a E D a1 a2 a ti 1 ti ti+1 ti+2 ti+3 The data {G t = (V, E t ), t = 1,, t max } and A a set of ordinal attributes: a i : V T D i, with D i the domain of a i Habilitation à diriger des recherches Mining attributed dynamic graphs 37 / 51
62 Trend Dynamic Sub-graphs What are the sub-graphs whose vertex attributes evolve similarly? Mining maximal sub-graphs that satisfy some constraints on the graph topology and on the attribute values The connectivity of the dynamic subgraphs is constrained by a maximum diameter value To be more robust towards intrinsic inter-individual variability, we do not compare raw numerical values, but their trends E Desmier, M Plantevit, C Robardet, J-F Boulicaut Habilitation à diriger des recherches Mining attributed dynamic graphs 38 / 51
63 Language of patterns L = {(U, S, Ω) U V, S = t 1,, t s, Ω A {+, }} Example: ({A, C, D}, t 0, t 1, {a + 1,a 3 }) L a1 0 a1 3 a1 0 a2 3 a2 5 a2 2 a3 5 a3 3 a3 1 A B a1 2 a2 4 a3 5 a1 a2 a3 F C E D a1 2 a2 1 a3 4 a1 1 a1 5 a2 3 a1 2 a2 2 a3 4 a2 0 a3 1 F a3 0 A E a1 2 B C D a1 4 a2 1 a1 3 a2 2 a3 0 a2 2 a3 1 a3 4 a1 1 a1 6 a1 3 a2 3 a2 3 a2 1 a3 2 a3 0 a3 3 F A E a1 2 B C D a1 5 a2 1 a1 5 a2 1 a3 0 a2 4 a3 0 a3 3 t0 t1 t2 Habilitation à diriger des recherches Mining attributed dynamic graphs 39 / 51
64 Definition of constraints A pattern (U, S, Ω) satisfies C diam ((U, S, Ω), D) diameter Gt(U) k, t S 1 Diameter C trend ((U, S, Ω), D) u U, t S, (a, m) Ω { a(u, t) < a(u, t + 1), if m = + a(u, t) > a(u, t + 1), if m = C max ((U, S, Ω), D) U, S and Ω cannot be enlarged without invalidating one or both of the above constraints Habilitation à diriger des recherches Mining attributed dynamic graphs 40 / 51
65 Constraint properties C trend ((U, S, Ω), D) is anti-monotone C diam ((U, S, Ω), D) is piecewise monotone: diameter Gt (U) k max v,w U d G t (U)(v, w) k f(u 1, U 2, D) = max v,w U 1 d Gt(U 2 )(v, w) f 1,U is anti-monotone ie if it is satisfied on U 1, it is also satisfied for any of its subsets; f 2,U is monotone ie if it is satisfied on G t (U 2 ), then, adding some vertices and edges to G t (U 2 ) will not increase its value We can use the propagation mechanisms C max ((U, S, Ω), D) is pushed using specific mechanisms Habilitation à diriger des recherches Mining attributed dynamic graphs 41 / 51
66 Katrina hurricane results Top pattern wrt time specificity (in red) Top pattern wrt trend specificity (Yellow) 71 airports whose arrival delays increase over 3 weeks 5 airports whose number of flights increased over 3 weeks Arrival delays never increased in these airports during another week Substitutions flights were provided from these airports during this period The hurricane strongly influenced the domestic flight punctuality This behavior is rather rare in the rest of the graph Habilitation à diriger des recherches Mining attributed dynamic graphs 42 / 51
67 Conclusions and perspectives Habilitation à diriger des recherches Conclusions and perspectives 43 / 51
68 Conclusion Constraint-based pattern mining framework can be used to analyze dynamic graphs Patterns combine information about the topological structure, the vertex attribute values and their tendencies A wide variety of data can be mined Provide new insights on the techniques that can be used to analyze dynamic graphs Habilitation à diriger des recherches Conclusions and perspectives 44 / 51
69 Challenges and perspectives - 1 Enhancing the quality of the extracted information by capturing changes in the graph structure while allowing changes in trend direction Combining sub-graph and sequence mining Habilitation à diriger des recherches Conclusions and perspectives 45 / 51
70 Challenges and perspectives - 2 Avoiding the known drawbacks of pattern mining approaches How to get rid of redundant patterns? How to fix the threshold values? Returning patterns whose constraint values are surprising with respect to what could be expected from randomized data with respect to what happened in the past (real-time data mining) Trigger alerts when changes in behavior Habilitation à diriger des recherches Conclusions and perspectives 46 / 51
71 Challenges and perspectives - 2 Avoiding the known drawbacks of pattern mining approaches How to get rid of redundant patterns? How to fix the threshold values? Returning patterns whose constraint values are surprising with respect to what could be expected from randomized data with respect to what happened in the past (real-time data mining) Trigger alerts when changes in behavior Habilitation à diriger des recherches Conclusions and perspectives 46 / 51
72 Challenges and perspectives - 3 Pattern without threshold values: Skyline patterns Retrieves the patterns that are not dominated by any other pattern Tid Items t 1 A B C D E F t 2 A B C D E F t 3 A B t 4 D t 5 A C t 6 E Patterns freq length area ABCDEF AB AC A ABCD C A Soulet, C Raissi, M Plantevit, B Cremilleux Habilitation à diriger des recherches Conclusions and perspectives 47 / 51
73 Ongoing research projects Vél'innov - ANR INOV labex IMU project Constraint-based pattern mining of Vélo'V data Characterizing the usage of the bicycle sharing system Modeling the system to simulate the impacts of local changes (in the city or in the system) Pitfall: Vertex attributes are static (INSEE data) or derived from the edges (eg, number of users of a considered category) Searching for relationships between attributes and the graph structure does not make sense Envisioned solutions: Considering the user trajectories and looking for paths that characterize a user group Considering an attributed static graph with temporal usage vectors associated to edges and seeking for sub-graphs homogeneous on vertex and edges attributes Habilitation à diriger des recherches Conclusions and perspectives 48 / 51
74 Upcoming research projects - 1 GRAISearch: enhancing a location-based social media search engine FP7-PEOPLE-2013-IAPP: Industry-Academia Partnerships and Pathways Benefits for DM2L team 40 researcher months Access to data with high added value An industrial cooperation with critical potential benefits for the company Habilitation à diriger des recherches Conclusions and perspectives 49 / 51
75 Upcoming research projects - 2 GRAISearch: Research tasks Geo-localized demographic flow prediction using geo-tagged social media uploaded data Geo-localized event detection algorithm Integration into a geo-located social media recommender system Habilitation à diriger des recherches Conclusions and perspectives 50 / 51
76 Acknowledgments Habilitation à diriger des recherches Conclusions and perspectives 51 / 51
Discret event trace analysis: from an automatical to an interactive process
Discret event trace analysis: from an automatical to an interactive process Céline Robardet Lundi 12 mai 2014 SILEX seminar 1 / 43 Data Mining: a KDD step Making sense of voluminous data Elicit hypothesis
More informationEffectiveness of Freq Pat Mining
Effectiveness of Freq Pat Mining Too many patterns! A pattern a 1 a 2 a n contains 2 n -1 subpatterns Understanding many patterns is difficult or even impossible for human users Non-focused mining A manager
More informationMining Frequent Patterns without Candidate Generation
Mining Frequent Patterns without Candidate Generation Outline of the Presentation Outline Frequent Pattern Mining: Problem statement and an example Review of Apriori like Approaches FP Growth: Overview
More informationApriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke
Apriori Algorithm For a given set of transactions, the main aim of Association Rule Mining is to find rules that will predict the occurrence of an item based on the occurrences of the other items in the
More informationDATA MINING II - 1DL460
DATA MINING II - 1DL460 Spring 2013 " An second class in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationFrequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar
Frequent Pattern Mining Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Item sets A New Type of Data Some notation: All possible items: Database: T is a bag of transactions Transaction transaction
More informationChapter 4: Mining Frequent Patterns, Associations and Correlations
Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 16: Association Rules Jan-Willem van de Meent (credit: Yijun Zhao, Yi Wang, Tan et al., Leskovec et al.) Apriori: Summary All items Count
More informationBuilding Roads. Page 2. I = {;, a, b, c, d, e, ab, ac, ad, ae, bc, bd, be, cd, ce, de, abd, abe, acd, ace, bcd, bce, bde}
Page Building Roads Page 2 2 3 4 I = {;, a, b, c, d, e, ab, ac, ad, ae, bc, bd, be, cd, ce, de, abd, abe, acd, ace, bcd, bce, bde} Building Roads Page 3 2 a d 3 c b e I = {;, a, b, c, d, e, ab, ac, ad,
More informationMining Association Rules in Large Databases
Mining Association Rules in Large Databases Association rules Given a set of transactions D, find rules that will predict the occurrence of an item (or a set of items) based on the occurrences of other
More informationAssociation Rules. A. Bellaachia Page: 1
Association Rules 1. Objectives... 2 2. Definitions... 2 3. Type of Association Rules... 7 4. Frequent Itemset generation... 9 5. Apriori Algorithm: Mining Single-Dimension Boolean AR 13 5.1. Join Step:...
More informationLecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationAdvance Association Analysis
Advance Association Analysis 1 Minimum Support Threshold 3 Effect of Support Distribution Many real data sets have skewed support distribution Support distribution of a retail data set 4 Effect of Support
More informationFrequent Pattern Mining
Frequent Pattern Mining How Many Words Is a Picture Worth? E. Aiden and J-B Michel: Uncharted. Reverhead Books, 2013 Jian Pei: CMPT 741/459 Frequent Pattern Mining (1) 2 Burnt or Burned? E. Aiden and J-B
More informationChapter 4: Association analysis:
Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their day-to-day operations, huge amounts of customer purchase data are collected daily
More informationPreference-Based Pattern Mining
Preference-Based Pattern Mining Marc Plantevit Université Claude Bernard Lyon 1 LIRIS CNRS UMR5205 * Slides from different research school lectures and conference tutorials ebiss 2018 About me marcplantevit@univ-lyon1fr
More informationFrequent Pattern Mining
Frequent Pattern Mining...3 Frequent Pattern Mining Frequent Patterns The Apriori Algorithm The FP-growth Algorithm Sequential Pattern Mining Summary 44 / 193 Netflix Prize Frequent Pattern Mining Frequent
More informationBCB 713 Module Spring 2011
Association Rule Mining COMP 790-90 Seminar BCB 713 Module Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline What is association rule mining? Methods for association rule mining Extensions
More informationAssociation Pattern Mining. Lijun Zhang
Association Pattern Mining Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction The Frequent Pattern Mining Model Association Rule Generation Framework Frequent Itemset Mining Algorithms
More informationAssociation rules. Marco Saerens (UCL), with Christine Decaestecker (ULB)
Association rules Marco Saerens (UCL), with Christine Decaestecker (ULB) 1 Slides references Many slides and figures have been adapted from the slides associated to the following books: Alpaydin (2004),
More informationPreference-based Pattern Mining
Preference-based Pattern Mining Bruno Crémilleux, Marc Plantevit, Arnaud Soulet Rennes, France - June 13, 2017 Introduction 2/96 Who are we? Bruno Crémilleux, Professor, Univ. Caen, France. Marc Plantevit,
More informationPerformance and Scalability: Apriori Implementa6on
Performance and Scalability: Apriori Implementa6on Apriori R. Agrawal and R. Srikant. Fast algorithms for mining associa6on rules. VLDB, 487 499, 1994 Reducing Number of Comparisons Candidate coun6ng:
More informationChapter 7: Frequent Itemsets and Association Rules
Chapter 7: Frequent Itemsets and Association Rules Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14 VII.1&2 1 Motivational Example Assume you run an on-line
More informationData Mining for Knowledge Management. Association Rules
1 Data Mining for Knowledge Management Association Rules Themis Palpanas University of Trento http://disi.unitn.eu/~themis 1 Thanks for slides to: Jiawei Han George Kollios Zhenyu Lu Osmar R. Zaïane Mohammad
More informationMarket baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged.
Frequent itemset Association&decision rule mining University of Szeged What frequent itemsets could be used for? Features/observations frequently co-occurring in some database can gain us useful insights
More informationInterestingness Measurements
Interestingness Measurements Objective measures Two popular measurements: support and confidence Subjective measures [Silberschatz & Tuzhilin, KDD95] A rule (pattern) is interesting if it is unexpected
More informationFrequent Pattern Mining S L I D E S B Y : S H R E E J A S W A L
Frequent Pattern Mining S L I D E S B Y : S H R E E J A S W A L Topics to be covered Market Basket Analysis, Frequent Itemsets, Closed Itemsets, and Association Rules; Frequent Pattern Mining, Efficient
More informationANU MLSS 2010: Data Mining. Part 2: Association rule mining
ANU MLSS 2010: Data Mining Part 2: Association rule mining Lecture outline What is association mining? Market basket analysis and association rule examples Basic concepts and formalism Basic rule measurements
More informationPreference-based Pattern Mining
Preference-based Pattern Mining Bruno Crémilleux, Marc Plantevit, Arnaud Soulet Nancy, France - November 16, 2017 Introduction 2/97 Who are we? Bruno Crémilleux, Professor, Univ. Caen, France. Marc Plantevit,
More informationCHAPTER 8. ITEMSET MINING 226
CHAPTER 8. ITEMSET MINING 226 Chapter 8 Itemset Mining In many applications one is interested in how often two or more objectsofinterest co-occur. For example, consider a popular web site, which logs all
More informationProduct presentations can be more intelligently planned
Association Rules Lecture /DMBI/IKI8303T/MTI/UI Yudho Giri Sucahyo, Ph.D, CISA (yudho@cs.ui.ac.id) Faculty of Computer Science, Objectives Introduction What is Association Mining? Mining Association Rules
More informationChapter 6: Association Rules
Chapter 6: Association Rules Association rule mining Proposed by Agrawal et al in 1993. It is an important data mining model. Transaction data (no time-dependent) Assume all data are categorical. No good
More informationData Mining Concepts
Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential
More informationFast Nearest Neighbor Search on Large Time-Evolving Graphs
Fast Nearest Neighbor Search on Large Time-Evolving Graphs Leman Akoglu Srinivasan Parthasarathy Rohit Khandekar Vibhore Kumar Deepak Rajan Kun-Lung Wu Graphs are everywhere Leman Akoglu Fast Nearest Neighbor
More informationHolistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges
Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Abhishek Santra 1 and Sanjukta Bhowmick 2 1 Information Technology Laboratory, CSE Department, University of
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationCS570 Introduction to Data Mining
CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay Partial slide credits: Li Xiong, Jiawei Han and Micheline Kamber George Kollios 1 Mining Frequent Patterns,
More informationRoadmap. PCY Algorithm
1 Roadmap Frequent Patterns A-Priori Algorithm Improvements to A-Priori Park-Chen-Yu Algorithm Multistage Algorithm Approximate Algorithms Compacting Results Data Mining for Knowledge Management 50 PCY
More informationChapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the
Chapter 6: What Is Frequent ent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc) that occurs frequently in a data set frequent itemsets and association rule
More informationMining Trusted Information in Medical Science: An Information Network Approach
Mining Trusted Information in Medical Science: An Information Network Approach Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign Collaborated with many, especially Yizhou
More informationData Mining in Bioinformatics Day 5: Frequent Subgraph Mining
Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining Chloé-Agathe Azencott & Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institutes
More informationDatabase and Knowledge-Base Systems: Data Mining. Martin Ester
Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 6
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 6 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013-2017 Han, Kamber & Pei. All
More informationMining Data Streams. From Data-Streams Management System Queries to Knowledge Discovery from continuous and fast-evolving Data Records.
DATA STREAMS MINING Mining Data Streams From Data-Streams Management System Queries to Knowledge Discovery from continuous and fast-evolving Data Records. Hammad Haleem Xavier Plantaz APPLICATIONS Sensors
More informationFundamental Data Mining Algorithms
2018 EE448, Big Data Mining, Lecture 3 Fundamental Data Mining Algorithms Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html REVIEW What is Data
More informationA Two-Phase Algorithm for Fast Discovery of High Utility Itemsets
A Two-Phase Algorithm for Fast Discovery of High Utility temsets Ying Liu, Wei-keng Liao, and Alok Choudhary Electrical and Computer Engineering Department, Northwestern University, Evanston, L, USA 60208
More informationClustering from Data Streams
Clustering from Data Streams João Gama LIAAD-INESC Porto, University of Porto, Portugal jgama@fep.up.pt 1 Introduction 2 Clustering Micro Clustering 3 Clustering Time Series Growing the Structure Adapting
More informationTrajectory Pattern Mining. Figures and charts are from some materials downloaded from the internet.
Trajectory Pattern Mining Figures and charts are from some materials downloaded from the internet. Outline Spatio-temporal data types Mining trajectory patterns Spatio-temporal data types Spatial extension
More informationAPPLYING BIT-VECTOR PROJECTION APPROACH FOR EFFICIENT MINING OF N-MOST INTERESTING FREQUENT ITEMSETS
APPLYIG BIT-VECTOR PROJECTIO APPROACH FOR EFFICIET MIIG OF -MOST ITERESTIG FREQUET ITEMSETS Zahoor Jan, Shariq Bashir, A. Rauf Baig FAST-ational University of Computer and Emerging Sciences, Islamabad
More informationPFPM: Discovering Periodic Frequent Patterns with Novel Periodicity Measures
PFPM: Discovering Periodic Frequent Patterns with Novel Periodicity Measures 1 Introduction Frequent itemset mining is a popular data mining task. It consists of discovering sets of items (itemsets) frequently
More informationBig Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety
Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Big Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety Abhishek
More informationData Mining: Concepts and Techniques. Chapter 5. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1
Data Mining: Concepts and Techniques Chapter 5 SS Chung April 5, 2013 Data Mining: Concepts and Techniques 1 Chapter 5: Mining Frequent Patterns, Association and Correlations Basic concepts and a road
More informationKnowledge Discovery in Databases II Winter Term 2015/2016. Optional Lecture: Pattern Mining & High-D Data Mining
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases II Winter Term 2015/2016 Optional Lecture: Pattern Mining
More informationDATA MINING RESEARCH: RETROSPECT AND PROSPECT
DATA MINING RESEARCH: RETROSPECT AND PROSPECT Prof(Dr).V.SARAVANAN & Mr. ABDUL KHADAR JILANI Department of Computer Science College of Computer and Information Sciences Majmaah University Kingdom of Saudi
More informationINTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá
INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús
More informationWhat Is Data Mining? CMPT 354: Database I -- Data Mining 2
Data Mining What Is Data Mining? Mining data mining knowledge Data mining is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data CMPT
More informationAssociation Rule Learning
Association Rule Learning 16s1: COMP9417 Machine Learning and Data Mining School of Computer Science and Engineering, University of New South Wales March 15, 2016 COMP9417 ML & DM (CSE, UNSW) Association
More informationCS Introduction to Data Mining Instructor: Abdullah Mueen
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING) Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts
More informationAcyclic Network. Tree Based Clustering. Tree Decomposition Methods
Summary s Join Tree Importance of s Solving Topological structure defines key features for a wide class of problems CSP: Inference in acyclic network is extremely efficient (polynomial) Idea: remove cycles
More informationH-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases. Paper s goals. H-mine characteristics. Why a new algorithm?
H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases Paper s goals Introduce a new data structure: H-struct J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang Int. Conf. on Data Mining
More informationAssociation Rule Mining
Association Rule Mining Generating assoc. rules from frequent itemsets Assume that we have discovered the frequent itemsets and their support How do we generate association rules? Frequent itemsets: {1}
More informationData Mining: Concepts and Techniques. Chapter Mining sequence patterns in transactional databases
Data Mining: Concepts and Techniques Chapter 8 8.3 Mining sequence patterns in transactional databases Jiawei Han and Micheline Kamber Department of Computer Science University of Illinois at Urbana-Champaign
More informationAdvanced Data Management
Advanced Data Management Medha Atre Office: KD-219 atrem@cse.iitk.ac.in Sept 26, 2016 defined Given a graph G(V, E) with V as the set of nodes and E as the set of edges, a reachability query asks does
More informationLecture Note: Computation problems in social. network analysis
Lecture Note: Computation problems in social network analysis Bang Ye Wu CSIE, Chung Cheng University, Taiwan September 29, 2008 In this lecture note, several computational problems are listed, including
More informationData Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems
Data Warehousing and Data Mining CPS 116 Introduction to Database Systems Announcements (December 1) 2 Homework #4 due today Sample solution available Thursday Course project demo period has begun! Check
More informationUncovering the Formation of Triadic Closure in Social Networks. Zhanpeng Fang and Jie Tang Tsinghua University
Uncovering the Formation of Triadic Closure in Social Networks Zhanpeng Fang and Jie Tang Tsinghua University 1 Triangle Laws Triangle is one of most basic human groups in social networks Friends of friends
More informationTriggering Patterns of Topology Changes in Dynamic Graphs*
Triggering Patterns of Topology Changes in Dynamic Graphs* Mehdi Kaytoue INSA-Lyon, CNRS LIRIS UMR55 F-69621, France mkaytoue@insa-lyon.fr Yoann Pitarch Université de Toulouse CNRS, IRIT UMR555 F-3171,
More informationGraph Cube: On Warehousing and OLAP Multidimensional Networks
Graph Cube: On Warehousing and OLAP Multidimensional Networks Peixiang Zhao, Xiaolei Li, Dong Xin, Jiawei Han Department of Computer Science, UIUC Groupon Inc. Google Cooperation pzhao4@illinois.edu, hanj@cs.illinois.edu
More informationConstraint-based Pattern Mining in Dynamic Graphs
2009 Ninth IEEE International Conference on Data Mining Constraint-based Pattern Mining in Dynamic Graphs Céline Robardet Université de Lyon, INSA-Lyon, CNRS, LIRIS UMR 5205 F-69621 Villeurbanne, France
More informationChapter 13, Sequence Data Mining
CSI 4352, Introduction to Data Mining Chapter 13, Sequence Data Mining Young-Rae Cho Associate Professor Department of Computer Science Baylor University Topics Single Sequence Mining Frequent sequence
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University 10/19/2017 Slides adapted from Prof. Jiawei Han @UIUC, Prof.
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank
More information9. Conclusions. 9.1 Definition KDD
9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]
More informationDistance-based Methods: Drawbacks
Distance-based Methods: Drawbacks Hard to find clusters with irregular shapes Hard to specify the number of clusters Heuristic: a cluster must be dense Jian Pei: CMPT 459/741 Clustering (3) 1 How to Find
More informationPattern Lattice Traversal by Selective Jumps
Pattern Lattice Traversal by Selective Jumps Osmar R. Zaïane Mohammad El-Hajj Department of Computing Science, University of Alberta Edmonton, AB, Canada {zaiane, mohammad}@cs.ualberta.ca ABSTRACT Regardless
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Sequence Data: Sequential Pattern Mining Instructor: Yizhou Sun yzsun@cs.ucla.edu November 27, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification
More informationECE 3060 VLSI and Advanced Digital Design
ECE 3060 VLSI and Advanced Digital Design Lecture 15 Multiple-Level Logic Minimization Outline Multi-level circuit representations Minimization methods goals: area, delay, power algorithms: algebraic,
More informationChapter 28. Outline. Definitions of Data Mining. Data Mining Concepts
Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms
More informationAC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery
: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References q Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationMining High Average-Utility Itemsets
Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Mining High Itemsets Tzung-Pei Hong Dept of Computer Science and Information Engineering
More informationMachine Learning: Symbolische Ansätze
Machine Learning: Symbolische Ansätze Unsupervised Learning Clustering Association Rules V2.0 WS 10/11 J. Fürnkranz Different Learning Scenarios Supervised Learning A teacher provides the value for the
More informationSummary. Acyclic Networks Join Tree Clustering. Tree Decomposition Methods. Acyclic Network. Tree Based Clustering. Tree Decomposition.
Summary s Join Tree Importance of s Solving Topological structure denes key features for a wide class of problems CSP: Inference in acyclic network is extremely ecient (polynomial) Idea: remove cycles
More informationDimension Induced Clustering
Dimension Induced Clustering Aris Gionis Alexander Hinneburg Spiros Papadimitriou Panayiotis Tsaparas HIIT, University of Helsinki Martin Luther University, Halle Carnegie Melon University HIIT, University
More informationWe had several appearances of the curse of dimensionality:
Short Recap We had several appearances of the curse of dimensionality: Concentration of distances => meaningless similarity/distance/neighborhood concept => instability of neighborhoods Growing hypothesis
More informationERMO DG: Evolving Region Moving Object Dataset Generator. Authors: B. Aydin, R. Angryk, K. G. Pillai Presented by Micheal Schuh May 21, 2014
ERMO DG: Evolving Region Moving Object Dataset Generator Authors: B. Aydin, R. Angryk, K. G. Pillai Presented by Micheal Schuh May 21, 2014 Introduction Spatiotemporal pattern mining algorithms Motivation
More informationAn Improved Apriori Algorithm for Association Rules
Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan
More informationCse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before
More informationSurvey: Efficent tree based structure for mining frequent pattern from transactional databases
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 9, Issue 5 (Mar. - Apr. 2013), PP 75-81 Survey: Efficent tree based structure for mining frequent pattern from
More informationInduction of Association Rules: Apriori Implementation
1 Induction of Association Rules: Apriori Implementation Christian Borgelt and Rudolf Kruse Department of Knowledge Processing and Language Engineering School of Computer Science Otto-von-Guericke-University
More informationBasic Concepts: Association Rules. What Is Frequent Pattern Analysis? COMP 465: Data Mining Mining Frequent Patterns, Associations and Correlations
What Is Frequent Pattern Analysis? COMP 465: Data Mining Mining Frequent Patterns, Associations and Correlations Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and
More informationHiding Sensitive Frequent Itemsets by a Border-Based Approach
Hiding Sensitive Frequent Itemsets by a Border-Based Approach Xingzhi Sun IBM China Research Lab, Beijing, China sunxingz@cn.ibm.com Philip S. Yu IBM Watson Research Center, Hawthorne, NY, USA psyu@us.ibm.com
More informationAcyclic Network. Tree Based Clustering. Tree Decomposition Methods
Summary s Cluster Tree Elimination Importance of s Solving Topological structure dene key features for a wide class of problems CSP: Inference in acyclic network is extremely ecient (polynomial) Idea:
More informationGraph Exploration: Taking the User into the Loop
Graph Exploration: Taking the User into the Loop Davide Mottin, Anja Jentzsch, Emmanuel Müller Hasso Plattner Institute, Potsdam, Germany 2016/10/24 CIKM2016, Indianapolis, US Where we are Background (5
More informationOPTIMISING ASSOCIATION RULE ALGORITHMS USING ITEMSET ORDERING
OPTIMISING ASSOCIATION RULE ALGORITHMS USING ITEMSET ORDERING ES200 Peterhouse College, Cambridge Frans Coenen, Paul Leng and Graham Goulbourne The Department of Computer Science The University of Liverpool
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 6
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 6 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2011 Han, Kamber & Pei. All rights
More informationAssociation Rules. Charles Sutton Data Mining and Exploration Spring Based on slides by Chris Williams and Amos Storkey. Thursday, 8 March 12
Association Rules Charles Sutton Data Mining and Exploration Spring 2012 Based on slides by Chris Williams and Amos Storkey The Goal Find patterns : local regularities that occur more often than you would
More informationMS-FP-Growth: A multi-support Vrsion of FP-Growth Agorithm
, pp.55-66 http://dx.doi.org/0.457/ijhit.04.7..6 MS-FP-Growth: A multi-support Vrsion of FP-Growth Agorithm Wiem Taktak and Yahya Slimani Computer Sc. Dept, Higher Institute of Arts MultiMedia (ISAMM),
More informationData Mining Part 3. Associations Rules
Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets
More information