Discovering Roles and Anomalies in Graphs: Theory and Applications

Size: px
Start display at page:

Download "Discovering Roles and Anomalies in Graphs: Theory and Applications"

Transcription

1 Discovering Roles and Anomalies in Graphs: Theory and Applications Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos Faloutsos (CMU) SDM'12 Tutorial

2 Overview Features Roles Anomalies = rare roles Patterns SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 2

3 Overview Features Roles Anomalies = rare roles Patterns SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 3

4 Roadmap What are roles Roles and communities Roles and equivalences (from sociology) Roles (from data mining) Summary SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 4

5 What are roles? Functions of nodes in the network Similar to functional roles of species in ecosystems Measured by structural behaviors Examples centers of stars members of cliques peripheral nodes SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 5

6 Example of Roles centers of stars members of cliques peripheral nodes SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 6

7 Why are roles important? Role Discovery ü Automated discovery ü Behavioral roles ü Roles generalize Task Role query Role outliers Role dynamics Use Case Identify individuals with similar behavior to a known target Identify individuals with unusual behavior Identify unusual changes in behavior Identity resolution Identify known individuals in a new network Role transfer Network comparison Use knowledge of one network to make predictions in another Determine network compatibility for knowledge transfer

8 Roadmap What are roles Roles and communities Roles and equivalences (from sociology) Roles (from data mining) Summary SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 8

9 Roles and Communities Roles group nodes with similar structural properties Communities group nodes that are wellconnected to each other Roles and communities are complementary SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 9

10 Roles and Communities RolX * Fast Modularity * Henderson, et al. 2012; Clauset, et al SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 10

11 Roles and Communities Consider the social network of a CS dept Roles Faculty Staff Students Communities AI lab Database lab Architecture lab SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 11

12 Roadmap What are roles Roles and communities Roles and equivalences (from sociology) Roles (from data mining) Summary SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 12

13 Equivalences Regular Deterministic Automorphic Equivalences Structural Probabilistic Stochastic SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 13

14 Deterministic Equivalences Regular Automorphic Structural SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 14

15 Structural Equivalence [Lorrain & White, 1971] Two nodes u and v are structurally equivalent if they have the same relationships to all other nodes Hypothesis: Structurally equivalent nodes are likely to be similar in other ways i.e., you are your friend Weights & timing issues are not considered Rarely appears in real-world networks a b c u d v e SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 15

16 Structural Equivalence: Algorithms CONCOR (CONvergence of iterated CORrelations) [Breiger et al. 1975] A hierarchical divisive approach 1. Starting with the adjacency matrix, repeatedly calculate Pearson correlations between rows until the resultant correlation matrix consists of +1 and -1 entries 2. Split the last correlation matrix into two structurally equivalent submatrices (a.k.a. blocks): one with +1 entries, another with -1 entries Successive split can be applied to submatrices in order to produce a hierarchy (where every node has a unique position) SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 16

17 Structural Equivalence: Algorithms STRUCUTRE [Burt 1976] A hierarchical agglomerative approach 1. For each node i, create its ID vector by concatenating its row and column vectors from the adjacency matrix 2. For every pair of nodes i, j, measure the square root of sum of squared differences between the corresponding entries in their ID vectors 3. Merge entries in hierarchical fashion as long as their difference is less than some threshold α SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 17

18 Structural Equivalences: Algorithms Combinatorial optimization approaches Numerical optimization with tabu search [UCINET] Local optimization [Pajek] Partition the sociomatrices into blocks based on a cost function that minimizes the sum of within block variances Basically, minimize the sum of code cost within each block SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 18

19 Deterministic Equivalences Regular Automorphic Structural SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 19

20 Automorphic Equivalence [Borgatti, et al. 1992; Sparrow 1993] Two nodes u and v are automorphically equivalent if all the nodes can be relabeled to form an isomorphic graph with the labels of u and v interchanged Swapping u and v (possibly along with their neighbors) does not change graph distances Two nodes that are automorphically equivalent share exactly the same label-independent properties u v SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 20

21 Automorphic Equivalence: Algorithms Sparrow (1993) proposed an algorithm that scales linearly to the number of edges Use numerical signatures on degree sequences of neighborhoods Numerical signatures use a unique transcendental number like π, which is independent of any permutation of nodes Suppose node i has the following degree sequence: 1, 1, 5, 6, and 9. Then its signature is S i,1 = (1 + π)(1 + π) (5 + π) (6 + π) (9 + π) The signature for node i at k+1 hops is S i,(k+1) = Π(S i,k + π) To find automorphic equivalence, simply compare numerical signatures of nodes SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 21

22 Deterministic Equivalences Regular Automorphic Structural SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 22

23 Regular Equivalence [Everett & Borgatti, 1992] Two nodes u and v are regularly equivalent if they are equally related to equivalent others President Motes Faculty Graduate Students Hanneman, Robert A. and Mark Riddle Introduction to social network methods. Riverside, CA: University of California, Riverside ( published in digital form at ) SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 23

24 Regular Equivalence (continued) Basic roles of nodes source repeater sink isolate SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 24

25 Regular Equivalence (continued) Based solely on the social roles of neighbors Interested in Which nodes fall in which social roles? How do social roles relate to each other? Hard partitioning of the graph into social roles A given graph can have more than one valid regular equivalence set Exact regular equivalences can be rare in large graphs SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 25

26 Regular Equivalence: Algorithms Many algorithms exist here Basic notion Profile each node s neighborhood by the presence of nodes of other "types" Nodes are regularly equivalent to the extent that they have similar "types" of other nodes at similar distances in their neighborhoods SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 26

27 Equivalences Regular Deterministic Automorphic Equivalences Structural Probabilistic Stochastic SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 27

28 Stochastic Equivalence [Holland, et al. 1983; Wasserman & Anderson, 1987] Two nodes are stochastically equivalent if they are exchangeable w.r.t. a probability distribution u p (a,u) a p (a,u) = p (a,v) p (u,b) = p (v,b) p (a,v) v Similar to structural equivalence but probabilistic p (u,b) b p (v,b) SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 28

29 Stochastic Equivalence: Algorithms Many algorithms exist here Most recent approaches are generative [Airoldi, et al 2008] Some choice points Single [Kemp, et al 2006] vs. mixed-membership [Koutsourelakis & Eliassi-Rad, 2008] equivalences (a.k.a. positions ) Parametric vs. non-parametric models SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 29

30 Roadmap What are roles Roles and communities Roles and equivalences (from sociology) Roles (from data mining) Summary SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 30

31 RolX: Role extraction Introduced by Henderson, et al. 2011b Automatically extracts the underlying roles in a network No prior knowledge required Assigns a mixed-membership of roles to each node Scales linearly on the number of edges SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 31

32 RolX: Flowchart Node Node Matrix Recursive Feature Extraction Node Feature Matrix Role Extraction Node Role Matrix Role Feature Matrix SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 32

33 RolX: Flowchart Node Node Matrix Recursive Feature Extraction Node Feature Matrix Example: degree, avg weight, # of edges in egonet, mean clustering coefficient of neighbors, etc Role Extraction Node Role Matrix Role Feature Matrix SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 33

34 Recursive Feature Extraction ReFeX [Henderson, et al. 2011a] turns network connectivity into recursive structural features Regional Neighborhood ReFeX Nodes Local Egonet Recursive 1411# 0# 1# 2# 1# 0# 0# 0# 1# 1# 0# 1# 0# 0# 1# 1# 2# 2# 1410# 0# 1# 1# 1# 0# 1# 0# 0# 1# 0# 1# 0# 1# 0# 1# 1# 1# 338# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 1# 0# 0# 0# 339# 1# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 1# 0# 1# 0# 0# 0# 1415# 0# 1# 1# 2# 0# 1# 0# 0# 0# 0# 0# 0# 1# 1# 1# 1# 1# 941# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 1414# 0# 1# 1# 1# 0# 1# 0# 0# 0# 0# 0# 0# 1# 1# 0# 1# 1# 942# 0# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1413# 0# 1# 1# 1# 0# 1# 1# 0# 0# 0# 0# 0# 1# 1# 0# 1# 1# 1412# 0# 0# 0# 0# 0# 0# 0# 1# 2# 0# 1# 1# 0# 0# 1# 2# 0# 940# 0# 0# 1# 0# 0# 0# 0# 1# 0# 0# 0# 1# 1# 0# 1# 1# 1# 1419# 0# 0# 1# 0# 0# 1# 0# 1# 1# 0# 1# 1# 1# 0# 1# 1# 1# 945# 0# 1# 4# 3# 0# 0# 0# 0# 2# 0# 1# 0# 0# 2# 1# 3# 1# 332# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 1418# 0# 0# 1# 0# 0# 0# 0# 1# 0# 0# 0# 1# 2# 0# 1# 0# 1# 946# 0# 1# 1# 0# 0# 1# 0# 1# 0# 0# 0# 1# 4# 0# 1# 1# 2# 333# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 1417# 0# 1# 1# 1# 0# 2# 0# 0# 1# 0# 1# 0# 1# 0# 1# 1# 1# 943# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1# 0# 0# 330# 1# 3# 2# 0# 1# 2# 2# 0# 2# 2# 2# 0# 3# 1# 0# 2# 5# 1416# 0# 1# 1# 1# 1# 2# 0# 0# 1# 0# 1# 0# 1# 0# 0# 1# 1# 944# 0# 1# 4# 2# 0# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 3# 1# 331# 0# 3# 2# 1# 0# 1# 0# 0# 2# 0# 2# 0# 2# 0# 1# 2# 5# 949# 0# 0# 0# 0# 2# 0# 0# 1# 0# 1# 0# 1# 0# 0# 0# 0# 0# 336# 0# 0# 0# 0# 2# 0# 0# 1# 1# 1# 1# 1# 0# 0# 0# 1# 0# 337# 1# 1# 1# 0# 0# 1# 2# 0# 1# 1# 1# 0# 1# 1# 1# 1# 1# 947# 1# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 1# 0# 1# 0# 0# 0# 334# 0# 0# 0# 1# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 948# 0# 0# 0# 0# 0# 1# 0# 1# 1# 0# 1# 1# 1# 0# 1# 1# 0# 335# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1# 0# 0# 531# 1# 0# 0# 0# 1# 0# 2# 0# 0# 2# 0# 0# 0# 2# 0# 0# 0# Neighborhood features: What is your connectivity pattern? Recursive Features: To what kinds of nodes are you connected? SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 34

35 Propositionalisation (PROP) [Knobbe, et al. 2001; Neville, et al. 2003; Krogel, et al. 2003] From multi-relational data mining with roots in Inductive Logic Programming (ILP) Summarizes a multi-relational dataset (stored in multiple tables) into a propositional dataset (stored in a single target table) Derived attribute-value features describe properties of individuals Related more to recursive structural features than structural roles SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 35

36 Role Extraction Features 1411# 0# 1# 2# 1# 0# 0# 0# 1# 1# 0# 1# 0# 0# 1# 1# 2# 2# Recursively extract roles Nodes 1410# 0# 1# 1# 1# 0# 1# 0# 0# 1# 0# 1# 0# 1# 0# 1# 1# 1# 338# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 1# 0# 0# 0# 339# 1# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 1# 0# 1# 0# 0# 0# 1415# 0# 1# 1# 2# 0# 1# 0# 0# 0# 0# 0# 0# 1# 1# 1# 1# 1# 941# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 1414# 0# 1# 1# 1# 0# 1# 0# 0# 0# 0# 0# 0# 1# 1# 0# 1# 1# 942# 0# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1413# 0# 1# 1# 1# 0# 1# 1# 0# 0# 0# 0# 0# 1# 1# 0# 1# 1# 1412# 0# 0# 0# 0# 0# 0# 0# 1# 2# 0# 1# 1# 0# 0# 1# 2# 0# 940# 0# 0# 1# 0# 0# 0# 0# 1# 0# 0# 0# 1# 1# 0# 1# 1# 1# 1419# 0# 0# 1# 0# 0# 1# 0# 1# 1# 0# 1# 1# 1# 0# 1# 1# 1# 945# 0# 1# 4# 3# 0# 0# 0# 0# 2# 0# 1# 0# 0# 2# 1# 3# 1# 332# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 1418# 0# 0# 1# 0# 0# 0# 0# 1# 0# 0# 0# 1# 2# 0# 1# 0# 1# 946# 0# 1# 1# 0# 0# 1# 0# 1# 0# 0# 0# 1# 4# 0# 1# 1# 2# 333# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 1417# 0# 1# 1# 1# 0# 2# 0# 0# 1# 0# 1# 0# 1# 0# 1# 1# 1# 943# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1# 0# 0# 330# 1# 3# 2# 0# 1# 2# 2# 0# 2# 2# 2# 0# 3# 1# 0# 2# 5# 1416# 0# 1# 1# 1# 1# 2# 0# 0# 1# 0# 1# 0# 1# 0# 0# 1# 1# 944# 0# 1# 4# 2# 0# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 3# 1# 331# 0# 3# 2# 1# 0# 1# 0# 0# 2# 0# 2# 0# 2# 0# 1# 2# 5# 949# 0# 0# 0# 0# 2# 0# 0# 1# 0# 1# 0# 1# 0# 0# 0# 0# 0# 336# 0# 0# 0# 0# 2# 0# 0# 1# 1# 1# 1# 1# 0# 0# 0# 1# 0# 337# 1# 1# 1# 0# 0# 1# 2# 0# 1# 1# 1# 0# 1# 1# 1# 1# 1# 947# 1# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 1# 0# 1# 0# 0# 0# 334# 0# 0# 0# 1# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 948# 0# 0# 0# 0# 0# 1# 0# 1# 1# 0# 1# 1# 1# 0# 1# 1# 0# 335# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1# 0# 0# 531# 1# 0# 0# 0# 1# 0# 2# 0# 0# 2# 0# 0# 0# 2# 0# 0# 0# Automatically factorize roles SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 36

37 Automatically Discovered Roles Network Science Co-authorship Graph [Newman 2006] SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 37

38 Making Sense of Roles Degree Avg. Weight Clustering Coeff. Eccentricity PageRank GateKeeper GateKeeper-Local Pivot Structural Hole Role 1 Role 2 Role 3 Role 4 NodeSense NeighborSense Role 1 Role 2 Role 3 Role 4 Default cliquey bridge periphery isolated SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 38

39 Mixed-Membership over Roles liberal conservative neutral Bright red nodes are locally central nodes Bright blue nodes are peripheral nodes Amazon Political Books Co-purchasing Network [V. Krebs 2000] SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 39

40 Role Query Node Similarity for J. Rinzel (isolate) Node Similarity for M.E.J. Newman (bridge) Node Similarity for F. Robert (cliquey) SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 40

41 Roles Generalize across Disjoint Networks Classification Accuracy RolX Feat Default A1-B A2-B A3-B TrainSet-TestSet A4-B A-B SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 41

42 Roles Generalize across Networks Discovery Stage Tue Network Wed Network V: (node feature) Thu matrix Network G: (node role) matrix E.g., degree, avg wgt, etc Feature Discovery Feature Extraction F: (role feature) matrix Feature L: List of feature names Extraction C: Class labels V 1 L V 2 M: model V 3 C 1 RolX Regression Regression G 1 F G 2 G 3 Learning Inference Inference M SDM'12 Tutorial C 2 42 C 3

43 Roles Generalize across Networks Tue Network Wed Network Thu Network E.g., degree, avg wgt, etc Feature Discovery Feature Extraction Feature Extraction V: (node feature) matrix V 1 L V 2 V 3 G: (node role) matrix F: (role feature) matrix C 1 RolX Regression Regression L: List of feature names C: Class labels G 1 F G 2 G 3 M: model Learning Inference Inference M Discovery Stage C 2 C 3 Application Stage 43

44 Roles: Regular Equivalence vs. RolX Mixed-membership over roles Fully automatic Uses structural features RolX Regular Equivalence Uses structure Generalizable across disjoint networks Scalable (linear on # of edges)? SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 44

45 Roadmap What are roles Roles and communities Roles and equivalences (from sociology) Roles (from data mining) Summary SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 45

46 Roles Summary Structural behavior ( function ) of nodes Complementary to communities Previous work mostly in sociology under equivalences Recent graph mining work produces mixedmembership roles, is fully automatic and scalable Can be used for many tasks: transfer learning, reidentification, node dynamics, etc SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 46

47 Acknowledgement LLNL: Brian Gallagher, Keith Henderson IBM Watson: Hanghang Tong Google: Sugato Basu CMU: Leman Akoglu, Danai Koutra, Lei Li Thanks to: LLNL, NSF, IARPA. SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 47

48 References Deterministic Equivalences S. Boorman, H.C. White: Social Structure from Multiple Networks: II. Role Structures. American Journal of Sociology, 81: , S.P. Borgatti, M.G. Everett: Notions of Positions in Social Network Analysis. In P. V. Marsden (Ed.): Sociological Methodology, 1992:1 35. S.P. Borgatti, M.G. Everett, L. Freeman: UCINET IV, S.P. Borgatti, M.G. Everett, Regular Blockmodels of Multiway, Multimode Matrices. Social Networks, 14:91-120, R. Breiger, S. Boorman, P. Arabie: An Algorithm for Clustering Relational Data with Applications to Social Network Analysis and Comparison with Multidimensional Scaling. Journal of Mathematical Psychology, 12: , R.S. Burt: Positions in Networks. Social Forces, 55:93-122, SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 48

49 References P. DiMaggio: Structural Analysis of Organizational Fields: A Blockmodel Approach. Research in Organizational Behavior, 8:335-70, P. Doreian, V. Batagelj, A. Ferligoj: Generalized Blockmodeling. Cambridge University Press, M.G. Everett, S. P. Borgatti: Regular Equivalence: General Theory. Journal of Mathematical Sociology, 19(1):29-52, K. Faust, A.K. Romney: Does Structure Find Structure? A critique of Burt's Use of Distance as a Measure of Structural Equivalence. Social Networks, 7:77-103, K. Faust, S. Wasserman: Blockmodels: Interpretation and Evaluation. Social Networks, 14: R.A. Hanneman, M. Riddle: Introduction to Social Network Methods. University of California, Riverside, SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 49

50 References F. Lorrain, H.C. White: Structural Equivalence of Individuals in Social Networks. Journal of Mathematical Sociology, 1:49-80, L.D. Sailer: Structural Equivalence: Meaning and Definition, Computation, and Applications. Social Networks, 1:73-90, M.K. Sparrow: A Linear Algorithm for Computing Automorphic Equivalence Classes: The Numerical Signatures Approach. Social Networks, 15: , S. Wasserman, K. Faust: Social Network Analysis: Methods and Applications. Cambridge University Press, H.C. White, S. A. Boorman, R. L. Breiger: Social Structure from Multiple Networks I. Blockmodels of Roles and Positions. American Journal of Sociology, 81: , D.R. White, K. Reitz: Graph and Semi-Group Homomorphism on Networks and Relations. Social Networks, 5: , SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 50

51 References Stochastic blockmodels E.M. Airoldi, D.M. Blei, S.E. Fienberg, E.P. Xing: Mixed Membership Stochastic Blockmodels. Journal of Machine Learning Research, 9: , P.W. Holland, K.B. Laskey, S. Leinhardt: Stochastic Blockmodels: Some First Steps. Social Networks, 5: , C. Kemp, J.B. Tenenbaum, T.L. Griffiths, T. Yamada, N. Ueda: Learning Systems of Concepts with an Infinite Relational Model. AAAI P.S. Koutsourelakis, T. Eliassi-Rad: Finding Mixed-Memberships in Social Networks. AAAI Spring Symposium on Social Information Processing, Stanford, CA, K. Nowicki,T. Snijders: Estimation and Prediction for Stochastic Blockstructures, Journal of the American Statistical Association, 96: , Z. Xu, V. Tresp, K. Yu, H.-P. Kriegel: Infinite Hidden Relational Models. UAI S. Wasserman, C. Anderson: Stochastic a Posteriori Blockmodels: Construction and Assessment, Social Networks, 9:1-36, SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 51

52 References Role Discovery K. Henderson, B. Gallagher, L. Li, L. Akoglu, T. Eliassi-Rad, H. Tong, C. Faloutsos: It's Who Your Know: Graph Mining Using Recursive Structural Features. KDD 2011: K. Henderson, B. Gallagher, T. Eliassi-Rad, H. Tong, S. Basu, L. Akoglu, D. Koutra, L. Li, C. Faloutsos: RolX: Structural Role Extraction & Mining in Large Graphs. Technical Report, Lawrence Livermore National Laboratory, Livermore, CA, Ruoming Jin, Victor E. Lee, Hui Hong: Axiomatic ranking of network role similarity. KDD 2011: SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 52

53 References Community Discovery A. Clauset, M.E.J. Newman, C. Moore: Finding Community Structure in Very Large Networks. Phys. Rev. E., 70:066111, M.E.J. Newman: Finding Community Structure in Networks Using the Eigenvectors of Matrices. Phys. Rev. E., 74:036104, Propositionalisation A.J. Knobbe, M. de Haas, A. Siebes: Propositionalisation and Aggregates. PKDD 2001: M.-A. Krogel, S. Rawles, F. Zelezny, P.A. Flach, N. Lavrac, S. Wrobel: Comparative Evaluation of Approaches to Propositionalization. ILP 2003: J. Neville, D. Jensen, B. Gallagher: Simple Estimators for Relational Bayesian Classifiers. ICDM 2003: SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 53

54 Back to Overview Features Roles Anomalies = rare roles Patterns SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 54

Generalized blockmodeling of sparse networks

Generalized blockmodeling of sparse networks Metodološki zvezki, Vol. 10, No. 2, 2013, 99-119 Generalized blockmodeling of sparse networks Aleš Žiberna 1 Abstract The paper starts with an observation that the blockmodeling of relatively sparse binary

More information

Node similarity and classification

Node similarity and classification Node similarity and classification Davide Mottin, Konstantina Lazaridou HassoPlattner Institute Graph Mining course Winter Semester 2016 Acknowledgements Some part of this lecture is taken from: http://web.eecs.umich.edu/~dkoutra/tut/icdm14.html

More information

Computing Regular Equivalence: Practical and Theoretical Issues

Computing Regular Equivalence: Practical and Theoretical Issues Developments in Statistics Andrej Mrvar and Anuška Ferligoj (Editors) Metodološki zvezki, 17, Ljubljana: FDV, 2002 Computing Regular Equivalence: Practical and Theoretical Issues Martin Everett 1 and Steve

More information

Definition: Implications: Analysis:

Definition: Implications: Analysis: Analyzing Two mode Network Data Definition: One mode networks detail the relationship between one type of entity, for example between people. In contrast, two mode networks are composed of two types of

More information

Node Similarity. Ralucca Gera, Applied Mathematics Dept. Naval Postgraduate School Monterey, California

Node Similarity. Ralucca Gera, Applied Mathematics Dept. Naval Postgraduate School Monterey, California Node Similarity Ralucca Gera, Applied Mathematics Dept. Naval Postgraduate School Monterey, California rgera@nps.edu Motivation We talked about global properties Average degree, average clustering, ave

More information

A Higher-order Latent Space Network Model

A Higher-order Latent Space Network Model A Higher-order Latent Space Network Model Nesreen K. Ahmed Intel Labs nesreen.k.ahmed@intel.com Ryan A. Rossi Palo Alto Research Center rrossi@parc.com Theodore L. Willke Intel Labs ted.willke@intel.com

More information

Two mode Network. PAD 637, Lab 8 Spring 2013 Yoonie Lee

Two mode Network. PAD 637, Lab 8 Spring 2013 Yoonie Lee Two mode Network PAD 637, Lab 8 Spring 2013 Yoonie Lee Lab exercises Two- mode Davis QAP correlation Multiple regression QAP Types of network data One-mode network (M M ) Two-mode network (M N ) M1 M2

More information

A Scalable Approach to Size-Independent Network Similarity

A Scalable Approach to Size-Independent Network Similarity A Scalable Approach to Size-Independent Network Similarity Michele Berlingerio Danai Koutra Tina Eliassi-Rad Christos Faloutsos IBM Research Dublin Carnegie Mellon University Rutgers University mberling@ie.ibm.com

More information

Blockmodels/Positional Analysis : Fundamentals. PAD 637, Lab 10 Spring 2013 Yoonie Lee

Blockmodels/Positional Analysis : Fundamentals. PAD 637, Lab 10 Spring 2013 Yoonie Lee Blockmodels/Positional Analysis : Fundamentals PAD 637, Lab 10 Spring 2013 Yoonie Lee Review of PS5 Main question was To compare blockmodeling results using PROFILE and CONCOR, respectively. Data: Krackhardt

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

Graph Exploration: Taking the User into the Loop

Graph Exploration: Taking the User into the Loop Graph Exploration: Taking the User into the Loop Davide Mottin, Anja Jentzsch, Emmanuel Müller Hasso Plattner Institute, Potsdam, Germany 2016/10/24 CIKM2016, Indianapolis, US Where we are Background (5

More information

Fast Nearest Neighbor Search on Large Time-Evolving Graphs

Fast Nearest Neighbor Search on Large Time-Evolving Graphs Fast Nearest Neighbor Search on Large Time-Evolving Graphs Leman Akoglu Srinivasan Parthasarathy Rohit Khandekar Vibhore Kumar Deepak Rajan Kun-Lung Wu Graphs are everywhere Leman Akoglu Fast Nearest Neighbor

More information

Section 7.12: Similarity. By: Ralucca Gera, NPS

Section 7.12: Similarity. By: Ralucca Gera, NPS Section 7.12: Similarity By: Ralucca Gera, NPS Motivation We talked about global properties Average degree, average clustering, ave path length We talked about local properties: Some node centralities

More information

Modularity CMSC 858L

Modularity CMSC 858L Modularity CMSC 858L Module-detection for Function Prediction Biological networks generally modular (Hartwell+, 1999) We can try to find the modules within a network. Once we find modules, we can look

More information

arxiv: v1 [stat.ml] 4 Oct 2016

arxiv: v1 [stat.ml] 4 Oct 2016 Revisiting Role Discovery in Networks: From Node to Edge Roles Nesreen K. Ahmed Ryan A. Rossi Intel Labs Palo Alto Research Center nesreen.k.ahmed@intel.com rrossi@parc.com Rong Zhou Palo Alto Research

More information

Spectral Methods for Network Community Detection and Graph Partitioning

Spectral Methods for Network Community Detection and Graph Partitioning Spectral Methods for Network Community Detection and Graph Partitioning M. E. J. Newman Department of Physics, University of Michigan Presenters: Yunqi Guo Xueyin Yu Yuanqi Li 1 Outline: Community Detection

More information

Stochastic propositionalization of relational data using aggregates

Stochastic propositionalization of relational data using aggregates Stochastic propositionalization of relational data using aggregates Valentin Gjorgjioski and Sašo Dzeroski Jožef Stefan Institute Abstract. The fact that data is already stored in relational databases

More information

CS Introduction to Data Mining Instructor: Abdullah Mueen

CS Introduction to Data Mining Instructor: Abdullah Mueen CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING) Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts

More information

Role-Dynamics: Fast Mining of Large Dynamic Networks

Role-Dynamics: Fast Mining of Large Dynamic Networks Role-Dynamics: Fast Mining of Large Dynamic Networks Ryan Rossi Jennifer Neville Purdue University {rrossi, neville}@purdue.edu Brian Gallagher Keith Henderson Lawrence Livermore Lab {bgallagher, keith}@llnl.gov

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

CS224W: Analysis of Networks Jure Leskovec, Stanford University

CS224W: Analysis of Networks Jure Leskovec, Stanford University CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 Observations Models

More information

Epsilon Equitable Partitions based Approach for Role & Positional Analysis of Social Networks

Epsilon Equitable Partitions based Approach for Role & Positional Analysis of Social Networks Epsilon Equitable Partitions based Approach for Role & Positional Analysis of Social Networks A THESIS submitted by PRATIK VINAY GUPTE for the award of the degree of MASTER OF SCIENCE (by Research) DEPARTMENT

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Study of Data Mining Algorithm in Social Network Analysis

Study of Data Mining Algorithm in Social Network Analysis 3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Study of Data Mining Algorithm in Social Network Analysis Chang Zhang 1,a, Yanfeng Jin 1,b, Wei Jin 1,c, Yu Liu 1,d 1

More information

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science Data Mining Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of Computer Science 06 07 Department of CS - DM - UHD Road map Cluster Analysis: Basic

More information

Modeling Dynamic Behavior in Large Evolving Graphs

Modeling Dynamic Behavior in Large Evolving Graphs Modeling Dynamic Behavior in Large Evolving Graphs R. Rossi, J. Neville, B. Gallagher, and K. Henderson Presented by: Doaa Altarawy 1 Outline - Motivation - Proposed Model - Definitions - Modeling dynamic

More information

Transitivity and Triads

Transitivity and Triads 1 / 32 Tom A.B. Snijders University of Oxford May 14, 2012 2 / 32 Outline 1 Local Structure Transitivity 2 3 / 32 Local Structure in Social Networks From the standpoint of structural individualism, one

More information

Social-Network Graphs

Social-Network Graphs Social-Network Graphs Mining Social Networks Facebook, Google+, Twitter Email Networks, Collaboration Networks Identify communities Similar to clustering Communities usually overlap Identify similarities

More information

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks SOMSN: An Effective Self Organizing Map for Clustering of Social Networks Fatemeh Ghaemmaghami Research Scholar, CSE and IT Dept. Shiraz University, Shiraz, Iran Reza Manouchehri Sarhadi Research Scholar,

More information

Web Structure Mining Community Detection and Evaluation

Web Structure Mining Community Detection and Evaluation Web Structure Mining Community Detection and Evaluation 1 Community Community. It is formed by individuals such that those within a group interact with each other more frequently than with those outside

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

Reflexive Regular Equivalence for Bipartite Data

Reflexive Regular Equivalence for Bipartite Data Reflexive Regular Equivalence for Bipartite Data Aaron Gerow 1, Mingyang Zhou 2, Stan Matwin 1, and Feng Shi 3 1 Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada 2 Department of Computer

More information

Tie strength, social capital, betweenness and homophily. Rik Sarkar

Tie strength, social capital, betweenness and homophily. Rik Sarkar Tie strength, social capital, betweenness and homophily Rik Sarkar Networks Position of a node in a network determines its role/importance Structure of a network determines its properties 2 Today Notion

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology 9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Evaluation of Direct and Indirect Blockmodeling of Regular Equivalence in Valued Networks by Simulations

Evaluation of Direct and Indirect Blockmodeling of Regular Equivalence in Valued Networks by Simulations Metodološki zvezki, Vol. 6, No. 2, 2009, 99-134 Evaluation of Direct and Indirect Blockmodeling of Regular Equivalence in Valued Networks by Simulations Aleš Žiberna 1 Abstract The aim of the paper is

More information

Table Of Contents: xix Foreword to Second Edition

Table Of Contents: xix Foreword to Second Edition Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany

More information

Introduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion

Introduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion Introduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion References Social Network Social Network Analysis Sociocentric

More information

Graphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

Graphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech CSE 6242/ CX 4242 Feb 18, 2014 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey

More information

Based on Raymond J. Mooney s slides

Based on Raymond J. Mooney s slides Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit

More information

Clustering Algorithms In Data Mining

Clustering Algorithms In Data Mining 2017 5th International Conference on Computer, Automation and Power Electronics (CAPE 2017) Clustering Algorithms In Data Mining Xiaosong Chen 1, a 1 Deparment of Computer Science, University of Vermont,

More information

Evaluation Strategies for Network Classification

Evaluation Strategies for Network Classification Evaluation Strategies for Network Classification Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Tao Wang, Brian Gallagher, and Tina Eliassi-Rad) 1 Given

More information

Covert and Anomalous Network Discovery and Detection (CANDiD)

Covert and Anomalous Network Discovery and Detection (CANDiD) Covert and Anomalous Network Discovery and Detection (CANDiD) Rajmonda S. Caceres, Edo Airoldi, Garrett Bernstein, Edward Kao, Benjamin A. Miller, Raj Rao Nadakuditi, Matthew C. Schmidt, Kenneth Senne,

More information

Temporal Graphs KRISHNAN PANAMALAI MURALI

Temporal Graphs KRISHNAN PANAMALAI MURALI Temporal Graphs KRISHNAN PANAMALAI MURALI METRICFORENSICS: A Multi-Level Approach for Mining Volatile Graphs Authors: Henderson, Eliassi-Rad, Faloutsos, Akoglu, Li, Maruhashi, Prakash and Tong. Published:

More information

Sampling Algorithms for Pure Network Topologies

Sampling Algorithms for Pure Network Topologies Sampling Algorithms for Pure Network Topologies A Study on the Stability and the Separability of Metric Embeddings ABSTRACT Edoardo M. Airoldi School of Computer Science Carnegie Mellon University Pittsburgh,

More information

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian Demystifying movie ratings 224W Project Report Amritha Raghunath (amrithar@stanford.edu) Vignesh Ganapathi Subramanian (vigansub@stanford.edu) 9 December, 2014 Introduction The past decade or so has seen

More information

Sampling algorithms of pure network topologies: Stability and separability of metric embeddings

Sampling algorithms of pure network topologies: Stability and separability of metric embeddings Sampling algorithms of pure network topologies: Stability and separability of metric embeddings Edoardo M. Airoldi May 2005 CMU-ISRI-05-111 School of Computer Science Carnegie Mellon University Pittsburgh,

More information

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms

More information

Slides based on those in:

Slides based on those in: Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org A 3.3 B 38.4 C 34.3 D 3.9 E 8.1 F 3.9 1.6 1.6 1.6 1.6 1.6 2 y 0.8 ½+0.2 ⅓ M 1/2 1/2 0 0.8 1/2 0 0 + 0.2 0 1/2 1 [1/N]

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei i Han 1 University of Illinois, IBM TJ Watson.

Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei i Han 1 University of Illinois, IBM TJ Watson. Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei i Han 1 University of Illinois, IBM TJ Watson Debapriya Basu Determine outliers in information networks Compare various algorithms

More information

Dimension reduction : PCA and Clustering

Dimension reduction : PCA and Clustering Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental

More information

SGN (4 cr) Chapter 11

SGN (4 cr) Chapter 11 SGN-41006 (4 cr) Chapter 11 Clustering Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology February 25, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter

More information

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

CSE 158. Web Mining and Recommender Systems. Midterm recap

CSE 158. Web Mining and Recommender Systems. Midterm recap CSE 158 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday! 5:10 pm 6:10 pm Closed book but I ll provide a similar level of basic info as in the last page of previous midterms CSE 158

More information

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection CSE 255 Lecture 6 Data Mining and Predictive Analytics Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:

More information

Generative Models for Complex Network Structure

Generative Models for Complex Network Structure Generative Models for Complex Network Structure Aaron Clauset @aaronclauset Computer Science Dept. & BioFrontiers Institute University of Colorado, Boulder External Faculty, Santa Fe Institute 4 June 2013

More information

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Contents. Foreword to Second Edition. Acknowledgments About the Authors Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1

More information

Social Data Management Communities

Social Data Management Communities Social Data Management Communities Antoine Amarilli 1, Silviu Maniu 2 January 9th, 2018 1 Télécom ParisTech 2 Université Paris-Sud 1/20 Table of contents Communities in Graphs 2/20 Graph Communities Communities

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Time: 6:00pm 8:50pm Thu Location: AK 232 Fall 2016 High Dimensional Data v Given a cloud of data points we want to understand

More information

COMS 4771 Clustering. Nakul Verma

COMS 4771 Clustering. Nakul Verma COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find

More information

Structural and Syntactic Pattern Recognition

Structural and Syntactic Pattern Recognition Structural and Syntactic Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent

More information

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple

More information

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering

More information

How to organize the Web?

How to organize the Web? How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second try: Web Search Information Retrieval attempts to find relevant docs in a small and trusted set Newspaper

More information

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the

More information

9. Conclusions. 9.1 Definition KDD

9. Conclusions. 9.1 Definition KDD 9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]

More information

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017) 1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should

More information

Graph Classification in Heterogeneous

Graph Classification in Heterogeneous Title: Graph Classification in Heterogeneous Networks Name: Xiangnan Kong 1, Philip S. Yu 1 Affil./Addr.: Department of Computer Science University of Illinois at Chicago Chicago, IL, USA E-mail: {xkong4,

More information

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.

More information

A Tutorial on the Probabilistic Framework for Centrality Analysis, Community Detection and Clustering

A Tutorial on the Probabilistic Framework for Centrality Analysis, Community Detection and Clustering A Tutorial on the Probabilistic Framework for Centrality Analysis, Community Detection and Clustering 1 Cheng-Shang Chang All rights reserved Abstract Analysis of networked data has received a lot of attention

More information

Community Structure Detection. Amar Chandole Ameya Kabre Atishay Aggarwal

Community Structure Detection. Amar Chandole Ameya Kabre Atishay Aggarwal Community Structure Detection Amar Chandole Ameya Kabre Atishay Aggarwal What is a network? Group or system of interconnected people or things Ways to represent a network: Matrices Sets Sequences Time

More information

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer Data Mining George Karypis Department of Computer Science Digital Technology Center University of Minnesota, Minneapolis, USA. http://www.cs.umn.edu/~karypis karypis@cs.umn.edu Overview Data-mining What

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/28/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

CS425: Algorithms for Web Scale Data

CS425: Algorithms for Web Scale Data CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org J.

More information

Clustering Part 4 DBSCAN

Clustering Part 4 DBSCAN Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information

DATA WAREHOUING UNIT I

DATA WAREHOUING UNIT I BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009

More information

Randomized Response Technique in Data Mining

Randomized Response Technique in Data Mining Randomized Response Technique in Data Mining Monika Soni Arya College of Engineering and IT, Jaipur(Raj.) 12.monika@gmail.com Vishal Shrivastva Arya College of Engineering and IT, Jaipur(Raj.) vishal500371@yahoo.co.in

More information

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov Algorithms and Applications in Social Networks 2017/2018, Semester B Slava Novgorodov 1 Lesson #1 Administrative questions Course overview Introduction to Social Networks Basic definitions Network properties

More information

Methods for Intelligent Systems

Methods for Intelligent Systems Methods for Intelligent Systems Lecture Notes on Clustering (II) Davide Eynard eynard@elet.polimi.it Department of Electronics and Information Politecnico di Milano Davide Eynard - Lecture Notes on Clustering

More information

Analyzing Outlier Detection Techniques with Hybrid Method

Analyzing Outlier Detection Techniques with Hybrid Method Analyzing Outlier Detection Techniques with Hybrid Method Shruti Aggarwal Assistant Professor Department of Computer Science and Engineering Sri Guru Granth Sahib World University. (SGGSWU) Fatehgarh Sahib,

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018) 1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be

More information

Bruno Martins. 1 st Semester 2012/2013

Bruno Martins. 1 st Semester 2012/2013 Link Analysis Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 4

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

CSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection

CSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection CSE 158 Lecture 6 Web Mining and Recommender Systems Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:

More information

INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING

INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING CS 7265 BIG DATA ANALYTICS INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, PhD Computer Science,

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information