Scalable Mining of Massive Networks: Distance-based Centrality, Similarity, and Influence. Edith Cohen Tel Aviv University

Size: px
Start display at page:

Download "Scalable Mining of Massive Networks: Distance-based Centrality, Similarity, and Influence. Edith Cohen Tel Aviv University"

Transcription

1 Scalable Mining of Massive Networks: Distance-based Centrality, Similarity, and Influence Edith Cohen Tel Aviv University

2 Graph Datasets: Represent relations between things Bowtie structure of the Web Broder et. al. 200 Dolphin interactions

3 Graph Datasets Hyperlinks (the Web) Social graphs (Facebook, Twitter, LinkedIn, ): friend, follow, like logs, phone call logs, messages Commerce transactions (Amazon,eBay) Road networks Communication networks Protein interactions

4 Analytics on Graphs Centralities/Influence The power/importance/coverage of a node or a set of nodes Applications: ranking, viral marketing, Similarities/Communities How tightly related are 2 or more nodes Applications: Friend/Product Recommendations, attribute completion, Advertising, Prediction

5 Challenges Modeling: Models must capture reality Be flexible to allow fitting to problem (tunable parameters) Scalability: 0 9+ interactions (edges), 0 8+ entities (nodes)

6 Centrality, Similarity, Influence Structural measures (based on the set of interactions) Local measures: only depend on the set of neighbors (paths of length ) Degree centrality, Influence as cardinality of union of neighbors Similarity by relation of neighbor sets (Jaccard, Adamic/Adar) Advantages: Gets the first order bit Scalable Disadvantages: Limited recall Spammable

7 Centrality, Similarity, Influence Structural measures (based on the set of interactions) Global Measures: depend on the paths ensemble Higher recall but much less scalable Random walk based (Geometric/Heat Kernel): Centrality, similarity, influence thru (personalized) page rank Katz (Sum of paths, discounted by length) Resistance: Similarity by effective resistance Distance/reachability based Measures: centrality, similarity, influence

8 Distance/Reachability based measures of Centrality, Influence, Similarity Closeness centrality: Ability of a node to reach other nodes [Bavelas 950]+++: Influence: Ability of a set of nodes to reach others [GLM200,KKT2003,] [Gomez- Rodriguez+ ICML 20], [Du+ NIPS 203]. [C DPW 204] +++: Closeness similarity: Relates two nodes based on the similarity of their reach [C DFGGW203], (local: [AA 2005,LK2007]):

9 This talk: Unified treatment of distance/reachability-based measures Models: Basic definition of distance-based Different ways of using distance/reachability Argue that models capture intuitive properties Scalability: Sketching and estimation

10 SP distances: Distance vectors

11 SP distances: Distance vectors

12 SP distances d ij : Distance vectors

13 SP distances d ij : Distance-based measures Relate entities based on their vectors. Closeness centrality C( ) Closeness Similarity Sim(, ) Influence Inf(,,, )

14 SP distances d ij : Distance-based Models Inbound/outbound/undirected distance Distance/Reachability/Dijkstra (NN) rank Topic filter β v 0: weighted relevance of entity Distances vs. decayed distances α d 0 Robustness by randomizing lengths/presence of edges (REL), or using multiple example instances

15 SP distances d ij : Distance-based Models Inbound/outbound/undirected distance Distance/Reachability/Dijkstra (NN) rank Topic filter β v 0: weighted relevance of entity Distances vs. decayed distances α d 0 Robustness by randomizing lengths/presence of edges (REL), or using multiple example instances

16 Topic Filter β(i) Weigh entities (nodes or entries of vectors) according to Topic, interests, education level, age, community, geography, language, product type, Applications: focus measure on topic. Useful for recommendations, attribute completion, targeted ads Similar to the role of start state in PPR Evil β:

17 SP distances d ij : Distance-based Models Inbound/outbound/undirected distance Distance/Reachability/Dijkstra (NN) rank Topic filter β v 0: weighted relevance of entity Distances vs. decayed distances α d 0 Robustness by randomizing lengths/presence of edges (REL), or using multiple example instances

18 Farness-penalty vs. Closeness-reward To emphasize penalty for weak coverage of far nodes, we use distances as is: The norm of the distances vector of a node is dominated by far nodes. As in classic closeness centrality, facility location, k-medians/means To emphasize reward for better coverage of close nodes, we apply a decay ( kernel ) function α d to the distance: The norm is dominated by closer nodes. As in influence models and local measures. Exact computation: both requires full distance vectors. Scalable approximation: different techniques

19 Quantifying Closeness Reward: Distance Decay (Kernel) Functions Specify how importance/relevance decays with distance α d = exp λd (exponential), d (geomertic), exp λd 2 (Gaussian) d <? : 0 (reachability) d < T? : 0 (threshold) Similar to the role of kernel distribution in PPR

20 Closeness Vectors SP distance vectors: Closeness vectors: α d ij = +d ij β

21 Closeness Centrality Classic (farness penalty) [Bavelas/Sabidussi/Beuchamp 950, 965, 966] C i = (n )/ d ij β(j) Distance-decay (closeness reward) [C Kaplan 2004, Opsahl et. al. 200, Dangalchev 2006, Boldi Vigna 203, ] C i = α(d ij ) β(j) j j

22 Closeness Matrix (distance decay+topic filter) c ij = α(d ij ) β(j) c i (row i ) is the closeness vector of node i Centrality: Influence: of a set of nodes C i = c ij Inf S = max j j i S c ij

23 Example: (dist-decay) Closeness Centrality SP distance vectors: Closeness vectors: α d ij = +d ij β C 2.05 C 2.

24 Example + topic filter: Closeness to Evil α d ij = +d ij 0. β evilness C 2.05 total C 2. total.24 evil part 0.29 evil part

25 Example: Influence from Vectors SP distance vectors: Closeness vectors: α d ij = +d ij β Inf, 3.64 Sum of max (Submodular) C 2. C 2.05

26 Closeness Similarity: Local Measures Local: sim(i, j) depends only on immediate neighbors of i, j Jaccard: Γ i Γ j Adamic Adar: Γ i Γ j x Γ i Γ(j) log Γ x J(, )= 2 9 J(, )= 3 5 AA(, )= log 5 AA(, )= 3 log 4 Work well for d ij 2, but no recall for d ij > 2

27 Closeness Similarity: Global Measures Based on full closeness vectors c ij = α(d ij ) β(j) weighted Jaccard: Cosine: L p Distance: h sim i, j = β h min c ih,c jh sim i, j = c i c j. h β h max c ih,c jh L p -dist i, j = c i c j p

28 Example: (global) Closeness Similarity SP distance vectors: Closeness vectors: α d ij = +d ij β Weighted Jaccard: x min C ix,c jx 0.2 x max C ix,c jx Cosine similarity: x C ix C jx C i 2 C j L Distance: C ix C jx x 2.8

29 So far: Definitions and some motivation of distance-based measures Next: Scalability thru Sketches and estimators

30 Scalable Closeness Centrality Closeness centrality of a node v can be computed exactly using a SSSP computation from v. But this takes near-linear time per node: Does not scale when we want centralities of many or all nodes We want to estimate all centralities within a small relative error using near-linear computation Classic (farness penalty): [C DPW : COSN 204] Previously additive error: [EW SODA 200, OCL 2008] Distance-decay (closeness reward): All-distances sketches [C 94], [C Kaplan SIGMOD 2004] [C : PODS 204].

31 Applications General Tools Scaling Up (closeness reward) All-distances and reachability node sketches (ADS) [C 94], [C Kaplan 2004 [C : PODS 204]. Estimators applicable with ADS: [C K PODC 2007, VLDB 2008, SIGMETRICS 2008], [C PODC 204,KDD 204] Distance-decay closeness centrality [C 94], [C Kaplan 2004] [Boldi Vigna 203] [C : PODS 204]. Closeness similarity [CDFGGW: COSN 203] Influence computation and maximization [CWY KDD 2009] [C Delling Pajor Werneck: CIKM 204] [Du Song Gomez-Rodruiguez Zha NIPS 203] [C Delling Pajor Werneck 205]

32 All-Distances Sketches (ADS) [C 94]+ per-node summary structures of Un/Directed, Un/Weighted networks For a node i n : ADS(i) samples the distance vector of i m edges, n nodes, parameter k trades-off sketch size and information

33 All-Distances Sketches: Definition ADS(v) is a list of pairs of the form (i, d vi ) Draw a random permutation of the nodes: r n [n] i ADS v r(i) < k th smallest rank amongst nodes that are closer to v than i This is a bottom-k ADS, there are other ADS flavors

34 All-Distances Sketches: Definition ADS(v) is a list of pairs of the form (i, d vi ) Draw a random permutation of the nodes: r n [n] i ADS v r(i) < k th smallest rank amongst nodes that are closer to v than i n j= Expected size of ADS(i) is min, k < k ln n j The j th closest node to i is included with probability min {, k j }

35 SP distances: ADS example Random permutation of nodes

36 ADS example k = All nodes sorted by SP distance from k = :

37 ADS example k = 2 Sorted by SP distances from k = 2:

38 Sketch Coordination We use the same permutation to obtain the ADS of all nodes, as a result: ADS Sketches of different nodes are coordinated: related in a way that is useful for queries that involve multiple nodes (similarities, influence, distance) [Brewer, Early, Joyce 972] Generalize MinHash sketches by adding a time/distance dimension

39 Computing ADSs Efficiently Pruned Dijkstra s Algorithm: Build ADS entries by increasing permutation rank. Local (Dynamic programming) Algorithm: Build ADS entries by increasing distance. Either way, we scan outgoing edges of a node only after its ADS is modified number of edge traversals is bounded by ADS size: O(km ln n)

40 Computing ADSs Efficiently Perform pruned Dijkstra from nodes by increasing permutation rank: k =

41 Estimation from sketches

42 Estimation with All-Distances Sketches Closeness centrality of node i, from ADS(i) with NRMSE O( ) (+concentration) [C 94, C Kaplan 04, C 4] k Influence of S, from ADS(i) i S, with NRMSE O( ) k [C 94, Chen WY KDD 09, Du SGZ NIPS 3, C DPW 204 ]+++ Closeness similarity sim(i, j) (W/Jaccard, cosine), from ADS(i) and ADS(j) w. RMSE O( ) [C DFGGW COSN 3] k L p distance, from ADS(i) and ADS(j) with RMSE O( ) times max norm using [C KDD 204] k Side note: ADSs can also be used as distance oracles (estimate pairwise distances), spanners, and more

43 Historic Inverse Probability (HIP) probability & estimator [C 204] The HIP inclusion probability p vi of i ADS(v) is conditioned on fixed rank values of other nodes. To estimate centrality of v, we estimate the presence of i with respect to v: I v i (= if v i, 0 otherwise) using inverse-probability [HT52] a vi = p. This is unbiased (when p > 0). For bottom-k ADS and r U[0,] p vi = k th {r(u) u ADS v d vu < d vi }

44 Example: HIP estimates Bottom-2 ADS of p: a = : p p:2 nd smallest r value among closer nodes

45 HIP cardinality estimate Bottom-2 ADS of distance: a: Query: n 6 v = { i d vi 6 } = I dvi 6 i n 6 (v) = a vi = = 3.59 i,d vi ADS v d vi 6

46 Quality of HIP cardinality Estimate Lemma: The HIP neighborhood cardinality estimator n d (v) = a vi i,d vi ADS v d vi d Has Coefficient of Variation σ μ 2k 2 This is 2 improvement over MinHash estimators [C 94]. Also optimally uses sketch.

47 HIP estimates of Centrality C v = α(d vi )β(i) i α non increasing; β some filter C v = a vi α(d vi ) β(i) i ADS(v)

48 HIP estimates: closeness to good/evil Bottom-2 ADS of distance: a: β: Filter: β 0, measures goodness Distance-decay kernel : e d vi C v = a vi β(i)e d vi = e e e 0 + i ADS(v)

49 Similarity/Influence Estimation We work with HIP inclusions and sum estimators (estimate separately contribution of each node) We use monotone estimation formulation [C K Random 204; C PODC 204] and the L* estimator (unique optimal monotone unbiased nonnegative sum estimator) Influence: can be estimated by merging ADS sketches and applying centrality estimator, but L* is tighter Similarity: inverse-probability on joint inclusion may not even apply, L* gets around the problem

50 Estimating Jaccard Closeness Similarity SP-distance; using kernel α d ij = +d ij : x sim i, j = min C ix,c jx = x max C ix,c jx X x > 0 or N x > 0 only when x ADS i ADS(j) easy to compute sim i, j using only ADS i ADS(j) We use L* for X x. We show N x (inverse probability). Get relative error on denominator, additive error of same order on numerator. k x min max d ix +, d jx + d ix +, d jx + Sum estimator: For all x obtain unbiased estimates N x of min d ix +, d jx + x sim i, j = N x x ; X x of max x X x d ix +, d jx + ;

51 Estimating Jaccard Closeness Similarity Estimate N x for min d ix +, d jx + : For x ADS i ADS(j) We compute HIP probability p for inclusion in intersection An inverse probability estimate: N x = p min, d ix + d jx + Otherwise, estimate is N x = 0 This is the best we can do because if x is not in intersection, data is consistent with 0 value any nonzero unbiased estimator must return 0

52 Estimating Jaccard Closeness Similarity Estimate X x for max d ix +, d jx + For x ADS i ADS(j) We can never determine X x if x is only reachable from one of i, j. We can only lower bound X x. The L* estimator [C 204] is an optimal extension of inverse probability to such a setting.

53 NN-rank based closeness similarity sim(i, j) = x x min max, π ix π jx, π ix π jx π ix : Dijkstra (NN) rank of x with respect to i Lemma: sim i, j is approximated well by proof: Pr[ x ADS i ADS j ] k min Pr[ x ADS i ADS j ] k max ADS i ADS j ADS i ADS j, π ix, π ix ADSs can be stored very compactly: no distances, hash of nodes π jx π jx

54 Next: Enhancing distance/reachability-based models with REL REL: randomizing lengths/presence of edges (Implicit) The Independent Cascade Diffusion model [Kempe Klienberg Tardos KDD 2003] Closeness similarity [C DFGGW 203] Timed Diffusion [Gomez-Rodriguez BS 20, Du SGZ 203, C DPW 204,.]

55 Strength of relation between two entities Basic Intuitions for dependence on path ensemble: Increase with shorter paths Increase with more (redundant paths)

56 Boosting distance by Randomizing Edge Lengths (REL) We expect the strength of a relation to Increase with the multiplicity of paths Decrease with the length of paths Eigenvector measures: Rooted PageRank, RWR, Hitting Time, Commute Time, Eff. Resistance, Katz Satisfy this strictly. SP distances: weakly

57 Randomizing Edge Lengths (REL) Replace edge lengths (or presence) with independent random variables Look at the expected measure

58 Randomizing Edge Lengths (REL) Expected distance is lower with multiple paths: expectation of the minimum < minimum of expectations

59 Randomizing Edge Lengths (REL) Scalability: use Monte Carlo simulations of model to generate fixed instances (graphs) Build sketches for multiple instances

60 Benefits of REL in network analysis Measure does reward for paths multiplicity Robustness: Measure is not sensitive to small changes in edge weights (edge presence for reachability)

61 Next: Some (preliminary) experimental results

62 Scalability: Timed Influence with REL Combined ADS sketches for all nodes, obtained from l = 64 instances generated by assigning independent Exp distributed edge lengths. Sketch parameter k = 64 Centrality ( seed) and Influence (50 seeds) queries Queries α x = exp( 0x) preproc seed 50 seeds #nodes #edges [h:m] time μs %err time μs %err Slashdot 77K 828K : % % Gowalla 97K.9M 3: % % Twitter F 456K 5M 9: % % [C Delling Pajor Werneck 204]

63 Closeness Similarity evaluation [C Delling Fuchs Goldberg Goldszmid Werneck COSN 203] Data Sets Nodes x 0 6 Edges x 0 6 ADS label size arxiv DBLP twitter smallworld

64 Similarity Evaluation Spearman coefficient: correlation between meta-data ranking and similarity measure ranking( = perfect match, 0= random rankings). On pairs selected uniformly by ground-truth similarity. arxiv DBLP Twitter SmallWord Adamic-Adar hops SP distance RWR RWR RWR RWR Closeness Closeness REL

65 Similarity Evaluation No clear winner (networks too different) Local measures are limited RWR has very good recall (with tuning) but computationally expensive Closeness (+REL) : robust, good recall, fast queries (microseconds) arxiv DBLP Twitter SmallWord Adamic-Adar hops SP distance RWR RWR RWR RWR Closeness Closeness REL

66 Conclusion Distance/reachability based measures of centrality, influence, and similarity: global measures that are flexible and highly scalable Presented a unified treatment Scalability through: All-Distances and reachability sketches Estimators applicable to sketches Future: Model fitting framework, Evaluations, Algorithms Engineering

67

68 Components of my work are joint work with (subsets of): Daniel Delling, Fabian Fuchs, Andrew Goldberg, Moises Goldszmidt, Haim Kaplan, Thomas Pajor, and Renato Werneck

Computing Classic Closeness Centrality, at Scale

Computing Classic Closeness Centrality, at Scale Computing Classic Closeness Centrality, at Scale Edith Cohen Joint with: Thomas Pajor, Daniel Delling, Renato Werneck Very Large Graphs Model relations and interactions (edges) between entities (nodes)

More information

MinHash Sketches: A Brief Survey. June 14, 2016

MinHash Sketches: A Brief Survey. June 14, 2016 MinHash Sketches: A Brief Survey Edith Cohen edith@cohenwang.com June 14, 2016 1 Definition Sketches are a very powerful tool in massive data analysis. Operations and queries that are specified with respect

More information

Similarity Ranking in Large- Scale Bipartite Graphs

Similarity Ranking in Large- Scale Bipartite Graphs Similarity Ranking in Large- Scale Bipartite Graphs Alessandro Epasto Brown University - 20 th March 2014 1 Joint work with J. Feldman, S. Lattanzi, S. Leonardi, V. Mirrokni [WWW, 2014] 2 AdWords Ads Ads

More information

Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs

Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Alessandro Epasto J. Feldman*, S. Lattanzi*, S. Leonardi, V. Mirrokni*. *Google Research Sapienza U. Rome Motivation Recommendation

More information

Section 7.12: Similarity. By: Ralucca Gera, NPS

Section 7.12: Similarity. By: Ralucca Gera, NPS Section 7.12: Similarity By: Ralucca Gera, NPS Motivation We talked about global properties Average degree, average clustering, ave path length We talked about local properties: Some node centralities

More information

The link prediction problem for social networks

The link prediction problem for social networks The link prediction problem for social networks Alexandra Chouldechova STATS 319, February 1, 2011 Motivation Recommending new friends in in online social networks. Suggesting interactions between the

More information

Centrality in Large Networks

Centrality in Large Networks Centrality in Large Networks Mostafa H. Chehreghani May 14, 2017 Table of contents Centrality notions Exact algorithm Approximate algorithms Conclusion Centrality notions Exact algorithm Approximate algorithms

More information

Link Prediction in Graph Streams

Link Prediction in Graph Streams Peixiang Zhao, Charu C. Aggarwal, and Gewen He Florida State University IBM T J Watson Research Center Link Prediction in Graph Streams ICDE Conference, 2016 Graph Streams Graph Streams arise in a wide

More information

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Effective Latent Space Graph-based Re-ranking Model with Global Consistency Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case

More information

Reach for A : an Efficient Point-to-Point Shortest Path Algorithm

Reach for A : an Efficient Point-to-Point Shortest Path Algorithm Reach for A : an Efficient Point-to-Point Shortest Path Algorithm Andrew V. Goldberg Microsoft Research Silicon Valley www.research.microsoft.com/ goldberg/ Joint with Haim Kaplan and Renato Werneck Einstein

More information

Diffusion and Clustering on Large Graphs

Diffusion and Clustering on Large Graphs Diffusion and Clustering on Large Graphs Alexander Tsiatas Final Defense 17 May 2012 Introduction Graphs are omnipresent in the real world both natural and man-made Examples of large graphs: The World

More information

Node Similarity. Ralucca Gera, Applied Mathematics Dept. Naval Postgraduate School Monterey, California

Node Similarity. Ralucca Gera, Applied Mathematics Dept. Naval Postgraduate School Monterey, California Node Similarity Ralucca Gera, Applied Mathematics Dept. Naval Postgraduate School Monterey, California rgera@nps.edu Motivation We talked about global properties Average degree, average clustering, ave

More information

Sketch-based Influence Maximization and Computation: Scaling up with Guarantees

Sketch-based Influence Maximization and Computation: Scaling up with Guarantees Sketch-based Influence Maximization and Computation: Scaling up with Guarantees EDITH COHEN Microsoft Research editco@microsoft.com DANIEL DELLING Microsoft Research dadellin@microsoft.com THOMAS PAJOR

More information

Scalable Influence Maximization in Social Networks under the Linear Threshold Model

Scalable Influence Maximization in Social Networks under the Linear Threshold Model Scalable Influence Maximization in Social Networks under the Linear Threshold Model Wei Chen Microsoft Research Asia Yifei Yuan Li Zhang In collaboration with University of Pennsylvania Microsoft Research

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford

More information

Link Prediction for Social Network

Link Prediction for Social Network Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue

More information

Problem 1: Complexity of Update Rules for Logistic Regression

Problem 1: Complexity of Update Rules for Logistic Regression Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1

More information

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum

More information

Viral Marketing and Outbreak Detection. Fang Jin Yao Zhang

Viral Marketing and Outbreak Detection. Fang Jin Yao Zhang Viral Marketing and Outbreak Detection Fang Jin Yao Zhang Paper 1: Maximizing the Spread of Influence through a Social Network Authors: David Kempe, Jon Kleinberg, Éva Tardos KDD 2003 Outline Problem Statement

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank

More information

Highway Dimension and Provably Efficient Shortest Paths Algorithms

Highway Dimension and Provably Efficient Shortest Paths Algorithms Highway Dimension and Provably Efficient Shortest Paths Algorithms Andrew V. Goldberg Microsoft Research Silicon Valley www.research.microsoft.com/ goldberg/ Joint with Ittai Abraham, Amos Fiat, and Renato

More information

Lecture 11: Clustering and the Spectral Partitioning Algorithm A note on randomized algorithm, Unbiased estimates

Lecture 11: Clustering and the Spectral Partitioning Algorithm A note on randomized algorithm, Unbiased estimates CSE 51: Design and Analysis of Algorithms I Spring 016 Lecture 11: Clustering and the Spectral Partitioning Algorithm Lecturer: Shayan Oveis Gharan May nd Scribe: Yueqi Sheng Disclaimer: These notes have

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Analysis of Large Graphs: TrustRank and WebSpam

Analysis of Large Graphs: TrustRank and WebSpam Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

Extracting Information from Complex Networks

Extracting Information from Complex Networks Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform

More information

Randomized Composable Core-sets for Distributed Optimization Vahab Mirrokni

Randomized Composable Core-sets for Distributed Optimization Vahab Mirrokni Randomized Composable Core-sets for Distributed Optimization Vahab Mirrokni Mainly based on joint work with: Algorithms Research Group, Google Research, New York Hossein Bateni, Aditya Bhaskara, Hossein

More information

Maximizing the Spread of Influence through a Social Network

Maximizing the Spread of Influence through a Social Network Maximizing the Spread of Influence through a Social Network By David Kempe, Jon Kleinberg, Eva Tardos Report by Joe Abrams Social Networks Infectious disease networks Viral Marketing Viral Marketing Example:

More information

IRIE: Scalable and Robust Influence Maximization in Social Networks

IRIE: Scalable and Robust Influence Maximization in Social Networks IRIE: Scalable and Robust Influence Maximization in Social Networks Kyomin Jung KAIST Republic of Korea kyomin@kaist.edu Wooram Heo KAIST Republic of Korea modesty83@kaist.ac.kr Wei Chen Microsoft Research

More information

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK232 Fall 2016 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna,

More information

Non-exhaustive, Overlapping k-means

Non-exhaustive, Overlapping k-means Non-exhaustive, Overlapping k-means J. J. Whang, I. S. Dhilon, and D. F. Gleich Teresa Lebair University of Maryland, Baltimore County October 29th, 2015 Teresa Lebair UMBC 1/38 Outline Introduction NEO-K-Means

More information

Part I: Data Mining Foundations

Part I: Data Mining Foundations Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?

More information

Scaling Link-Based Similarity Search

Scaling Link-Based Similarity Search Scaling Link-Based Similarity Search Dániel Fogaras Budapest University of Technology and Economics Budapest, Hungary, H-1521 fd@cs.bme.hu Balázs Rácz Computer and Automation Research Institute of the

More information

Community detection algorithms survey and overlapping communities. Presented by Sai Ravi Kiran Mallampati

Community detection algorithms survey and overlapping communities. Presented by Sai Ravi Kiran Mallampati Community detection algorithms survey and overlapping communities Presented by Sai Ravi Kiran Mallampati (sairavi5@vt.edu) 1 Outline Various community detection algorithms: Intuition * Evaluation of the

More information

Similarity Joins in MapReduce

Similarity Joins in MapReduce Similarity Joins in MapReduce Benjamin Coors, Kristian Hunt, and Alain Kaeslin KTH Royal Institute of Technology {coors,khunt,kaeslin}@kth.se Abstract. This paper studies how similarity joins can be implemented

More information

A Computational Theory of Clustering

A Computational Theory of Clustering A Computational Theory of Clustering Avrim Blum Carnegie Mellon University Based on work joint with Nina Balcan, Anupam Gupta, and Santosh Vempala Point of this talk A new way to theoretically analyze

More information

Query Independent Scholarly Article Ranking

Query Independent Scholarly Article Ranking Query Independent Scholarly Article Ranking Shuai Ma, Chen Gong, Renjun Hu, Dongsheng Luo, Chunming Hu, Jinpeng Huai SKLSDE Lab, Beihang University, China Beijing Advanced Innovation Center for Big Data

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

I/O Efficient Algorithms for Exact Distance Queries on Disk- Resident Dynamic Graphs

I/O Efficient Algorithms for Exact Distance Queries on Disk- Resident Dynamic Graphs I/O Efficient Algorithms for Exact Distance Queries on Disk- Resident Dynamic Graphs Yishi Lin, Xiaowei Chen, John C.S. Lui The Chinese University of Hong Kong 9/4/15 EXACT DISTANCE QUERIES ON DYNAMIC

More information

Efficient Top-k Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling with Application to Network Structure Prediction

Efficient Top-k Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling with Application to Network Structure Prediction Efficient Top-k Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling with Application to Network Structure Prediction Takuya Akiba U Tokyo Takanori Hayashi U Tokyo Nozomi Nori Kyoto

More information

Association Pattern Mining. Lijun Zhang

Association Pattern Mining. Lijun Zhang Association Pattern Mining Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction The Frequent Pattern Mining Model Association Rule Generation Framework Frequent Itemset Mining Algorithms

More information

CS 6604: Data Mining Large Networks and Time-Series

CS 6604: Data Mining Large Networks and Time-Series CS 6604: Data Mining Large Networks and Time-Series Soumya Vundekode Lecture #12: Centrality Metrics Prof. B Aditya Prakash Agenda Link Analysis and Web Search Searching the Web: The Problem of Ranking

More information

Fast Nearest Neighbor Search on Large Time-Evolving Graphs

Fast Nearest Neighbor Search on Large Time-Evolving Graphs Fast Nearest Neighbor Search on Large Time-Evolving Graphs Leman Akoglu Srinivasan Parthasarathy Rohit Khandekar Vibhore Kumar Deepak Rajan Kun-Lung Wu Graphs are everywhere Leman Akoglu Fast Nearest Neighbor

More information

Compressing Social Networks

Compressing Social Networks Compressing Social Networks The Minimum Logarithmic Arrangement Problem Chad Waters School of Computing Clemson University cgwater@clemson.edu March 4, 2013 Motivation Determine the extent to which social

More information

Information Retrieval. Lecture 9 - Web search basics

Information Retrieval. Lecture 9 - Web search basics Information Retrieval Lecture 9 - Web search basics Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 30 Introduction Up to now: techniques for general

More information

GIVEN a large graph and a query node, finding its k-

GIVEN a large graph and a query node, finding its k- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL., NO., 2016 1 Efficient and Exact Local Search for Random Walk Based Top-K Proximity Query in Large Graphs Yubao Wu, Ruoming Jin, and Xiang Zhang

More information

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery : Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Coverage Approximation Algorithms

Coverage Approximation Algorithms DATA MINING LECTURE 12 Coverage Approximation Algorithms Example Promotion campaign on a social network We have a social network as a graph. People are more likely to buy a product if they have a friend

More information

Online Social Networks and Media

Online Social Networks and Media Online Social Networks and Media Absorbing Random Walks Link Prediction Why does the Power Method work? If a matrix R is real and symmetric, it has real eigenvalues and eigenvectors: λ, w, λ 2, w 2,, (λ

More information

Big Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety

Big Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Big Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety Abhishek

More information

Link Analysis Ranking Algorithms, Theory, and Experiments

Link Analysis Ranking Algorithms, Theory, and Experiments Link Analysis Ranking Algorithms, Theory, and Experiments Allan Borodin Gareth O. Roberts Jeffrey S. Rosenthal Panayiotis Tsaparas June 28, 2004 Abstract The explosive growth and the widespread accessibility

More information

Absorbing Random walks Coverage

Absorbing Random walks Coverage DATA MINING LECTURE 3 Absorbing Random walks Coverage Random Walks on Graphs Random walk: Start from a node chosen uniformly at random with probability. n Pick one of the outgoing edges uniformly at random

More information

Absorbing Random walks Coverage

Absorbing Random walks Coverage DATA MINING LECTURE 3 Absorbing Random walks Coverage Random Walks on Graphs Random walk: Start from a node chosen uniformly at random with probability. n Pick one of the outgoing edges uniformly at random

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References q Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Locality- Sensitive Hashing Random Projections for NN Search

Locality- Sensitive Hashing Random Projections for NN Search Case Study 2: Document Retrieval Locality- Sensitive Hashing Random Projections for NN Search Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 18, 2017 Sham Kakade

More information

Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges

Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Abhishek Santra 1 and Sanjukta Bhowmick 2 1 Information Technology Laboratory, CSE Department, University of

More information

Part I Part II Part III Part IV Part V. Influence Maximization

Part I Part II Part III Part IV Part V. Influence Maximization Part I Part II Part III Part IV Part V Influence Maximization 1 Word-of-mouth (WoM) effect in social networks xphone is good xphone is good xphone is good xphone is good xphone is good xphone is good xphone

More information

Fast Contextual Preference Scoring of Database Tuples

Fast Contextual Preference Scoring of Database Tuples Fast Contextual Preference Scoring of Database Tuples Kostas Stefanidis Department of Computer Science, University of Ioannina, Greece Joint work with Evaggelia Pitoura http://dmod.cs.uoi.gr 2 Motivation

More information

Topic: Duplicate Detection and Similarity Computing

Topic: Duplicate Detection and Similarity Computing Table of Content Topic: Duplicate Detection and Similarity Computing Motivation Shingling for duplicate comparison Minhashing LSH UCSB 290N, 2013 Tao Yang Some of slides are from text book [CMS] and Rajaraman/Ullman

More information

arxiv: v3 [cs.db] 1 Dec 2017

arxiv: v3 [cs.db] 1 Dec 2017 Efficient Error-tolerant Search on Knowledge Graphs Zhaoyang Shao University of Alberta zhaoyang@ualberta.ca Davood Rafiei University of Alberta drafiei@ualberta.ca Themis Palpanas Paris Descartes University

More information

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University  Infinite data. Filtering data streams /9/7 Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them

More information

COMP 465: Data Mining Still More on Clustering

COMP 465: Data Mining Still More on Clustering 3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following

More information

Exact Computation of Influence Spread by Binary Decision Diagrams

Exact Computation of Influence Spread by Binary Decision Diagrams Exact Computation of Influence Spread by Binary Decision Diagrams Takanori Maehara 1), Hirofumi Suzuki 2), Masakazu Ishihata 2) 1) Riken Center for Advanced Intelligence Project 2) Hokkaido University

More information

Scribe from 2014/2015: Jessica Su, Hieu Pham Date: October 6, 2016 Editor: Jimmy Wu

Scribe from 2014/2015: Jessica Su, Hieu Pham Date: October 6, 2016 Editor: Jimmy Wu CS 267 Lecture 3 Shortest paths, graph diameter Scribe from 2014/2015: Jessica Su, Hieu Pham Date: October 6, 2016 Editor: Jimmy Wu Today we will talk about algorithms for finding shortest paths in a graph.

More information

Diffusion and Clustering on Large Graphs

Diffusion and Clustering on Large Graphs Diffusion and Clustering on Large Graphs Alexander Tsiatas Thesis Proposal / Advancement Exam 8 December 2011 Introduction Graphs are omnipresent in the real world both natural and man-made Examples of

More information

Point-to-Point Shortest Path Algorithms with Preprocessing

Point-to-Point Shortest Path Algorithms with Preprocessing Point-to-Point Shortest Path Algorithms with Preprocessing Andrew V. Goldberg Microsoft Research Silicon Valley www.research.microsoft.com/ goldberg/ Joint work with Chris Harrelson, Haim Kaplan, and Retato

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

More information

Centrality Measures. Computing Closeness and Betweennes. Andrea Marino. Pisa, February PhD Course on Graph Mining Algorithms, Università di Pisa

Centrality Measures. Computing Closeness and Betweennes. Andrea Marino. Pisa, February PhD Course on Graph Mining Algorithms, Università di Pisa Computing Closeness and Betweennes PhD Course on Graph Mining Algorithms, Università di Pisa Pisa, February 2018 Centrality measures The problem of identifying the most central nodes in a network is a

More information

Approximation Algorithms

Approximation Algorithms Chapter 8 Approximation Algorithms Algorithm Theory WS 2016/17 Fabian Kuhn Approximation Algorithms Optimization appears everywhere in computer science We have seen many examples, e.g.: scheduling jobs

More information

CS 664 Slides #11 Image Segmentation. Prof. Dan Huttenlocher Fall 2003

CS 664 Slides #11 Image Segmentation. Prof. Dan Huttenlocher Fall 2003 CS 664 Slides #11 Image Segmentation Prof. Dan Huttenlocher Fall 2003 Image Segmentation Find regions of image that are coherent Dual of edge detection Regions vs. boundaries Related to clustering problems

More information

A Class of Submodular Functions for Document Summarization

A Class of Submodular Functions for Document Summarization A Class of Submodular Functions for Document Summarization Hui Lin, Jeff Bilmes University of Washington, Seattle Dept. of Electrical Engineering June 20, 2011 Lin and Bilmes Submodular Summarization June

More information

Information Propagation in Interaction Networks

Information Propagation in Interaction Networks Information Propagation in Interaction Networks Rohit Kumar Department of Computer and Decision Engineering Université Libre de Bruxelles, Belgium Department of Service and Information System Engineering

More information

Proteins, Particles, and Pseudo-Max- Marginals: A Submodular Approach Jason Pacheco Erik Sudderth

Proteins, Particles, and Pseudo-Max- Marginals: A Submodular Approach Jason Pacheco Erik Sudderth Proteins, Particles, and Pseudo-Max- Marginals: A Submodular Approach Jason Pacheco Erik Sudderth Department of Computer Science Brown University, Providence RI Protein Side Chain Prediction Estimate side

More information

Nearest Neighbor with KD Trees

Nearest Neighbor with KD Trees Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest

More information

Reach for A : Efficient Point-to-Point Shortest Path Algorithms

Reach for A : Efficient Point-to-Point Shortest Path Algorithms Reach for A : Efficient Point-to-Point Shortest Path Algorithms Andrew V. Goldberg 1 Haim Kaplan 2 Renato F. Werneck 3 October 2005 Technical Report MSR-TR-2005-132 We study the point-to-point shortest

More information

Link Prediction and Anomoly Detection

Link Prediction and Anomoly Detection Graphs and Networks Lecture 23 Link Prediction and Anomoly Detection Daniel A. Spielman November 19, 2013 23.1 Disclaimer These notes are not necessarily an accurate representation of what happened in

More information

Some Interesting Applications of Theory. PageRank Minhashing Locality-Sensitive Hashing

Some Interesting Applications of Theory. PageRank Minhashing Locality-Sensitive Hashing Some Interesting Applications of Theory PageRank Minhashing Locality-Sensitive Hashing 1 PageRank The thing that makes Google work. Intuition: solve the recursive equation: a page is important if important

More information

Efficient Processing of Multiple DTW Queries in Time Series Databases

Efficient Processing of Multiple DTW Queries in Time Series Databases Efficient Processing of Multiple DTW Queries in Time Series Databases Hardy Kremer 1 Stephan Günnemann 1 Anca-Maria Ivanescu 1 Ira Assent 2 Thomas Seidl 1 1 RWTH Aachen University, Germany 2 Aarhus University,

More information

University of Maryland. Tuesday, March 2, 2010

University of Maryland. Tuesday, March 2, 2010 Data-Intensive Information Processing Applications Session #5 Graph Algorithms Jimmy Lin University of Maryland Tuesday, March 2, 2010 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Part 1: Link Analysis & Page Rank

Part 1: Link Analysis & Page Rank Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Graph Data: Social Networks [Source: 4-degrees of separation, Backstrom-Boldi-Rosa-Ugander-Vigna,

More information

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman http://www.mmds.org Overview of Recommender Systems Content-based Systems Collaborative Filtering J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive

More information

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval Case Study 2: Document Retrieval Task Description: Finding Similar Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 11, 2017 Sham Kakade 2017 1 Document

More information

Uncertain Data Management Non-relational models: Graphs

Uncertain Data Management Non-relational models: Graphs Uncertain Data Management Non-relational models: Graphs Antoine Amarilli 1, Silviu Maniu 2 1 Télécom ParisTech 2 Université Paris-Sud January 16th, 2017 1/21 Credits M. Potamias, F. Bonchi, A. Gionis,

More information

CPSC 340: Machine Learning and Data Mining. Finding Similar Items Fall 2017

CPSC 340: Machine Learning and Data Mining. Finding Similar Items Fall 2017 CPSC 340: Machine Learning and Data Mining Finding Similar Items Fall 2017 Assignment 1 is due tonight. Admin 1 late day to hand in Monday, 2 late days for Wednesday. Assignment 2 will be up soon. Start

More information

Latent Distance Models and. Graph Feature Models Dr. Asja Fischer, Prof. Jens Lehmann

Latent Distance Models and. Graph Feature Models Dr. Asja Fischer, Prof. Jens Lehmann Latent Distance Models and Graph Feature Models 2016-02-09 Dr. Asja Fischer, Prof. Jens Lehmann Overview Last lecture we saw how neural networks/multi-layer perceptrons (MLPs) can be applied to SRL. After

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

More information

Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of Handwaving. Andrew McGregor University of Massachusetts

Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of Handwaving. Andrew McGregor University of Massachusetts Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of Handwaving Andrew McGregor University of Massachusetts Latest on Linear Sketches for Large Graphs: Lots of Problems,

More information

CS224W: Analysis of Networks Jure Leskovec, Stanford University

CS224W: Analysis of Networks Jure Leskovec, Stanford University CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 Observations Models

More information

Maximizing the Spread of Influence through a Social Network. David Kempe, Jon Kleinberg and Eva Tardos

Maximizing the Spread of Influence through a Social Network. David Kempe, Jon Kleinberg and Eva Tardos Maximizing the Spread of Influence through a Social Network David Kempe, Jon Kleinberg and Eva Tardos Group 9 Lauren Thomas, Ryan Lieblein, Joshua Hammock and Mary Hanvey Introduction In a social network,

More information

Link Analysis in the Cloud

Link Analysis in the Cloud Cloud Computing Link Analysis in the Cloud Dell Zhang Birkbeck, University of London 2017/18 Graph Problems & Representations What is a Graph? G = (V,E), where V represents the set of vertices (nodes)

More information

arxiv: v1 [cs.ma] 8 May 2018

arxiv: v1 [cs.ma] 8 May 2018 Ordinal Approximation for Social Choice, Matching, and Facility Location Problems given Candidate Positions Elliot Anshelevich and Wennan Zhu arxiv:1805.03103v1 [cs.ma] 8 May 2018 May 9, 2018 Abstract

More information

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Emily Fox University of Washington February 3, 2017 Simplest approach: Nearest neighbor regression 1 Fit locally to each

More information

A Tutorial on the Probabilistic Framework for Centrality Analysis, Community Detection and Clustering

A Tutorial on the Probabilistic Framework for Centrality Analysis, Community Detection and Clustering A Tutorial on the Probabilistic Framework for Centrality Analysis, Community Detection and Clustering 1 Cheng-Shang Chang All rights reserved Abstract Analysis of networked data has received a lot of attention

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

Clustering: Classic Methods and Modern Views

Clustering: Classic Methods and Modern Views Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mining Massive Datasets Jure Leskovec, Stanford University http://cs46.stanford.edu /7/ Jure Leskovec, Stanford C46: Mining Massive Datasets Many real-world problems Web Search and Text Mining Billions

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information