Uncertain Data Management Non-relational models: Graphs

Size: px
Start display at page:

Download "Uncertain Data Management Non-relational models: Graphs"

Transcription

1 Uncertain Data Management Non-relational models: Graphs Antoine Amarilli 1, Silviu Maniu 2 1 Télécom ParisTech 2 Université Paris-Sud January 16th, /21

2 Credits M. Potamias, F. Bonchi, A. Gionis, G. Kolios. k-nearest Neighbors in Uncertain Graphs. PVLDB 3(1), (number of samples, median measure, figure in slide 17, algorithm in slide 20) M. Ball. Computational Complexity of Network Reliability Analysis: An Overview. IEEE Trans. Reliab. R-35(3), L. Valiant. The Complexity of Enumeration And Reliability Problems. SIAM J. Comput. 8(3), (complexity of reliability/reachability) PDFs of the slides available at 2/21

3 Uncertain Graphs Graphs: a natural way to represent data in various domains transport data: road, air links between locations social networks: relationships between humans, citation networks interactions between proteins: contacts due to biochemical processes 3/21

4 Uncertain Graphs Graphs: a natural way to represent data in various domains transport data: road, air links between locations social networks: relationships between humans, citation networks interactions between proteins: contacts due to biochemical processes For all the above examples, the links are not exact. (Why?) 3/21

5 (Deterministic) Graphs a d A graph G = (V, E) is formed of b c a set V of vertices (nodes) a set E V V, of edges 4/21

6 Uncertain Graphs An uncertain graph G = (V, E, p) is formed of a 0.2 b d 0.5 c 0.7 a set V of vertices (nodes) a set E V V, of edges a function p : E [0, 1], representing the probability p e that the edge e E exists or not What are the possible worlds and their probability for this model? 5/21

7 Uncertain Graphs: Possible Worlds A possible world of G, denoted G G is a deterministic graph G = (V, E G ) where each e E G is chosen from E 6/21

8 Uncertain Graphs: Possible Worlds A possible world of G, denoted G G is a deterministic graph G = (V, E G ) where each e E G is chosen from E The probability of G is: Pr[G] = e E G p e e E\E G (1 p e ) How many possible worlds are there? 6/21

9 Uncertain Graphs: Other models Other models are possible: each edge is replaced by a distribution of weights instead of choosing if the edge exists or not, a possible world is an instantiation of weights each edge has a formula of events, capturing correlations probabilities can be on nodes also equivalent to the edge model (Why?) 7/21

10 Queries on Uncertain Graphs Generally, the queries we want to answer are distance queries: the reachability or reliability query get the probability that two nodes s and t are connected 8/21

11 Queries on Uncertain Graphs Generally, the queries we want to answer are distance queries: the reachability or reliability query get the probability that two nodes s and t are connected queries on the distance distribution: p s,t (d) = G d G (s,t)=d Pr[G] 8/21

12 Queries on Uncertain Graphs Generally, the queries we want to answer are distance queries: the reachability or reliability query get the probability that two nodes s and t are connected queries on the distance distribution: p s,t (d) = Multiple uses of distance queries: G d G (s,t)=d Pr[G] link prediction, social search, travel estimation 8/21

13 Queries on Uncertain Graphs Generally, the queries we want to answer are distance queries: the reachability or reliability query get the probability that two nodes s and t are connected 9/21

14 Queries on Uncertain Graphs Generally, the queries we want to answer are distance queries: the reachability or reliability query get the probability that two nodes s and t are connected queries on the distance distribution: p s,t (d) = G d G (s,t)=d Pr[G] 9/21

15 Queries on Uncertain Graphs a d What is the distance (in hops) between b and a? b c 10/21

16 Queries on Uncertain Graphs d What is the distance (in hops) between b and a? b a c BFS search (or Dijkstra s algorithms) finds the edge b a the cost is O(E) (linear in the size of the graph) 10/21

17 Queries on Uncertain Graphs What is the distance (in hops) between b and a? a d b 0.3 c 11/21

18 Queries on Uncertain Graphs What is the distance (in hops) between b and a? a d the edge b a does not appear in all possible worlds: p b,a (1) = p(b a) b 0.3 c 11/21

19 Queries on Uncertain Graphs What is the distance (in hops) between b and a? a d the edge b a does not appear in all possible worlds: p b,a (1) = p(b a) b 0.3 c there are two possible paths of distance 2 (b c a) and 3 (b d c a) p b,a (1) = (1 p b,a (1)) p(b c a) 11/21

20 Queries on Uncertain Graphs a d What is the distance (in hops) between b and a? b c 12/21

21 Queries on Uncertain Graphs a b d c What is the distance (in hops) between b and a? the number of paths is exponential in the size of the graph specifically, there are 3! paths 12/21

22 Queries on Uncertain Graphs Distance query answering in uncertain graphs is at least as hard as in relational databases (logical formulas of paths; the number of which can be exponential) 13/21

23 Queries on Uncertain Graphs Distance query answering in uncertain graphs is at least as hard as in relational databases (logical formulas of paths; the number of which can be exponential) Computing the reachability probability (i.e, computing the probability of there being a path between a source and a target) is known to be #P hard [Valiant, SIAM J. Comp, 1979] 13/21

24 Computing Answers to Distance Queries on Probabilistic Graphs Distance estimations in uncertain graphs can be approximated via Monte Carlo sampling 14/21

25 Computing Answers to Distance Queries on Probabilistic Graphs Distance estimations in uncertain graphs can be approximated via Monte Carlo sampling 1. generate sampled graphs for r rounds (is this the optimal way for an s, t distance estimation?) 2. compute the desired measure (reachability probability, distance distributions) by averaging results 14/21

26 Computing Answers to Distance Queries on Probabilistic Graphs Distance estimations in uncertain graphs can be approximated via Monte Carlo sampling 1. generate sampled graphs for r rounds (is this the optimal way for an s, t distance estimation?) 2. compute the desired measure (reachability probability, distance distributions) by averaging results Same issue: how many rounds? 14/21

27 Median distance: Number of Samples: Median Distance d M (s, t) = arg max D { D } p s,t (d) 1 2 d=0 15/21

28 Median distance: Number of Samples: Median Distance d M (s, t) = arg max D { D } p s,t (d) 1 2 d=0 Let µ be the real median, and α and β values ±ɛn away from µ. Then for: r > c ɛ 2 log(2 δ ) and a good choice of c: Pr(ˆµ [α, β]) > 1 δ 15/21

29 Number of Samples: Expected Distance Expected reliable distance (generalization of reliability): d ER (s, t) = d d< d p s,t (d) 1 p s,t ( ) 16/21

30 Number of Samples: Expected Distance Expected reliable distance (generalization of reliability): d ER (s, t) = d d< d p s,t (d) 1 p s,t ( ) By estimating the connectivity ρ, we need to sample at least: { } ( ) 3 (n 1)2 2 r max ɛ 2, ρ 2ɛ 2 log δ for an (ɛ, δ) approximation. 16/21

31 s- y - - s of y n a 3. rs Distance Estimation 2 in Uncertain Graphs i- a s e - s i- e Number of Samples In Reality Edge probability (a) The Figure number 4: ofdistribution needed samples of can (a) beedge surprisingly probabilities, low (but it (b) depends distances. on the actual probabilities) Mean Squared Error BIOMINE Number of worlds Median Majority ExpectedRel Reliability Mean Squared Error Distance (b) FLICKR Number of worlds Median Majority ExpectedRel Reliability Figure 5: MSE vs. worlds. 200 worlds are enough. 17/21

32 Sampling Graphs Generating the entirety of the graph G i for each round i < r is not optimal 18/21

33 Sampling Graphs Generating the entirety of the graph G i for each round i < r is not optimal we do not need to estimate the entire graph G i we can start from s and do a BFS or Dijkstra search by sampling only the outgoing edges based on the generated outgoing edges, we re-do the computation for each generated outgoing node, until we find t 18/21

34 Example: Median Distance k-nn k-nn (k nearest neighbours) finding the k nodes from s the closest by some measure let us consider the median distance (reminder: it is the highest distance in the distribution that has mass less or equal to 0.5) 19/21

35 Example: Median Distance k-nn k-nn (k nearest neighbours) finding the k nodes from s the closest by some measure let us consider the median distance (reminder: it is the highest distance in the distribution that has mass less or equal to 0.5) We only care about the top-k nodes, and not their values, and we do not want to evaluate all the graph if possible we can evaluate a truncated distribution up to a distance D p s,t (d) if d < D p D,s,t (d) = x=d p s,t(x) if d = D 0 if d > D for any two nodes t 1, t 2, d D,M (s, t 1 ) < d D,M (s, t 2 ) implies d M (s, t 1 ) < d M (d, t 2 ) 19/21

36 DistancetoEstimation increase D asinyou Uncertain go and tographs perform all r repetitions of the Dijkstra algorithm in parallel. The algorithm proceeds in rounds, starting from distance D =0,andincreasingthe distance by γ. In each round, we resume all r executions of the Dijkstra from where they had left in the previous round, and pause them when they reach all nodes with distance at most D. If the distribution p D,s,t of a node t reaches the 50% of its mass, then t is added to the k-nn solution. All other btained nodes that will be added in later steps will have greater or al meeal dis- equal median distances. The algorithm terminates once the solution set contains at least k nodes. This scheme works s, t1) < for any order statistic other than the median. Example: Median Distance k-nn (s, t) =. Since D,and nd the s, ti) declare This is e, since e overmating p D,s,t. 1 worlds, h their Algotimes: e Dijkets visgraphs, : when (samtop the whose date or ance is e comt V. e. We Algorithm 1 Median-Distance k-nn Input: Probabilistic graph G = (V,E,P,W), node s V, number of samples r, number k, distance increment γ Ouput: Tk, a result set of k nodes for the k-nn query 1: Tk ; D 0 2: Initiate r executions of Dijkstra from s 3: while Tk <kdo 4: D D + γ 5: for i 1:r do 6: Continue visiting nodes in the i-th execution of Dijkstra until reaching distance D 7: For each node t V visited update the distribution p D,s,t {Create the distribution p D,s,t if t has never been visited before} 8: end for 9: for all nodes t Tk for which p D,s,t exists do 10: if median( p D,s,t) <Dthen 11: Tk Tk {t} 12: end if 13: end for 14: end while 4.4 Majority-distance k-nn pruning The k-nn algorithm for Majority-Distance is similar to the one for Median-Distance. There are two main differences: In the case of the median, the distance of a node t from s is determined once the truncated distribution p D,s,t reaches the 50% of its mass. In the case of the majority, let d1 be the current majority value in p D,s,t, andletrt be all Dijkstra executions in which a node t has been visited. The condition for ensuring that d1 will be the exact majority distance is p D,s,t(d1) r rt. The aboveconditionstakecare r of the (worst) case that a node will appear with the same start from a small distance D decide whether there are nodes to add to the k-nn set increase the distance, and re-start each sampled graph from the new distance 20/21

37 Example: Median Distance k-nn The algorithm does not need to visit all nodes 0.5 Median Pruning (200 worlds) Visited nodes DBLP BIOMINE FLICKR 21/21

Social Data Management Communities

Social Data Management Communities Social Data Management Communities Antoine Amarilli 1, Silviu Maniu 2 January 9th, 2018 1 Télécom ParisTech 2 Université Paris-Sud 1/20 Table of contents Communities in Graphs 2/20 Graph Communities Communities

More information

Scalable Evaluation of k-nn Queries on Large Uncertain Graphs

Scalable Evaluation of k-nn Queries on Large Uncertain Graphs Scalable Evaluation of k-nn Queries on Large Uncertain Graphs Xiaodong Li 1, Reynold Cheng 1, Yixiang Fang 1, Jiafeng Hu 1, Silviu Maniu 2 1 The University of Hong Kong, China, 2 Université Paris-Sud,

More information

Fast Reliability Search in Uncertain Graphs Arijit Khan, Francesco Bonchi, Aristides Gionis, Francesco Gullo

Fast Reliability Search in Uncertain Graphs Arijit Khan, Francesco Bonchi, Aristides Gionis, Francesco Gullo Fast Reliability earch in ncertain Graphs Arijit Khan, Francesco Bonchi, Aristides Gionis, Francesco Gullo ystems Group, ETH Zurich Yahoo Labs, pain Aalto niversity, Finland ncertain Graphs 0.2 0.1 T ocial

More information

k-nearest Neighbors in Uncertain Graphs

k-nearest Neighbors in Uncertain Graphs k-nearest Neighbors in Uncertain Graphs Michalis Potamias Francesco Bonchi 2 Aristides Gionis 2 George Kollios Computer Science Department 2 Yahoo! Research Boston University, USA Barcelona, Spain {mp,gkollios}@cs.bu.edu

More information

An Indexing Framework for Queries on Probabilistic Graphs

An Indexing Framework for Queries on Probabilistic Graphs An Indexing Framework for Queries on Probabilistic Graphs SILVIU MANIU, Université Paris-Sud, Université Paris-Saclay REYNOLD CHENG, The University of Hong Kong PIERRE SENELLART, École normale supérieure,

More information

Nearest Neighbor with KD Trees

Nearest Neighbor with KD Trees Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest

More information

Nearest Neighbor with KD Trees

Nearest Neighbor with KD Trees Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest

More information

An Indexing Framework for Queries on Probabilistic Graphs

An Indexing Framework for Queries on Probabilistic Graphs An Indexing Framework for Queries on Probabilistic Graphs Silviu Maniu The University of Hong Kong Hong Kong SAR, China smaniu@cs.hku.hk Reynold Cheng The University of Hong Kong Hong Kong SAR, China ckcheng@cs.hku.hk

More information

Routing in Switched Networks

Routing in Switched Networks Routing in Switched Networks Raj Jain Washington University Saint Louis, MO 611 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse47-05/ 15-1 Overview! Routing

More information

Centrality in Large Networks

Centrality in Large Networks Centrality in Large Networks Mostafa H. Chehreghani May 14, 2017 Table of contents Centrality notions Exact algorithm Approximate algorithms Conclusion Centrality notions Exact algorithm Approximate algorithms

More information

Approximate Shortest Distance Computing: A Query-Dependent Local Landmark Scheme

Approximate Shortest Distance Computing: A Query-Dependent Local Landmark Scheme Approximate Shortest Distance Computing: A Query-Dependent Local Landmark Scheme Miao Qiao, Hong Cheng, Lijun Chang and Jeffrey Xu Yu The Chinese University of Hong Kong {mqiao, hcheng, ljchang, yu}@secuhkeduhk

More information

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval Case Study 2: Document Retrieval Task Description: Finding Similar Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 11, 2017 Sham Kakade 2017 1 Document

More information

Modeling of Complex Social. MATH 800 Fall 2011

Modeling of Complex Social. MATH 800 Fall 2011 Modeling of Complex Social Systems MATH 800 Fall 2011 Complex SocialSystems A systemis a set of elements and relationships A complex system is a system whose behavior cannot be easily or intuitively predicted

More information

Computer Vision 2 Lecture 8

Computer Vision 2 Lecture 8 Computer Vision 2 Lecture 8 Multi-Object Tracking (30.05.2016) leibe@vision.rwth-aachen.de, stueckler@vision.rwth-aachen.de RWTH Aachen University, Computer Vision Group http://www.vision.rwth-aachen.de

More information

Markov Decision Processes and Reinforcement Learning

Markov Decision Processes and Reinforcement Learning Lecture 14 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Introduction Artificial Intelligence

More information

CAIM: Cerca i Anàlisi d Informació Massiva

CAIM: Cerca i Anàlisi d Informació Massiva 1 / 72 CAIM: Cerca i Anàlisi d Informació Massiva FIB, Grau en Enginyeria Informàtica Slides by Marta Arias, José Balcázar, Ricard Gavaldá Department of Computer Science, UPC Fall 2016 http://www.cs.upc.edu/~caim

More information

Discovering Highly Reliable Subgraphs in Uncertain Graphs

Discovering Highly Reliable Subgraphs in Uncertain Graphs Discovering Highly Reliable Subgraphs in Uncertain Graphs Ruoming Jin Kent State University Kent, OH, USA jin@cs.kent.edu Lin Liu Kent State University Kent, OH, USA lliu@cs.kent.edu Charu C. Aggarwal

More information

Highway Dimension and Provably Efficient Shortest Paths Algorithms

Highway Dimension and Provably Efficient Shortest Paths Algorithms Highway Dimension and Provably Efficient Shortest Paths Algorithms Andrew V. Goldberg Microsoft Research Silicon Valley www.research.microsoft.com/ goldberg/ Joint with Ittai Abraham, Amos Fiat, and Renato

More information

Problem 1: Complexity of Update Rules for Logistic Regression

Problem 1: Complexity of Update Rules for Logistic Regression Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1

More information

Distance Constraint Reachability Computation in Uncertain Graphs

Distance Constraint Reachability Computation in Uncertain Graphs Distance Constraint Reachability Computation in Uncertain Graphs Ruoming Jin Lin Liu Bolin Ding Haixun Wang Kent State University UIUC Microsoft Research Asia {jin,liu}@cs.kent.edu, bding3@uiuc.edu, haixunw@microsoft.com

More information

Querying Shortest Distance on Large Graphs

Querying Shortest Distance on Large Graphs .. Miao Qiao, Hong Cheng, Lijun Chang and Jeffrey Xu Yu Department of Systems Engineering & Engineering Management The Chinese University of Hong Kong October 19, 2011 Roadmap Preliminary Related Work

More information

CS 598CSC: Approximation Algorithms Lecture date: March 2, 2011 Instructor: Chandra Chekuri

CS 598CSC: Approximation Algorithms Lecture date: March 2, 2011 Instructor: Chandra Chekuri CS 598CSC: Approximation Algorithms Lecture date: March, 011 Instructor: Chandra Chekuri Scribe: CC Local search is a powerful and widely used heuristic method (with various extensions). In this lecture

More information

Geometric data structures:

Geometric data structures: Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other

More information

Diffusion and Clustering on Large Graphs

Diffusion and Clustering on Large Graphs Diffusion and Clustering on Large Graphs Alexander Tsiatas Final Defense 17 May 2012 Introduction Graphs are omnipresent in the real world both natural and man-made Examples of large graphs: The World

More information

CE693: Adv. Computer Networking

CE693: Adv. Computer Networking CE693: Adv. Computer Networking L-10 Wireless Broadcast Fall 1390 Acknowledgments: Lecture slides are from the graduate level Computer Networks course thought by Srinivasan Seshan at CMU. When slides are

More information

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity

More information

Fast Nearest Neighbor Search on Large Time-Evolving Graphs

Fast Nearest Neighbor Search on Large Time-Evolving Graphs Fast Nearest Neighbor Search on Large Time-Evolving Graphs Leman Akoglu Srinivasan Parthasarathy Rohit Khandekar Vibhore Kumar Deepak Rajan Kun-Lung Wu Graphs are everywhere Leman Akoglu Fast Nearest Neighbor

More information

Link Prediction Benchmarks

Link Prediction Benchmarks Link Prediction Benchmarks Haifeng Qian IBM T. J. Watson Research Center Yorktown Heights, NY qianhaifeng@us.ibm.com October 13, 2016 This document describes two temporal link prediction benchmarks that

More information

High Dimensional Indexing by Clustering

High Dimensional Indexing by Clustering Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should

More information

Photon Mapping. Kadi Bouatouch IRISA

Photon Mapping. Kadi Bouatouch IRISA Kadi Bouatouch IRISA Email: kadi@irisa.fr 1 Photon emission and transport 2 Photon caching 3 Spatial data structure for fast access 4 Radiance estimation 5 Kd-tree Balanced Binary Tree When a splitting

More information

Chapter 4: Implicit Error Detection

Chapter 4: Implicit Error Detection 4. Chpter 5 Chapter 4: Implicit Error Detection Contents 4.1 Introduction... 4-2 4.2 Network error correction... 4-2 4.3 Implicit error detection... 4-3 4.4 Mathematical model... 4-6 4.5 Simulation setup

More information

Floating-Point Arithmetic

Floating-Point Arithmetic Floating-Point Arithmetic Raymond J. Spiteri Lecture Notes for CMPT 898: Numerical Software University of Saskatchewan January 9, 2013 Objectives Floating-point numbers Floating-point arithmetic Analysis

More information

Balanced Box-Decomposition trees for Approximate nearest-neighbor. Manos Thanos (MPLA) Ioannis Emiris (Dept Informatics) Computational Geometry

Balanced Box-Decomposition trees for Approximate nearest-neighbor. Manos Thanos (MPLA) Ioannis Emiris (Dept Informatics) Computational Geometry Balanced Box-Decomposition trees for Approximate nearest-neighbor 11 Manos Thanos (MPLA) Ioannis Emiris (Dept Informatics) Computational Geometry Nearest Neighbor A set S of n points is given in some metric

More information

Improving Search In Peer-to-Peer Systems

Improving Search In Peer-to-Peer Systems Improving Search In Peer-to-Peer Systems Presented By Jon Hess cs294-4 Fall 2003 Goals Present alternative searching methods for systems with loose integrity constraints Probabilistically optimize over

More information

Probabilistic Modeling of Leach Protocol and Computing Sensor Energy Consumption Rate in Sensor Networks

Probabilistic Modeling of Leach Protocol and Computing Sensor Energy Consumption Rate in Sensor Networks Probabilistic Modeling of Leach Protocol and Computing Sensor Energy Consumption Rate in Sensor Networks Dezhen Song CS Department, Texas A&M University Technical Report: TR 2005-2-2 Email: dzsong@cs.tamu.edu

More information

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang K-Nearest Neighbour (Continued) Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ A few things: No lectures on Week 7 (i.e., the week starting from Monday 5 th November), and Week 11 (i.e., the week

More information

Sensor Tasking and Control

Sensor Tasking and Control Sensor Tasking and Control Outline Task-Driven Sensing Roles of Sensor Nodes and Utilities Information-Based Sensor Tasking Joint Routing and Information Aggregation Summary Introduction To efficiently

More information

Component Based Performance Modelling of Wireless Routing Protocols

Component Based Performance Modelling of Wireless Routing Protocols This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE ICC 29 proceedings Component Based Performance Modelling of Wireless

More information

Sequential Monte Carlo Method for counting vertex covers

Sequential Monte Carlo Method for counting vertex covers Sequential Monte Carlo Method for counting vertex covers Slava Vaisman Faculty of Industrial Engineering and Management Technion, Israel Institute of Technology Haifa, Israel May 18, 2013 Slava Vaisman

More information

Markov chain Monte Carlo methods

Markov chain Monte Carlo methods Markov chain Monte Carlo methods (supplementary material) see also the applet http://www.lbreyer.com/classic.html February 9 6 Independent Hastings Metropolis Sampler Outline Independent Hastings Metropolis

More information

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015 Clustering Subhransu Maji CMPSCI 689: Machine Learning 2 April 2015 7 April 2015 So far in the course Supervised learning: learning with a teacher You had training data which was (feature, label) pairs

More information

Exact Computation of Influence Spread by Binary Decision Diagrams

Exact Computation of Influence Spread by Binary Decision Diagrams Exact Computation of Influence Spread by Binary Decision Diagrams Takanori Maehara 1), Hirofumi Suzuki 2), Masakazu Ishihata 2) 1) Riken Center for Advanced Intelligence Project 2) Hokkaido University

More information

A Fault-Tolerant P2P-based Protocol for Logical Networks Interconnection

A Fault-Tolerant P2P-based Protocol for Logical Networks Interconnection A Fault-Tolerant P2P-based Protocol for Logical Networks Interconnection Jaime Lloret 1, Juan R. Diaz 2, Fernando Boronat 3 and Jose M. Jiménez 4 Department of Communications, Polytechnic University of

More information

Scalable Routing in Cyclic Mobile Networks

Scalable Routing in Cyclic Mobile Networks Scalable Routing in Cyclic Mobile Networks Cong Liu and Jie Wu Department of Computer Science and Engineering Florida Atlantic University Boca Raton, FL 3343 Abstract The non-existence of an end-to-end

More information

Random Swap algorithm

Random Swap algorithm Random Swap algorithm Pasi Fränti 24.4.2018 Definitions and data Set of N data points: X={x 1, x 2,, x N } Partition of the data: P={p 1, p 2,,p k }, Set of k cluster prototypes (centroids): C={c 1, c

More information

Correction of Model Reduction Errors in Simulations

Correction of Model Reduction Errors in Simulations Correction of Model Reduction Errors in Simulations MUQ 15, June 2015 Antti Lipponen UEF // University of Eastern Finland Janne Huttunen Ville Kolehmainen University of Eastern Finland University of Eastern

More information

Onroad Vehicular Broadcast

Onroad Vehicular Broadcast Onroad Vehicular Broadcast Jesus Arango, Alon Efrat Computer Science Department University of Arizona Srinivasan Ramasubramanian, Marwan Krunz Electrical and Computer Engineering University of Arizona

More information

Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching

Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching Correlated Correspondences [ASP*04] A Complete Registration System [HAW*08] In this session Advanced Global Matching Some practical

More information

Randomized algorithms. Inge Li Gørtz

Randomized algorithms. Inge Li Gørtz Randomized algorithms Inge Li Gørtz 1 Randomized algorithms Today What are randomized algorithms? Properties of randomized algorithms Three examples: Median/Select. Quick-sort Closest pair of points 2

More information

CS 340 Lec. 4: K-Nearest Neighbors

CS 340 Lec. 4: K-Nearest Neighbors CS 340 Lec. 4: K-Nearest Neighbors AD January 2011 AD () CS 340 Lec. 4: K-Nearest Neighbors January 2011 1 / 23 K-Nearest Neighbors Introduction Choice of Metric Overfitting and Underfitting Selection

More information

On Graph Query Optimization in Large Networks

On Graph Query Optimization in Large Networks On Graph Query Optimization in Large Networks Peixiang Zhao, Jiawei Han Department of omputer Science University of Illinois at Urbana-hampaign pzhao4@illinois.edu, hanj@cs.uiuc.edu September 14th, 2010

More information

Web Science & Technologies University of Koblenz Landau, Germany. Relational Data Model

Web Science & Technologies University of Koblenz Landau, Germany. Relational Data Model Web Science & Technologies University of Koblenz Landau, Germany Relational Data Model Overview Relational data model; Tuples and relations; Schemas and instances; Named vs. unnamed perspective; Relational

More information

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum

More information

Large Scale Density-friendly Graph Decomposition via Convex Programming

Large Scale Density-friendly Graph Decomposition via Convex Programming Large Scale Density-friendly Graph Decomposition via Convex Programming Maximilien Danisch LTCI, Télécom ParisTech, Université Paris-Saclay, 753, Paris, France danisch@telecomparistech.fr T-H. Hubert Chan

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

SILC: Efficient Query Processing on Spatial Networks

SILC: Efficient Query Processing on Spatial Networks Hanan Samet hjs@cs.umd.edu Department of Computer Science University of Maryland College Park, MD 20742, USA Joint work with Jagan Sankaranarayanan and Houman Alborzi Proceedings of the 13th ACM International

More information

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS Overview of Networks Instructor: Yizhou Sun yzsun@cs.ucla.edu January 10, 2017 Overview of Information Network Analysis Network Representation Network

More information

An Efficient Bayesian Nearest Neighbor Search Using Marginal Object Weight Ranking Scheme in Spatial Databases

An Efficient Bayesian Nearest Neighbor Search Using Marginal Object Weight Ranking Scheme in Spatial Databases Journal of Computer Science 8 (8): 1358-1363, 2012 ISSN 1549-3636 2012 Science Publications An Efficient Bayesian Nearest Neighbor Search Using Marginal Object Weight Ranking Scheme in Spatial Databases

More information

We use non-bold capital letters for all random variables in these notes, whether they are scalar-, vector-, matrix-, or whatever-valued.

We use non-bold capital letters for all random variables in these notes, whether they are scalar-, vector-, matrix-, or whatever-valued. The Bayes Classifier We have been starting to look at the supervised classification problem: we are given data (x i, y i ) for i = 1,..., n, where x i R d, and y i {1,..., K}. In this section, we suppose

More information

Instance-Based Learning. Goals for the lecture

Instance-Based Learning. Goals for the lecture Instance-Based Learning Mar Craven and David Page Computer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed

More information

Semantic Importance Sampling for Statistical Model Checking

Semantic Importance Sampling for Statistical Model Checking Semantic Importance Sampling for Statistical Model Checking Software Engineering Institute Carnegie Mellon University Pittsburgh, PA 15213 Jeffery Hansen, Lutz Wrage, Sagar Chaki, Dionisio de Niz, Mark

More information

CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)"

CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1) CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Retrieval Models" Provide

More information

Section 7.12: Similarity. By: Ralucca Gera, NPS

Section 7.12: Similarity. By: Ralucca Gera, NPS Section 7.12: Similarity By: Ralucca Gera, NPS Motivation We talked about global properties Average degree, average clustering, ave path length We talked about local properties: Some node centralities

More information

The BKZ algorithm. Joop van de Pol

The BKZ algorithm. Joop van de Pol The BKZ algorithm Department of Computer Science, University f Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB United Kingdom. May 9th, 2014 The BKZ algorithm Slide 1 utline Lattices

More information

Approximation Algorithms for Clustering Uncertain Data

Approximation Algorithms for Clustering Uncertain Data Approximation Algorithms for Clustering Uncertain Data Graham Cormode AT&T Labs - Research graham@research.att.com Andrew McGregor UCSD / MSR / UMass Amherst andrewm@ucsd.edu Introduction Many applications

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

CPSC 340: Machine Learning and Data Mining. Finding Similar Items Fall 2017

CPSC 340: Machine Learning and Data Mining. Finding Similar Items Fall 2017 CPSC 340: Machine Learning and Data Mining Finding Similar Items Fall 2017 Assignment 1 is due tonight. Admin 1 late day to hand in Monday, 2 late days for Wednesday. Assignment 2 will be up soon. Start

More information

A Improving Classification Quality in Uncertain Graphs

A Improving Classification Quality in Uncertain Graphs Page of 22 Journal of Data and Information Quality A Improving Classification Quality in Uncertain Graphs Michele Dallachiesa, University of Trento Charu C. Aggarwal, IBM T.J. Watson Research Center Themis

More information

SOCIAL MEDIA MINING. Data Mining Essentials

SOCIAL MEDIA MINING. Data Mining Essentials SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate

More information

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please) Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in

More information

The Cross-Entropy Method

The Cross-Entropy Method The Cross-Entropy Method Guy Weichenberg 7 September 2003 Introduction This report is a summary of the theory underlying the Cross-Entropy (CE) method, as discussed in the tutorial by de Boer, Kroese,

More information

Component Based Performance Modelling of the Wireless Routing Protocols

Component Based Performance Modelling of the Wireless Routing Protocols The Institute for Systems Research ISR Technical Report 28-27 Component Based Performance Modelling of the Wireless Routing Protocols Vahid Tabatabaee, John S. Baras, Punyaslok Purkayastha, Kiran Somasundaram

More information

Discussion. What problems stretch the limits of computation? Compare 4 Algorithms. What is Brilliance? 11/11/11

Discussion. What problems stretch the limits of computation? Compare 4 Algorithms. What is Brilliance? 11/11/11 11/11/11 UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 0: Introduction to Computation Discussion Professor Andrea Arpaci-Dusseau Is there an inherent difference between What problems

More information

Clustering. Unsupervised Learning

Clustering. Unsupervised Learning Clustering. Unsupervised Learning Maria-Florina Balcan 04/06/2015 Reading: Chapter 14.3: Hastie, Tibshirani, Friedman. Additional resources: Center Based Clustering: A Foundational Perspective. Awasthi,

More information

Modern Methods of Data Analysis - WS 07/08

Modern Methods of Data Analysis - WS 07/08 Modern Methods of Data Analysis Lecture XV (04.02.08) Contents: Function Minimization (see E. Lohrmann & V. Blobel) Optimization Problem Set of n independent variables Sometimes in addition some constraints

More information

Locality- Sensitive Hashing Random Projections for NN Search

Locality- Sensitive Hashing Random Projections for NN Search Case Study 2: Document Retrieval Locality- Sensitive Hashing Random Projections for NN Search Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 18, 2017 Sham Kakade

More information

CPSC 340: Machine Learning and Data Mining. Hierarchical Clustering Fall 2016

CPSC 340: Machine Learning and Data Mining. Hierarchical Clustering Fall 2016 CPSC 340: Machine Learning and Data Mining Hierarchical Clustering Fall 2016 Admin Assignment 1 : 3 late days to hand it in before Friday. 0 after that. Assignment 2 is out: Due Friday of next week, but

More information

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20 Data mining Piotr Paszek Classification k-nn Classifier (Piotr Paszek) Data mining k-nn 1 / 20 Plan of the lecture 1 Lazy Learner 2 k-nearest Neighbor Classifier 1 Distance (metric) 2 How to Determine

More information

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2 So far in the course Clustering Subhransu Maji : Machine Learning 2 April 2015 7 April 2015 Supervised learning: learning with a teacher You had training data which was (feature, label) pairs and the goal

More information

Node Similarity. Ralucca Gera, Applied Mathematics Dept. Naval Postgraduate School Monterey, California

Node Similarity. Ralucca Gera, Applied Mathematics Dept. Naval Postgraduate School Monterey, California Node Similarity Ralucca Gera, Applied Mathematics Dept. Naval Postgraduate School Monterey, California rgera@nps.edu Motivation We talked about global properties Average degree, average clustering, ave

More information

Linear Regression and K-Nearest Neighbors 3/28/18

Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression Hypothesis Space Supervised learning For every input in the data set, we know the output Regression Outputs are continuous A number,

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Bayes Nets: Inference Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Fast Reliability Search in Uncertain Graphs

Fast Reliability Search in Uncertain Graphs Fast Reliability Search in Uncertain Graphs Arijit Khan, Francesco Bonchi, Aristides Gionis, Francesco Gullo Systems Group, ETH Zurich Yahoo Labs, Spain Aalto University, Finland ABSTRACT Uncertain, or

More information

Nearest Neighbors Classifiers

Nearest Neighbors Classifiers Nearest Neighbors Classifiers Raúl Rojas Freie Universität Berlin July 2014 In pattern recognition we want to analyze data sets of many different types (pictures, vectors of health symptoms, audio streams,

More information

Warm-up as you walk in

Warm-up as you walk in arm-up as you walk in Given these N=10 observations of the world: hat is the approximate value for P c a, +b? A. 1/10 B. 5/10. 1/4 D. 1/5 E. I m not sure a, b, +c +a, b, +c a, b, +c a, +b, +c +a, b, +c

More information

Bidirectional search and Goal-directed Dijkstra

Bidirectional search and Goal-directed Dijkstra Bidirectional search and Goal-directed Dijkstra Ferienakademie im Sarntal Course 2 Distance Problems: Theory and Praxis Kozyntsev A.N. Fakultät für Informatik TU München 26. September 2010 Kozyntsev A.N.:

More information

CS 268: Computer Networking. Taking Advantage of Broadcast

CS 268: Computer Networking. Taking Advantage of Broadcast CS 268: Computer Networking L-12 Wireless Broadcast Taking Advantage of Broadcast Opportunistic forwarding Network coding Assigned reading XORs In The Air: Practical Wireless Network Coding ExOR: Opportunistic

More information

CS264: Homework #4. Due by midnight on Wednesday, October 22, 2014

CS264: Homework #4. Due by midnight on Wednesday, October 22, 2014 CS264: Homework #4 Due by midnight on Wednesday, October 22, 2014 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. (2) Turn in your solutions

More information

PointMap: A real-time memory-based learning system with on-line and post-training pruning

PointMap: A real-time memory-based learning system with on-line and post-training pruning PointMap: A real-time memory-based learning system with on-line and post-training pruning Norbert Kopčo and Gail A. Carpenter Department of Cognitive and Neural Systems, Boston University Boston, Massachusetts

More information

Uninformed Search Strategies

Uninformed Search Strategies Uninformed Search Strategies Alan Mackworth UBC CS 322 Search 2 January 11, 2013 Textbook 3.5 1 Today s Lecture Lecture 4 (2-Search1) Recap Uninformed search + criteria to compare search algorithms - Depth

More information

Computational complexity

Computational complexity Computational complexity Heuristic Algorithms Giovanni Righini University of Milan Department of Computer Science (Crema) Definitions: problems and instances A problem is a general question expressed in

More information

Fast Nearest-neighbor Search in Disk-resident Graphs. February 2010 CMU-ML

Fast Nearest-neighbor Search in Disk-resident Graphs. February 2010 CMU-ML Fast Nearest-neighbor Search in Disk-resident Graphs Purnamrita Sarkar Andrew W. Moore February 2010 CMU-ML-10-100 Fast Nearest-neighbor Search in Disk-resident Graphs Purnamrita Sarkar February 5, 2010

More information

Outline. Data Association Scenarios. Data Association Scenarios. Data Association Scenarios

Outline. Data Association Scenarios. Data Association Scenarios. Data Association Scenarios Outline Data Association Scenarios Track Filtering and Gating Global Nearest Neighbor (GNN) Review: Linear Assignment Problem Murthy s k-best Assignments Algorithm Probabilistic Data Association (PDAF)

More information

Reach for A : Efficient Point-to-Point Shortest Path Algorithms

Reach for A : Efficient Point-to-Point Shortest Path Algorithms Reach for A : Efficient Point-to-Point Shortest Path Algorithms Andrew V. Goldberg 1 Haim Kaplan 2 Renato F. Werneck 3 October 2005 Technical Report MSR-TR-2005-132 We study the point-to-point shortest

More information

Betweness Centrality ENDRIAS KAHSSAY

Betweness Centrality ENDRIAS KAHSSAY Betweness Centrality ENDRIAS KAHSSAY What is it? The centrality of a vertex is the fraction of the shortest paths that go through. A measure how important a vertex is in a Graph. An undirected graph colored

More information

Graph Algorithms. Revised based on the slides by Ruoming Kent State

Graph Algorithms. Revised based on the slides by Ruoming Kent State Graph Algorithms Adapted from UMD Jimmy Lin s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States. See http://creativecommons.org/licenses/by-nc-sa/3.0/us/

More information

Computational Methods. Randomness and Monte Carlo Methods

Computational Methods. Randomness and Monte Carlo Methods Computational Methods Randomness and Monte Carlo Methods Manfred Huber 2010 1 Randomness and Monte Carlo Methods Introducing randomness in an algorithm can lead to improved efficiencies Random sampling

More information

Instance-Based Learning.

Instance-Based Learning. Instance-Based Learning www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts k-nn classification k-nn regression edited nearest neighbor k-d trees for nearest

More information

A Genetic Algorithm Framework

A Genetic Algorithm Framework Fast, good, cheap. Pick any two. The Project Triangle 3 A Genetic Algorithm Framework In this chapter, we develop a genetic algorithm based framework to address the problem of designing optimal networks

More information

A Dijkstra-type algorithm for dynamic games

A Dijkstra-type algorithm for dynamic games A Dijkstra-type algorithm for dynamic games Martino Bardi 1 Juan Pablo Maldonado Lopez 2 1 Dipartimento di Matematica, Università di Padova, Italy 2 formerly at Combinatoire et Optimisation, UPMC, France

More information