Uncertain Data Management Non-relational models: Graphs
|
|
- Laurel Hicks
- 6 years ago
- Views:
Transcription
1 Uncertain Data Management Non-relational models: Graphs Antoine Amarilli 1, Silviu Maniu 2 1 Télécom ParisTech 2 Université Paris-Sud January 16th, /21
2 Credits M. Potamias, F. Bonchi, A. Gionis, G. Kolios. k-nearest Neighbors in Uncertain Graphs. PVLDB 3(1), (number of samples, median measure, figure in slide 17, algorithm in slide 20) M. Ball. Computational Complexity of Network Reliability Analysis: An Overview. IEEE Trans. Reliab. R-35(3), L. Valiant. The Complexity of Enumeration And Reliability Problems. SIAM J. Comput. 8(3), (complexity of reliability/reachability) PDFs of the slides available at 2/21
3 Uncertain Graphs Graphs: a natural way to represent data in various domains transport data: road, air links between locations social networks: relationships between humans, citation networks interactions between proteins: contacts due to biochemical processes 3/21
4 Uncertain Graphs Graphs: a natural way to represent data in various domains transport data: road, air links between locations social networks: relationships between humans, citation networks interactions between proteins: contacts due to biochemical processes For all the above examples, the links are not exact. (Why?) 3/21
5 (Deterministic) Graphs a d A graph G = (V, E) is formed of b c a set V of vertices (nodes) a set E V V, of edges 4/21
6 Uncertain Graphs An uncertain graph G = (V, E, p) is formed of a 0.2 b d 0.5 c 0.7 a set V of vertices (nodes) a set E V V, of edges a function p : E [0, 1], representing the probability p e that the edge e E exists or not What are the possible worlds and their probability for this model? 5/21
7 Uncertain Graphs: Possible Worlds A possible world of G, denoted G G is a deterministic graph G = (V, E G ) where each e E G is chosen from E 6/21
8 Uncertain Graphs: Possible Worlds A possible world of G, denoted G G is a deterministic graph G = (V, E G ) where each e E G is chosen from E The probability of G is: Pr[G] = e E G p e e E\E G (1 p e ) How many possible worlds are there? 6/21
9 Uncertain Graphs: Other models Other models are possible: each edge is replaced by a distribution of weights instead of choosing if the edge exists or not, a possible world is an instantiation of weights each edge has a formula of events, capturing correlations probabilities can be on nodes also equivalent to the edge model (Why?) 7/21
10 Queries on Uncertain Graphs Generally, the queries we want to answer are distance queries: the reachability or reliability query get the probability that two nodes s and t are connected 8/21
11 Queries on Uncertain Graphs Generally, the queries we want to answer are distance queries: the reachability or reliability query get the probability that two nodes s and t are connected queries on the distance distribution: p s,t (d) = G d G (s,t)=d Pr[G] 8/21
12 Queries on Uncertain Graphs Generally, the queries we want to answer are distance queries: the reachability or reliability query get the probability that two nodes s and t are connected queries on the distance distribution: p s,t (d) = Multiple uses of distance queries: G d G (s,t)=d Pr[G] link prediction, social search, travel estimation 8/21
13 Queries on Uncertain Graphs Generally, the queries we want to answer are distance queries: the reachability or reliability query get the probability that two nodes s and t are connected 9/21
14 Queries on Uncertain Graphs Generally, the queries we want to answer are distance queries: the reachability or reliability query get the probability that two nodes s and t are connected queries on the distance distribution: p s,t (d) = G d G (s,t)=d Pr[G] 9/21
15 Queries on Uncertain Graphs a d What is the distance (in hops) between b and a? b c 10/21
16 Queries on Uncertain Graphs d What is the distance (in hops) between b and a? b a c BFS search (or Dijkstra s algorithms) finds the edge b a the cost is O(E) (linear in the size of the graph) 10/21
17 Queries on Uncertain Graphs What is the distance (in hops) between b and a? a d b 0.3 c 11/21
18 Queries on Uncertain Graphs What is the distance (in hops) between b and a? a d the edge b a does not appear in all possible worlds: p b,a (1) = p(b a) b 0.3 c 11/21
19 Queries on Uncertain Graphs What is the distance (in hops) between b and a? a d the edge b a does not appear in all possible worlds: p b,a (1) = p(b a) b 0.3 c there are two possible paths of distance 2 (b c a) and 3 (b d c a) p b,a (1) = (1 p b,a (1)) p(b c a) 11/21
20 Queries on Uncertain Graphs a d What is the distance (in hops) between b and a? b c 12/21
21 Queries on Uncertain Graphs a b d c What is the distance (in hops) between b and a? the number of paths is exponential in the size of the graph specifically, there are 3! paths 12/21
22 Queries on Uncertain Graphs Distance query answering in uncertain graphs is at least as hard as in relational databases (logical formulas of paths; the number of which can be exponential) 13/21
23 Queries on Uncertain Graphs Distance query answering in uncertain graphs is at least as hard as in relational databases (logical formulas of paths; the number of which can be exponential) Computing the reachability probability (i.e, computing the probability of there being a path between a source and a target) is known to be #P hard [Valiant, SIAM J. Comp, 1979] 13/21
24 Computing Answers to Distance Queries on Probabilistic Graphs Distance estimations in uncertain graphs can be approximated via Monte Carlo sampling 14/21
25 Computing Answers to Distance Queries on Probabilistic Graphs Distance estimations in uncertain graphs can be approximated via Monte Carlo sampling 1. generate sampled graphs for r rounds (is this the optimal way for an s, t distance estimation?) 2. compute the desired measure (reachability probability, distance distributions) by averaging results 14/21
26 Computing Answers to Distance Queries on Probabilistic Graphs Distance estimations in uncertain graphs can be approximated via Monte Carlo sampling 1. generate sampled graphs for r rounds (is this the optimal way for an s, t distance estimation?) 2. compute the desired measure (reachability probability, distance distributions) by averaging results Same issue: how many rounds? 14/21
27 Median distance: Number of Samples: Median Distance d M (s, t) = arg max D { D } p s,t (d) 1 2 d=0 15/21
28 Median distance: Number of Samples: Median Distance d M (s, t) = arg max D { D } p s,t (d) 1 2 d=0 Let µ be the real median, and α and β values ±ɛn away from µ. Then for: r > c ɛ 2 log(2 δ ) and a good choice of c: Pr(ˆµ [α, β]) > 1 δ 15/21
29 Number of Samples: Expected Distance Expected reliable distance (generalization of reliability): d ER (s, t) = d d< d p s,t (d) 1 p s,t ( ) 16/21
30 Number of Samples: Expected Distance Expected reliable distance (generalization of reliability): d ER (s, t) = d d< d p s,t (d) 1 p s,t ( ) By estimating the connectivity ρ, we need to sample at least: { } ( ) 3 (n 1)2 2 r max ɛ 2, ρ 2ɛ 2 log δ for an (ɛ, δ) approximation. 16/21
31 s- y - - s of y n a 3. rs Distance Estimation 2 in Uncertain Graphs i- a s e - s i- e Number of Samples In Reality Edge probability (a) The Figure number 4: ofdistribution needed samples of can (a) beedge surprisingly probabilities, low (but it (b) depends distances. on the actual probabilities) Mean Squared Error BIOMINE Number of worlds Median Majority ExpectedRel Reliability Mean Squared Error Distance (b) FLICKR Number of worlds Median Majority ExpectedRel Reliability Figure 5: MSE vs. worlds. 200 worlds are enough. 17/21
32 Sampling Graphs Generating the entirety of the graph G i for each round i < r is not optimal 18/21
33 Sampling Graphs Generating the entirety of the graph G i for each round i < r is not optimal we do not need to estimate the entire graph G i we can start from s and do a BFS or Dijkstra search by sampling only the outgoing edges based on the generated outgoing edges, we re-do the computation for each generated outgoing node, until we find t 18/21
34 Example: Median Distance k-nn k-nn (k nearest neighbours) finding the k nodes from s the closest by some measure let us consider the median distance (reminder: it is the highest distance in the distribution that has mass less or equal to 0.5) 19/21
35 Example: Median Distance k-nn k-nn (k nearest neighbours) finding the k nodes from s the closest by some measure let us consider the median distance (reminder: it is the highest distance in the distribution that has mass less or equal to 0.5) We only care about the top-k nodes, and not their values, and we do not want to evaluate all the graph if possible we can evaluate a truncated distribution up to a distance D p s,t (d) if d < D p D,s,t (d) = x=d p s,t(x) if d = D 0 if d > D for any two nodes t 1, t 2, d D,M (s, t 1 ) < d D,M (s, t 2 ) implies d M (s, t 1 ) < d M (d, t 2 ) 19/21
36 DistancetoEstimation increase D asinyou Uncertain go and tographs perform all r repetitions of the Dijkstra algorithm in parallel. The algorithm proceeds in rounds, starting from distance D =0,andincreasingthe distance by γ. In each round, we resume all r executions of the Dijkstra from where they had left in the previous round, and pause them when they reach all nodes with distance at most D. If the distribution p D,s,t of a node t reaches the 50% of its mass, then t is added to the k-nn solution. All other btained nodes that will be added in later steps will have greater or al meeal dis- equal median distances. The algorithm terminates once the solution set contains at least k nodes. This scheme works s, t1) < for any order statistic other than the median. Example: Median Distance k-nn (s, t) =. Since D,and nd the s, ti) declare This is e, since e overmating p D,s,t. 1 worlds, h their Algotimes: e Dijkets visgraphs, : when (samtop the whose date or ance is e comt V. e. We Algorithm 1 Median-Distance k-nn Input: Probabilistic graph G = (V,E,P,W), node s V, number of samples r, number k, distance increment γ Ouput: Tk, a result set of k nodes for the k-nn query 1: Tk ; D 0 2: Initiate r executions of Dijkstra from s 3: while Tk <kdo 4: D D + γ 5: for i 1:r do 6: Continue visiting nodes in the i-th execution of Dijkstra until reaching distance D 7: For each node t V visited update the distribution p D,s,t {Create the distribution p D,s,t if t has never been visited before} 8: end for 9: for all nodes t Tk for which p D,s,t exists do 10: if median( p D,s,t) <Dthen 11: Tk Tk {t} 12: end if 13: end for 14: end while 4.4 Majority-distance k-nn pruning The k-nn algorithm for Majority-Distance is similar to the one for Median-Distance. There are two main differences: In the case of the median, the distance of a node t from s is determined once the truncated distribution p D,s,t reaches the 50% of its mass. In the case of the majority, let d1 be the current majority value in p D,s,t, andletrt be all Dijkstra executions in which a node t has been visited. The condition for ensuring that d1 will be the exact majority distance is p D,s,t(d1) r rt. The aboveconditionstakecare r of the (worst) case that a node will appear with the same start from a small distance D decide whether there are nodes to add to the k-nn set increase the distance, and re-start each sampled graph from the new distance 20/21
37 Example: Median Distance k-nn The algorithm does not need to visit all nodes 0.5 Median Pruning (200 worlds) Visited nodes DBLP BIOMINE FLICKR 21/21
Social Data Management Communities
Social Data Management Communities Antoine Amarilli 1, Silviu Maniu 2 January 9th, 2018 1 Télécom ParisTech 2 Université Paris-Sud 1/20 Table of contents Communities in Graphs 2/20 Graph Communities Communities
More informationScalable Evaluation of k-nn Queries on Large Uncertain Graphs
Scalable Evaluation of k-nn Queries on Large Uncertain Graphs Xiaodong Li 1, Reynold Cheng 1, Yixiang Fang 1, Jiafeng Hu 1, Silviu Maniu 2 1 The University of Hong Kong, China, 2 Université Paris-Sud,
More informationFast Reliability Search in Uncertain Graphs Arijit Khan, Francesco Bonchi, Aristides Gionis, Francesco Gullo
Fast Reliability earch in ncertain Graphs Arijit Khan, Francesco Bonchi, Aristides Gionis, Francesco Gullo ystems Group, ETH Zurich Yahoo Labs, pain Aalto niversity, Finland ncertain Graphs 0.2 0.1 T ocial
More informationk-nearest Neighbors in Uncertain Graphs
k-nearest Neighbors in Uncertain Graphs Michalis Potamias Francesco Bonchi 2 Aristides Gionis 2 George Kollios Computer Science Department 2 Yahoo! Research Boston University, USA Barcelona, Spain {mp,gkollios}@cs.bu.edu
More informationAn Indexing Framework for Queries on Probabilistic Graphs
An Indexing Framework for Queries on Probabilistic Graphs SILVIU MANIU, Université Paris-Sud, Université Paris-Saclay REYNOLD CHENG, The University of Hong Kong PIERRE SENELLART, École normale supérieure,
More informationNearest Neighbor with KD Trees
Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest
More informationNearest Neighbor with KD Trees
Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest
More informationAn Indexing Framework for Queries on Probabilistic Graphs
An Indexing Framework for Queries on Probabilistic Graphs Silviu Maniu The University of Hong Kong Hong Kong SAR, China smaniu@cs.hku.hk Reynold Cheng The University of Hong Kong Hong Kong SAR, China ckcheng@cs.hku.hk
More informationRouting in Switched Networks
Routing in Switched Networks Raj Jain Washington University Saint Louis, MO 611 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse47-05/ 15-1 Overview! Routing
More informationCentrality in Large Networks
Centrality in Large Networks Mostafa H. Chehreghani May 14, 2017 Table of contents Centrality notions Exact algorithm Approximate algorithms Conclusion Centrality notions Exact algorithm Approximate algorithms
More informationApproximate Shortest Distance Computing: A Query-Dependent Local Landmark Scheme
Approximate Shortest Distance Computing: A Query-Dependent Local Landmark Scheme Miao Qiao, Hong Cheng, Lijun Chang and Jeffrey Xu Yu The Chinese University of Hong Kong {mqiao, hcheng, ljchang, yu}@secuhkeduhk
More informationTask Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval
Case Study 2: Document Retrieval Task Description: Finding Similar Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 11, 2017 Sham Kakade 2017 1 Document
More informationModeling of Complex Social. MATH 800 Fall 2011
Modeling of Complex Social Systems MATH 800 Fall 2011 Complex SocialSystems A systemis a set of elements and relationships A complex system is a system whose behavior cannot be easily or intuitively predicted
More informationComputer Vision 2 Lecture 8
Computer Vision 2 Lecture 8 Multi-Object Tracking (30.05.2016) leibe@vision.rwth-aachen.de, stueckler@vision.rwth-aachen.de RWTH Aachen University, Computer Vision Group http://www.vision.rwth-aachen.de
More informationMarkov Decision Processes and Reinforcement Learning
Lecture 14 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Introduction Artificial Intelligence
More informationCAIM: Cerca i Anàlisi d Informació Massiva
1 / 72 CAIM: Cerca i Anàlisi d Informació Massiva FIB, Grau en Enginyeria Informàtica Slides by Marta Arias, José Balcázar, Ricard Gavaldá Department of Computer Science, UPC Fall 2016 http://www.cs.upc.edu/~caim
More informationDiscovering Highly Reliable Subgraphs in Uncertain Graphs
Discovering Highly Reliable Subgraphs in Uncertain Graphs Ruoming Jin Kent State University Kent, OH, USA jin@cs.kent.edu Lin Liu Kent State University Kent, OH, USA lliu@cs.kent.edu Charu C. Aggarwal
More informationHighway Dimension and Provably Efficient Shortest Paths Algorithms
Highway Dimension and Provably Efficient Shortest Paths Algorithms Andrew V. Goldberg Microsoft Research Silicon Valley www.research.microsoft.com/ goldberg/ Joint with Ittai Abraham, Amos Fiat, and Renato
More informationProblem 1: Complexity of Update Rules for Logistic Regression
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1
More informationDistance Constraint Reachability Computation in Uncertain Graphs
Distance Constraint Reachability Computation in Uncertain Graphs Ruoming Jin Lin Liu Bolin Ding Haixun Wang Kent State University UIUC Microsoft Research Asia {jin,liu}@cs.kent.edu, bding3@uiuc.edu, haixunw@microsoft.com
More informationQuerying Shortest Distance on Large Graphs
.. Miao Qiao, Hong Cheng, Lijun Chang and Jeffrey Xu Yu Department of Systems Engineering & Engineering Management The Chinese University of Hong Kong October 19, 2011 Roadmap Preliminary Related Work
More informationCS 598CSC: Approximation Algorithms Lecture date: March 2, 2011 Instructor: Chandra Chekuri
CS 598CSC: Approximation Algorithms Lecture date: March, 011 Instructor: Chandra Chekuri Scribe: CC Local search is a powerful and widely used heuristic method (with various extensions). In this lecture
More informationGeometric data structures:
Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other
More informationDiffusion and Clustering on Large Graphs
Diffusion and Clustering on Large Graphs Alexander Tsiatas Final Defense 17 May 2012 Introduction Graphs are omnipresent in the real world both natural and man-made Examples of large graphs: The World
More informationCE693: Adv. Computer Networking
CE693: Adv. Computer Networking L-10 Wireless Broadcast Fall 1390 Acknowledgments: Lecture slides are from the graduate level Computer Networks course thought by Srinivasan Seshan at CMU. When slides are
More informationClustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity
More informationFast Nearest Neighbor Search on Large Time-Evolving Graphs
Fast Nearest Neighbor Search on Large Time-Evolving Graphs Leman Akoglu Srinivasan Parthasarathy Rohit Khandekar Vibhore Kumar Deepak Rajan Kun-Lung Wu Graphs are everywhere Leman Akoglu Fast Nearest Neighbor
More informationLink Prediction Benchmarks
Link Prediction Benchmarks Haifeng Qian IBM T. J. Watson Research Center Yorktown Heights, NY qianhaifeng@us.ibm.com October 13, 2016 This document describes two temporal link prediction benchmarks that
More informationHigh Dimensional Indexing by Clustering
Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should
More informationPhoton Mapping. Kadi Bouatouch IRISA
Kadi Bouatouch IRISA Email: kadi@irisa.fr 1 Photon emission and transport 2 Photon caching 3 Spatial data structure for fast access 4 Radiance estimation 5 Kd-tree Balanced Binary Tree When a splitting
More informationChapter 4: Implicit Error Detection
4. Chpter 5 Chapter 4: Implicit Error Detection Contents 4.1 Introduction... 4-2 4.2 Network error correction... 4-2 4.3 Implicit error detection... 4-3 4.4 Mathematical model... 4-6 4.5 Simulation setup
More informationFloating-Point Arithmetic
Floating-Point Arithmetic Raymond J. Spiteri Lecture Notes for CMPT 898: Numerical Software University of Saskatchewan January 9, 2013 Objectives Floating-point numbers Floating-point arithmetic Analysis
More informationBalanced Box-Decomposition trees for Approximate nearest-neighbor. Manos Thanos (MPLA) Ioannis Emiris (Dept Informatics) Computational Geometry
Balanced Box-Decomposition trees for Approximate nearest-neighbor 11 Manos Thanos (MPLA) Ioannis Emiris (Dept Informatics) Computational Geometry Nearest Neighbor A set S of n points is given in some metric
More informationImproving Search In Peer-to-Peer Systems
Improving Search In Peer-to-Peer Systems Presented By Jon Hess cs294-4 Fall 2003 Goals Present alternative searching methods for systems with loose integrity constraints Probabilistically optimize over
More informationProbabilistic Modeling of Leach Protocol and Computing Sensor Energy Consumption Rate in Sensor Networks
Probabilistic Modeling of Leach Protocol and Computing Sensor Energy Consumption Rate in Sensor Networks Dezhen Song CS Department, Texas A&M University Technical Report: TR 2005-2-2 Email: dzsong@cs.tamu.edu
More informationK-Nearest Neighbour (Continued) Dr. Xiaowei Huang
K-Nearest Neighbour (Continued) Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ A few things: No lectures on Week 7 (i.e., the week starting from Monday 5 th November), and Week 11 (i.e., the week
More informationSensor Tasking and Control
Sensor Tasking and Control Outline Task-Driven Sensing Roles of Sensor Nodes and Utilities Information-Based Sensor Tasking Joint Routing and Information Aggregation Summary Introduction To efficiently
More informationComponent Based Performance Modelling of Wireless Routing Protocols
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE ICC 29 proceedings Component Based Performance Modelling of Wireless
More informationSequential Monte Carlo Method for counting vertex covers
Sequential Monte Carlo Method for counting vertex covers Slava Vaisman Faculty of Industrial Engineering and Management Technion, Israel Institute of Technology Haifa, Israel May 18, 2013 Slava Vaisman
More informationMarkov chain Monte Carlo methods
Markov chain Monte Carlo methods (supplementary material) see also the applet http://www.lbreyer.com/classic.html February 9 6 Independent Hastings Metropolis Sampler Outline Independent Hastings Metropolis
More informationClustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015
Clustering Subhransu Maji CMPSCI 689: Machine Learning 2 April 2015 7 April 2015 So far in the course Supervised learning: learning with a teacher You had training data which was (feature, label) pairs
More informationExact Computation of Influence Spread by Binary Decision Diagrams
Exact Computation of Influence Spread by Binary Decision Diagrams Takanori Maehara 1), Hirofumi Suzuki 2), Masakazu Ishihata 2) 1) Riken Center for Advanced Intelligence Project 2) Hokkaido University
More informationA Fault-Tolerant P2P-based Protocol for Logical Networks Interconnection
A Fault-Tolerant P2P-based Protocol for Logical Networks Interconnection Jaime Lloret 1, Juan R. Diaz 2, Fernando Boronat 3 and Jose M. Jiménez 4 Department of Communications, Polytechnic University of
More informationScalable Routing in Cyclic Mobile Networks
Scalable Routing in Cyclic Mobile Networks Cong Liu and Jie Wu Department of Computer Science and Engineering Florida Atlantic University Boca Raton, FL 3343 Abstract The non-existence of an end-to-end
More informationRandom Swap algorithm
Random Swap algorithm Pasi Fränti 24.4.2018 Definitions and data Set of N data points: X={x 1, x 2,, x N } Partition of the data: P={p 1, p 2,,p k }, Set of k cluster prototypes (centroids): C={c 1, c
More informationCorrection of Model Reduction Errors in Simulations
Correction of Model Reduction Errors in Simulations MUQ 15, June 2015 Antti Lipponen UEF // University of Eastern Finland Janne Huttunen Ville Kolehmainen University of Eastern Finland University of Eastern
More informationOnroad Vehicular Broadcast
Onroad Vehicular Broadcast Jesus Arango, Alon Efrat Computer Science Department University of Arizona Srinivasan Ramasubramanian, Marwan Krunz Electrical and Computer Engineering University of Arizona
More informationGeometric Registration for Deformable Shapes 3.3 Advanced Global Matching
Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching Correlated Correspondences [ASP*04] A Complete Registration System [HAW*08] In this session Advanced Global Matching Some practical
More informationRandomized algorithms. Inge Li Gørtz
Randomized algorithms Inge Li Gørtz 1 Randomized algorithms Today What are randomized algorithms? Properties of randomized algorithms Three examples: Median/Select. Quick-sort Closest pair of points 2
More informationCS 340 Lec. 4: K-Nearest Neighbors
CS 340 Lec. 4: K-Nearest Neighbors AD January 2011 AD () CS 340 Lec. 4: K-Nearest Neighbors January 2011 1 / 23 K-Nearest Neighbors Introduction Choice of Metric Overfitting and Underfitting Selection
More informationOn Graph Query Optimization in Large Networks
On Graph Query Optimization in Large Networks Peixiang Zhao, Jiawei Han Department of omputer Science University of Illinois at Urbana-hampaign pzhao4@illinois.edu, hanj@cs.uiuc.edu September 14th, 2010
More informationWeb Science & Technologies University of Koblenz Landau, Germany. Relational Data Model
Web Science & Technologies University of Koblenz Landau, Germany Relational Data Model Overview Relational data model; Tuples and relations; Schemas and instances; Named vs. unnamed perspective; Relational
More informationCentralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge
Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum
More informationLarge Scale Density-friendly Graph Decomposition via Convex Programming
Large Scale Density-friendly Graph Decomposition via Convex Programming Maximilien Danisch LTCI, Télécom ParisTech, Université Paris-Saclay, 753, Paris, France danisch@telecomparistech.fr T-H. Hubert Chan
More informationStatistical Matching using Fractional Imputation
Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:
More informationSILC: Efficient Query Processing on Spatial Networks
Hanan Samet hjs@cs.umd.edu Department of Computer Science University of Maryland College Park, MD 20742, USA Joint work with Jagan Sankaranarayanan and Houman Alborzi Proceedings of the 13th ACM International
More informationCS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS
CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS Overview of Networks Instructor: Yizhou Sun yzsun@cs.ucla.edu January 10, 2017 Overview of Information Network Analysis Network Representation Network
More informationAn Efficient Bayesian Nearest Neighbor Search Using Marginal Object Weight Ranking Scheme in Spatial Databases
Journal of Computer Science 8 (8): 1358-1363, 2012 ISSN 1549-3636 2012 Science Publications An Efficient Bayesian Nearest Neighbor Search Using Marginal Object Weight Ranking Scheme in Spatial Databases
More informationWe use non-bold capital letters for all random variables in these notes, whether they are scalar-, vector-, matrix-, or whatever-valued.
The Bayes Classifier We have been starting to look at the supervised classification problem: we are given data (x i, y i ) for i = 1,..., n, where x i R d, and y i {1,..., K}. In this section, we suppose
More informationInstance-Based Learning. Goals for the lecture
Instance-Based Learning Mar Craven and David Page Computer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed
More informationSemantic Importance Sampling for Statistical Model Checking
Semantic Importance Sampling for Statistical Model Checking Software Engineering Institute Carnegie Mellon University Pittsburgh, PA 15213 Jeffery Hansen, Lutz Wrage, Sagar Chaki, Dionisio de Niz, Mark
More informationCSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)"
CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Retrieval Models" Provide
More informationSection 7.12: Similarity. By: Ralucca Gera, NPS
Section 7.12: Similarity By: Ralucca Gera, NPS Motivation We talked about global properties Average degree, average clustering, ave path length We talked about local properties: Some node centralities
More informationThe BKZ algorithm. Joop van de Pol
The BKZ algorithm Department of Computer Science, University f Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB United Kingdom. May 9th, 2014 The BKZ algorithm Slide 1 utline Lattices
More informationApproximation Algorithms for Clustering Uncertain Data
Approximation Algorithms for Clustering Uncertain Data Graham Cormode AT&T Labs - Research graham@research.att.com Andrew McGregor UCSD / MSR / UMass Amherst andrewm@ucsd.edu Introduction Many applications
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationCPSC 340: Machine Learning and Data Mining. Finding Similar Items Fall 2017
CPSC 340: Machine Learning and Data Mining Finding Similar Items Fall 2017 Assignment 1 is due tonight. Admin 1 late day to hand in Monday, 2 late days for Wednesday. Assignment 2 will be up soon. Start
More informationA Improving Classification Quality in Uncertain Graphs
Page of 22 Journal of Data and Information Quality A Improving Classification Quality in Uncertain Graphs Michele Dallachiesa, University of Trento Charu C. Aggarwal, IBM T.J. Watson Research Center Themis
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationHomework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)
Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in
More informationThe Cross-Entropy Method
The Cross-Entropy Method Guy Weichenberg 7 September 2003 Introduction This report is a summary of the theory underlying the Cross-Entropy (CE) method, as discussed in the tutorial by de Boer, Kroese,
More informationComponent Based Performance Modelling of the Wireless Routing Protocols
The Institute for Systems Research ISR Technical Report 28-27 Component Based Performance Modelling of the Wireless Routing Protocols Vahid Tabatabaee, John S. Baras, Punyaslok Purkayastha, Kiran Somasundaram
More informationDiscussion. What problems stretch the limits of computation? Compare 4 Algorithms. What is Brilliance? 11/11/11
11/11/11 UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 0: Introduction to Computation Discussion Professor Andrea Arpaci-Dusseau Is there an inherent difference between What problems
More informationClustering. Unsupervised Learning
Clustering. Unsupervised Learning Maria-Florina Balcan 04/06/2015 Reading: Chapter 14.3: Hastie, Tibshirani, Friedman. Additional resources: Center Based Clustering: A Foundational Perspective. Awasthi,
More informationModern Methods of Data Analysis - WS 07/08
Modern Methods of Data Analysis Lecture XV (04.02.08) Contents: Function Minimization (see E. Lohrmann & V. Blobel) Optimization Problem Set of n independent variables Sometimes in addition some constraints
More informationLocality- Sensitive Hashing Random Projections for NN Search
Case Study 2: Document Retrieval Locality- Sensitive Hashing Random Projections for NN Search Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 18, 2017 Sham Kakade
More informationCPSC 340: Machine Learning and Data Mining. Hierarchical Clustering Fall 2016
CPSC 340: Machine Learning and Data Mining Hierarchical Clustering Fall 2016 Admin Assignment 1 : 3 late days to hand it in before Friday. 0 after that. Assignment 2 is out: Due Friday of next week, but
More informationData mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20
Data mining Piotr Paszek Classification k-nn Classifier (Piotr Paszek) Data mining k-nn 1 / 20 Plan of the lecture 1 Lazy Learner 2 k-nearest Neighbor Classifier 1 Distance (metric) 2 How to Determine
More informationClustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2
So far in the course Clustering Subhransu Maji : Machine Learning 2 April 2015 7 April 2015 Supervised learning: learning with a teacher You had training data which was (feature, label) pairs and the goal
More informationNode Similarity. Ralucca Gera, Applied Mathematics Dept. Naval Postgraduate School Monterey, California
Node Similarity Ralucca Gera, Applied Mathematics Dept. Naval Postgraduate School Monterey, California rgera@nps.edu Motivation We talked about global properties Average degree, average clustering, ave
More informationLinear Regression and K-Nearest Neighbors 3/28/18
Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression Hypothesis Space Supervised learning For every input in the data set, we know the output Regression Outputs are continuous A number,
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets: Inference Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationFast Reliability Search in Uncertain Graphs
Fast Reliability Search in Uncertain Graphs Arijit Khan, Francesco Bonchi, Aristides Gionis, Francesco Gullo Systems Group, ETH Zurich Yahoo Labs, Spain Aalto University, Finland ABSTRACT Uncertain, or
More informationNearest Neighbors Classifiers
Nearest Neighbors Classifiers Raúl Rojas Freie Universität Berlin July 2014 In pattern recognition we want to analyze data sets of many different types (pictures, vectors of health symptoms, audio streams,
More informationWarm-up as you walk in
arm-up as you walk in Given these N=10 observations of the world: hat is the approximate value for P c a, +b? A. 1/10 B. 5/10. 1/4 D. 1/5 E. I m not sure a, b, +c +a, b, +c a, b, +c a, +b, +c +a, b, +c
More informationBidirectional search and Goal-directed Dijkstra
Bidirectional search and Goal-directed Dijkstra Ferienakademie im Sarntal Course 2 Distance Problems: Theory and Praxis Kozyntsev A.N. Fakultät für Informatik TU München 26. September 2010 Kozyntsev A.N.:
More informationCS 268: Computer Networking. Taking Advantage of Broadcast
CS 268: Computer Networking L-12 Wireless Broadcast Taking Advantage of Broadcast Opportunistic forwarding Network coding Assigned reading XORs In The Air: Practical Wireless Network Coding ExOR: Opportunistic
More informationCS264: Homework #4. Due by midnight on Wednesday, October 22, 2014
CS264: Homework #4 Due by midnight on Wednesday, October 22, 2014 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. (2) Turn in your solutions
More informationPointMap: A real-time memory-based learning system with on-line and post-training pruning
PointMap: A real-time memory-based learning system with on-line and post-training pruning Norbert Kopčo and Gail A. Carpenter Department of Cognitive and Neural Systems, Boston University Boston, Massachusetts
More informationUninformed Search Strategies
Uninformed Search Strategies Alan Mackworth UBC CS 322 Search 2 January 11, 2013 Textbook 3.5 1 Today s Lecture Lecture 4 (2-Search1) Recap Uninformed search + criteria to compare search algorithms - Depth
More informationComputational complexity
Computational complexity Heuristic Algorithms Giovanni Righini University of Milan Department of Computer Science (Crema) Definitions: problems and instances A problem is a general question expressed in
More informationFast Nearest-neighbor Search in Disk-resident Graphs. February 2010 CMU-ML
Fast Nearest-neighbor Search in Disk-resident Graphs Purnamrita Sarkar Andrew W. Moore February 2010 CMU-ML-10-100 Fast Nearest-neighbor Search in Disk-resident Graphs Purnamrita Sarkar February 5, 2010
More informationOutline. Data Association Scenarios. Data Association Scenarios. Data Association Scenarios
Outline Data Association Scenarios Track Filtering and Gating Global Nearest Neighbor (GNN) Review: Linear Assignment Problem Murthy s k-best Assignments Algorithm Probabilistic Data Association (PDAF)
More informationReach for A : Efficient Point-to-Point Shortest Path Algorithms
Reach for A : Efficient Point-to-Point Shortest Path Algorithms Andrew V. Goldberg 1 Haim Kaplan 2 Renato F. Werneck 3 October 2005 Technical Report MSR-TR-2005-132 We study the point-to-point shortest
More informationBetweness Centrality ENDRIAS KAHSSAY
Betweness Centrality ENDRIAS KAHSSAY What is it? The centrality of a vertex is the fraction of the shortest paths that go through. A measure how important a vertex is in a Graph. An undirected graph colored
More informationGraph Algorithms. Revised based on the slides by Ruoming Kent State
Graph Algorithms Adapted from UMD Jimmy Lin s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States. See http://creativecommons.org/licenses/by-nc-sa/3.0/us/
More informationComputational Methods. Randomness and Monte Carlo Methods
Computational Methods Randomness and Monte Carlo Methods Manfred Huber 2010 1 Randomness and Monte Carlo Methods Introducing randomness in an algorithm can lead to improved efficiencies Random sampling
More informationInstance-Based Learning.
Instance-Based Learning www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts k-nn classification k-nn regression edited nearest neighbor k-d trees for nearest
More informationA Genetic Algorithm Framework
Fast, good, cheap. Pick any two. The Project Triangle 3 A Genetic Algorithm Framework In this chapter, we develop a genetic algorithm based framework to address the problem of designing optimal networks
More informationA Dijkstra-type algorithm for dynamic games
A Dijkstra-type algorithm for dynamic games Martino Bardi 1 Juan Pablo Maldonado Lopez 2 1 Dipartimento di Matematica, Università di Padova, Italy 2 formerly at Combinatoire et Optimisation, UPMC, France
More information