Covert and Anomalous Network Discovery and Detection (CANDiD)
|
|
- Gerald Dennis
- 5 years ago
- Views:
Transcription
1 Covert and Anomalous Network Discovery and Detection (CANDiD) Rajmonda S. Caceres, Edo Airoldi, Garrett Bernstein, Edward Kao, Benjamin A. Miller, Raj Rao Nadakuditi, Matthew C. Schmidt, Kenneth Senne, Steven T. Smith, Leah Weiner GraphEx 2014 This work is sponsored by the Assistant Secretary of Defense for Research & Engineering under Air Force Contract #FA C Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.
2 Program Motivation Network Detection Applications ISR Cyber Social Network Detection Techniques Signal Processing for Graphs Space-Time Threat Propagation Challenge: What are the performance bounds for graph detection techniques when used on application datasets? Presentation Name CANDiD - GraphEx Initials MM/DD/YY R.Author Caceres
3 Network Detection Performance State of Art Real-World Full Covert Networks Fidelity Empirical These Do Not Exist Closed-Form Analysis (complex model) Analytical Monte Carlo Simulations Empirical performance Real-world examples Single datasets Monte Carlo simulations Single Experiment Empirical Closed-Form Analysis (simple model) Analytical These Exist Empirical s 1000s Number of Cases Analytical performance Currently limited to simple models
4 Outline Motivation and Background Detection on Graphs Random Matrix Theoretic Bounds Inference and Estimation on Graphs Optimum Bayesian Network Detection Model Selection Future Work
5 Spectral Graph Detection Bounds Using Random Matrix Theory to Define Bounds Graph Model Graph Matrix as Low Rank Matrix plus Random Matrix Closed-Form Expression of Graph Spectrum Detection Bound Definitions Uniform Edge Probability (Erdos-Renyi) Clique ~Wigner A = k(1 p)uu T + random P(detect) = 1 if k 2 /n > p/(1 p) Arbitrary Expected Degrees (Chung-Lu) Degrees ~Wigner A = dd T /V + random N/A Community Structure (Stochastic Block) A = ½(c in +c out )11 T + ½(c in c out )uu T + random E-R Community ~Wigner Prior work with RMT has produced graph spectra and detection bounds for only simple models of background graphs
6 Expanding Random Matrix Theory to More Complex Graph Models 1 Complex Graph Model Spectrum with Blockmodel and Arbitrary Degrees Model: Community structure with arbitrary expected degree 1 Random matrix theory used to derive closed-form expression of graph model spectrum 2 Ongoing 1 P(detect) = 1 if N fg p fg > [½(c in +c out )] Background Model: Community structure with arbitrary expected degree Foreground Model: Clique PD N fg = 24 N fg = N fg p fg Academic collaborators have expanded theory to define spectrum and bounds for more expressive graph models 1 Brian Karrer, Mark E. J. Newman: Stochastic Block Models and community structure in networks, Phys. Rev. E (2011). 2 Xiao Zhang, Raj Rao Nadakuditi, Mark E. J. Newman: Spectra of random graphs with community structure and arbitrary degrees. Submitted to Phys. Rev E. (2014)
7 Outline Motivation and Background Detection on Graphs Random Matrix Theoretic Bounds Inference and Estimation on Graphs Optimum Bayesian Network Detection Model Selection Future Work
8 Hybrid Mixed-Membership Block Model Poisson Interaction Rate: Chung-Lu Model λ ij = λ iλ j λ π T i Bπ j I ij k k Mixed Membership Block Model Erdös-Rényi Model Observed Interaction Data: Log-likelihood function: l M ( i ) / X j2i ij =1 CL ij m ij P ois( ij T ) m ij is the number of interactions between i and j [m ij log T i B j CL ij T T i B j] Intuition: Setting the first derivative to zero shows that the log-likelihood is maximized when the observed interaction equals the expected interaction Hybrid Mixed-Membership Block model captures important properties of real-world network data m ij λij
9 Estimation Bounds for the Hybrid Mixed-Membership Block Model Fisher Information Derivation Cramér-Rao Bounds The inverse of the Fisher information (i.e. the Cramér-Rao bound) gives the minimal variance of any unbiased estimate of a membership vector Cramér-Rao bounds on membership estimation lead to theoretical bounds for the detection problem Line Leads - 10 of 12 CANDiD MCS 12/13/ GraphEx 2014
10 Outline Motivation and Background Detection on Graphs Random Matrix Theoretic Bounds Inference and Estimation on Graphs Optimum Bayesian Network Detection Model Selection Future Work
11 Optimal Bayesian Network Detection Stochastic Realization Threat Propagation Optimal Detector Neyman-Pearson optimal detector defined using likelihood-ratio test v - Threat at vertex v z u - Observation of threat at Threat exists at a node if it propagated thru the node to the observation of the threat The probability of threat at a vertex is averaged over all random walks u Threat at a vertex is a weighted sum of it s neighbors threat Efficient to solve for unknown threat given observed threat This is equal to probability of threat given observations The harmonic threat propagation values provide an optimal detection test for a given CFAR Harmonic threat propagation yields N independent tests Stochastic realization used to show the optimality of Harmonic Threat Propagation detection method for given diffusion models CANDiD Line Leads - GraphEx - 12 of R. MCS Caceres 12/13/2013 Smith et al., Bayesian Discovery of Threat Networks, IEEE Transaction on Signal Processing, 2013
12 Outline Motivation and Background Detection on Graphs Random Matrix Theoretic Bounds Optimum Bayesian Network Detection Inference and Estimation on Graphs Model Selection Future Work
13 Model Selection for Graphs Research Question Given an observed graph and a collection of generative graph models, what model is the observed graph the closest to? Technical Approach Construct graph ensembles using given generative models Embed each graph into feature space Build classifier Apply classifier to the observed graph M 1 M 2 M 3 f 2 Graph Feature Space Graph Instances f 1 G obs Discriminate generative models in topological feature space Presentation Name - 14 Author CANDiD Initials - GraphEx MM/DD/YY 2014
14 Model Selection for Graphs Realistic Models - Community structure - Power law - Sparsity - Large n Topological Measures - Density, triad count, etc Random Forest - Robust to noise - Computationally feasible GRAPH MODELS GRAPH ENSEMBLE GENERATION GRAPH INSTANCES FEATURE EMBEDDING FEATURE DATA GRAPH CLASSIFICATION MODEL GRAPH CLASSIFIER Theoretical Detectability bound Presentation Name - 15 Author CANDiD Initials - GraphEx MM/DD/YY 2014
15 Outline Motivation and Background Detection on Graphs Future Work
16 Operational Network Detection Operational CANDiD Community Structure Commercial HADR DoD/IC Cyber Topological Realism CANDiD Cliques Single Source Perfect Knowledge Multi-Source Data Realism Data Uncertainty
17 Multisource and Uncertainty Preliminary Results o v ψ 3 Multi-INT: Detection Threat Network o MOVINT track P2 D5 3 P1 D2 P3 P5 D6 4 1 D3 P4 D7 ψ 1 o v Positive + entries Adjacency Matrix (Symmetric) D1 D4 2 GEOINT TXT ψ 2 SIGINT ψ 4 Bayesian model Multi-INT Threat at v determined by neighboring threat P(o) P(2 o) P(3 o) P(4 o) P(5 o) P(v o) x b x i = ψ v v P(o) P(2 o) P(3 o) P(4 o) P(5 o) P(v o) Threat Probabilities Multi-INT: Aggregation Detect communities based on various observed types of interactions Boosting approach enables community detection via aggregation Performance meets the theoretical bound for the underlying graph Probability of Detection Uncertainty Cluster Detection in Web of Science fusion significantly reduces false alarms Probability of False Alarm Uncertainty models Missing edges False edges Missing vertices Metadata confusion Hypothesis: Data fusion recovers performance e P(e) 1/2 (1,2) 1/2 Recent efforts demonstrate potential of multi-observation fusion Operational CANDiD will lead to relevant performance bounds Smith et al., IEEE Trans. SP, 2014; Collins and Smith, Proc. FUSION, 2014 Caceres et al., Proc. SDM Workshop Mining Large Networks and Graphs, 2014 Miller and Arcolano, Proc. ICASSP, 2014
18 Papers Xiao Zhang, Raj Rao Nadakuditi, Mark E. J. Newman: Spectra of random graphs with community structure and arbitrary degrees. Phys. Rev E. (2014). Rajmonda S. Caceres, Kevin C. Carter, Jeremy Kun, A Boosting Approach to Learning Graph Representations, SDM Workshop: Mining Large Networks and Graphs, (2014). Benjamin A. Miller, Nicholas Arcolano, Spectral subgraph detection with corrupt observations, Proc. ICASSP, 2014 Steven T. Smith, Edward K. Kao, Kenneth D. Senne, Garret Bernstein, Scott Philips, Bayesian Discovery of Threat Networks, IEEE Transaction on Signal Processing, (2013). Benjamin A. Miller, Nadya T. Bliss, Patrick J. Wolfe, and Michele S. Beard, Detection Theory for Graphs, Lincoln Laboratory Journal, (2013). THANK YOU
19 Candid - GraphEx 2014 RSC 08/21/2014 BACKUP
20 CANDiD Overview Current Evaluation Method Uncertainty CANDiD Focus Areas Closed-Form Bounds Run detection method on observed data Validate any detections Could another detection method have done better? Random Matrix Theory Optimal Network Detection Membership Estimation Bounds Model Selection Run detection method on many simulated graphs Evaluate performance across simulations Does detection performance transfer to observed data? Topological Classification Establishing closed-form bounds and a framework for model selection address uncertainties of current evaluation methods Line Leads - 21 of 12 CANDiD MCS 12/13/ GraphEx 2014
21 Space-Time Threat Propagation* Space Space-Time Graph G T = (V T, E T ) Non-Threat Inferred Threat Bayesian Prior: Observed Threat Time Optimum Neyman-Pearson network detection Maximizes Probability of Detection 80 STTP BFS Algorithm: For each node, propagate threat belief to neighboring nodes Harmonic analysis on graph Comparable to Google s Page Rank algorithm Space-Time graph analytics infers threat at network sites based on Bayesian propagation model PD (%) 40 Random 0 0 NFA 200 λ P(site s i red) = α s N(s i ) v N (s i ) P(v red) + (1 λ) max v N (s i ) P(v red) Candid - GraphEx 2014 RSC 08/21/2014 *Smith et al. Lincoln Laboratory Journal (2012)
Anomaly Detection in Very Large Graphs Modeling and Computational Considerations
Anomaly Detection in Very Large Graphs Modeling and Computational Considerations Benjamin A. Miller, Nicholas Arcolano, Edward M. Rutledge and Matthew C. Schmidt MIT Lincoln Laboratory Nadya T. Bliss ASURE
More informationSPECTRAL SUBGRAPH DETECTION WITH CORRUPT OBSERVATIONS. Benjamin A. Miller and Nicholas Arcolano
204 IEEE International Conference on Acoustic Speech and Signal Processing (ICASSP) SPECTRAL SUBGRAPH DETECTION WITH CORRUPT OBSERVATIONS Benjamin A. Miller and Nicholas Arcolano Lincoln Laboratory Massachusetts
More informationScott Philips, Edward Kao, Michael Yee and Christian Anderson. Graph Exploitation Symposium August 9 th 2011
Activity-Based Community Detection Scott Philips, Edward Kao, Michael Yee and Christian Anderson Graph Exploitation Symposium August 9 th 2011 23-1 This work is sponsored by the Office of Naval Research
More informationDetection Theory for Graphs
Detection Theory for Graphs Benjamin A. Miller, Nadya T. Bliss, Patrick J. Wolfe, and Michelle S. Beard Graphs are fast emerging as a common data structure used in many scientific and engineering fields.
More informationEigenspace Analysis for Threat Detection in Social Networks
14th International Conference on Information Fusion Chicago, Illinois, USA, July 5-8, 011 Eigenspace Analysis for Threat Detection in Social Networks Benjamin A. Miller, Michelle S. Beard and Nadya T.
More informationPerfect Power Law Graphs: Generation, Sampling, Construction, and Fitting
Perfect Power Law Graphs: Generation, Sampling, Construction, and Fitting Jeremy Kepner SIAM Annual Meeting, Minneapolis, July 9, 2012 This work is sponsored by the Department of the Air Force under Air
More informationSub-Graph Detection Theory
Sub-Graph Detection Theory Jeremy Kepner, Nadya Bliss, and Eric Robinson This work is sponsored by the Department of Defense under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions,
More informationAnalysis and Mapping of Sparse Matrix Computations
Analysis and Mapping of Sparse Matrix Computations Nadya Bliss & Sanjeev Mohindra Varun Aggarwal & Una-May O Reilly MIT Computer Science and AI Laboratory September 19th, 2007 HPEC2007-1 This work is sponsored
More informationSpectral Methods for Network Community Detection and Graph Partitioning
Spectral Methods for Network Community Detection and Graph Partitioning M. E. J. Newman Department of Physics, University of Michigan Presenters: Yunqi Guo Xueyin Yu Yuanqi Li 1 Outline: Community Detection
More informationSampling Large Graphs for Anticipatory Analysis
Sampling Large Graphs for Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance Extreme Computing Conference September 16, 2015
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationSocial Behavior Prediction Through Reality Mining
Social Behavior Prediction Through Reality Mining Charlie Dagli, William Campbell, Clifford Weinstein Human Language Technology Group MIT Lincoln Laboratory This work was sponsored by the DDR&E / RRTO
More informationGraph Exploitation Testbed
Graph Exploitation Testbed Peter Jones and Eric Robinson Graph Exploitation Symposium April 18, 2012 This work was sponsored by the Office of Naval Research under Air Force Contract FA8721-05-C-0002. Opinions,
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationStreaming Graph Challenge: Stochastic Block Partition - draft - Steven Smith, Edward Kao, Jeremy Kepner, Michael Hurley, Sanjeev Mohindra
Streaming Graph Challenge: Stochastic Block Partition - draft - Steven Smith, Edward Kao, Jeremy Kepner, Michael Hurley, Sanjeev Mohindra http://graphchallenge.org Outline Introduction Data Sets Graph
More informationSampling informative/complex a priori probability distributions using Gibbs sampling assisted by sequential simulation
Sampling informative/complex a priori probability distributions using Gibbs sampling assisted by sequential simulation Thomas Mejer Hansen, Klaus Mosegaard, and Knud Skou Cordua 1 1 Center for Energy Resources
More informationSTATISTICS (STAT) Statistics (STAT) 1
Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).
More informationIntroduction to network metrics
Universitat Politècnica de Catalunya Version 0.5 Complex and Social Networks (2018-2019) Master in Innovation and Research in Informatics (MIRI) Instructors Argimiro Arratia, argimiro@cs.upc.edu, http://www.cs.upc.edu/~argimiro/
More informationLevel-set MCMC Curve Sampling and Geometric Conditional Simulation
Level-set MCMC Curve Sampling and Geometric Conditional Simulation Ayres Fan John W. Fisher III Alan S. Willsky February 16, 2007 Outline 1. Overview 2. Curve evolution 3. Markov chain Monte Carlo 4. Curve
More informationLesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008
Lesson 4 Random graphs Sergio Barbarossa Graph models 1. Uncorrelated random graph (Erdős, Rényi) N nodes are connected through n edges which are chosen randomly from the possible configurations 2. Binomial
More informationScalable Clustering of Signed Networks Using Balance Normalized Cut
Scalable Clustering of Signed Networks Using Balance Normalized Cut Kai-Yang Chiang,, Inderjit S. Dhillon The 21st ACM International Conference on Information and Knowledge Management (CIKM 2012) Oct.
More informationCS281 Section 9: Graph Models and Practical MCMC
CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs
More informationDiffusion and Clustering on Large Graphs
Diffusion and Clustering on Large Graphs Alexander Tsiatas Thesis Proposal / Advancement Exam 8 December 2011 Introduction Graphs are omnipresent in the real world both natural and man-made Examples of
More informationSeparating Objects and Clutter in Indoor Scenes
Separating Objects and Clutter in Indoor Scenes Salman H. Khan School of Computer Science & Software Engineering, The University of Western Australia Co-authors: Xuming He, Mohammed Bennamoun, Ferdous
More informationLLMORE: Mapping and Optimization Framework
LORE: Mapping and Optimization Framework Michael Wolf, MIT Lincoln Laboratory 11 September 2012 This work is sponsored by Defense Advanced Research Projects Agency (DARPA) under Air Force contract FA8721-05-C-0002.
More informationRecap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach
Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)
More informationData fusion and multi-cue data matching using diffusion maps
Data fusion and multi-cue data matching using diffusion maps Stéphane Lafon Collaborators: Raphy Coifman, Andreas Glaser, Yosi Keller, Steven Zucker (Yale University) Part of this work was supported by
More informationAn Empirical Analysis of Communities in Real-World Networks
An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization
More informationEmpirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee
A first model of learning Let s restrict our attention to binary classification our labels belong to (or ) Empirical risk minimization (ERM) Recall the definitions of risk/empirical risk We observe the
More informationBasics of Network Analysis
Basics of Network Analysis Hiroki Sayama sayama@binghamton.edu Graph = Network G(V, E): graph (network) V: vertices (nodes), E: edges (links) 1 Nodes = 1, 2, 3, 4, 5 2 3 Links = 12, 13, 15, 23,
More informationCluster-based 3D Reconstruction of Aerial Video
Cluster-based 3D Reconstruction of Aerial Video Scott Sawyer (scott.sawyer@ll.mit.edu) MIT Lincoln Laboratory HPEC 12 12 September 2012 This work is sponsored by the Assistant Secretary of Defense for
More informationDistributed Detection in Sensor Networks: Connectivity Graph and Small World Networks
Distributed Detection in Sensor Networks: Connectivity Graph and Small World Networks SaeedA.AldosariandJoséM.F.Moura Electrical and Computer Engineering Department Carnegie Mellon University 5000 Forbes
More informationChapter 11. Network Community Detection
Chapter 11. Network Community Detection Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Outline
More informationPath Length. 2) Verification of the Algorithm and Code
Path Length ) Introduction In calculating the average path length, we must find the shortest path from a source node to all other nodes contained within the graph. Previously, we found that by using an
More informationDS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK232 Fall 2016 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna,
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationClustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract
Clustering Sequences with Hidden Markov Models Padhraic Smyth Information and Computer Science University of California, Irvine CA 92697-3425 smyth@ics.uci.edu Abstract This paper discusses a probabilistic
More informationTable Of Contents: xix Foreword to Second Edition
Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data
More informationModeling Movie Character Networks with Random Graphs
Modeling Movie Character Networks with Random Graphs Andy Chen December 2017 Introduction For a novel or film, a character network is a graph whose nodes represent individual characters and whose edges
More informationTuring Workshop on Statistics of Network Analysis
Turing Workshop on Statistics of Network Analysis Day 1: 29 May 9:30-10:00 Registration & Coffee 10:00-10:45 Eric Kolaczyk Title: On the Propagation of Uncertainty in Network Summaries Abstract: While
More informationSignal Processing on Databases
Signal Processing on Databases Jeremy Kepner Lecture 0: Introduction 3 October 2012 This work is sponsored by the Department of the Air Force under Air Force Contract #FA8721-05-C-0002. Opinions, interpretations,
More informationCycles in Random Graphs
Cycles in Random Graphs Valery Van Kerrebroeck Enzo Marinari, Guilhem Semerjian [Phys. Rev. E 75, 066708 (2007)] [J. Phys. Conf. Series 95, 012014 (2008)] Outline Introduction Statistical Mechanics Approach
More informationThis paper describes an analytical approach to the parametric analysis of target/decoy
Parametric analysis of target/decoy performance1 John P. Kerekes Lincoln Laboratory, Massachusetts Institute of Technology 244 Wood Street Lexington, Massachusetts 02173 ABSTRACT As infrared sensing technology
More informationDataSToRM: Data Science and Technology Research Environment
The Future of Advanced (Secure) Computing DataSToRM: Data Science and Technology Research Environment This material is based upon work supported by the Assistant Secretary of Defense for Research and Engineering
More informationCollective classification in network data
1 / 50 Collective classification in network data Seminar on graphs, UCSB 2009 Outline 2 / 50 1 Problem 2 Methods Local methods Global methods 3 Experiments Outline 3 / 50 1 Problem 2 Methods Local methods
More informationPart I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a
Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More informationThe Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing
UW Biostatistics Working Paper Series 9-6-2005 The Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing John D. Storey University of Washington, jstorey@u.washington.edu Suggested
More informationnode2vec: Scalable Feature Learning for Networks
node2vec: Scalable Feature Learning for Networks A paper by Aditya Grover and Jure Leskovec, presented at Knowledge Discovery and Data Mining 16. 11/27/2018 Presented by: Dharvi Verma CS 848: Graph Database
More informationClassification. 1 o Semestre 2007/2008
Classification Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 Single-Class
More informationIntro to Random Graphs and Exponential Random Graph Models
Intro to Random Graphs and Exponential Random Graph Models Danielle Larcomb University of Denver Danielle Larcomb Random Graphs 1/26 Necessity of Random Graphs The study of complex networks plays an increasingly
More informationMultivariate Data Analysis and Machine Learning in High Energy Physics (V)
Multivariate Data Analysis and Machine Learning in High Energy Physics (V) Helge Voss (MPI K, Heidelberg) Graduierten-Kolleg, Freiburg, 11.5-15.5, 2009 Outline last lecture Rule Fitting Support Vector
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationPassive Differential Matched-field Depth Estimation of Moving Acoustic Sources
Lincoln Laboratory ASAP-2001 Workshop Passive Differential Matched-field Depth Estimation of Moving Acoustic Sources Shawn Kraut and Jeffrey Krolik Duke University Department of Electrical and Computer
More informationKernels for Structured Data
T-122.102 Special Course in Information Science VI: Co-occurence methods in analysis of discrete data Kernels for Structured Data Based on article: A Survey of Kernels for Structured Data by Thomas Gärtner
More informationTopological Classification of Data Sets without an Explicit Metric
Topological Classification of Data Sets without an Explicit Metric Tim Harrington, Andrew Tausz and Guillaume Troianowski December 10, 2008 A contemporary problem in data analysis is understanding the
More informationSD 372 Pattern Recognition
SD 372 Pattern Recognition Lab 2: Model Estimation and Discriminant Functions 1 Purpose This lab examines the areas of statistical model estimation and classifier aggregation. Model estimation will be
More informationSparse Matrix Partitioning for Parallel Eigenanalysis of Large Static and Dynamic Graphs
Sparse Matrix Partitioning for Parallel Eigenanalysis of Large Static and Dynamic Graphs Michael M. Wolf and Benjamin A. Miller Lincoln Laboratory Massachusetts Institute of Technology Lexington, MA 02420
More informationStatistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes.
Outliers Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Concepts What is an outlier? The set of data points that are considerably different than the remainder of the
More informationContents. Foreword to Second Edition. Acknowledgments About the Authors
Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1
More informationWorkshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient
Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient quality) 3. I suggest writing it on one presentation. 4. Include
More informationLoopy Belief Propagation
Loopy Belief Propagation Research Exam Kristin Branson September 29, 2003 Loopy Belief Propagation p.1/73 Problem Formalization Reasoning about any real-world problem requires assumptions about the structure
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationGT "Calcul Ensembliste"
GT "Calcul Ensembliste" Beyond the bounded error framework for non linear state estimation Fahed Abdallah Université de Technologie de Compiègne 9 Décembre 2010 Fahed Abdallah GT "Calcul Ensembliste" 9
More informationMulti-Centrality Graph Spectral Decompositions and Their Application to Cyber Intrusion Detection
Multi-Centrality Graph Spectral Decompositions and Their Application to Cyber Intrusion Detection Dr. Pin-Yu Chen 1 Dr. Sutanay Choudhury 2 Prof. Alfred Hero 1 1 Department of Electrical Engineering and
More informationStatistical Physics of Community Detection
Statistical Physics of Community Detection Keegan Go (keegango), Kenji Hata (khata) December 8, 2015 1 Introduction Community detection is a key problem in network science. Identifying communities, defined
More informationStochastic Blockmodels as an unsupervised approach to detect botnet infected clusters in networked data
Stochastic Blockmodels as an unsupervised approach to detect botnet infected clusters in networked data Mark Patrick Roeling & Geoff Nicholls Department of Statistics University of Oxford Data Science
More informationGraph similarity. Laura Zager and George Verghese EECS, MIT. March 2005
Graph similarity Laura Zager and George Verghese EECS, MIT March 2005 Words you won t hear today impedance matching thyristor oxide layer VARs Some quick definitions GV (, E) a graph G V the set of vertices
More informationReddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011
Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions
More informationThe HPEC Challenge Benchmark Suite
The HPEC Challenge Benchmark Suite Ryan Haney, Theresa Meuse, Jeremy Kepner and James Lebak Massachusetts Institute of Technology Lincoln Laboratory HPEC 2005 This work is sponsored by the Defense Advanced
More informationSummary: What We Have Learned So Far
Summary: What We Have Learned So Far small-world phenomenon Real-world networks: { Short path lengths High clustering Broad degree distributions, often power laws P (k) k γ Erdös-Renyi model: Short path
More informationEFFICIENT BAYESIAN INFERENCE USING FULLY CONNECTED CONDITIONAL RANDOM FIELDS WITH STOCHASTIC CLIQUES. M. J. Shafiee, A. Wong, P. Siva, P.
EFFICIENT BAYESIAN INFERENCE USING FULLY CONNECTED CONDITIONAL RANDOM FIELDS WITH STOCHASTIC CLIQUES M. J. Shafiee, A. Wong, P. Siva, P. Fieguth Vision & Image Processing Lab, System Design Engineering
More informationAnomaly Detection on Data Streams with High Dimensional Data Environment
Anomaly Detection on Data Streams with High Dimensional Data Environment Mr. D. Gokul Prasath 1, Dr. R. Sivaraj, M.E, Ph.D., 2 Department of CSE, Velalar College of Engineering & Technology, Erode 1 Assistant
More informationHeterogeneity Increases Multicast Capacity in Clustered Network
Heterogeneity Increases Multicast Capacity in Clustered Network Qiuyu Peng Xinbing Wang Huan Tang Department of Electronic Engineering Shanghai Jiao Tong University April 15, 2010 Infocom 2011 1 / 32 Outline
More informationLecture 9: Undirected Graphical Models Machine Learning
Lecture 9: Undirected Graphical Models Machine Learning Andrew Rosenberg March 5, 2010 1/1 Today Graphical Models Probabilities in Undirected Graphs 2/1 Undirected Graphs What if we allow undirected graphs?
More informationDomain Adaptation For Mobile Robot Navigation
Domain Adaptation For Mobile Robot Navigation David M. Bradley, J. Andrew Bagnell Robotics Institute Carnegie Mellon University Pittsburgh, 15217 dbradley, dbagnell@rec.ri.cmu.edu 1 Introduction An important
More informationLearning decomposable models with a bounded clique size
Learning decomposable models with a bounded clique size Achievements 2014-2016 Aritz Pérez Basque Center for Applied Mathematics Bilbao, March, 2016 Outline 1 Motivation and background 2 The problem 3
More informationRanking Algorithms For Digital Forensic String Search Hits
DIGITAL FORENSIC RESEARCH CONFERENCE Ranking Algorithms For Digital Forensic String Search Hits By Nicole Beebe and Lishu Liu Presented At The Digital Forensic Research Conference DFRWS 2014 USA Denver,
More informationSocial-Network Graphs
Social-Network Graphs Mining Social Networks Facebook, Google+, Twitter Email Networks, Collaboration Networks Identify communities Similar to clustering Communities usually overlap Identify similarities
More informationImpact of Clustering on Epidemics in Random Networks
Impact of Clustering on Epidemics in Random Networks Joint work with Marc Lelarge INRIA-ENS 8 March 2012 Coupechoux - Lelarge (INRIA-ENS) Epidemics in Random Networks 8 March 2012 1 / 19 Outline 1 Introduction
More informationMixture Models and the EM Algorithm
Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is
More informationData mining --- mining graphs
Data mining --- mining graphs University of South Florida Xiaoning Qian Today s Lecture 1. Complex networks 2. Graph representation for networks 3. Markov chain 4. Viral propagation 5. Google s PageRank
More informationDS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK 233 Spring 2018 Service Providing Improve urban planning, Ease Traffic Congestion, Save Energy,
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationEffective Latent Space Graph-based Re-ranking Model with Global Consistency
Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case
More informationOff-Line and Real-Time Methods for ML-PDA Target Validation
Off-Line and Real-Time Methods for ML-PDA Target Validation Wayne R. Blanding*, Member, IEEE, Peter K. Willett, Fellow, IEEE and Yaakov Bar-Shalom, Fellow, IEEE 1 Abstract We present two procedures for
More informationData Science Center Eindhoven. The Mathematics Behind Big Data. Alessandro Di Bucchianico
Data Science Center Eindhoven The Mathematics Behind Big Data Alessandro Di Bucchianico 4TU AMI SRO Big Data Meeting Big Data: Mathematics in Action! November 24, 2017 Outline Big Data Some real-life examples
More informationComplex-Network Modelling and Inference
Complex-Network Modelling and Inference Lecture 8: Graph features (2) Matthew Roughan http://www.maths.adelaide.edu.au/matthew.roughan/notes/ Network_Modelling/ School
More informationPhysics 736. Experimental Methods in Nuclear-, Particle-, and Astrophysics. - Statistical Methods -
Physics 736 Experimental Methods in Nuclear-, Particle-, and Astrophysics - Statistical Methods - Karsten Heeger heeger@wisc.edu Course Schedule and Reading course website http://neutrino.physics.wisc.edu/teaching/phys736/
More informationOn the Approximability of Modularity Clustering
On the Approximability of Modularity Clustering Newman s Community Finding Approach for Social Nets Bhaskar DasGupta Department of Computer Science University of Illinois at Chicago Chicago, IL 60607,
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.
More informationRandom projection for non-gaussian mixture models
Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,
More informationEfficient Iterative Semi-supervised Classification on Manifold
. Efficient Iterative Semi-supervised Classification on Manifold... M. Farajtabar, H. R. Rabiee, A. Shaban, A. Soltani-Farani Sharif University of Technology, Tehran, Iran. Presented by Pooria Joulani
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning
More informationAn Advanced Graph Processor Prototype
An Advanced Graph Processor Prototype Vitaliy Gleyzer GraphEx 2016 DISTRIBUTION STATEMENT A. Approved for public release: distribution unlimited. This material is based upon work supported by the Assistant
More informationEnhanced Six Sigma with Uncertainty Quantification. Mark Andrews SmartUQ, Madison WI
Enhanced Six Sigma with Uncertainty Quantification Mark Andrews SmartUQ, Madison WI ASQ World Conference Session T05 May 1, 2017 Learning Objectives In this session you will: Learn basic concepts of Uncertainty
More informationResponse Network Emerging from Simple Perturbation
Journal of the Korean Physical Society, Vol 44, No 3, March 2004, pp 628 632 Response Network Emerging from Simple Perturbation S-W Son, D-H Kim, Y-Y Ahn and H Jeong Department of Physics, Korea Advanced
More information