Covert and Anomalous Network Discovery and Detection (CANDiD)

Size: px
Start display at page:

Download "Covert and Anomalous Network Discovery and Detection (CANDiD)"

Transcription

1 Covert and Anomalous Network Discovery and Detection (CANDiD) Rajmonda S. Caceres, Edo Airoldi, Garrett Bernstein, Edward Kao, Benjamin A. Miller, Raj Rao Nadakuditi, Matthew C. Schmidt, Kenneth Senne, Steven T. Smith, Leah Weiner GraphEx 2014 This work is sponsored by the Assistant Secretary of Defense for Research & Engineering under Air Force Contract #FA C Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.

2 Program Motivation Network Detection Applications ISR Cyber Social Network Detection Techniques Signal Processing for Graphs Space-Time Threat Propagation Challenge: What are the performance bounds for graph detection techniques when used on application datasets? Presentation Name CANDiD - GraphEx Initials MM/DD/YY R.Author Caceres

3 Network Detection Performance State of Art Real-World Full Covert Networks Fidelity Empirical These Do Not Exist Closed-Form Analysis (complex model) Analytical Monte Carlo Simulations Empirical performance Real-world examples Single datasets Monte Carlo simulations Single Experiment Empirical Closed-Form Analysis (simple model) Analytical These Exist Empirical s 1000s Number of Cases Analytical performance Currently limited to simple models

4 Outline Motivation and Background Detection on Graphs Random Matrix Theoretic Bounds Inference and Estimation on Graphs Optimum Bayesian Network Detection Model Selection Future Work

5 Spectral Graph Detection Bounds Using Random Matrix Theory to Define Bounds Graph Model Graph Matrix as Low Rank Matrix plus Random Matrix Closed-Form Expression of Graph Spectrum Detection Bound Definitions Uniform Edge Probability (Erdos-Renyi) Clique ~Wigner A = k(1 p)uu T + random P(detect) = 1 if k 2 /n > p/(1 p) Arbitrary Expected Degrees (Chung-Lu) Degrees ~Wigner A = dd T /V + random N/A Community Structure (Stochastic Block) A = ½(c in +c out )11 T + ½(c in c out )uu T + random E-R Community ~Wigner Prior work with RMT has produced graph spectra and detection bounds for only simple models of background graphs

6 Expanding Random Matrix Theory to More Complex Graph Models 1 Complex Graph Model Spectrum with Blockmodel and Arbitrary Degrees Model: Community structure with arbitrary expected degree 1 Random matrix theory used to derive closed-form expression of graph model spectrum 2 Ongoing 1 P(detect) = 1 if N fg p fg > [½(c in +c out )] Background Model: Community structure with arbitrary expected degree Foreground Model: Clique PD N fg = 24 N fg = N fg p fg Academic collaborators have expanded theory to define spectrum and bounds for more expressive graph models 1 Brian Karrer, Mark E. J. Newman: Stochastic Block Models and community structure in networks, Phys. Rev. E (2011). 2 Xiao Zhang, Raj Rao Nadakuditi, Mark E. J. Newman: Spectra of random graphs with community structure and arbitrary degrees. Submitted to Phys. Rev E. (2014)

7 Outline Motivation and Background Detection on Graphs Random Matrix Theoretic Bounds Inference and Estimation on Graphs Optimum Bayesian Network Detection Model Selection Future Work

8 Hybrid Mixed-Membership Block Model Poisson Interaction Rate: Chung-Lu Model λ ij = λ iλ j λ π T i Bπ j I ij k k Mixed Membership Block Model Erdös-Rényi Model Observed Interaction Data: Log-likelihood function: l M ( i ) / X j2i ij =1 CL ij m ij P ois( ij T ) m ij is the number of interactions between i and j [m ij log T i B j CL ij T T i B j] Intuition: Setting the first derivative to zero shows that the log-likelihood is maximized when the observed interaction equals the expected interaction Hybrid Mixed-Membership Block model captures important properties of real-world network data m ij λij

9 Estimation Bounds for the Hybrid Mixed-Membership Block Model Fisher Information Derivation Cramér-Rao Bounds The inverse of the Fisher information (i.e. the Cramér-Rao bound) gives the minimal variance of any unbiased estimate of a membership vector Cramér-Rao bounds on membership estimation lead to theoretical bounds for the detection problem Line Leads - 10 of 12 CANDiD MCS 12/13/ GraphEx 2014

10 Outline Motivation and Background Detection on Graphs Random Matrix Theoretic Bounds Inference and Estimation on Graphs Optimum Bayesian Network Detection Model Selection Future Work

11 Optimal Bayesian Network Detection Stochastic Realization Threat Propagation Optimal Detector Neyman-Pearson optimal detector defined using likelihood-ratio test v - Threat at vertex v z u - Observation of threat at Threat exists at a node if it propagated thru the node to the observation of the threat The probability of threat at a vertex is averaged over all random walks u Threat at a vertex is a weighted sum of it s neighbors threat Efficient to solve for unknown threat given observed threat This is equal to probability of threat given observations The harmonic threat propagation values provide an optimal detection test for a given CFAR Harmonic threat propagation yields N independent tests Stochastic realization used to show the optimality of Harmonic Threat Propagation detection method for given diffusion models CANDiD Line Leads - GraphEx - 12 of R. MCS Caceres 12/13/2013 Smith et al., Bayesian Discovery of Threat Networks, IEEE Transaction on Signal Processing, 2013

12 Outline Motivation and Background Detection on Graphs Random Matrix Theoretic Bounds Optimum Bayesian Network Detection Inference and Estimation on Graphs Model Selection Future Work

13 Model Selection for Graphs Research Question Given an observed graph and a collection of generative graph models, what model is the observed graph the closest to? Technical Approach Construct graph ensembles using given generative models Embed each graph into feature space Build classifier Apply classifier to the observed graph M 1 M 2 M 3 f 2 Graph Feature Space Graph Instances f 1 G obs Discriminate generative models in topological feature space Presentation Name - 14 Author CANDiD Initials - GraphEx MM/DD/YY 2014

14 Model Selection for Graphs Realistic Models - Community structure - Power law - Sparsity - Large n Topological Measures - Density, triad count, etc Random Forest - Robust to noise - Computationally feasible GRAPH MODELS GRAPH ENSEMBLE GENERATION GRAPH INSTANCES FEATURE EMBEDDING FEATURE DATA GRAPH CLASSIFICATION MODEL GRAPH CLASSIFIER Theoretical Detectability bound Presentation Name - 15 Author CANDiD Initials - GraphEx MM/DD/YY 2014

15 Outline Motivation and Background Detection on Graphs Future Work

16 Operational Network Detection Operational CANDiD Community Structure Commercial HADR DoD/IC Cyber Topological Realism CANDiD Cliques Single Source Perfect Knowledge Multi-Source Data Realism Data Uncertainty

17 Multisource and Uncertainty Preliminary Results o v ψ 3 Multi-INT: Detection Threat Network o MOVINT track P2 D5 3 P1 D2 P3 P5 D6 4 1 D3 P4 D7 ψ 1 o v Positive + entries Adjacency Matrix (Symmetric) D1 D4 2 GEOINT TXT ψ 2 SIGINT ψ 4 Bayesian model Multi-INT Threat at v determined by neighboring threat P(o) P(2 o) P(3 o) P(4 o) P(5 o) P(v o) x b x i = ψ v v P(o) P(2 o) P(3 o) P(4 o) P(5 o) P(v o) Threat Probabilities Multi-INT: Aggregation Detect communities based on various observed types of interactions Boosting approach enables community detection via aggregation Performance meets the theoretical bound for the underlying graph Probability of Detection Uncertainty Cluster Detection in Web of Science fusion significantly reduces false alarms Probability of False Alarm Uncertainty models Missing edges False edges Missing vertices Metadata confusion Hypothesis: Data fusion recovers performance e P(e) 1/2 (1,2) 1/2 Recent efforts demonstrate potential of multi-observation fusion Operational CANDiD will lead to relevant performance bounds Smith et al., IEEE Trans. SP, 2014; Collins and Smith, Proc. FUSION, 2014 Caceres et al., Proc. SDM Workshop Mining Large Networks and Graphs, 2014 Miller and Arcolano, Proc. ICASSP, 2014

18 Papers Xiao Zhang, Raj Rao Nadakuditi, Mark E. J. Newman: Spectra of random graphs with community structure and arbitrary degrees. Phys. Rev E. (2014). Rajmonda S. Caceres, Kevin C. Carter, Jeremy Kun, A Boosting Approach to Learning Graph Representations, SDM Workshop: Mining Large Networks and Graphs, (2014). Benjamin A. Miller, Nicholas Arcolano, Spectral subgraph detection with corrupt observations, Proc. ICASSP, 2014 Steven T. Smith, Edward K. Kao, Kenneth D. Senne, Garret Bernstein, Scott Philips, Bayesian Discovery of Threat Networks, IEEE Transaction on Signal Processing, (2013). Benjamin A. Miller, Nadya T. Bliss, Patrick J. Wolfe, and Michele S. Beard, Detection Theory for Graphs, Lincoln Laboratory Journal, (2013). THANK YOU

19 Candid - GraphEx 2014 RSC 08/21/2014 BACKUP

20 CANDiD Overview Current Evaluation Method Uncertainty CANDiD Focus Areas Closed-Form Bounds Run detection method on observed data Validate any detections Could another detection method have done better? Random Matrix Theory Optimal Network Detection Membership Estimation Bounds Model Selection Run detection method on many simulated graphs Evaluate performance across simulations Does detection performance transfer to observed data? Topological Classification Establishing closed-form bounds and a framework for model selection address uncertainties of current evaluation methods Line Leads - 21 of 12 CANDiD MCS 12/13/ GraphEx 2014

21 Space-Time Threat Propagation* Space Space-Time Graph G T = (V T, E T ) Non-Threat Inferred Threat Bayesian Prior: Observed Threat Time Optimum Neyman-Pearson network detection Maximizes Probability of Detection 80 STTP BFS Algorithm: For each node, propagate threat belief to neighboring nodes Harmonic analysis on graph Comparable to Google s Page Rank algorithm Space-Time graph analytics infers threat at network sites based on Bayesian propagation model PD (%) 40 Random 0 0 NFA 200 λ P(site s i red) = α s N(s i ) v N (s i ) P(v red) + (1 λ) max v N (s i ) P(v red) Candid - GraphEx 2014 RSC 08/21/2014 *Smith et al. Lincoln Laboratory Journal (2012)

Anomaly Detection in Very Large Graphs Modeling and Computational Considerations

Anomaly Detection in Very Large Graphs Modeling and Computational Considerations Anomaly Detection in Very Large Graphs Modeling and Computational Considerations Benjamin A. Miller, Nicholas Arcolano, Edward M. Rutledge and Matthew C. Schmidt MIT Lincoln Laboratory Nadya T. Bliss ASURE

More information

SPECTRAL SUBGRAPH DETECTION WITH CORRUPT OBSERVATIONS. Benjamin A. Miller and Nicholas Arcolano

SPECTRAL SUBGRAPH DETECTION WITH CORRUPT OBSERVATIONS. Benjamin A. Miller and Nicholas Arcolano 204 IEEE International Conference on Acoustic Speech and Signal Processing (ICASSP) SPECTRAL SUBGRAPH DETECTION WITH CORRUPT OBSERVATIONS Benjamin A. Miller and Nicholas Arcolano Lincoln Laboratory Massachusetts

More information

Scott Philips, Edward Kao, Michael Yee and Christian Anderson. Graph Exploitation Symposium August 9 th 2011

Scott Philips, Edward Kao, Michael Yee and Christian Anderson. Graph Exploitation Symposium August 9 th 2011 Activity-Based Community Detection Scott Philips, Edward Kao, Michael Yee and Christian Anderson Graph Exploitation Symposium August 9 th 2011 23-1 This work is sponsored by the Office of Naval Research

More information

Detection Theory for Graphs

Detection Theory for Graphs Detection Theory for Graphs Benjamin A. Miller, Nadya T. Bliss, Patrick J. Wolfe, and Michelle S. Beard Graphs are fast emerging as a common data structure used in many scientific and engineering fields.

More information

Eigenspace Analysis for Threat Detection in Social Networks

Eigenspace Analysis for Threat Detection in Social Networks 14th International Conference on Information Fusion Chicago, Illinois, USA, July 5-8, 011 Eigenspace Analysis for Threat Detection in Social Networks Benjamin A. Miller, Michelle S. Beard and Nadya T.

More information

Perfect Power Law Graphs: Generation, Sampling, Construction, and Fitting

Perfect Power Law Graphs: Generation, Sampling, Construction, and Fitting Perfect Power Law Graphs: Generation, Sampling, Construction, and Fitting Jeremy Kepner SIAM Annual Meeting, Minneapolis, July 9, 2012 This work is sponsored by the Department of the Air Force under Air

More information

Sub-Graph Detection Theory

Sub-Graph Detection Theory Sub-Graph Detection Theory Jeremy Kepner, Nadya Bliss, and Eric Robinson This work is sponsored by the Department of Defense under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions,

More information

Analysis and Mapping of Sparse Matrix Computations

Analysis and Mapping of Sparse Matrix Computations Analysis and Mapping of Sparse Matrix Computations Nadya Bliss & Sanjeev Mohindra Varun Aggarwal & Una-May O Reilly MIT Computer Science and AI Laboratory September 19th, 2007 HPEC2007-1 This work is sponsored

More information

Spectral Methods for Network Community Detection and Graph Partitioning

Spectral Methods for Network Community Detection and Graph Partitioning Spectral Methods for Network Community Detection and Graph Partitioning M. E. J. Newman Department of Physics, University of Michigan Presenters: Yunqi Guo Xueyin Yu Yuanqi Li 1 Outline: Community Detection

More information

Sampling Large Graphs for Anticipatory Analysis

Sampling Large Graphs for Anticipatory Analysis Sampling Large Graphs for Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance Extreme Computing Conference September 16, 2015

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

Social Behavior Prediction Through Reality Mining

Social Behavior Prediction Through Reality Mining Social Behavior Prediction Through Reality Mining Charlie Dagli, William Campbell, Clifford Weinstein Human Language Technology Group MIT Lincoln Laboratory This work was sponsored by the DDR&E / RRTO

More information

Graph Exploitation Testbed

Graph Exploitation Testbed Graph Exploitation Testbed Peter Jones and Eric Robinson Graph Exploitation Symposium April 18, 2012 This work was sponsored by the Office of Naval Research under Air Force Contract FA8721-05-C-0002. Opinions,

More information

Clustering: Classic Methods and Modern Views

Clustering: Classic Methods and Modern Views Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering

More information

Streaming Graph Challenge: Stochastic Block Partition - draft - Steven Smith, Edward Kao, Jeremy Kepner, Michael Hurley, Sanjeev Mohindra

Streaming Graph Challenge: Stochastic Block Partition - draft - Steven Smith, Edward Kao, Jeremy Kepner, Michael Hurley, Sanjeev Mohindra Streaming Graph Challenge: Stochastic Block Partition - draft - Steven Smith, Edward Kao, Jeremy Kepner, Michael Hurley, Sanjeev Mohindra http://graphchallenge.org Outline Introduction Data Sets Graph

More information

Sampling informative/complex a priori probability distributions using Gibbs sampling assisted by sequential simulation

Sampling informative/complex a priori probability distributions using Gibbs sampling assisted by sequential simulation Sampling informative/complex a priori probability distributions using Gibbs sampling assisted by sequential simulation Thomas Mejer Hansen, Klaus Mosegaard, and Knud Skou Cordua 1 1 Center for Energy Resources

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

Introduction to network metrics

Introduction to network metrics Universitat Politècnica de Catalunya Version 0.5 Complex and Social Networks (2018-2019) Master in Innovation and Research in Informatics (MIRI) Instructors Argimiro Arratia, argimiro@cs.upc.edu, http://www.cs.upc.edu/~argimiro/

More information

Level-set MCMC Curve Sampling and Geometric Conditional Simulation

Level-set MCMC Curve Sampling and Geometric Conditional Simulation Level-set MCMC Curve Sampling and Geometric Conditional Simulation Ayres Fan John W. Fisher III Alan S. Willsky February 16, 2007 Outline 1. Overview 2. Curve evolution 3. Markov chain Monte Carlo 4. Curve

More information

Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008

Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008 Lesson 4 Random graphs Sergio Barbarossa Graph models 1. Uncorrelated random graph (Erdős, Rényi) N nodes are connected through n edges which are chosen randomly from the possible configurations 2. Binomial

More information

Scalable Clustering of Signed Networks Using Balance Normalized Cut

Scalable Clustering of Signed Networks Using Balance Normalized Cut Scalable Clustering of Signed Networks Using Balance Normalized Cut Kai-Yang Chiang,, Inderjit S. Dhillon The 21st ACM International Conference on Information and Knowledge Management (CIKM 2012) Oct.

More information

CS281 Section 9: Graph Models and Practical MCMC

CS281 Section 9: Graph Models and Practical MCMC CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs

More information

Diffusion and Clustering on Large Graphs

Diffusion and Clustering on Large Graphs Diffusion and Clustering on Large Graphs Alexander Tsiatas Thesis Proposal / Advancement Exam 8 December 2011 Introduction Graphs are omnipresent in the real world both natural and man-made Examples of

More information

Separating Objects and Clutter in Indoor Scenes

Separating Objects and Clutter in Indoor Scenes Separating Objects and Clutter in Indoor Scenes Salman H. Khan School of Computer Science & Software Engineering, The University of Western Australia Co-authors: Xuming He, Mohammed Bennamoun, Ferdous

More information

LLMORE: Mapping and Optimization Framework

LLMORE: Mapping and Optimization Framework LORE: Mapping and Optimization Framework Michael Wolf, MIT Lincoln Laboratory 11 September 2012 This work is sponsored by Defense Advanced Research Projects Agency (DARPA) under Air Force contract FA8721-05-C-0002.

More information

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)

More information

Data fusion and multi-cue data matching using diffusion maps

Data fusion and multi-cue data matching using diffusion maps Data fusion and multi-cue data matching using diffusion maps Stéphane Lafon Collaborators: Raphy Coifman, Andreas Glaser, Yosi Keller, Steven Zucker (Yale University) Part of this work was supported by

More information

An Empirical Analysis of Communities in Real-World Networks

An Empirical Analysis of Communities in Real-World Networks An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization

More information

Empirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee

Empirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee A first model of learning Let s restrict our attention to binary classification our labels belong to (or ) Empirical risk minimization (ERM) Recall the definitions of risk/empirical risk We observe the

More information

Basics of Network Analysis

Basics of Network Analysis Basics of Network Analysis Hiroki Sayama sayama@binghamton.edu Graph = Network G(V, E): graph (network) V: vertices (nodes), E: edges (links) 1 Nodes = 1, 2, 3, 4, 5 2 3 Links = 12, 13, 15, 23,

More information

Cluster-based 3D Reconstruction of Aerial Video

Cluster-based 3D Reconstruction of Aerial Video Cluster-based 3D Reconstruction of Aerial Video Scott Sawyer (scott.sawyer@ll.mit.edu) MIT Lincoln Laboratory HPEC 12 12 September 2012 This work is sponsored by the Assistant Secretary of Defense for

More information

Distributed Detection in Sensor Networks: Connectivity Graph and Small World Networks

Distributed Detection in Sensor Networks: Connectivity Graph and Small World Networks Distributed Detection in Sensor Networks: Connectivity Graph and Small World Networks SaeedA.AldosariandJoséM.F.Moura Electrical and Computer Engineering Department Carnegie Mellon University 5000 Forbes

More information

Chapter 11. Network Community Detection

Chapter 11. Network Community Detection Chapter 11. Network Community Detection Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Outline

More information

Path Length. 2) Verification of the Algorithm and Code

Path Length. 2) Verification of the Algorithm and Code Path Length ) Introduction In calculating the average path length, we must find the shortest path from a source node to all other nodes contained within the graph. Previously, we found that by using an

More information

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK232 Fall 2016 Graph Data: Social Networks Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna,

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Clustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract

Clustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract Clustering Sequences with Hidden Markov Models Padhraic Smyth Information and Computer Science University of California, Irvine CA 92697-3425 smyth@ics.uci.edu Abstract This paper discusses a probabilistic

More information

Table Of Contents: xix Foreword to Second Edition

Table Of Contents: xix Foreword to Second Edition Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data

More information

Modeling Movie Character Networks with Random Graphs

Modeling Movie Character Networks with Random Graphs Modeling Movie Character Networks with Random Graphs Andy Chen December 2017 Introduction For a novel or film, a character network is a graph whose nodes represent individual characters and whose edges

More information

Turing Workshop on Statistics of Network Analysis

Turing Workshop on Statistics of Network Analysis Turing Workshop on Statistics of Network Analysis Day 1: 29 May 9:30-10:00 Registration & Coffee 10:00-10:45 Eric Kolaczyk Title: On the Propagation of Uncertainty in Network Summaries Abstract: While

More information

Signal Processing on Databases

Signal Processing on Databases Signal Processing on Databases Jeremy Kepner Lecture 0: Introduction 3 October 2012 This work is sponsored by the Department of the Air Force under Air Force Contract #FA8721-05-C-0002. Opinions, interpretations,

More information

Cycles in Random Graphs

Cycles in Random Graphs Cycles in Random Graphs Valery Van Kerrebroeck Enzo Marinari, Guilhem Semerjian [Phys. Rev. E 75, 066708 (2007)] [J. Phys. Conf. Series 95, 012014 (2008)] Outline Introduction Statistical Mechanics Approach

More information

This paper describes an analytical approach to the parametric analysis of target/decoy

This paper describes an analytical approach to the parametric analysis of target/decoy Parametric analysis of target/decoy performance1 John P. Kerekes Lincoln Laboratory, Massachusetts Institute of Technology 244 Wood Street Lexington, Massachusetts 02173 ABSTRACT As infrared sensing technology

More information

DataSToRM: Data Science and Technology Research Environment

DataSToRM: Data Science and Technology Research Environment The Future of Advanced (Secure) Computing DataSToRM: Data Science and Technology Research Environment This material is based upon work supported by the Assistant Secretary of Defense for Research and Engineering

More information

Collective classification in network data

Collective classification in network data 1 / 50 Collective classification in network data Seminar on graphs, UCSB 2009 Outline 2 / 50 1 Problem 2 Methods Local methods Global methods 3 Experiments Outline 3 / 50 1 Problem 2 Methods Local methods

More information

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

The Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing

The Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing UW Biostatistics Working Paper Series 9-6-2005 The Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing John D. Storey University of Washington, jstorey@u.washington.edu Suggested

More information

node2vec: Scalable Feature Learning for Networks

node2vec: Scalable Feature Learning for Networks node2vec: Scalable Feature Learning for Networks A paper by Aditya Grover and Jure Leskovec, presented at Knowledge Discovery and Data Mining 16. 11/27/2018 Presented by: Dharvi Verma CS 848: Graph Database

More information

Classification. 1 o Semestre 2007/2008

Classification. 1 o Semestre 2007/2008 Classification Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 Single-Class

More information

Intro to Random Graphs and Exponential Random Graph Models

Intro to Random Graphs and Exponential Random Graph Models Intro to Random Graphs and Exponential Random Graph Models Danielle Larcomb University of Denver Danielle Larcomb Random Graphs 1/26 Necessity of Random Graphs The study of complex networks plays an increasingly

More information

Multivariate Data Analysis and Machine Learning in High Energy Physics (V)

Multivariate Data Analysis and Machine Learning in High Energy Physics (V) Multivariate Data Analysis and Machine Learning in High Energy Physics (V) Helge Voss (MPI K, Heidelberg) Graduierten-Kolleg, Freiburg, 11.5-15.5, 2009 Outline last lecture Rule Fitting Support Vector

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Passive Differential Matched-field Depth Estimation of Moving Acoustic Sources

Passive Differential Matched-field Depth Estimation of Moving Acoustic Sources Lincoln Laboratory ASAP-2001 Workshop Passive Differential Matched-field Depth Estimation of Moving Acoustic Sources Shawn Kraut and Jeffrey Krolik Duke University Department of Electrical and Computer

More information

Kernels for Structured Data

Kernels for Structured Data T-122.102 Special Course in Information Science VI: Co-occurence methods in analysis of discrete data Kernels for Structured Data Based on article: A Survey of Kernels for Structured Data by Thomas Gärtner

More information

Topological Classification of Data Sets without an Explicit Metric

Topological Classification of Data Sets without an Explicit Metric Topological Classification of Data Sets without an Explicit Metric Tim Harrington, Andrew Tausz and Guillaume Troianowski December 10, 2008 A contemporary problem in data analysis is understanding the

More information

SD 372 Pattern Recognition

SD 372 Pattern Recognition SD 372 Pattern Recognition Lab 2: Model Estimation and Discriminant Functions 1 Purpose This lab examines the areas of statistical model estimation and classifier aggregation. Model estimation will be

More information

Sparse Matrix Partitioning for Parallel Eigenanalysis of Large Static and Dynamic Graphs

Sparse Matrix Partitioning for Parallel Eigenanalysis of Large Static and Dynamic Graphs Sparse Matrix Partitioning for Parallel Eigenanalysis of Large Static and Dynamic Graphs Michael M. Wolf and Benjamin A. Miller Lincoln Laboratory Massachusetts Institute of Technology Lexington, MA 02420

More information

Statistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes.

Statistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes. Outliers Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Concepts What is an outlier? The set of data points that are considerably different than the remainder of the

More information

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Contents. Foreword to Second Edition. Acknowledgments About the Authors Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1

More information

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient quality) 3. I suggest writing it on one presentation. 4. Include

More information

Loopy Belief Propagation

Loopy Belief Propagation Loopy Belief Propagation Research Exam Kristin Branson September 29, 2003 Loopy Belief Propagation p.1/73 Problem Formalization Reasoning about any real-world problem requires assumptions about the structure

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

GT "Calcul Ensembliste"

GT Calcul Ensembliste GT "Calcul Ensembliste" Beyond the bounded error framework for non linear state estimation Fahed Abdallah Université de Technologie de Compiègne 9 Décembre 2010 Fahed Abdallah GT "Calcul Ensembliste" 9

More information

Multi-Centrality Graph Spectral Decompositions and Their Application to Cyber Intrusion Detection

Multi-Centrality Graph Spectral Decompositions and Their Application to Cyber Intrusion Detection Multi-Centrality Graph Spectral Decompositions and Their Application to Cyber Intrusion Detection Dr. Pin-Yu Chen 1 Dr. Sutanay Choudhury 2 Prof. Alfred Hero 1 1 Department of Electrical Engineering and

More information

Statistical Physics of Community Detection

Statistical Physics of Community Detection Statistical Physics of Community Detection Keegan Go (keegango), Kenji Hata (khata) December 8, 2015 1 Introduction Community detection is a key problem in network science. Identifying communities, defined

More information

Stochastic Blockmodels as an unsupervised approach to detect botnet infected clusters in networked data

Stochastic Blockmodels as an unsupervised approach to detect botnet infected clusters in networked data Stochastic Blockmodels as an unsupervised approach to detect botnet infected clusters in networked data Mark Patrick Roeling & Geoff Nicholls Department of Statistics University of Oxford Data Science

More information

Graph similarity. Laura Zager and George Verghese EECS, MIT. March 2005

Graph similarity. Laura Zager and George Verghese EECS, MIT. March 2005 Graph similarity Laura Zager and George Verghese EECS, MIT March 2005 Words you won t hear today impedance matching thyristor oxide layer VARs Some quick definitions GV (, E) a graph G V the set of vertices

More information

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions

More information

The HPEC Challenge Benchmark Suite

The HPEC Challenge Benchmark Suite The HPEC Challenge Benchmark Suite Ryan Haney, Theresa Meuse, Jeremy Kepner and James Lebak Massachusetts Institute of Technology Lincoln Laboratory HPEC 2005 This work is sponsored by the Defense Advanced

More information

Summary: What We Have Learned So Far

Summary: What We Have Learned So Far Summary: What We Have Learned So Far small-world phenomenon Real-world networks: { Short path lengths High clustering Broad degree distributions, often power laws P (k) k γ Erdös-Renyi model: Short path

More information

EFFICIENT BAYESIAN INFERENCE USING FULLY CONNECTED CONDITIONAL RANDOM FIELDS WITH STOCHASTIC CLIQUES. M. J. Shafiee, A. Wong, P. Siva, P.

EFFICIENT BAYESIAN INFERENCE USING FULLY CONNECTED CONDITIONAL RANDOM FIELDS WITH STOCHASTIC CLIQUES. M. J. Shafiee, A. Wong, P. Siva, P. EFFICIENT BAYESIAN INFERENCE USING FULLY CONNECTED CONDITIONAL RANDOM FIELDS WITH STOCHASTIC CLIQUES M. J. Shafiee, A. Wong, P. Siva, P. Fieguth Vision & Image Processing Lab, System Design Engineering

More information

Anomaly Detection on Data Streams with High Dimensional Data Environment

Anomaly Detection on Data Streams with High Dimensional Data Environment Anomaly Detection on Data Streams with High Dimensional Data Environment Mr. D. Gokul Prasath 1, Dr. R. Sivaraj, M.E, Ph.D., 2 Department of CSE, Velalar College of Engineering & Technology, Erode 1 Assistant

More information

Heterogeneity Increases Multicast Capacity in Clustered Network

Heterogeneity Increases Multicast Capacity in Clustered Network Heterogeneity Increases Multicast Capacity in Clustered Network Qiuyu Peng Xinbing Wang Huan Tang Department of Electronic Engineering Shanghai Jiao Tong University April 15, 2010 Infocom 2011 1 / 32 Outline

More information

Lecture 9: Undirected Graphical Models Machine Learning

Lecture 9: Undirected Graphical Models Machine Learning Lecture 9: Undirected Graphical Models Machine Learning Andrew Rosenberg March 5, 2010 1/1 Today Graphical Models Probabilities in Undirected Graphs 2/1 Undirected Graphs What if we allow undirected graphs?

More information

Domain Adaptation For Mobile Robot Navigation

Domain Adaptation For Mobile Robot Navigation Domain Adaptation For Mobile Robot Navigation David M. Bradley, J. Andrew Bagnell Robotics Institute Carnegie Mellon University Pittsburgh, 15217 dbradley, dbagnell@rec.ri.cmu.edu 1 Introduction An important

More information

Learning decomposable models with a bounded clique size

Learning decomposable models with a bounded clique size Learning decomposable models with a bounded clique size Achievements 2014-2016 Aritz Pérez Basque Center for Applied Mathematics Bilbao, March, 2016 Outline 1 Motivation and background 2 The problem 3

More information

Ranking Algorithms For Digital Forensic String Search Hits

Ranking Algorithms For Digital Forensic String Search Hits DIGITAL FORENSIC RESEARCH CONFERENCE Ranking Algorithms For Digital Forensic String Search Hits By Nicole Beebe and Lishu Liu Presented At The Digital Forensic Research Conference DFRWS 2014 USA Denver,

More information

Social-Network Graphs

Social-Network Graphs Social-Network Graphs Mining Social Networks Facebook, Google+, Twitter Email Networks, Collaboration Networks Identify communities Similar to clustering Communities usually overlap Identify similarities

More information

Impact of Clustering on Epidemics in Random Networks

Impact of Clustering on Epidemics in Random Networks Impact of Clustering on Epidemics in Random Networks Joint work with Marc Lelarge INRIA-ENS 8 March 2012 Coupechoux - Lelarge (INRIA-ENS) Epidemics in Random Networks 8 March 2012 1 / 19 Outline 1 Introduction

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

Data mining --- mining graphs

Data mining --- mining graphs Data mining --- mining graphs University of South Florida Xiaoning Qian Today s Lecture 1. Complex networks 2. Graph representation for networks 3. Markov chain 4. Viral propagation 5. Google s PageRank

More information

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK 233 Spring 2018 Service Providing Improve urban planning, Ease Traffic Congestion, Save Energy,

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Effective Latent Space Graph-based Re-ranking Model with Global Consistency Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case

More information

Off-Line and Real-Time Methods for ML-PDA Target Validation

Off-Line and Real-Time Methods for ML-PDA Target Validation Off-Line and Real-Time Methods for ML-PDA Target Validation Wayne R. Blanding*, Member, IEEE, Peter K. Willett, Fellow, IEEE and Yaakov Bar-Shalom, Fellow, IEEE 1 Abstract We present two procedures for

More information

Data Science Center Eindhoven. The Mathematics Behind Big Data. Alessandro Di Bucchianico

Data Science Center Eindhoven. The Mathematics Behind Big Data. Alessandro Di Bucchianico Data Science Center Eindhoven The Mathematics Behind Big Data Alessandro Di Bucchianico 4TU AMI SRO Big Data Meeting Big Data: Mathematics in Action! November 24, 2017 Outline Big Data Some real-life examples

More information

Complex-Network Modelling and Inference

Complex-Network Modelling and Inference Complex-Network Modelling and Inference Lecture 8: Graph features (2) Matthew Roughan http://www.maths.adelaide.edu.au/matthew.roughan/notes/ Network_Modelling/ School

More information

Physics 736. Experimental Methods in Nuclear-, Particle-, and Astrophysics. - Statistical Methods -

Physics 736. Experimental Methods in Nuclear-, Particle-, and Astrophysics. - Statistical Methods - Physics 736 Experimental Methods in Nuclear-, Particle-, and Astrophysics - Statistical Methods - Karsten Heeger heeger@wisc.edu Course Schedule and Reading course website http://neutrino.physics.wisc.edu/teaching/phys736/

More information

On the Approximability of Modularity Clustering

On the Approximability of Modularity Clustering On the Approximability of Modularity Clustering Newman s Community Finding Approach for Social Nets Bhaskar DasGupta Department of Computer Science University of Illinois at Chicago Chicago, IL 60607,

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Efficient Iterative Semi-supervised Classification on Manifold

Efficient Iterative Semi-supervised Classification on Manifold . Efficient Iterative Semi-supervised Classification on Manifold... M. Farajtabar, H. R. Rabiee, A. Shaban, A. Soltani-Farani Sharif University of Technology, Tehran, Iran. Presented by Pooria Joulani

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

An Advanced Graph Processor Prototype

An Advanced Graph Processor Prototype An Advanced Graph Processor Prototype Vitaliy Gleyzer GraphEx 2016 DISTRIBUTION STATEMENT A. Approved for public release: distribution unlimited. This material is based upon work supported by the Assistant

More information

Enhanced Six Sigma with Uncertainty Quantification. Mark Andrews SmartUQ, Madison WI

Enhanced Six Sigma with Uncertainty Quantification. Mark Andrews SmartUQ, Madison WI Enhanced Six Sigma with Uncertainty Quantification Mark Andrews SmartUQ, Madison WI ASQ World Conference Session T05 May 1, 2017 Learning Objectives In this session you will: Learn basic concepts of Uncertainty

More information

Response Network Emerging from Simple Perturbation

Response Network Emerging from Simple Perturbation Journal of the Korean Physical Society, Vol 44, No 3, March 2004, pp 628 632 Response Network Emerging from Simple Perturbation S-W Son, D-H Kim, Y-Y Ahn and H Jeong Department of Physics, Korea Advanced

More information