Sampling Large Graphs for Anticipatory Analysis
|
|
- Norah Mosley
- 6 years ago
- Views:
Transcription
1 Sampling Large Graphs for Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance Extreme Computing Conference September 16, 2015 This work is sponsored by the Intelligence Advanced Research Projects Activity (IARPA) under Air Force Contract #FA C Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.
2 Outline Introduction Sampling Techniques Link Prediction Experimental Setup Results Summary
3 Common Big Data Challenge Operators Analysts Commanders Users Rapidly increasing - Data volume - Data velocity - Data variety - Data veracity (security) Data Gap Users & Beyond Data <html> OSINT Weather HUMINT C2 Ground Maritime Air Space Cyber
4 What is a Graph? Graphs are mathematical representations of physical or logical relationships between sets of objects A graph is a pair of sets: A set of vertices/ nodes (entities) A set of edges/links (connections, transactions, and relationships)
5 Applications of Graph Analytics NSA-RD Brain scale... ISR Social 100 billion vertices, 100 trillion edges Cyber2.08 mna bytes 2 (molar bytes) adjacency Biomatrix 2.84 PB adjacency list 2.84 PB edge list Human connectome. Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011 Graphs represent entities and relationships detected through multiple data Graphs represent relationships between individuals or documents Graphs represent Graphs represent 2 N = mol communication connectivity between Paul Burkhardt, Chris Waring An NSA Big Graph experiment patterns of computers brain regions on a network ~1B 1T regions and ~1K 1M entities and connections ~10K 10M individuals and interactions ~1M 1B network events GOAL: Identify anomalous patterns GOAL: Identify hidden social networks GOAL: Detect attack or malicious software A 23 1 connections GOAL: Detect regions affected by a neurological condition
6 Big Data Challenge NSA-RD v1 Brain scale billion vertices, 100 trillion edges Memory required for storing Sparse Matrix (GB) 2,097, mna bytes 2 (molar bytes) adjacency matrix 2.84 PB adjacency list Petabyte 2.84 PB edge list NSA-RD v1 Web scale , billion vertices, 1 trillion edges 271 EB adjacency matrix 29.5 TB adjacency list 29.1 TB edge list 32,768 NSA-RD v1 Social scale... Brain Human connectome. Gerhard et al., Frontiers in Neuroinformatics 5(3), NA = mol 1 Paul Burkhardt, Chris Waring An NSA Big Graph experiment 1 billion vertices, 100 billion edges 111 PB adjacency matrix 4, TB adjacency list Terabyte 512 Web graph from the SNAP database ( TB edge list Internet graph from the Opte Project ( Paul Burkhardt, Chris Waring 64 An NSA Big Graph experiment Web Social Twitter graph from Gephi dataset ( Cyber 8 Paul Burkhardt, Chris Waring An NSA Big Graph experiment P. Burkhardt and C. Waring, An NSA Big Graph experiment, National Security Agency, Tech. Rep. NSA-RD v1, ISR ,096 65,536 1,048,576 NSA Big Graph Experiment Graph Vertices (millions) How can these massive graphs be leveraged to provide accurate anticipatory intelligence?
7 Outline Introduction Sampling Techniques Link Prediction Experimental Setup Results Summary
8 Methods for Sampling Graphs Edge sampling methods Consider each edges individually, keep it in the sample based on some criterion Vertex sampling methods Sample individual vertices or local substructures Random walk methods Walk from vertex to vertex by some criterion, keep edges along the path Snowball sampling methods Start with a set of seed vertices, grow the sample from those
9 Random Edge Sampling Each edge in the graph has equal probability of being included in the sample Methods Used Popularity-Based Sampling Probability of sampling a given edge is proportional to a decreasing function of the vertex degrees Wedge Sampling Choose a vertex at random, and randomly select two of its adjacent edges to be included in the sample
10 Random Edge Sampling Each edge in the graph has equal probability of being included in the sample Methods Used Popularity-Based Sampling Probability of sampling a given edge is proportional to a decreasing function of the vertex degrees Wedge Sampling Choose a vertex at random, and randomly select two of its adjacent edges to be included in the sample
11 Random Edge Sampling Each edge in the graph has equal probability of being included in the sample Methods Used Popularity-Based Sampling Probability of sampling a given edge is proportional to a decreasing function of the vertex degrees Wedge Sampling Choose a vertex at random, and randomly select two of its adjacent edges to be included in the sample
12 Random Edge Sampling Each edge in the graph has equal probability of being included in the sample Methods Used Popularity-Based Sampling Probability of sampling a given edge is proportional to a decreasing function of the vertex degrees Wedge Sampling Choose a vertex at random, and randomly select two of its adjacent edges to be included in the sample
13 Random Edge Sampling Each edge in the graph has equal probability of being included in the sample Methods Used Popularity-Based Sampling Probability of sampling a given edge is proportional to a decreasing function of the vertex degrees Wedge Sampling Choose a vertex at random, and randomly select two of its adjacent edges to be included in the sample
14 Random Edge Sampling Each edge in the graph has equal probability of being included in the sample Methods Used Popularity-Based Sampling Probability of sampling a given edge is proportional to a decreasing function of the vertex degrees Wedge Sampling Choose a vertex at random, and randomly select two of its adjacent edges to be included in the sample
15 Random Edge Sampling Each edge in the graph has equal probability of being included in the sample Methods Used Popularity-Based Sampling Probability of sampling a given edge is proportional to a decreasing function of the vertex degrees Wedge Sampling Choose a vertex at random, and randomly select two of its adjacent edges to be included in the sample
16 Random Edge Sampling Each edge in the graph has equal probability of being included in the sample Methods Used Popularity-Based Sampling Probability of sampling a given edge is proportional to a decreasing function of the vertex degrees Wedge Sampling Choose a vertex at random, and randomly select two of its adjacent edges to be included in the sample
17 Random Walk Sampling Move along edges from vertex to vertex, adding edges to the sample along the way At each step, with probability 0.15, jump to any vertex in the graph Random Area Sampling Choose a set of seed vertices, and add all edges adjacent to those vertices to the sample Methods Used, Cont d
18 Random Walk Sampling Move along edges from vertex to vertex, adding edges to the sample along the way At each step, with probability 0.15, jump to any vertex in the graph Random Area Sampling Choose a set of seed vertices, and add all edges adjacent to those vertices to the sample Methods Used, Cont d
19 Random Walk Sampling Move along edges from vertex to vertex, adding edges to the sample along the way At each step, with probability 0.15, jump to any vertex in the graph Random Area Sampling Choose a set of seed vertices, and add all edges adjacent to those vertices to the sample Methods Used, Cont d
20 Random Walk Sampling Move along edges from vertex to vertex, adding edges to the sample along the way At each step, with probability 0.15, jump to any vertex in the graph Random Area Sampling Choose a set of seed vertices, and add all edges adjacent to those vertices to the sample Methods Used, Cont d
21 Random Walk Sampling Move along edges from vertex to vertex, adding edges to the sample along the way At each step, with probability 0.15, jump to any vertex in the graph Random Area Sampling Choose a set of seed vertices, and add all edges adjacent to those vertices to the sample Methods Used, Cont d
22 Random Walk Sampling Move along edges from vertex to vertex, adding edges to the sample along the way At each step, with probability 0.15, jump to any vertex in the graph Random Area Sampling Choose a set of seed vertices, and add all edges adjacent to those vertices to the sample Methods Used, Cont d
23 Random Walk Sampling Move along edges from vertex to vertex, adding edges to the sample along the way At each step, with probability 0.15, jump to any vertex in the graph Random Area Sampling Choose a set of seed vertices, and add all edges adjacent to those vertices to the sample Methods Used, Cont d
24 Random Walk Sampling Move along edges from vertex to vertex, adding edges to the sample along the way At each step, with probability 0.15, jump to any vertex in the graph Random Area Sampling Choose a set of seed vertices, and add all edges adjacent to those vertices to the sample Methods Used, Cont d
25 Random Walk Sampling Move along edges from vertex to vertex, adding edges to the sample along the way At each step, with probability 0.15, jump to any vertex in the graph Random Area Sampling Choose a set of seed vertices, and add all edges adjacent to those vertices to the sample Methods Used, Cont d
26 Random Walk Sampling Move along edges from vertex to vertex, adding edges to the sample along the way At each step, with probability 0.15, jump to any vertex in the graph Random Area Sampling Choose a set of seed vertices, and add all edges adjacent to those vertices to the sample Methods Used, Cont d
27 Random Walk Sampling Move along edges from vertex to vertex, adding edges to the sample along the way At each step, with probability 0.15, jump to any vertex in the graph Random Area Sampling Choose a set of seed vertices, and add all edges adjacent to those vertices to the sample Methods Used, Cont d
28 Random Walk Sampling Move along edges from vertex to vertex, adding edges to the sample along the way At each step, with probability 0.15, jump to any vertex in the graph Random Area Sampling Choose a set of seed vertices, and add all edges adjacent to those vertices to the sample Methods Used, Cont d
29 Methods Used, Cont d Random Walk Sampling Move along edges from vertex to vertex, adding edges to the sample along the way At each step, with probability 0.15, jump to any vertex in the graph Random Area Sampling Choose a set of seed vertices, and add all edges adjacent to those vertices to the sample Selected methods comprise a broad array of network sampling techniques
30 Outline Introduction Sampling Techniques Link Prediction Experimental Setup Results Summary
31 Link Prediction Scenario: Two vertices currently do not share an edge Question: How likely are these vertices to form a connection in the near future?? Examples: How likely is a connection to be formed between two people in a social network? How likely are two political entities to engage in diplomatic or military conflict?? Link prediction: Anticipation of new connections that currently do not exist
32 Link Prediction Statistics Common neighbors Vertices with more neighbors in common are more likely to connect Jaccard s coefficient Normalize common neighbors by total number of neighbors Low-rank approximation Use a low-rank approximation for the square of the adjacency matrix 2 U Σ 2 U T Tensor decomposition Break data into vertex-vertextime factors, extrapolate along temporal axis vertex vertex time time
33 How Does Sampling Hurt? If edge sampling probability is p, probability of maintaining a common neighbor is p 2 This smears the distribution of common neighbors, inhibiting detection Sample 1/8 of Edges For eigenvector-based techniques, removing edges causes the distribution of eigenvalues to contract The outlier eigenvalues, which provide the most information, contract more quickly eigenvalue
34 Outline Introduction Sampling Techniques Link Prediction Experimental Setup Results Summary
35 Experiments aggregate input graph sampling sampling sampling sampling sampling prediction statistics prediction statistics prediction statistics prediction statistics prediction statistics evaluation Simulated graph Method Random edge Popularity-based Random walk Random area Wedge Amount Factor by which data size should be reduced Link Prediction Common neighbors Jaccard s coefficient Low-rank approximation Tensor decomposition Prediction performance Running time
36 Simulation Setup Graph at time t Normalized common neighbors Random changes Graph at time t+1 BTER Model: Simulate graphs evolving over several time steps First time step: Create a graph from a generative model Future time steps: Compute common neighbors Normalize using cosine similarity Randomly select based on scores Create new random edges from generative model Simulates chance encounters Randomly remove edges
37 Simulated Graphs Approximately 10,000 vertices, average degree of 17, 5% increase in edge count per time step Two degree distributions: one lighter tailed and one heavier tailed Variations designed to determine most robust sampling methods
38 Performance Metrics Area under the ROC curve (AUC) Measure area under the receiver operating characteristic (ROC) curve, which maps false alarm rate to detection rate Running time Time required (in Matlab implementation) to compute prediction statistics Probability of Detection ROC shaded area = AUC = Probability of False Alarm Memory Footprint Storage required for prediction statistics Fundamental metrics for anticipatory network analytics: Predictive performance Latency Computing resource use
39 Outline Introduction Sampling Techniques Link Prediction Experimental Setup Results Summary
40 Link Prediction in Baseline Simulation Common Neighbors Low-Rank Approximation Jaccard s Coefficient Tensor Decomposition Common neighbors and Jaccard s coefficient are virtually identical Random area sampling has the most significant initial performance drop in all cases Random walk sampling improves AUC at a higher sampling factor with tensor factorization
41 Time for Link Prediction Common Neighbors Jaccard s Coefficient Common neighbors and Jaccard: time is substantially reduced as sampling becomes more aggressive Best initial decrease is with random area sampling However, this comes with the biggest decrease in predictive performance
42 Time for Link Prediction Low-Rank Common Approximation Neighbors Tensor Jaccard s Decomposition Coefficient Low Rank: time increases with higher sampling rates Changes in the distribution of eigenvalues alters convergence rate Tensor: High sampling rates significantly reduces runtime Random area sampling produces high number of isolated vertices, reduces the dimensionality of the adjacency matrix
43 Outline Introduction Sampling Techniques Link Prediction Experimental Setup Results Summary
44 Summary Network analytics are of tremendous importance for anticipatory analytics Forecast future connections Networks of interest are often extremely large, and data sampling enables computation on a dataset of a reasonable size Study considered the effect of sampling on anticipatory network analytics Random edge sampling is typically the most stable Random area sampling loses performance the fastest Promising areas for future work: test on larger simulated and real graph data, investigate emergent community detection, aggregate samples for higher predictive capabilities, integration with high-performance graph analytics hardware
45 Backup
Sampling Operations on Big Data
Sampling Operations on Big Data Vijay Gadepally, Taylor Herr, Luke Johnson, Lauren Milechin, Maja Milosavljevic, Benjamin A. Miller Lincoln Laboratory Massachusetts Institute of Technology Lexington, MA
More informationAnomaly Detection in Very Large Graphs Modeling and Computational Considerations
Anomaly Detection in Very Large Graphs Modeling and Computational Considerations Benjamin A. Miller, Nicholas Arcolano, Edward M. Rutledge and Matthew C. Schmidt MIT Lincoln Laboratory Nadya T. Bliss ASURE
More information[CoolName++]: A Graph Processing Framework for Charm++
[CoolName++]: A Graph Processing Framework for Charm++ Hassan Eslami, Erin Molloy, August Shi, Prakalp Srivastava Laxmikant V. Kale Charm++ Workshop University of Illinois at Urbana-Champaign {eslami2,emolloy2,awshi2,psrivas2,kale}@illinois.edu
More informationBoosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search
Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search Jialiang Zhang, Soroosh Khoram and Jing Li 1 Outline Background Big graph analytics Hybrid
More informationGraph Exploitation Testbed
Graph Exploitation Testbed Peter Jones and Eric Robinson Graph Exploitation Symposium April 18, 2012 This work was sponsored by the Office of Naval Research under Air Force Contract FA8721-05-C-0002. Opinions,
More informationCS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp
CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as
More informationDataSToRM: Data Science and Technology Research Environment
The Future of Advanced (Secure) Computing DataSToRM: Data Science and Technology Research Environment This material is based upon work supported by the Assistant Secretary of Defense for Research and Engineering
More informationThanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman
Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman http://www.mmds.org Overview of Recommender Systems Content-based Systems Collaborative Filtering J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive
More informationGeneral Instructions. Questions
CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These
More informationGraphs / Networks CSE 6242/ CX Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech
CSE 6242/ CX 4242 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationScalable Selective Traffic Congestion Notification
Scalable Selective Traffic Congestion Notification Győző Gidófalvi Division of Geoinformatics Deptartment of Urban Planning and Environment KTH Royal Institution of Technology, Sweden gyozo@kth.se Outline
More informationAnalysis and Mapping of Sparse Matrix Computations
Analysis and Mapping of Sparse Matrix Computations Nadya Bliss & Sanjeev Mohindra Varun Aggarwal & Una-May O Reilly MIT Computer Science and AI Laboratory September 19th, 2007 HPEC2007-1 This work is sponsored
More informationKara Greenfield, William Campbell, Joel Acevedo-Aviles
Kara Greenfield, William Campbell, Joel Acevedo-Aviles GraphEx 2014 8/21/2014 This work was sponsored by the Defense Advanced Research Projects Agency under Air Force Contract FA8721-05-C-0002. Opinions,
More informationJaccard Coefficients as a Potential Graph Benchmark
Jaccard Coefficients as a Potential Graph Benchmark Peter M. Kogge McCourtney Prof. of CSE Univ. of Notre Dame IBM Fellow (retired) Please Sir, I want more 1 Outline Motivation Jaccard Coefficients A MapReduce
More informationLatent Space Model for Road Networks to Predict Time-Varying Traffic. Presented by: Rob Fitzgerald Spring 2017
Latent Space Model for Road Networks to Predict Time-Varying Traffic Presented by: Rob Fitzgerald Spring 2017 Definition of Latent https://en.oxforddictionaries.com/definition/latent Latent Space Model?
More informationSocial Behavior Prediction Through Reality Mining
Social Behavior Prediction Through Reality Mining Charlie Dagli, William Campbell, Clifford Weinstein Human Language Technology Group MIT Lincoln Laboratory This work was sponsored by the DDR&E / RRTO
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu Customer X Buys Metalica CD Buys Megadeth CD Customer Y Does search on Metalica Recommender system suggests Megadeth
More informationA New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader
A New Parallel Algorithm for Connected Components in Dynamic Graphs Robert McColl Oded Green David Bader Overview The Problem Target Datasets Prior Work Parent-Neighbor Subgraph Results Conclusions Problem
More informationTechnical Brief: Domain Risk Score Proactively uncover threats using DNS and data science
Technical Brief: Domain Risk Score Proactively uncover threats using DNS and data science 310 Million + Current Domain Names 11 Billion+ Historical Domain Profiles 5 Million+ New Domain Profiles Daily
More informationMining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams
/9/7 Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them
More informationData Sources for Cyber Security Research
Data Sources for Cyber Security Research Melissa Turcotte mturcotte@lanl.gov Advanced Research in Cyber Systems, Los Alamos National Laboratory 14 June 2018 Background Advanced Research in Cyber Systems,
More informationDeep Character-Level Click-Through Rate Prediction for Sponsored Search
Deep Character-Level Click-Through Rate Prediction for Sponsored Search Bora Edizel - Phd Student UPF Amin Mantrach - Criteo Research Xiao Bai - Oath This work was done at Yahoo and will be presented as
More informationCS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.
CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank
More informationScott Philips, Edward Kao, Michael Yee and Christian Anderson. Graph Exploitation Symposium August 9 th 2011
Activity-Based Community Detection Scott Philips, Edward Kao, Michael Yee and Christian Anderson Graph Exploitation Symposium August 9 th 2011 23-1 This work is sponsored by the Office of Naval Research
More informationWith turing you can: Identify, locate and mitigate the effects of botnets or other malware abusing your infrastructure
Decoding DNS data If you have a large DNS infrastructure, understanding what is happening with your real-time and historic traffic is difficult, if not impossible. Until now, the available network management
More informationLLMORE: Mapping and Optimization Framework
LORE: Mapping and Optimization Framework Michael Wolf, MIT Lincoln Laboratory 11 September 2012 This work is sponsored by Defense Advanced Research Projects Agency (DARPA) under Air Force contract FA8721-05-C-0002.
More informationTracking Groups in Mobile Network Traces
Tracking Groups in Mobile Network Traces Kun Tu*, Bruno Ribeiro**, Ananthram Swami***, Don Towsley* *University of Massachusetts, Amherst **Purdue University ***Army Research Lab Presented by Gayane Vardoyan
More informationHighly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture
A Cost Effective,, High g Performance,, Highly Scalable, Non-RDMA NVMe Fabric Bob Hansen,, VP System Architecture bob@apeirondata.com Storage Developers Conference, September 2015 Agenda 3 rd Platform
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationSignal Processing on Databases
Signal Processing on Databases Jeremy Kepner Lecture 0: Introduction 3 October 2012 This work is sponsored by the Department of the Air Force under Air Force Contract #FA8721-05-C-0002. Opinions, interpretations,
More informationBig Data - Some Words BIG DATA 8/31/2017. Introduction
BIG DATA Introduction Big Data - Some Words Connectivity Social Medias Share information Interactivity People Business Data Data mining Text mining Business Intelligence 1 What is Big Data Big Data means
More informationEdge Classification in Networks
Charu C. Aggarwal, Peixiang Zhao, and Gewen He Florida State University IBM T J Watson Research Center Edge Classification in Networks ICDE Conference, 2016 Introduction We consider in this paper the edge
More informationReddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011
Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions
More informationLinear Regression Optimization
Gradient Descent Linear Regression Optimization Goal: Find w that minimizes f(w) f(w) = Xw y 2 2 Closed form solution exists Gradient Descent is iterative (Intuition: go downhill!) n w * w Scalar objective:
More informationBipartite Edge Prediction via Transductive Learning over Product Graphs
Bipartite Edge Prediction via Transductive Learning over Product Graphs Hanxiao Liu, Yiming Yang School of Computer Science, Carnegie Mellon University July 8, 2015 ICML 2015 Bipartite Edge Prediction
More informationLec 8: Adaptive Information Retrieval 2
Lec 8: Adaptive Information Retrieval 2 Advaith Siddharthan Introduction to Information Retrieval by Manning, Raghavan & Schütze. Website: http://nlp.stanford.edu/ir-book/ Linear Algebra Revision Vectors:
More informationCluster-based 3D Reconstruction of Aerial Video
Cluster-based 3D Reconstruction of Aerial Video Scott Sawyer (scott.sawyer@ll.mit.edu) MIT Lincoln Laboratory HPEC 12 12 September 2012 This work is sponsored by the Assistant Secretary of Defense for
More informationGraph Data Management
Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn Leitow Overview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of
More informationCollaborative Filtering for Netflix
Collaborative Filtering for Netflix Michael Percy Dec 10, 2009 Abstract The Netflix movie-recommendation problem was investigated and the incremental Singular Value Decomposition (SVD) algorithm was implemented
More informationPredictive Indexing for Fast Search
Predictive Indexing for Fast Search Sharad Goel, John Langford and Alex Strehl Yahoo! Research, New York Modern Massive Data Sets (MMDS) June 25, 2008 Goel, Langford & Strehl (Yahoo! Research) Predictive
More informationCS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul
1 CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul Introduction Our problem is crawling a static social graph (snapshot). Given
More informationData preprocessing Functional Programming and Intelligent Algorithms
Data preprocessing Functional Programming and Intelligent Algorithms Que Tran Høgskolen i Ålesund 20th March 2017 1 Why data preprocessing? Real-world data tend to be dirty incomplete: lacking attribute
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationPARALLELIZATION OF THE NELDER-MEAD SIMPLEX ALGORITHM
PARALLELIZATION OF THE NELDER-MEAD SIMPLEX ALGORITHM Scott Wu Montgomery Blair High School Silver Spring, Maryland Paul Kienzle Center for Neutron Research, National Institute of Standards and Technology
More informationBigDataBench: a Benchmark Suite for Big Data Application
BigDataBench: a Benchmark Suite for Big Data Application Wanling Gao Institute of Computing Technology, Chinese Academy of Sciences HVC tutorial in conjunction with The 19th IEEE International Symposium
More informationGraphs / Networks CSE 6242/ CX Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech
CSE 6242/ CX 4242 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John
More informationPart 1: Link Analysis & Page Rank
Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Graph Data: Social Networks [Source: 4-degrees of separation, Backstrom-Boldi-Rosa-Ugander-Vigna,
More informationMicrosoft Advance Threat Analytics (ATA) at LLNL NLIT Summit 2018
Microsoft Advance Threat Analytics (ATA) at LLNL NLIT Summit 2018 May, 22, 2018 John Wong wong76@llnl.gov Systems & Network Associate This work was performed under the auspices of the U.S. Department of
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #7: Recommendation Content based & Collaborative Filtering Seoul National University In This Lecture Understand the motivation and the problem of recommendation Compare
More informationRobustness of Centrality Measures for Small-World Networks Containing Systematic Error
Robustness of Centrality Measures for Small-World Networks Containing Systematic Error Amanda Lannie Analytical Systems Branch, Air Force Research Laboratory, NY, USA Abstract Social network analysis is
More informationCovert and Anomalous Network Discovery and Detection (CANDiD)
Covert and Anomalous Network Discovery and Detection (CANDiD) Rajmonda S. Caceres, Edo Airoldi, Garrett Bernstein, Edward Kao, Benjamin A. Miller, Raj Rao Nadakuditi, Matthew C. Schmidt, Kenneth Senne,
More informationFeature Selection for fmri Classification
Feature Selection for fmri Classification Chuang Wu Program of Computational Biology Carnegie Mellon University Pittsburgh, PA 15213 chuangw@andrew.cmu.edu Abstract The functional Magnetic Resonance Imaging
More informationBIG DATA TESTING: A UNIFIED VIEW
http://core.ecu.edu/strg BIG DATA TESTING: A UNIFIED VIEW BY NAM THAI ECU, Computer Science Department, March 16, 2016 2/30 PRESENTATION CONTENT 1. Overview of Big Data A. 5 V s of Big Data B. Data generation
More informationCENTRALITIES. Carlo PICCARDI. DEIB - Department of Electronics, Information and Bioengineering Politecnico di Milano, Italy
CENTRALITIES Carlo PICCARDI DEIB - Department of Electronics, Information and Bioengineering Politecnico di Milano, Italy email carlo.piccardi@polimi.it http://home.deib.polimi.it/piccardi Carlo Piccardi
More informationDemystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian
Demystifying movie ratings 224W Project Report Amritha Raghunath (amrithar@stanford.edu) Vignesh Ganapathi Subramanian (vigansub@stanford.edu) 9 December, 2014 Introduction The past decade or so has seen
More informationCollective Mind. Early Warnings of Systematic Failures of Equipment. Dr. Artur Dubrawski. Dr. Norman Sondheimer. Auton Lab Carnegie Mellon University
Collective Mind Early Warnings of Systematic Failures of Equipment Dr. Artur Dubrawski Auton Lab Carnegie Mellon University Dr. Norman Sondheimer University of Massachusetts Amherst 1 Collective Mind Unique
More informationMassive Data Analysis
Professor, Department of Electrical and Computer Engineering Tennessee Technological University February 25, 2015 Big Data This talk is based on the report [1]. The growth of big data is changing that
More informationHomework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)
Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in
More informationIEEE 802 network enhancements for the next decade Industry Connections Activity Initiation Document (ICAID) Version: 1.
IEEE 802 network enhancements for the next decade Industry Connections Activity Initiation Document (ICAID) Version: 1.3, 2017-03-16 IC17-001-01 Approved by the IEEE-SASB 23 March 2017 Instructions Instructions
More informationMission Impossible: Surviving Today s Flood of Critical Data
August 2017 Mission Impossible: Surviving Today s Flood of Critical Data Matt Rutledge Senior Vice President Business Marketing 8/10/17 Data Explosion Half a billion terabytes created daily Will continue
More informationActiveClean: Interactive Data Cleaning For Statistical Modeling. Safkat Islam Carolyn Zhang CS 590
ActiveClean: Interactive Data Cleaning For Statistical Modeling Safkat Islam Carolyn Zhang CS 590 Outline Biggest Takeaways, Strengths, and Weaknesses Background System Architecture Updating the Model
More informationCSCI5070 Advanced Topics in Social Computing
CSCI5070 Advanced Topics in Social Computing Irwin King The Chinese University of Hong Kong king@cse.cuhk.edu.hk!! 2012 All Rights Reserved. Outline Scale-Free Networks Generation Properties Analysis Dynamic
More informationLink Prediction for Social Network
Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue
More informationWHITE PAPER. Operationalizing Threat Intelligence Data: The Problems of Relevance and Scale
WHITE PAPER Operationalizing Threat Intelligence Data: The Problems of Relevance and Scale Operationalizing Threat Intelligence Data: The Problems of Relevance and Scale One key number that is generally
More informationNonparametric Importance Sampling for Big Data
Nonparametric Importance Sampling for Big Data Abigael C. Nachtsheim Research Training Group Spring 2018 Advisor: Dr. Stufken SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCES Motivation Goal: build a model
More informationAn Automated System for Data Attribute Anomaly Detection
Proceedings of Machine Learning Research 77:95 101, 2017 KDD 2017: Workshop on Anomaly Detection in Finance An Automated System for Data Attribute Anomaly Detection David Love Nalin Aggarwal Alexander
More informationSimilarity Ranking in Large- Scale Bipartite Graphs
Similarity Ranking in Large- Scale Bipartite Graphs Alessandro Epasto Brown University - 20 th March 2014 1 Joint work with J. Feldman, S. Lattanzi, S. Leonardi, V. Mirrokni [WWW, 2014] 2 AdWords Ads Ads
More informationObject and Action Detection from a Single Example
Object and Action Detection from a Single Example Peyman Milanfar* EE Department University of California, Santa Cruz *Joint work with Hae Jong Seo AFOSR Program Review, June 4-5, 29 Take a look at this:
More informationPageRank. CS16: Introduction to Data Structures & Algorithms Spring 2018
PageRank CS16: Introduction to Data Structures & Algorithms Spring 2018 Outline Background The Internet World Wide Web Search Engines The PageRank Algorithm Basic PageRank Full PageRank Spectral Analysis
More informationCase Study: Social Network Analysis. Part II
Case Study: Social Network Analysis Part II https://sites.google.com/site/kdd2017iot/ Outline IoT Fundamentals and IoT Stream Mining Algorithms Predictive Learning Descriptive Learning Frequent Pattern
More informationBubbleNet: A Cyber Security Dashboard for Visualizing Patterns
BubbleNet: A Cyber Security Dashboard for Visualizing Patterns Sean McKenna 1,2 Diane Staheli 2 Cody Fulcher 2 Miriah Meyer 1 1 University of Utah 2 MIT Lincoln Laboratory The Lincoln Laboratory portion
More informationPackage mmtsne. July 28, 2017
Type Package Title Multiple Maps t-sne Author Benjamin J. Radford Package mmtsne July 28, 2017 Maintainer Benjamin J. Radford Version 0.1.0 An implementation of multiple maps
More information2013 AWS Worldwide Public Sector Summit Washington, D.C.
2013 AWS Worldwide Public Sector Summit Washington, D.C. EMR for Fun and for Profit Ben Butler Sr. Manager, Big Data butlerb@amazon.com @bensbutler Overview 1. What is big data? 2. What is AWS Elastic
More informationReduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs
Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Alessandro Epasto J. Feldman*, S. Lattanzi*, S. Leonardi, V. Mirrokni*. *Google Research Sapienza U. Rome Motivation Recommendation
More informationNode Similarity. Ralucca Gera, Applied Mathematics Dept. Naval Postgraduate School Monterey, California
Node Similarity Ralucca Gera, Applied Mathematics Dept. Naval Postgraduate School Monterey, California rgera@nps.edu Motivation We talked about global properties Average degree, average clustering, ave
More informationFinding a needle in Haystack: Facebook's photo storage
Finding a needle in Haystack: Facebook's photo storage The paper is written at facebook and describes a object storage system called Haystack. Since facebook processes a lot of photos (20 petabytes total,
More informationGraFBoost: Using accelerated flash storage for external graph analytics
GraFBoost: Using accelerated flash storage for external graph analytics Sang-Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu and Arvind MIT CSAIL Funded by: 1 Large Graphs are Found Everywhere in Nature
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationCS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering
W10.B.0.0 CS435 Introduction to Big Data W10.B.1 FAQs Term project 5:00PM March 29, 2018 PA2 Recitation: Friday PART 1. LARGE SCALE DATA AALYTICS 4. RECOMMEDATIO SYSTEMS 5. EVALUATIO AD VALIDATIO TECHIQUES
More informationSentiment analysis under temporal shift
Sentiment analysis under temporal shift Jan Lukes and Anders Søgaard Dpt. of Computer Science University of Copenhagen Copenhagen, Denmark smx262@alumni.ku.dk Abstract Sentiment analysis models often rely
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationTELCOM2125: Network Science and Analysis
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning
More informationFraud Mobility: Exploitation Patterns and Insights
WHITEPAPER Fraud Mobility: Exploitation Patterns and Insights September 2015 2 Table of Contents Introduction 3 Study Methodology 4 Once a SSN has been Compromised, it Remains at Risk 4 Victims Remain
More informationImproved Detection of Low-Profile Probes and Denial-of-Service Attacks*
Improved Detection of Low-Profile Probes and Denial-of-Service Attacks* William W. Streilein Rob K. Cunningham, Seth E. Webster Workshop on Statistical and Machine Learning Techniques in Computer Intrusion
More informationLink Analysis in the Cloud
Cloud Computing Link Analysis in the Cloud Dell Zhang Birkbeck, University of London 2017/18 Graph Problems & Representations What is a Graph? G = (V,E), where V represents the set of vertices (nodes)
More informationSection 7.12: Similarity. By: Ralucca Gera, NPS
Section 7.12: Similarity By: Ralucca Gera, NPS Motivation We talked about global properties Average degree, average clustering, ave path length We talked about local properties: Some node centralities
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu //8 Jure Leskovec, Stanford CS6: Mining Massive Datasets High dim. data Graph data Infinite data Machine learning
More informationUNSUPERVISED LEARNING FOR ANOMALY INTRUSION DETECTION Presented by: Mohamed EL Fadly
UNSUPERVISED LEARNING FOR ANOMALY INTRUSION DETECTION Presented by: Mohamed EL Fadly Outline Introduction Motivation Problem Definition Objective Challenges Approach Related Work Introduction Anomaly detection
More informationInsight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products
What is big data? Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.
More informationMachine Learning Applications for Data Center Optimization
Machine Learning Applications for Data Center Optimization Jim Gao, Google Ratnesh Jamidar Indian Institute of Technology, Kanpur October 27, 2014 Outline Introduction Methodology General Background Model
More informationMulti-Centrality Graph Spectral Decompositions and Their Application to Cyber Intrusion Detection
Multi-Centrality Graph Spectral Decompositions and Their Application to Cyber Intrusion Detection Dr. Pin-Yu Chen 1 Dr. Sutanay Choudhury 2 Prof. Alfred Hero 1 1 Department of Electrical Engineering and
More informationBuilding an Effective Threat Intelligence Capability. Haider Pasha, CISSP, C EH Director, Security Strategy Emerging Markets Office of the CTO
Building an Effective Threat Intelligence Capability Haider Pasha, CISSP, C EH Director, Security Strategy Emerging Markets Office of the CTO The Race To Digitize Automotive Telematics In-vehicle entertainment
More informationThe HPEC Challenge Benchmark Suite
The HPEC Challenge Benchmark Suite Ryan Haney, Theresa Meuse, Jeremy Kepner and James Lebak Massachusetts Institute of Technology Lincoln Laboratory HPEC 2005 This work is sponsored by the Defense Advanced
More informationUNCLASSIFIED. R-1 ITEM NOMENCLATURE PE D8Z: Data to Decisions Advanced Technology FY 2012 OCO
Exhibit R-2, RDT&E Budget Item Justification: PB 2012 Office of Secretary Of Defense DATE: February 2011 BA 3: Advanced Development (ATD) COST ($ in Millions) FY 2010 FY 2011 Base OCO Total FY 2013 FY
More information