Scalable Network Analysis

Size: px
Start display at page:

Download "Scalable Network Analysis"

Transcription

1 Inderjit S. Dhillon University of Texas at Austin COMAD, Ahmedabad, India Dec 20, 2013

2 Outline Unstructured Data - Scale & Diversity Evolving Networks Machine Learning Problems arising in Networks Recommender Systems Link Prediction Sign Prediction Formulation as Missing Value Estimation Scalable Algorithms NOMAD: Distributed matrix completion algorithm Results on Applications Conclusions 2

3 Structured Data Data organized into fields: Relational databases, spreadsheets, XML Highly optimized for storage & retrieval (e.g. using SQL) 3

4 Structured Data Focus is on data format, efficient storage & search Less or no uncertainty in semantics: e.g. businesses know the fields of the data 4

5 Unstructured Data Modern data is unstructured and diverse Networks, Graphs Text Images, Videos 5

6 Unstructured Data Much greater growth rate 6

7 Unstructured Data Dynamic aspects of unstructured data: Constantly evolving Uncertainties abound: What should I ask of the data? Seek insights Heterogeneity renders traditional database models inadequate 7

8 Unstructured Data Buzzwords - Big Data & Data Science Machine Learning: Predictive models for data Engineering perspective: Scale matters 8

9 Network Graphs Social networks (Friendship) APQ6 AQP1 AQP5 Bipartite networks (Membership, Ratings, etc.) MIP AQP2 GK Gene networks (Functional interaction) 9

10 Graph Evolution Social networks are highly dynamic Constantly grow, change quickly over time Users arrive/leave, relationships form/dissolve Understanding graph evolution is important 10

11 Graphs meet Machine Learning Network analysis: Understanding structure & evolution of networks Formulate predictive problems on the adjacency matrix of the graph Confluence of graph theory & machine learning 11

12 Users Recommender Systems Movies Link Prediction Netflix problem: 100M ratings, 0.5M users, 20K movies A Toy Problem In Comparison! Facebook growth 12

13 Users Recommender Systems Movies 13

14 Link Prediction in Social Networks?? Network at time T Network at time T + 1 Problem: Infer missing relationships from a given snapshot of the network 14

15 Predicting gene-disease links? 15

16 Signed Social Networks Like Dislike Like / Dislike? Sign Prediction Problem: Given a snapshot of the signed social network, predict the signs of missing edges 16

17 Formulation as Missing Value Estimation

18 Users Low-rank Matrix Completion Movies 18

19 Users Low-rank Matrix Completion Movies? 19

20 Low-rank Matrix Completion 20

21 Low-rank Matrix Completion min W 2R m k,h2r n k X (i,j)2 (A ij W i H T j ) 2 + (kw k 2 F + khk 2 F ) 21

22 Low-rank Matrix Completion? min W 2R m k,h2r n k X (i,j)2 (A ij W i H T j ) 2 + (kw k 2 F + khk 2 F ) 22

23 Low-rank Matrix Completion 3.44 min W 2R m k,h2r n k X (i,j)2 (A ij W i H T j ) 2 + (kw k 2 F + khk 2 F ) 23

24 Link Prediction Can be posed as matrix completion problem A (t) = Issue: Only positive relationships are observed Test Link Score (4,5) 1 (1,4) 1 (3,4) 1 (1,5) 1 24

25 Link Prediction Formulate Biased Matrix Completion Problem: min W 2R n k,h2r n k X (i,j)2 (A ij W i H T j ) 2 + X (i,j) /2 (A ij W i H T j ) 2 + (kw k 2 F + khk 2 F ) A (t) = Test Link Score (4,5) 0.59 (1,4) 0.59 (3,4) 0.67 (1,5)

26 Signed Social Networks Balanced Not balanced Friend of a friend is a friend Enemy of an enemy is a friend Social Balance [Harary,1953]: In real-world signed networks, triangles tend to be balanced 26

27 Signed Social Networks Theorem: All triangles in a network are balanced if and only if there exist two antagonistic groups. 27

28 Signed Social Networks Weakly Balanced Not balanced Relaxation: Weak balance Allow triangles with all negative edges 28

29 Signed Social Networks Theorem: All triangles in a network are weakly balanced if and only if there exist k antagonistic groups. 29

30 Sign Prediction Theorem: A k-weakly balanced signed network has rank at most k. Sign inference can be posed as low-rank matrix completion Theorem: If there are no small groups, the underlying network can be exactly recovered, under certain conditions. K. Chiang et al. Prediction and Clustering in Signed Networks: A Local to Global Perspective. To appear in JMLR. 30

31 Scalable Algorithms

32 Stochastic Gradient Sample random index (i,j) and update corresponding factors: w i w i ((A ij w T i h j )h j + w i ) h j h j ((A ij w T i h j )w i + h j ) Time per update O(k) Effective for very large-scale problems 32

33 Distributed Stochastic Gradient Descent (DSGD) [Gemulla et al. KDD 2011] Decoupled updates Easy to parallelize But communication & computation are interleaved 33

34 DSGD 34

35 DSGD Curse of the last reducer 35

36 DSGD 36

37 NOMAD Goal: Keep CPU & network simultaneously busy. Non-locking stochastic Multi-machine algorithm for Asynchronous & Decentralized matrix factorization Asynchronous distributed solution. 37

38 NOMAD Nomadic variables queue h 1 h 5 h 4 h 3 Native variables w 1 w 2 w 3 Workers w 4 h 2 38

39 NOMAD Nomadic variables queue h 1 h 5 h 3 Native variables Push h 4 w 1 w 2 w 3 Workers w 4 h 2 h 4 39

40 NOMAD 40

41 NOMAD 41

42 NOMAD 42

43 NOMAD 43

44 NOMAD 44

45 NOMAD 45

46 NOMAD 46

47 NOMAD 47

48 NOMAD 48

49 NOMAD 49

50 NOMAD 50

51 NOMAD Algorithm 1. Initialize: Randomly assign h j columns to worker queues 2. Parallel Foreach q in {1,2,...,p} 3. If queue[q] not empty then 4. (j, h j ) queue[q].pop() 5. for ( i, j) in (q) do j 6. Do SGD updates 7. end for 8. Sample q uniformly from {1,2,...,p} 9. queue[q ].push( (j, h j ) 10. end if 11. Parallel End 51

52 NOMAD Algorithm 1. Initialize: Randomly assign h j columns to worker queues 2. Parallel Foreach q in {1,2,...,p} 3. If queue[q] not empty then 4. (j, h j ) queue[q].pop() 5. for ( i, j) in (q) do j 6. Do SGD updates Distributed setting: 7. end for Write over the network 8. Sample q uniformly from {1,2,...,p} 9. queue[q ].push( (j, h j ) 10. end if 11. Parallel End Concurrent object 52

53 Algorithm Complexity Average space required per worker: O(mk/p + nk/p + /p) Average time for one sweep: O( k/p) 53

54 Results on Applications

55 Recommender Systems Netflix dataset: 2,649,429 users, 17,770 movies, ~100M ratings Multicore Distributed 55

56 Recommender Systems Synthetic dataset: ~85M users, 17,770 items, ~8.5B observations 56

57 Link Prediction Flickr dataset: 1.9M users & 42M links. Test set: sampled 5K users. 57

58 Predicting Gene-Disease Links CATAPULT(Blom et al.,2013) Katz Biased MF Test pair rank CATAPULT(Blom et al.,2013) Katz Biased MF Singleton gene rank Pr(Rank <= x) Pr(Rank <= x) Rank Rank 58

59 Sign Prediction Epinions dataset (+ve & -ve reviews): 131K nodes, 840K edges, 15% edges negative. MF-ALS is faster and achieves higher accuracy. MF-ALS takes 455 secs on network with 1.1M nodes & 120M edges 59

60 Conclusions Rapid growth of unstructured data demands scalable machine learning solutions for analysis Machine learning problems arising in network analysis can be cast in the matrix completion framework Our proposed asynchronous distributed algorithm NOMAD outperforms state-of-the-art matrix completion solvers Beyond Matrix Completion: Asynchronous distributed framework for solving machine learning problems 60

Scalable Clustering of Signed Networks Using Balance Normalized Cut

Scalable Clustering of Signed Networks Using Balance Normalized Cut Scalable Clustering of Signed Networks Using Balance Normalized Cut Kai-Yang Chiang,, Inderjit S. Dhillon The 21st ACM International Conference on Information and Knowledge Management (CIKM 2012) Oct.

More information

Social and Information Network Analysis Positive and Negative Relationships

Social and Information Network Analysis Positive and Negative Relationships Social and Information Network Analysis Positive and Negative Relationships Cornelia Caragea Department of Computer Science and Engineering University of North Texas June 14, 2016 To Recap Triadic Closure

More information

Positive and Negative Relations

Positive and Negative Relations Positive and Negative Relations CMSC 498J: Social Media Computing Department of Computer Science University of Maryland Spring 2016 Hadi Amiri hadi@umd.edu Lecture Topics Balanced and Unbalanced Networks

More information

Link Prediction for Social Network

Link Prediction for Social Network Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue

More information

Link Sign Prediction and Ranking in Signed Directed Social Networks

Link Sign Prediction and Ranking in Signed Directed Social Networks Noname manuscript No. (will be inserted by the editor) Link Sign Prediction and Ranking in Signed Directed Social Networks Dongjin Song David A. Meyer Received: date / Accepted: date Abstract Signed directed

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6 - Section - Spring 7 Lecture Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS6) Project Project Deadlines Feb: Form teams of - people 7 Feb:

More information

Positive and Negative Links

Positive and Negative Links Positive and Negative Links Web Science (VU) (707.000) Elisabeth Lex KTI, TU Graz May 4, 2015 Elisabeth Lex (KTI, TU Graz) Networks May 4, 2015 1 / 66 Outline 1 Repetition 2 Motivation 3 Structural Balance

More information

CS224W Final Report Emergence of Global Status Hierarchy in Social Networks

CS224W Final Report Emergence of Global Status Hierarchy in Social Networks CS224W Final Report Emergence of Global Status Hierarchy in Social Networks Group 0: Yue Chen, Jia Ji, Yizheng Liao December 0, 202 Introduction Social network analysis provides insights into a wide range

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 60 - Section - Fall 06 Lecture Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS6) Recommender Systems The Long Tail (from: https://www.wired.com/00/0/tail/)

More information

Distributed stochastic gradient descent for link prediction in signed social networks

Distributed stochastic gradient descent for link prediction in signed social networks Zhang et al. EURASIP Journal on Advances in Signal Processing (2019) 2019:3 https://doi.org/10.1186/s13634-019-0601-0 EURASIP Journal on Advances in Signal Processing RESEARCH Open Access Distributed stochastic

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/10/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

More information

Scalable Clustering of Signed Networks Using Balance Normalized Cut

Scalable Clustering of Signed Networks Using Balance Normalized Cut Scalable Clustering of Signed Networks Using Balance Normalized Cut Kai-Yang Chiang Dept. of Computer Science University of Texas at Austin kychiang@cs.utexas.edu Joyce Jiyoung Whang Dept. of Computer

More information

Graph Mining: Overview of different graph models

Graph Mining: Overview of different graph models Graph Mining: Overview of different graph models Davide Mottin, Konstantina Lazaridou Hasso Plattner Institute Graph Mining course Winter Semester 2016 Lecture road Anomaly detection (previous lecture)

More information

CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks

CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks Archana Sulebele, Usha Prabhu, William Yang (Group 29) Keywords: Link Prediction, Review Networks, Adamic/Adar,

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Recommender Systems II Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Recommender Systems Recommendation via Information Network Analysis Hybrid Collaborative Filtering

More information

IMP: A Message-Passing Algorithm for Matrix Completion

IMP: A Message-Passing Algorithm for Matrix Completion IMP: A Message-Passing Algorithm for Matrix Completion Henry D. Pfister (joint with Byung-Hak Kim and Arvind Yedla) Electrical and Computer Engineering Texas A&M University ISTC 2010 Brest, France September

More information

V2: Measures and Metrics (II)

V2: Measures and Metrics (II) - Betweenness Centrality V2: Measures and Metrics (II) - Groups of Vertices - Transitivity - Reciprocity - Signed Edges and Structural Balance - Similarity - Homophily and Assortative Mixing 1 Betweenness

More information

Supervised Random Walks

Supervised Random Walks Supervised Random Walks Pawan Goyal CSE, IITKGP September 8, 2014 Pawan Goyal (IIT Kharagpur) Supervised Random Walks September 8, 2014 1 / 17 Correlation Discovery by random walk Problem definition Estimate

More information

Analyzing Stochastic Gradient Descent for Some Non- Convex Problems

Analyzing Stochastic Gradient Descent for Some Non- Convex Problems Analyzing Stochastic Gradient Descent for Some Non- Convex Problems Christopher De Sa Soon at Cornell University cdesa@stanford.edu stanford.edu/~cdesa Kunle Olukotun Christopher Ré Stanford University

More information

AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks

AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan, Jiawei Han University of Illinois at Urbana-Champaign (UIUC) Facebook Inc. U.S. Army Research

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Practice Questions for Midterm

Practice Questions for Midterm Practice Questions for Midterm - 10-605 Oct 14, 2015 (version 1) 10-605 Name: Fall 2015 Sample Questions Andrew ID: Time Limit: n/a Grade Table (for teacher use only) Question Points Score 1 6 2 6 3 15

More information

QUINT: On Query-Specific Optimal Networks

QUINT: On Query-Specific Optimal Networks QUINT: On Query-Specific Optimal Networks Presenter: Liangyue Li Joint work with Yuan Yao (NJU) -1- Jie Tang (Tsinghua) Wei Fan (Baidu) Hanghang Tong (ASU) Node Proximity: What? Node proximity: the closeness

More information

Graph Data Management

Graph Data Management Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn Leitow Overview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of

More information

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov Algorithms and Applications in Social Networks 2017/2018, Semester B Slava Novgorodov 1 Lesson #1 Administrative questions Course overview Introduction to Social Networks Basic definitions Network properties

More information

Databases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016

Databases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016 + Databases and Information Retrieval Integration TIETS42 Autumn 2016 Kostas Stefanidis kostas.stefanidis@uta.fi http://www.uta.fi/sis/tie/dbir/index.html http://people.uta.fi/~kostas.stefanidis/dbir16/dbir16-main.html

More information

Graph Data Management Systems in New Applications Domains. Mikko Halin

Graph Data Management Systems in New Applications Domains. Mikko Halin Graph Data Management Systems in New Applications Domains Mikko Halin Introduction Presentation is based on two papers Graph Data Management Systems for New Application Domains - Philippe Cudré-Mauroux,

More information

Conflict Graphs for Parallel Stochastic Gradient Descent

Conflict Graphs for Parallel Stochastic Gradient Descent Conflict Graphs for Parallel Stochastic Gradient Descent Darshan Thaker*, Guneet Singh Dhillon* Abstract We present various methods for inducing a conflict graph in order to effectively parallelize Pegasos.

More information

Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets

Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets Nadathur Satish, Narayanan Sundaram, Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, M. Amber Hassaan, Shubho Sengupta, Zhaoming

More information

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets

More information

CSE 258. Web Mining and Recommender Systems. Advanced Recommender Systems

CSE 258. Web Mining and Recommender Systems. Advanced Recommender Systems CSE 258 Web Mining and Recommender Systems Advanced Recommender Systems This week Methodological papers Bayesian Personalized Ranking Factorizing Personalized Markov Chains Personalized Ranking Metric

More information

Massive Data Analysis

Massive Data Analysis Professor, Department of Electrical and Computer Engineering Tennessee Technological University February 25, 2015 Big Data This talk is based on the report [1]. The growth of big data is changing that

More information

WHITE PAPER NGINX An Open Source Platform of Choice for Enterprise Website Architectures

WHITE PAPER NGINX An Open Source Platform of Choice for Enterprise Website Architectures ASHNIK PTE LTD. White Paper WHITE PAPER NGINX An Open Source Platform of Choice for Enterprise Website Architectures Date: 10/12/2014 Company Name: Ashnik Pte Ltd. Singapore By: Sandeep Khuperkar, Director

More information

DeepWalk: Online Learning of Social Representations

DeepWalk: Online Learning of Social Representations DeepWalk: Online Learning of Social Representations ACM SIG-KDD August 26, 2014, Rami Al-Rfou, Steven Skiena Stony Brook University Outline Introduction: Graphs as Features Language Modeling DeepWalk Evaluation:

More information

Resource and Performance Distribution Prediction for Large Scale Analytics Queries

Resource and Performance Distribution Prediction for Large Scale Analytics Queries Resource and Performance Distribution Prediction for Large Scale Analytics Queries Prof. Rajiv Ranjan, SMIEEE School of Computing Science, Newcastle University, UK Visiting Scientist, Data61, CSIRO, Australia

More information

Spectral Clustering and Community Detection in Labeled Graphs

Spectral Clustering and Community Detection in Labeled Graphs Spectral Clustering and Community Detection in Labeled Graphs Brandon Fain, Stavros Sintos, Nisarg Raval Machine Learning (CompSci 571D / STA 561D) December 7, 2015 {btfain, nisarg, ssintos} at cs.duke.edu

More information

CSE 158 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize)

CSE 158 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize) CSE 158 Lecture 8 Web Mining and Recommender Systems Extensions of latent-factor models, (and more on the Netflix prize) Summary so far Recap 1. Measuring similarity between users/items for binary prediction

More information

Learning Low-rank Transformations: Algorithms and Applications. Qiang Qiu Guillermo Sapiro

Learning Low-rank Transformations: Algorithms and Applications. Qiang Qiu Guillermo Sapiro Learning Low-rank Transformations: Algorithms and Applications Qiang Qiu Guillermo Sapiro Motivation Outline Low-rank transform - algorithms and theories Applications Subspace clustering Classification

More information

Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data

Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data Jesse Read 1, Albert Bifet 2, Bernhard Pfahringer 2, Geoff Holmes 2 1 Department of Signal Theory and Communications Universidad

More information

Recommender System. What is it? How to build it? Challenges. R package: recommenderlab

Recommender System. What is it? How to build it? Challenges. R package: recommenderlab Recommender System What is it? How to build it? Challenges R package: recommenderlab 1 What is a recommender system Wiki definition: A recommender system or a recommendation system (sometimes replacing

More information

Harp-DAAL for High Performance Big Data Computing

Harp-DAAL for High Performance Big Data Computing Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big

More information

Graph-Parallel Problems. ML in the Context of Parallel Architectures

Graph-Parallel Problems. ML in the Context of Parallel Architectures Case Study 4: Collaborative Filtering Graph-Parallel Problems Synchronous v. Asynchronous Computation Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 20 th, 2014

More information

CS 5220: Parallel Graph Algorithms. David Bindel

CS 5220: Parallel Graph Algorithms. David Bindel CS 5220: Parallel Graph Algorithms David Bindel 2017-11-14 1 Graphs Mathematically: G = (V, E) where E V V Convention: V = n and E = m May be directed or undirected May have weights w V : V R or w E :

More information

Parallelism. CS6787 Lecture 8 Fall 2017

Parallelism. CS6787 Lecture 8 Fall 2017 Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 4, 2016 Outline Multi-core v.s. multi-processor Parallel Gradient Descent Parallel Stochastic Gradient Parallel Coordinate Descent Parallel

More information

Performance and Scalability

Performance and Scalability Measuring Critical Manufacturing cmnavigo: Affordable MES Performance and Scalability for Time-Critical Industry Environments Critical Manufacturing, 2015 Summary Proving Performance and Scalability Time-Critical

More information

Non-exhaustive, Overlapping k-means

Non-exhaustive, Overlapping k-means Non-exhaustive, Overlapping k-means J. J. Whang, I. S. Dhilon, and D. F. Gleich Teresa Lebair University of Maryland, Baltimore County October 29th, 2015 Teresa Lebair UMBC 1/38 Outline Introduction NEO-K-Means

More information

DeepFace: Closing the Gap to Human-Level Performance in Face Verification

DeepFace: Closing the Gap to Human-Level Performance in Face Verification DeepFace: Closing the Gap to Human-Level Performance in Face Verification Report on the paper Artem Komarichev February 7, 2016 Outline New alignment technique New DNN architecture New large dataset with

More information

Machine Learning Basics. Sargur N. Srihari

Machine Learning Basics. Sargur N. Srihari Machine Learning Basics Sargur N. srihari@cedar.buffalo.edu 1 Overview Deep learning is a specific type of ML Necessary to have a solid understanding of the basic principles of ML 2 Topics Stochastic Gradient

More information

CPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017

CPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017 CPSC 340: Machine Learning and Data Mining Recommender Systems Fall 2017 Assignment 4: Admin Due tonight, 1 late day for Monday, 2 late days for Wednesday. Assignment 5: Posted, due Monday of last week

More information

modern database systems lecture 10 : large-scale graph processing

modern database systems lecture 10 : large-scale graph processing modern database systems lecture 1 : large-scale graph processing Aristides Gionis spring 18 timeline today : homework is due march 6 : homework out april 5, 9-1 : final exam april : homework due graphs

More information

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell Constrained Convolutional Neural Networks for Weakly Supervised Segmentation Deepak Pathak, Philipp Krähenbühl and Trevor Darrell 1 Multi-class Image Segmentation Assign a class label to each pixel in

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Modularity CMSC 858L

Modularity CMSC 858L Modularity CMSC 858L Module-detection for Function Prediction Biological networks generally modular (Hartwell+, 1999) We can try to find the modules within a network. Once we find modules, we can look

More information

MIND: Machine Learning based Network Dynamics. Dr. Yanhui Geng Huawei Noah s Ark Lab, Hong Kong

MIND: Machine Learning based Network Dynamics. Dr. Yanhui Geng Huawei Noah s Ark Lab, Hong Kong MIND: Machine Learning based Network Dynamics Dr. Yanhui Geng Huawei Noah s Ark Lab, Hong Kong Outline Challenges with traditional SDN MIND architecture Experiment results Conclusion Challenges with Traditional

More information

Uncovering the Formation of Triadic Closure in Social Networks. Zhanpeng Fang and Jie Tang Tsinghua University

Uncovering the Formation of Triadic Closure in Social Networks. Zhanpeng Fang and Jie Tang Tsinghua University Uncovering the Formation of Triadic Closure in Social Networks Zhanpeng Fang and Jie Tang Tsinghua University 1 Triangle Laws Triangle is one of most basic human groups in social networks Friends of friends

More information

Structure of Social Networks

Structure of Social Networks Structure of Social Networks Outline Structure of social networks Applications of structural analysis Social *networks* Twitter Facebook Linked-in IMs Email Real life Address books... Who Twitter #numbers

More information

Challenges in Multiresolution Methods for Graph-based Learning

Challenges in Multiresolution Methods for Graph-based Learning Challenges in Multiresolution Methods for Graph-based Learning Michael W. Mahoney ICSI and Dept of Statistics, UC Berkeley ( For more info, see: http: // www. stat. berkeley. edu/ ~ mmahoney or Google

More information

Decentralized and Distributed Machine Learning Model Training with Actors

Decentralized and Distributed Machine Learning Model Training with Actors Decentralized and Distributed Machine Learning Model Training with Actors Travis Addair Stanford University taddair@stanford.edu Abstract Training a machine learning model with terabytes to petabytes of

More information

Why do we need graph processing?

Why do we need graph processing? Why do we need graph processing? Community detection: suggest followers? Determine what products people will like Count how many people are in different communities (polling?) Graphs are Everywhere Group

More information

Parallel Stochastic Gradient Descent: The case for native GPU-side GPI

Parallel Stochastic Gradient Descent: The case for native GPU-side GPI Parallel Stochastic Gradient Descent: The case for native GPU-side GPI J. Keuper Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Mark Silberstein Accelerated Computer

More information

TGNet: Learning to Rank Nodes in Temporal Graphs. Qi Song 1 Bo Zong 2 Yinghui Wu 1,3 Lu-An Tang 2 Hui Zhang 2 Guofei Jiang 2 Haifeng Chen 2

TGNet: Learning to Rank Nodes in Temporal Graphs. Qi Song 1 Bo Zong 2 Yinghui Wu 1,3 Lu-An Tang 2 Hui Zhang 2 Guofei Jiang 2 Haifeng Chen 2 TGNet: Learning to Rank Nodes in Temporal Graphs Qi Song 1 Bo Zong 2 Yinghui Wu 1,3 Lu-An Tang 2 Hui Zhang 2 Guofei Jiang 2 Haifeng Chen 2 1 2 3 Node Ranking in temporal graphs Temporal graphs have been

More information

Complex-Network Modelling and Inference

Complex-Network Modelling and Inference Complex-Network Modelling and Inference Lecture 8: Graph features (2) Matthew Roughan http://www.maths.adelaide.edu.au/matthew.roughan/notes/ Network_Modelling/ School

More information

Paradigm Shift of Database

Paradigm Shift of Database Paradigm Shift of Database Prof. A. A. Govande, Assistant Professor, Computer Science and Applications, V. P. Institute of Management Studies and Research, Sangli Abstract Now a day s most of the organizations

More information

Convex Optimization MLSS 2015

Convex Optimization MLSS 2015 Convex Optimization MLSS 2015 Constantine Caramanis The University of Texas at Austin The Optimization Problem minimize : f (x) subject to : x X. The Optimization Problem minimize : f (x) subject to :

More information

Balanced and partitionable signed graphs

Balanced and partitionable signed graphs A. Mrvar: Balanced and partitionable signed graphs 1 Balanced and partitionable signed graphs Signed graphs Signed graph is an ordered pair (G,σ), where: G = (V,L) is a graph with a set of vertices V and

More information

COMP6237 Data Mining Data Mining & Machine Learning with Big Data. Jonathon Hare

COMP6237 Data Mining Data Mining & Machine Learning with Big Data. Jonathon Hare COMP6237 Data Mining Data Mining & Machine Learning with Big Data Jonathon Hare jsh2@ecs.soton.ac.uk Contents Going to look at two case-studies looking at how we can make machine-learning algorithms work

More information

Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds

Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds Kevin Hsieh Aaron Harlap, Nandita Vijaykumar, Dimitris Konomis, Gregory R. Ganger, Phillip B. Gibbons, Onur Mutlu Machine Learning and Big

More information

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization Pedro Ribeiro (DCC/FCUP & CRACS/INESC-TEC) Part 1 Motivation and emergence of Network Science

More information

CS224W Project: Recommendation System Models in Product Rating Predictions

CS224W Project: Recommendation System Models in Product Rating Predictions CS224W Project: Recommendation System Models in Product Rating Predictions Xiaoye Liu xiaoye@stanford.edu Abstract A product recommender system based on product-review information and metadata history

More information

Extracting Communities from Networks

Extracting Communities from Networks Extracting Communities from Networks Ji Zhu Department of Statistics, University of Michigan Joint work with Yunpeng Zhao and Elizaveta Levina Outline Review of community detection Community extraction

More information

Metric Learning for Large-Scale Image Classification:

Metric Learning for Large-Scale Image Classification: Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka

More information

A Brief Look at Optimization

A Brief Look at Optimization A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest

More information

ActiveClean: Interactive Data Cleaning For Statistical Modeling. Safkat Islam Carolyn Zhang CS 590

ActiveClean: Interactive Data Cleaning For Statistical Modeling. Safkat Islam Carolyn Zhang CS 590 ActiveClean: Interactive Data Cleaning For Statistical Modeling Safkat Islam Carolyn Zhang CS 590 Outline Biggest Takeaways, Strengths, and Weaknesses Background System Architecture Updating the Model

More information

FastText. Jon Koss, Abhishek Jindal

FastText. Jon Koss, Abhishek Jindal FastText Jon Koss, Abhishek Jindal FastText FastText is on par with state-of-the-art deep learning classifiers in terms of accuracy But it is way faster: FastText can train on more than one billion words

More information

Apache Giraph. for applications in Machine Learning & Recommendation Systems. Maria Novartis

Apache Giraph. for applications in Machine Learning & Recommendation Systems. Maria Novartis Apache Giraph for applications in Machine Learning & Recommendation Systems Maria Stylianou @marsty5 Novartis Züri Machine Learning Meetup #5 June 16, 2014 Apache Giraph for applications in Machine Learning

More information

Networks in economics and finance. Lecture 1 - Measuring networks

Networks in economics and finance. Lecture 1 - Measuring networks Networks in economics and finance Lecture 1 - Measuring networks What are networks and why study them? A network is a set of items (nodes) connected by edges or links. Units (nodes) Individuals Firms Banks

More information

Case Study 4: Collaborative Filtering. GraphLab

Case Study 4: Collaborative Filtering. GraphLab Case Study 4: Collaborative Filtering GraphLab Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin March 14 th, 2013 Carlos Guestrin 2013 1 Social Media

More information

Local Community Detection in Dynamic Graphs Using Personalized Centrality

Local Community Detection in Dynamic Graphs Using Personalized Centrality algorithms Article Local Community Detection in Dynamic Graphs Using Personalized Centrality Eisha Nathan, Anita Zakrzewska, Jason Riedy and David A. Bader * School of Computational Science and Engineering,

More information

Epilog: Further Topics

Epilog: Further Topics Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Epilog: Further Topics Lecture: Prof. Dr. Thomas

More information

Web Personalization & Recommender Systems

Web Personalization & Recommender Systems Web Personalization & Recommender Systems COSC 488 Slides are based on: - Bamshad Mobasher, Depaul University - Recent publications: see the last page (Reference section) Web Personalization & Recommender

More information

Introduction to Data Management. Lecture #1 (Course Trailer )

Introduction to Data Management. Lecture #1 (Course Trailer ) Introduction to Data Management Lecture #1 (Course Trailer ) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Topics v Welcome to one

More information

Latent Space Model for Road Networks to Predict Time-Varying Traffic. Presented by: Rob Fitzgerald Spring 2017

Latent Space Model for Road Networks to Predict Time-Varying Traffic. Presented by: Rob Fitzgerald Spring 2017 Latent Space Model for Road Networks to Predict Time-Varying Traffic Presented by: Rob Fitzgerald Spring 2017 Definition of Latent https://en.oxforddictionaries.com/definition/latent Latent Space Model?

More information

CSE 190 Lecture 16. Data Mining and Predictive Analytics. Small-world phenomena

CSE 190 Lecture 16. Data Mining and Predictive Analytics. Small-world phenomena CSE 190 Lecture 16 Data Mining and Predictive Analytics Small-world phenomena Another famous study Stanley Milgram wanted to test the (already popular) hypothesis that people in social networks are separated

More information

over Multi Label Images

over Multi Label Images IBM Research Compact Hashing for Mixed Image Keyword Query over Multi Label Images Xianglong Liu 1, Yadong Mu 2, Bo Lang 1 and Shih Fu Chang 2 1 Beihang University, Beijing, China 2 Columbia University,

More information

Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea

Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea Descent w/modification Descent w/modification Descent w/modification Descent w/modification CPU Descent w/modification Descent w/modification Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea

More information

Algorithmic and Economic Aspects of Networks. Nicole Immorlica

Algorithmic and Economic Aspects of Networks. Nicole Immorlica Algorithmic and Economic Aspects of Networks Nicole Immorlica Syllabus 1. Jan. 8 th (today): Graph theory, network structure 2. Jan. 15 th : Random graphs, probabilistic network formation 3. Jan. 20 th

More information

Dynamic and Historical Shortest-Path Distance Queries on Large Evolving Networks by Pruned Landmark Labeling

Dynamic and Historical Shortest-Path Distance Queries on Large Evolving Networks by Pruned Landmark Labeling 2014/04/09 @ WWW 14 Dynamic and Historical Shortest-Path Distance Queries on Large Evolving Networks by Pruned Landmark Labeling Takuya Akiba (U Tokyo) Yoichi Iwata (U Tokyo) Yuichi Yoshida (NII & PFI)

More information

Mining Data that Changes. 17 July 2015

Mining Data that Changes. 17 July 2015 Mining Data that Changes 17 July 2015 Data is Not Static Data is not static New transactions, new friends, stop following somebody in Twitter, But most data mining algorithms assume static data Even a

More information

Graph based machine learning with applications to media analytics

Graph based machine learning with applications to media analytics Graph based machine learning with applications to media analytics Lei Ding, PhD 9-1-2011 with collaborators at Outline Graph based machine learning Basic structures Algorithms Examples Applications in

More information

JAVASCRIPT CHARTING. Scaling for the Enterprise with Metric Insights Copyright Metric insights, Inc.

JAVASCRIPT CHARTING. Scaling for the Enterprise with Metric Insights Copyright Metric insights, Inc. JAVASCRIPT CHARTING Scaling for the Enterprise with Metric Insights 2013 Copyright Metric insights, Inc. A REVOLUTION IS HAPPENING... 3! Challenges... 3! Borrowing From The Enterprise BI Stack... 4! Visualization

More information

Progress Report: Collaborative Filtering Using Bregman Co-clustering

Progress Report: Collaborative Filtering Using Bregman Co-clustering Progress Report: Collaborative Filtering Using Bregman Co-clustering Wei Tang, Srivatsan Ramanujam, and Andrew Dreher April 4, 2008 1 Introduction Analytics are becoming increasingly important for business

More information

LDA for Big Data - Outline

LDA for Big Data - Outline LDA FOR BIG DATA 1 LDA for Big Data - Outline Quick review of LDA model clustering words-in-context Parallel LDA ~= IPM Fast sampling tricks for LDA Sparsified sampler Alias table Fenwick trees LDA for

More information

Know your neighbours: Machine Learning on Graphs

Know your neighbours: Machine Learning on Graphs Know your neighbours: Machine Learning on Graphs Andrew Docherty Senior Research Engineer andrew.docherty@data61.csiro.au www.data61.csiro.au 2 Graphs are Everywhere Online Social Networks Transportation

More information

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data

More information

arxiv: v1 [cs.si] 8 Jun 2012

arxiv: v1 [cs.si] 8 Jun 2012 Multi-Scale Link Prediction Donghyuk Shin Si Si Inderjit S. Dhillon arxiv:1206.1891v1 [cs.si] 8 Jun 2012 Abstract The automated analysis of social networks has become an important problem due to the proliferation

More information

STREAMING RANKING BASED RECOMMENDER SYSTEMS

STREAMING RANKING BASED RECOMMENDER SYSTEMS STREAMING RANKING BASED RECOMMENDER SYSTEMS Weiqing Wang, Hongzhi Yin, Zi Huang, Qinyong Wang, Xingzhong Du, Quoc Viet Hung Nguyen University of Queensland, Australia & Griffith University, Australia July

More information

Survey on Process in Scalable Big Data Management Using Data Driven Model Frame Work

Survey on Process in Scalable Big Data Management Using Data Driven Model Frame Work Survey on Process in Scalable Big Data Management Using Data Driven Model Frame Work Dr. G.V. Sam Kumar Faculty of Computer Studies, Ministry of Education, Republic of Maldives ABSTRACT: Data in rapid

More information

Databases 2 (VU) ( / )

Databases 2 (VU) ( / ) Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:

More information