Structured prediction using the network perceptron

Similar documents
High throughput Data Analysis 2. Cluster Analysis

Learning a Distance Metric for Structured Network Prediction. Stuart Andrews and Tony Jebara Columbia University

A Dendrogram. Bioinformatics (Lec 17)

Structure of biological networks. Presentation by Atanas Kamburov

Properties of Biological Networks

Graph Theory. Graph Theory. COURSE: Introduction to Biological Networks. Euler s Solution LECTURE 1: INTRODUCTION TO NETWORKS.

Summary: What We Have Learned So Far

Kernel Principal Component Analysis: Applications and Implementation

Application of Support Vector Machine In Bioinformatics

Advanced Algorithms and Models for Computational Biology -- a machine learning approach

Response Network Emerging from Simple Perturbation

Unsupervised Learning

Latent Variable Models for the Analysis, Visualization and Prediction of Network and Nodal Attribute Data. Isabella Gollini.

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Basics of Network Analysis

Machine Learning in Biology

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Network-based auto-probit modeling for protein function prediction

Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008

CAIM: Cerca i Anàlisi d Informació Massiva

Comparison of Centralities for Biological Networks

Machine Learning (BSMC-GA 4439) Wenke Liu

Link Prediction for Social Network

Clustering analysis of gene expression data

Introduction to Bioinformatics

Signal Processing for Big Data

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Extracting Information from Complex Networks

Geometric-Edge Random Graph Model for Image Representation

UNIVERSITA DEGLI STUDI DI CATANIA FACOLTA DI INGEGNERIA

Problem 1: Complexity of Update Rules for Logistic Regression

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Graph-theoretic Properties of Networks

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Evaluating Classifiers

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

Evaluating Classifiers

Online Social Networks and Media

SVM Classification in -Arrays

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

Social Network Analysis

Perceptron as a graph

Introduction to Artificial Intelligence

Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching

Transductive Learning: Motivation, Model, Algorithms

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

Graphical Models, Bayesian Method, Sampling, and Variational Inference

10-701/15-781, Fall 2006, Final

Figure (5) Kohonen Self-Organized Map

Assignment 4 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran

Introduction to network metrics

Cycles in Random Graphs

Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data

Latent Space Model for Road Networks to Predict Time-Varying Traffic. Presented by: Rob Fitzgerald Spring 2017

An Introduction to Complex Systems Science

A Taxonomy of Semi-Supervised Learning Algorithms

Introduction to Networks and Business Intelligence

CS-E5740. Complex Networks. Scale-free networks

Bioinformatics - Lecture 07

CSCI5070 Advanced Topics in Social Computing

Network Traffic Measurements and Analysis

Parsimony-Based Approaches to Inferring Phylogenetic Trees

MSA220 - Statistical Learning for Big Data

ECS 234: Data Analysis: Clustering ECS 234

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Complex-Network Modelling and Inference

Community-Based Recommendations: a Solution to the Cold Start Problem

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

Mismatch String Kernels for SVM Protein Classification

Wednesday, March 8, Complex Networks. Presenter: Jirakhom Ruttanavakul. CS 790R, University of Nevada, Reno

Complex networks Phys 682 / CIS 629: Computational Methods for Nonlinear Systems

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

Sparse and large-scale learning with heterogeneous data

Kernel Methods & Support Vector Machines

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Machine Learning (BSMC-GA 4439) Wenke Liu

Double Self-Organizing Maps to Cluster Gene Expression Data

Vivaldi: : A Decentralized Network Coordinate System

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Network Thinking. Complexity: A Guided Tour, Chapters 15-16

How do microarrays work

Topology Enhancement in Wireless Multihop Networks: A Top-down Approach

Locally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005

5 Learning hypothesis classes (16 points)

Visual Representations for Machine Learning

Parametrizing the easiness of machine learning problems. Sanjoy Dasgupta, UC San Diego

Unsupervised Learning

Classification. Slide sources:

Overlay (and P2P) Networks

Challenges in Multiresolution Methods for Graph-based Learning

Example for calculation of clustering coefficient Node N 1 has 8 neighbors (red arrows) There are 12 connectivities among neighbors (blue arrows)

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

COMPRESSED DETECTION VIA MANIFOLD LEARNING. Hyun Jeong Cho, Kuang-Hung Liu, Jae Young Park. { zzon, khliu, jaeypark

Gene Clustering & Classification

Overview of Network Theory, I

Importance of IP Alias Resolution in Sampling Internet Topologies

Markov Networks in Computer Vision

Spectral Clustering X I AO ZE N G + E L HA M TA BA S SI CS E CL A S S P R ESENTATION MA RCH 1 6,

Transcription:

Structured prediction using the network perceptron Ta-tsen Soong Joint work with Stuart Andrews and Prof. Tony Jebara

Motivation A lot of network-structured data Social networks Citation networks Biological networks Important to discover the network structure y given the node attributes x Biological network structure = how cellular processes are regulated The network structure has dependency within y not i.i.d structured prediction Protein-protein interaction network

Properties of Networks What properties make edges in the network inter-dependent?? Degree distribution Power-law many nodes with few neighbors Few nodes with lots of neighbors Barabasi & Oltvai. 2004. Nature Review: Genetics Degree (connectivity)

Properties of Networks Network growth Preferential attachment: New nodes prefer to connect to nodes that already have many links. Infer incomplete networks Degree increase Old degree Clustering coefficients, motifs, modules, diameter of network, etc. Barabasi & Oltvai. 2004. Nature Review: Genetics

Our Goals Focus on protein-protein interaction network Given a set of nodes (proteins), find the network structure (edges) that indicates which proteins interact with which other proteins. Non-I.I.D. structured learning Network degree constraints Kernelization to take advantage of high-dimensional features Transductive learning

Protein 1 Overview Interaction between proteins 1 and 2 Maximum-weight b-matching problem Similarity matrix S Degree constraints j y = b k j, k k Adjacency matrix Y

Learning the similarity From features x i x j to similarity S ij Correlation, Euclidean distance, etc. Want to learn a good transformation from x to S, such that the output Y is most compatible with the true network structure Y= s.t. degree constraints j y = b k j, k k s = w f T j, k j, k Perceptron learning

Algorithms i.i.d perceptron network perceptron* Input: {( x y ), i = 1,..., n}, (0) w =0 i n x R, y { 1, + 1} for t=1 to max_iteration for i=1 to num_examples end if end i end t i (t) if sign( w, xi ) y ( t+ 1) ( t) w = w + xi yi i Input: X = [ x, x,..., x,..., x ] (0) w =0 11 12 Y = [ y, y,... y,..., y ] 11 12 for t=1 to max_iteration (t) T y arg max ( ) = Y ij ij nn nn w X Y s. t. Y satisfies degree constraints (lower and upper bounds) for all i, j if y end if end i,j end t y ( t ) true ij ij ( t+ 1) ( t) w = w + xij yij *Algorithm can use transductive learning and be kernelized. Details not shown

Experiments Network data: DIP database* Protein-protein interaction network 2,312 proteins 5,299 interactions (y) interactions labeled as +1 unseen protein pairs labeled as -1 Features (x): Microarray data: 349 experiments, 2,312 x 349 matrix Used ICA / PCA to reduce dimensionality *Xenarios I, et al. (2000) DIP: The Database of Interacting Proteins. Nucleic Acid Research

Results Exact degree constraints Non-exact degree constraints Effect of features Kernel diagonal to deal with unseparable data

Exact degree constraints 1,000 positives 1,000 negatives 5-fold cross-validation Can reach low error very quickly due to the tight degree constraints

Looser degree constraints More realistic in real-world problems. Ex. true degree -2/+5 1,000 positives 1,000 negatives Better than random. But the perceptron oscillates Why?? Linear kernel

Features Are microarray data good features? Try to add separable features to data to see if algorithm works Add two separable random Gaussian bumps of the same dimensions to the original positive and negative features respectively. Algorithm works for separable data. Microarray features might not be that good

Kernel diagonal Adding a constant C to the kernel diagonal to deal with unseparable (microarray) data. Helps the algorithm converge. Cross-validate to find the C that gives the best performance Area under the curve C

Future work Add more meaningful features. Ex. function annotation, sub-cellular localization Add degree bound estimation Add topological constraints Memory-efficient kernel computation Take into account how close each node s degree is to the expected degree.