Learning Transferable Features with Deep Adaptation Networks

Similar documents
arxiv: v1 [cs.cv] 6 Jul 2016

arxiv: v2 [cs.lg] 17 Aug 2017

Adapting Deep Networks Across Domains, Modalities, and Tasks Judy Hoffman!

Partial Adversarial Domain Adaptation

Unsupervised Domain Adaptation by Domain Invariant Projection

Progress on Generative Adversarial Networks

Lightweight Unsupervised Domain Adaptation by Convolutional Filter Reconstruction

arxiv: v2 [cs.cv] 28 Mar 2018

Unsupervised domain adaptation of deep object detectors

Geodesic Flow Kernel for Unsupervised Domain Adaptation

Unsupervised Domain Adaptation by Backpropagation. Chih-Hui Ho, Xingyu Gu, Yuan Qi

arxiv: v2 [cs.cv] 18 Feb 2014

Deep Learning with Tensorflow AlexNet

arxiv: v2 [cs.cv] 17 Nov 2016 Abstract

arxiv: v1 [cs.cv] 22 Jun 2017

Open Set Domain Adaptation

arxiv: v2 [cs.ro] 26 Oct 2018

RECOGNIZING HETEROGENEOUS CROSS-DOMAIN DATA VIA GENERALIZED JOINT DISTRIBUTION ADAPTATION

A Novel Representation and Pipeline for Object Detection

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

arxiv: v1 [cs.cv] 30 Dec 2018

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

arxiv: v2 [cs.lg] 3 Nov 2018

arxiv: v1 [cs.cv] 19 Nov 2018

Global Optimality in Neural Network Training

Deep Neural Networks:

Transitive Hashing Network for Heterogeneous Multimedia Retrieval

Deep Adversarial Attention Alignment for Unsupervised Domain Adaptation: the Benefit of Target Expectation Maximization

Deep Model Adaptation using Domain Adversarial Training

Bilinear Models for Fine-Grained Visual Recognition

Return of the Devil in the Details: Delving Deep into Convolutional Nets

Efficient Convolutional Network Learning using Parametric Log based Dual-Tree Wavelet ScatterNet

Deep Learning for Computer Vision II

Transfer Learning. Style Transfer in Deep Learning

Deep Asymmetric Transfer Network for Unbalanced Domain Adaptation

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

Domain Transfer SVM for Video Concept Detection

Channel Locality Block: A Variant of Squeeze-and-Excitation

Computer Vision Lecture 16

Computer Vision Lecture 16

Unsupervised Deep Learning. James Hays slides from Carl Doersch and Richard Zhang

Transfer Feature Representation via Multiple Kernel Learning

Machine Learning. MGS Lecture 3: Deep Learning

Transfer learning for the Probabilistic Classification Vector Machine

Deep Generative Models and a Probabilistic Programming Library

arxiv: v2 [cs.cv] 1 Aug 2016

arxiv: v1 [cs.cv] 20 Dec 2016

VisDA: A Synthetic-to-Real Benchmark for Visual Domain Adaptation

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

Domain Generalization based on Transfer Component Analysis

arxiv: v1 [cs.cv] 15 Aug 2016

REVISITING BATCH NORMALIZATION FOR PRACTICAL DOMAIN ADAPTATION

Divide and Conquer Kernel Ridge Regression

An Exploration of Computer Vision Techniques for Bird Species Classification

Computer Vision Lecture 16

A GENERALIZED KERNEL-BASED RANDOM K-SAMPLESETS METHOD FOR TRANSFER LEARNING * J. TAHMORESNEZHAD AND S. HASHEMI **

Unsupervised Domain Adaptation with Regularized Domain Instance Denoising

Domain Generalization with Adversarial Feature Learning

Real-time Object Detection CS 229 Course Project

arxiv: v2 [cs.cv] 27 Nov 2017

DeCAF: a Deep Convolutional Activation Feature for Generic Visual Recognition

In Defense of Fully Connected Layers in Visual Representation Transfer

Deep Learning and Its Applications

Overall Description. Goal: to improve spatial invariance to the input data. Translation, Rotation, Scale, Clutter, Elastic

Improving One-Shot Learning through Fusing Side Information

Real-Time Document Image Classification using Deep CNN and Extreme Learning Machines

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Fully Convolutional Networks for Semantic Segmentation

Feature Selection for Transfer Learning

Landmarks-based Kernelized Subspace Alignment for Unsupervised Domain Adaptation

Deep Unsupervised Convolutional Domain Adaptation

Domain Separation Networks

Deep Learning on Graphs

Part Localization by Exploiting Deep Convolutional Networks

arxiv: v1 [cs.cv] 29 Sep 2016

Introduction to Deep Learning

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset

Feature Visualization

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Distant Domain Transfer Learning

arxiv: v1 [cs.cv] 17 Jan 2019

Metric Learning for Large-Scale Image Classification:

Deep Visual Domain Adaptation: A Survey

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Alternatives to Direct Supervision

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset

Layerwise Interweaving Convolutional LSTM

END-TO-END CHINESE TEXT RECOGNITION

arxiv: v1 [cs.cv] 1 Dec 2017

Transfer Learning Using Rotated Image Data to Improve Deep Neural Network Performance

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Perceptron: This is convolution!

arxiv: v3 [cs.lg] 1 Mar 2017

PUnDA: Probabilistic Unsupervised Domain Adaptation for Knowledge Transfer Across Visual Categories

Spatial Localization and Detection. Lecture 8-1

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

Transcription:

Learning Transferable Features with Deep Adaptation Networks Mingsheng Long, Yue Cao, Jianmin Wang, Michael I. Jordan Presented by Changyou Chen October 30, 2015 1 Changyou Chen Learning Transferable Features with Deep Adaptation Networks

Outline Introduction 1 Introduction 2 Deep Adaptation Networks 3 Experiments 2 Changyou Chen Learning Transferable Features with Deep Adaptation Networks

Contribution Introduction Proposes a deep architecture for transfer learning. Based on a deep convolution neural network (AlexNet [Krizhevshy et al., 2012]). It is essentially an CNN with a particular regularizer, such that the distance between the distributions generating the source domain data and target domain data is minimized. 3 Changyou Chen Learning Transferable Features with Deep Adaptation Networks

Introduction Transfer learning (domain adaptation) This paper considers the unsupervised/semi-supervised domain adaptation setting. Given source domain D s = {(xi s,ys i )}n s i=1 generated from distribution p, and unlabelled target domain D t = {xj t}n t j=1 (unsupervised), or partially labelled D t = {(xi t,yt i )m t i=1,{xt j }} (semi-supervised), generated from distribution q. p and q are usually unknown. Transfer learning aims to build a classifier y = θ(x), which can minimize target risk ε t (θ) = Pr (x,y) q [θ(x) y], using source supervision: - in this paper, the classifier is built based on an CNN the AlexNet [Krizhevshy et al., 2012] 4 Changyou Chen Learning Transferable Features with Deep Adaptation Networks

Introduction The AlexNet [Krizhevshy et al., 2012] 8 layers deep models, with the first five layers being CNN, the 6-7 layers fully connected layers, and the last layer a softmax layer. It got state-of-the-art classification performance on imagenet in 2012. Building block for many state-of-the-art models. 5 Changyou Chen Learning Transferable Features with Deep Adaptation Networks

Introduction Multi-kernel maximum mean discrepancy (MK-MMD) MK-MMD describes distances between distributions in a reproducing kernel Hilbert space (RKHS). Let H k be an RKHS with kernel k. The mean embedding of a distribution p in H k is a unique element µ k (p) such that E x p f (x) =< f (x), µ k (p) > Hk, f H k. Given the feature map φ, MK-MMD defines the following objective as the RKHS distance between the mean embeddings of p and q: d 2 k(p,q) E p [φ(x s )] E q [φ(x t )] 2 H k (1) The kernel associated with φ is the convex combination of m PSD kernels {k u }: { } K k = m u=1 β u k u : m u=1 β u = 1,β u 0, u 6 Changyou Chen Learning Transferable Features with Deep Adaptation Networks (2)

Outline Deep Adaptation Networks 1 Introduction 2 Deep Adaptation Networks 3 Experiments 7 Changyou Chen Learning Transferable Features with Deep Adaptation Networks

Deep Adaptation Networks Deep adaptation networks (DAN) Two parallel AlexNets with sharing: - one for the source domain, the other for the target domain - first 5 convolutional layers are shared, with the first three fixed pretrain on the source domain, last two fine-tuned in the training using the target domain data - last 3 layers are individual, feedforward nets - last layer is the softmax layer - use MK-MMD to regularize that the distributions p and q generating the data are close to each other in the RKHS 8 Changyou Chen Learning Transferable Features with Deep Adaptation Networks

Deep Adaptation Networks Deep adaptation networks (DAN) Let Θ = {W l,b l } L l=1 the set of all model parameters, the objective function of DAN is: n 1 a min Θ n a J(θ(xi a ),y a 8 i ) +λ dk(d 2 l s,d l t) (3) i=1 l=6 } {{ }} {{ } CNN MK-MMD - (xi a,ya i ): labeled input data; θ(xa i ): softmax output; J: cross-entropy loss function - D l = {h l i }: l-th layer hidden representation - dk 2(Dl s,d l t) = E x s x sk(xs,x s ) + E x t x tk(xt,x t ) 2E x s x tk(xs,x t ): MK-MMD between the source and target Use an unbiased estimation of dk 2(Dl s,d l t) in SGD: dk(d 2 l s,d l t) = 2 n s /2 n s g k (z i ), (4) i=1 with z i (x s 2i 1,xs 2i,xt 2i 1,xt 2i ) and g k (z i ) k(x s 2i 1,xs 2i ) + k(xt 2i 1,xt 2i ) k(xs 2i 1,xt 2i ) k(xs 2i,xt 2i 1 ). 9 Changyou Chen Learning Transferable Features with Deep Adaptation Networks

Deep Adaptation Networks Deep adaptation networks (DAN) The gradient can be calculated as: - - J(z i ) Θ l g k (z l i ) Θ l Θ l = J(z i) Θ l + λ g k(z l i ) Θ l (5) is the same as in standard CNN can also be calculated easily Learning kernel weights β: - according to [Gretton et al., 2012], this is equivalent to: min β T (Q + εi)β, (6) d T β=1,β 0 with Q a matrix constructed from g k (z i ). 10 Changyou Chen Learning Transferable Features with Deep Adaptation Networks

Deep Adaptation Networks Theoretical property Theorem Let ε (θ) = Pr (x,y) [θ(x) y] be the expected risk of the source/target domain, then ε t (θ) ε s (θ) + 2d k (p,q) + C, (7) where C is a constant of the complexity of hypothesis space and the risk of an ideal hypothesis for both domains. 11 Changyou Chen Learning Transferable Features with Deep Adaptation Networks

Outline Experiments 1 Introduction 2 Deep Adaptation Networks 3 Experiments 12 Changyou Chen Learning Transferable Features with Deep Adaptation Networks

Datasets Experiments Office-31: - 4,652 images with 31 categories collected from: Amazon (A), Webcam (W) and DSLR (D) - evaluate transfers: A W, D W, W D, A D, D A, W A Office-10 + Caltech-10: - 10 categories shared by the Office-31 and Caltech-256 (C) datasets - evaluate transfers: A C, W C, D C, C A, C W, C D 13 Changyou Chen Learning Transferable Features with Deep Adaptation Networks

Setup Experiments Compared with TCA [Pan et al., 2011], GFK [Gong et al., 2012] (shallow models), and CNN [Krizhevshy et al., 2012], LapCNN [Weston et al., 2008], DDC [Tzeng et al., 2014] (deep models). Several variants of DAN: - DAN 7 : DAN with layer 7 to impose the MK-MMD - DAN 8 : DAN with layer 8 to impose the MK-MMD - DAN SK : DAN with a single kernel MMD Using Gaussian kernels with varying bandwidth (variance). Fix convolution layers conv1 conv3 with a pretrain on the source data with the AlexNet, fine-tune conv4 conv5 and fully connected layers fc6 fc8. 14 Changyou Chen Learning Transferable Features with Deep Adaptation Networks

Experiments Office-31: unsupervised 15 Changyou Chen Learning Transferable Features with Deep Adaptation Networks

Experiments Office-10 + Caltech-10: unsupervised 16 Changyou Chen Learning Transferable Features with Deep Adaptation Networks

Experiments Office-31: semi-supervised Deep models outperforms shallow models. Existing deep models can not deal well with the challenge of domain discrepancy. Multiple layers distribution adaptation (DAN) is better than single layer adaptation (DAN 7 or DAN 8 ), also better than single kernel variant (DAN SK ). 17 Changyou Chen Learning Transferable Features with Deep Adaptation Networks

Experiments Feature embedding with t-sne 18 Changyou Chen Learning Transferable Features with Deep Adaptation Networks

Experiments Thanks for your attention!!! 19 Changyou Chen Learning Transferable Features with Deep Adaptation Networks

References I Experiments Gretton, A., Sriperumbudur, B., Sejdinovic, D., Strathmann, H., Balakrishnan, S., Pontil, M., Fukumizu, K.: Optimal kernel choice for large-scale two-sample tests. NIPS (2012) Pan, S. J., Tsang, I. W., Kwok, J. T., Yang, Q.: Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks and Learning Systems (2011) Gong, B., Shi, Y., Sha, F., Grauman, K.: Geodesic flow kernel for unsupervised domain adaptation. CVPR (2012) Krizhevshy, A., Sutskever, I., Hinton, G. E.: Imagenet classification with deep convolutional neural networks. NIPS (2012) Weston, J., Rattle, F., Collobert, R.: Deep learning via semi-supervised embedding. ICML (2008) Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., Darrell, T.: Deep domain confusion: Maximising for domain invariance. Technical report, arxiv:1412.3474 (2014) 20 Changyou Chen Learning Transferable Features with Deep Adaptation Networks