Deep Learning via Semi-Supervised Embedding. Jason Weston, Frederic Ratle and Ronan Collobert Presented by: Janani Kalyanam

Size: px

Start display at page:

Download "Deep Learning via Semi-Supervised Embedding. Jason Weston, Frederic Ratle and Ronan Collobert Presented by: Janani Kalyanam"

Britton Stevenson
6 years ago
Views:

1 Deep Learning via Semi-Supervised Embedding Jason Weston, Frederic Ratle and Ronan Collobert Presented by: Janani Kalyanam

2 Review Deep Learning Extract low-level features first. Extract more complicated features as we progress. Called pre-training. Perform supervised task at the end, and fine tune weights by back propagation.

3 Review Deep Learning, contd. Training Algorithm Labels, y Back propagation Input

4 Review Deep Learning, contd. Input Input, approximation mimimize x f dec (f enc (x)) 2

5 Authors point of view Shallow methods give nice insights to the problem, but are restrictive. Deep methods are complicated. Moreover, all the unsupervised methods proposed, like RBMs or auto-associators seem to be different from existing unsupervised learning techniques.

6 Authors point of view Shallow methods give nice insights to the problem, but are restrictive. Deep methods are complicated. Moreover, all the unsupervised methods proposed, like RBMs or auto-associators seem to be different from existing unsupervised learning techniques. Why not try to borrow the nice ideas from shallow methods and put it in a deep learning framework?

7 Proposition Choose an unsupervised learning algorithm (that already exists in shallow literature) Choose a model with deep architecture The unsupervised learning is plugged into any layer of the architecture as an auxiliary task (as opposed to learning the unsupervised task first, and then performing fine-tuning, or back propagation) Train supervised and unsupervised tasks simultaneously

8 Outline Review some embedding algorithms Embedding algorithms used in a shallow architecure How do the authors apply the embedding algorithm to a deep architecture? Experiments

9 Outline Review some embedding algorithms Embedding algorithms used in a shallow architecure How do the authors apply the embedding algorithm to a deep architecture? Experiments

10 Review: Embedding Algorithms General formulation minimize α U L(f(x i,α),f(x j,α),w ij ) i,j=1 f(x) R n is an embedding to be learned given x R d L is a loss function between pairs of example W is a matrix of similarity/dissimilarity

11 Review: Embedding Algorithm (contd.) Multidimensional Scaling Preserves distances between points while embedding them into a lower dimensional space L(f i,f j,w ij ) = ( f i f j W ij ) 2

12 Review: Embedding Algorithm contd. Laplacian Eigen Maps Create a sparse, connected graph using some notion of neighbors. Create a weight matrix using k-nn or heat kernals: exp (x i x j )/(scaling)

13 Review: Embedding Algorithms contd. Formulation: L(f i,f j,w ij ) = ij ij W ij f i f j 2 Impose suitable constraints to prevent trivial solutions.

14 Review: Embedding Algorithm (contd.) Margin based loss function for Siamese Networks Encourages similar examples to be close, and separates dissimilar ones at least by margin m { fi f L(f i,f j,w ij ) = j 2 if W ij = 1 max(0,m f i f j 2 ) if W ij = 0

15 Outline Review some embedding algorithms Embedding algorithms used in a shallow architecure How do the authors apply the embedding algorithm to a deep architecture? Experiments

16 Review: Embedding in shallow architecture Label Propagation min L i=1 L+U f i y i 2 +λ i,j=1 W ij f i f j 2 Encourages examples with high similarity value to get the same label Due to transitivity, neighbors or neighbors also get the same label

17 Example

18 Example, contd

19 Example, contd

20 Review: Embedding in shallow architecture LapSVM min w 2 +γ L i=1 L+U H(y i,f(x i ))+λ i,j=1 W ij f(x i ) f(x j ) 2 First two terms are from SVM formulation, third term includes unlabeled data

21 Outline Review some embedding algorithms Embedding algorithms used in a shallow architecure How do the authors apply the embedding algorithm to a deep architecture? Experiments

22 Semi-supervised learning in Deep architecture

23 Semi-supervised learning in Deep architecture Deep learning set-up f(x) = h 3 (h 2 (h 1 (x))) Supervised training is to minimize L(f(x i ),y i ) Can add unsupervised training to any of the layers Output: L(f(xi ),f(x j ),W ij ) Intermediate: L(h 2 (h 1 (x i )),h 2 (h 1 (x j )),W ij ) Auxiliary: L(e(xi ),e(x j ),W ij ) where e(x i ) = e(h 2 (h 1 (x)))

24 Outline Review some embedding algorithms Embedding algorithms used in a shallow architecure How do the authors apply the embedding algorithm to a deep architecture? Experiments

25 Experiments Small datasets Train g50c Text Uspst SVM TSVM LapSVM NN EmbedNN

26 MNIST 2 layers, crossvalidate over the number of hidden units, and learning rate. W ij is binary according to 10-nn criterion. Train 1h 6h 1k 3k SVM TSVM RBM SESM DBN-NCA DBN-rNCA NN EmbedNN-O EmbedI EmbedA

27 MNIST 50 hidden units on each layer. Classical NN compared to EmbedNN-O and EmberNN-ALL More sophisticated experiments on video data by taking images from consecutive streams for encoding W matrix. Train NN EmbedNN-O EmbedNN-ALL

28 Take away.. Unsupervised learning as an auxiliary tasks seems to work well.

29 The End Thank you!

Large Scale Manifold Transduction

Large Scale Manifold Transduction Michael Karlen, Jason Weston, Ayse Erkan & Ronan Collobert NEC Labs America, Princeton, USA Ećole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland New York University,