Regression in Deep Learning: Siamese and Triplet Networks
|
|
- Mae Lindsey
- 5 years ago
- Views:
Transcription
1 Regression in Deep Learning: Siamese and Triplet Networks Tu Bui, John Collomosse Centre for Vision, Speech and Signal Processing (CVSSP) University of Surrey, United Kingdom Leonardo Ribeiro, Tiago Nazare, Moacir Ponti Institute of Mathematics and Computer Sciences (ICMC) University of Sao Paulo, Brazil
2 Content The regression problem Siamese network and contrastive loss Triplet network and triplet loss Training tricks Regression application: sketch-based image retrieval Limitations and future work 2
3 top-5 error (%) Lower is better Revolution of deep learning in classification ImageNet ILSVRC winner shallow Human 6% AlexNet 15.3 ZFNet GoogleNet ResNet 2.99 Ensemble year 2.25 SENet 3
4 Classification vs. Regression Classification - Discrete set of outputs - Output: label/class/category Regression - Continuous valued output - Output: embedding feature x n 0 x 4 x 3 x2 x 1 4
5 Regression example: intra-domain learning Face identification Tracking Schroff et al. CVPR 2015 Wang & Gupta ICCV
6 Regression example: cross-domain learning Multi-modality visual search duck Language model 3D model photo model sketch model Skip-gram voxnet AlexNet SketchANet Embedding space 6
7 Conventional methods for cross-domain regression Step 1 Step 2 Source data SIFT, HoG, SURF Local features BoW, GMM Global features Learnable transform matrix *M transformed features Target data Local features Global features Embedding space Problem: assume linear transformation between two domains. 7
8 End-to-end regression with deep learning End-to-end learning Source data Layer 1 Layer 2 Layer n target data Embedding space Layer 1 Layer 2 Layer m Multi-stream network 8
9 End-to-end regression with multi-stream networks Open questions: Network designs? Loss function to be used? 9
10 Using output of classification model as feature? - Not intuitive: different objective function - Cross-domain learning: training a classification network for each domain separately does not guarantee a common embedding. softmax loss fc6 fc7 softmax loss fc6 fc7 10
11 Content The regression problem Siamese network and contrastive loss Triplet network and triplet loss Training tricks Regression application: sketch-based image retrieval Limitations and future work 11
12 Siamese network and contrastive loss - Siamese (2-branch) network x 1 x 2 - Given an input training pair (x 1,x 2 ): o Label: o y = ቊ 0 if x 1, x 2 similar pair 1 if x 1, x 2 dissimilar pair Network output: f W1 g W2 a = f W 1, x 1 p = g W 2, x 2 o Euclidean distance between outputs: D W 1, W 2, x 1, x 2 = a p 2 = f W 1, x 1 g W 2, x 2 2 a=f(w 1,x 1 ) p=g(w 2,x 2 ) L(a,p) 12
13 Siamese network and contrastive loss - Contrastive loss equation: x 1 x 2 L W 1, W 2, x 1, x 2 = y D y max 0, m D 2 D = a p 2 = f W 1, x 1 g W 2, x 2 2 y = ቊ 0 if x 1, x 2 similar pair 1 if x 1, x 2 dissimilar pair f W1 g W2 margin m: desirable distance for dissimilar pair (x 1,x 2 ) - Training: argmin W 1,W 2 L a=f(w 1,x 1 ) p=g(w 2,x 2 ) L(a,p) 13
14 Siamese network and contrastive loss Contrastive loss functions: - Standard form* L y=0 L(a, p) = y D y {max 0, m D) 2 L y=1 - Alternative form** L y=0 L a, p = y D y {max(0, m D2 )} L y=1 *Hadsell et al. CVPR 2006 **Chopra et al. CVPR
15 Content The regression problem Siamese network and contrastive loss Triplet network and triplet loss Training tricks Regression application: sketch-based image retrieval Limitations and future work 15
16 Triplet network and triplet loss - Triplet (3-branch) network x a x p x n o Given a training triplet (x a,x p,x n ): x a anchor; x p positive (similar to x a ); x n negative (dissimilar to x a ) o Pos/neg branches always share weights. f W1 g W2 g W2 o Anchor branch can share weights (intra-domain learning) or not (cross-domain learning). o Network outputs: a = f W 1, x a p = g W 2, x p n = g(w 2, x n ) a=f(w 1,x a ) p=g(w 2,x p ) L(a, p, n) n=g(w 2,x n ) 16
17 Triplet network and triplet loss Triplet loss equation: x a x p x n L a, p, n = 1 2 {max(0, m + D2 (a, p) D 2 (a, n)} o Standard form*: f W1 g W2 g W2 D u, v = u v 2 o Alternative form**: D u, v = 1 *Schroff et al. CVPR 2015 **Wang et al. ICCV 2015 u. v u 2 v 2 a=f(w 1,x a ) p=g(w 2,x p ) L(a, p, n) n=g(w 2,x n ) 17
18 Siamese vs. Triplet n m n n a p a p a p Before training Contrastive loss m Triplet loss L a, p = 1 2 (1 y) a p y {max(0, m a p 2 2 } L a, p, n = 1 2 {max(0, m + + a p 2 2 a n 2 2 } 18
19 Siamese or triplet? Depending on data, training strategies, network design and more: - Siamese superior Radenovie et al. ECCV Triplet superior o Hoffer & Ailon. SBPR o Bui et al. arxiv
20 Content The regression problem Siamese network and contrastive loss Triplet network and triplet loss Training tricks Regression application: sketch-based image retrieval Limitations and future work 20
21 Training trick #1: solving gradient collapsing problem - The gradient collapsing problem N L = 1 2N {max 0, m + a i p 2 i 2 a i n 2 i 2 } i=1 Margin m = 1.0 a m n p expected n a p reality 21
22 Training tricks #1 - Solution for gradient collapsing: Combine regression and classification loss for better regularisation. Change loss function. N L = 1 2N {max 0, m + ka i p 2 i 2 ka i n 2 i 2 } i=1 L(a,p,n) Saddle point L(a,p,n ) p a p, n n p a p, n n 22
23 Training tricks #2: dimensional reduction - Conventional methods: o o Redundant analysis on a fixed set of features. E.g. Principal Component Analysis (PCA), Product quantisation, etc - Dimensional reduction in CNN: part of the training process 4096x1x1 FC7 128x4096x1x x1x1 = 128x1x1 Conv filter (fc) bias out FC out
24 Training tricks #3: hard negative mining Random paring Positive and negative samples are selected randomly. Hard negative mining Negative example is the nearest irrelevant neighbor to the anchor. Hard positive mining Positive example is the farthest relevant neighbor to the anchor. + + duck photo + + duck photo + + duck photo duck 3D swan photo cat photo duck 3D duck 3D cat photo 24
25 Training tricks #4: layer sharing - Consider sharing the anchor with the pos/neg branches a p a p a p Full-share No-share Partial-share 25
26 Other training tricks - Data augmentation: o Random crop, rotation, scaling, flip, whitening - Dropout: o Randomly disable neurons - Regularisation: o Add parameter magnitude to loss o L total (W, X) = L contrastive,triplet (W, X) + W 2 26
27 Content The regression problem Siamese network and contrastive loss Triplet network and triplet loss Training tricks Regression application: sketch-based image retrieval Limitations and future work 27
28 Regression application: sketch-based image retrieval (SBIR) Search for a particular image in your mind? 28
29 Text search? 29
30 Sketch-based Image Retrieval (SBIR) sketch retrieval 30
31 Existing applications Google Emoji Search Detexify: latex symbol search 31
32 Challenges Free-hand sketch is usually messy. Horse category Flickr-330 dataset, Hu et al
33 Challenges Various levels of abstraction. House Crocodile TU-Berlin dataset, Eitz et al
34 Challenges Domain gap: sketch does not always describe real-life object accurately. Caricature Anthropomorphism Cat s whisker Hedgehog s spine Smiling spider? Simplification Viewpoint Category person walking TU-Berlin 34
35 Challenges Limited #sketch datasets. Flickr15K: 330 sketches + 15k classes TU-Berlin: 20k sketches@250 classes o New Google Quickdraw: 50M classes Sketchy: ~75k sketches k classes Flickr15K [Hu et al. 2013] TU-Berlin [Eitz et al. 2012] Sketchy [Sangkloy et al. 2016] 35
36 SBIR evaluation metric - Evaluation metric o Mean Average Precision (map) o Precision-recal (PR) curve P k = # relevant in top k results k AP = σ k=1 N P k rel(k) # relevant images o Kendal rank correlation coefficient map = 1 Q q Q AP q 36
37 Background Conventional shallow SBIR framework Edge extraction Feature extraction # 1 #2 # 3 Photo database Edge map # N Index file matching Query
38 Background: hand-crafted features Structure tensor [Eitz,2010] Flickr15K benchmark Method Structure Tensor [Eitz, 2010] map(%) 7.98 W 1 S W I W 2 I x 2 2 I x y 2 I x y 2 I y 2 dictionary 38
39 Background: hand-crafted features Flickr15K benchmark Shape context [Mori, 2005] Method map(%) Structure Tensor [Eitz, 2010] 7.98 Shape Context [Mori, 2005]
40 Background: hand-crafted features Flickr15K benchmark Self similarity [Shechtman, 2007] Method map(%) Structure Tensor [Eitz, 2010] 7.98 Shape Context [Mori, 2005] 8.14 SSIM [Shechtman, 2007] 9.57 corr, 40
41 Background: hand-crafted features Flickr15K benchmark SIFT [Lowe, 2004] HoG [Dalas, 2005] Method map(%) Structure Tensor [Eitz, 2010] 7.98 Shape Context [Mori, 2005] 8.14 SSIM [Shechtman, 2007] 9.57 SIFT [Lowe, 2004] 9.11 HoG [Dalas, 2005] SIFT HoG 41
42 Background: hand-crafted features Flickr15K benchmark GF-HoG [Hu et al. CVIU 2013] Color GF-HoG [Bui et al. ICCV 2015] Method map(%) Structure Tensor [Eitz, 2010] 7.98 Shape Context [Mori, 2005] 8.14 SSIM [Shechtman, 2007] 9.57 SIFT [Lowe, 2004] 9.11 HoG [Dalas, 2005] GF-HoG [Hu, 2013] Color GF-HoG [Bui, 2015]
43 Background: hand-crafted features Flickr15K benchmark PerceptualEdge [Qi, 2015] Method map(%) Structure Tensor [Eitz, 2010] 7.98 Shape Context [Mori, 2005] 8.14 gpb Perceptual edge SSIM [Shechtman, 2007] 9.57 SIFT [Lowe, 2004] 9.11 HoG [Dalas, 2005] GF-HoG [Hu, 2013] Color GF-HoG [Bui, 2015] PerceptualEdge [Qi, 2015]
44 Back ground: deep features Flickr15K benchmark - Siamese network with contrastive loss - Qi et al. ICIP 2016 o Sketch-edgemap Method map(%) Structure Tensor [Eitz, 2010] 7.98 Shape Context [Mori, 2005] 8.14 SSIM [Shechtman, 2007] 9.57 SIFT [Lowe, 2004] 9.11 HoG [Dalas, 2005] o Fully shared GF-HoG [Hu, 2013] Color GF-HoG [Bui, 2015] PerceptualEdge [Qi, 2015] Siamese network [Qi, 2016]
45 Triplet network for SBIR Sketch-edgemap CNN architecture: Sketch-A-Net [Yu, 2015] C1 C2 Output dimension: 100 C Share layers: Conv 4-5, FC 6-8 C4 C Loss: fc6 fc7 N L = 1 2N {max 0, m + ka i p 2 i 2 ka i n 2 i 2 } i=1 k = 2.0 fc8 a p n 45
46 Training procedure Images: 25k photos: 100 photos/class. Edge extraction: gpb [Arbelaez, 2011]. Mean subtraction, random crop/rotation/scaling/flip. Sketches: 20k sketches: 20s training, 60s validation per class. Skeletonisation. Mean subtraction, random crop/rotation/scaling/flip. Random stroke removal. Triplet formation: Random selection pos/neg samples. Training: 10k epochs. Multistep decreasing learning rate k = crop rotation scaling flip Stroke removal 46
47 Results Flickr15K benchmark Method map(%) Structure Tensor [Eitz, 2010] 7.98 Shape Context [Mori, 2005] 8.14 SSIM [Shechtman, 2007] 9.57 SIFT [Lowe, 2004] 9.11 HoG [Dalas, 2005] GF-HoG [Hu, 2013] Colour GF-HoG [Bui, 2015] PerceptualEdge [Qi, 2015] Single CNN Siamese network [Qi, 2016] Triplet full-share [Bui, 2016] Triplet no-share [Bui, 2016] Triplet half-share [Bui, 2016]
48 Sketch-photo direct matching loss Training failure epochs a p n 48
49 Sketch-photo direct matching SketchANet hybrid AlexNet AlexNet loss weight x1.0 x2.0 softmax loss softmax loss triplet loss softmax loss special layers dimensional reduction normalisation 49
50 Multi-stage training procedure Stage 1: train unshared layers Train Sketch branch from scratch. Finetune image branch from AlexNet Stage 2: train shared layers Form a 2-branch network with pretrained weights. Freeze unshared layers. Train the shared layers with contrastive loss + softmax loss. Stage 3: regression with triplet loss Form a triplet network. Unfreeze the all layers. Train the whole network with triplet loss + softmax loss. Softmax loss Softmax loss Softmax loss contrastive Triplet loss loss 50
51 Training results Phase 1 Sketch branch Image branch Phase 2 Phase 3 Siamese network Triplet network 51
52 Results Flickr15K benchmark Method map(%) Structure Tensor [Eitz, 2010] 7.98 Shape Context [Mori, 2005] 8.14 SSIM [Shechtman, 2007] 9.57 SIFT [Lowe, 2004] 9.11 HoG [Dalas, 2005] GF-HoG [Hu, 2013] Colour GF-HoG [Bui, 2015] PerceptualEdge [Qi, 2015] Single CNN Siamese network [Qi, 2016] Sketch-edgemap triplet [Bui, 2016] Sketch-photo triplet
53 Layer visualisation 64 15x15 filters in conv1 layer SketchANet 96 11x11 filters in conv1 layer AlexNet 53
54 SBIR example 54
55 Demo: SketchSearch Sketch-based Image Retrieval Sketch Retrieval 55
56 Content The regression problem Siamese network and contrastive loss Triplet network and triplet loss Training tricks Regression application: sketch-based image retrieval Limitations and future work 56
57 Limitations o o o Hard to train a regression model. Need labelled datasets. Real-life sketch can be very complicated Guernica by Pablo Picasso,
58 Future work o Multi-domain regression e.g. 3D, text, photo, sketch, depth-map, cartoon duck Castrejon, 2016 Language model 3D model Photo model Sketch model Siddiquie, 2014 Embedding space o Toward unsupervised deep learning: Labelled image set, unlabelled or no sketch set Radenovic, 2017 Completely unsupervised: Auto-encoder, Generative Adversaries Network (GAN) 58
59 Thank you for listening 59
SKETCH-BASED IMAGE RETRIEVAL VIA SIAMESE CONVOLUTIONAL NEURAL NETWORK
SKETCH-BASED IMAGE RETRIEVAL VIA SIAMESE CONVOLUTIONAL NEURAL NETWORK Yonggang Qi Yi-Zhe Song Honggang Zhang Jun Liu School of Information and Communication Engineering, BUPT, Beijing, China School of
More informationCompact Descriptors for Sketch-based Image Retrieval using a Triplet loss Convolutional Neural Network
Compact Descriptors for Sketch-based Image Retrieval using a Triplet loss Convolutional Neural Network T. Bui 1, L. Ribeiro 2, M. Ponti 2, John Collomosse 1 1 Centre for Vision, Speech and Signal Processing
More informationComo funciona o Deep Learning
Como funciona o Deep Learning Moacir Ponti (com ajuda de Gabriel Paranhos da Costa) ICMC, Universidade de São Paulo Contact: www.icmc.usp.br/~moacir moacir@icmc.usp.br Uberlandia-MG/Brazil October, 2017
More informationSpatial Localization and Detection. Lecture 8-1
Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday
More informationDeep Learning for Computer Vision II
IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L
More informationMachine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,
Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image
More informationECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016
ECCV 2016 Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 Fundamental Question What is a good vector representation of an object? Something that can be easily predicted from 2D
More informationDeep Learning for Computer Vision with MATLAB By Jon Cherrie
Deep Learning for Computer Vision with MATLAB By Jon Cherrie 2015 The MathWorks, Inc. 1 Deep learning is getting a lot of attention "Dahl and his colleagues won $22,000 with a deeplearning system. 'We
More informationEverything you wanted to know about Deep Learning for Computer Vision but were afraid to ask
Everything you wanted to know about Deep Learning for Computer Vision but were afraid to ask Moacir A. Ponti, Leonardo S. F. Ribeiro, Tiago S. Nazare ICMC University of São Paulo São Carlos/SP, 13566-590,
More informationDeep Residual Learning
Deep Residual Learning MSRA @ ILSVRC & COCO 2015 competitions Kaiming He with Xiangyu Zhang, Shaoqing Ren, Jifeng Dai, & Jian Sun Microsoft Research Asia (MSRA) MSRA @ ILSVRC & COCO 2015 Competitions 1st
More informationCENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan
CENG 783 Special topics in Deep Learning AlchemyAPI Week 11 Sinan Kalkan TRAINING A CNN Fig: http://www.robots.ox.ac.uk/~vgg/practicals/cnn/ Feed-forward pass Note that this is written in terms of the
More informationCross-domain Deep Encoding for 3D Voxels and 2D Images
Cross-domain Deep Encoding for 3D Voxels and 2D Images Jingwei Ji Stanford University jingweij@stanford.edu Danyang Wang Stanford University danyangw@stanford.edu 1. Introduction 3D reconstruction is one
More informationDeep Learning with Tensorflow AlexNet
Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification
More informationDeep Learning in Visual Recognition. Thanks Da Zhang for the slides
Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object
More informationReturn of the Devil in the Details: Delving Deep into Convolutional Nets
Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield - Karen Simonyan - Andrea Vedaldi - Andrew Zisserman University of Oxford The Devil is still in the Details 2011 2014
More informationDeep Face Recognition. Nathan Sun
Deep Face Recognition Nathan Sun Why Facial Recognition? Picture ID or video tracking Higher Security for Facial Recognition Software Immensely useful to police in tracking suspects Your face will be an
More informationDeep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.
Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer
More informationFaceNet. Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana
FaceNet Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana Introduction FaceNet learns a mapping from face images to a compact Euclidean Space
More informationAdvanced Video Analysis & Imaging
Advanced Video Analysis & Imaging (5LSH0), Module 09B Machine Learning with Convolutional Neural Networks (CNNs) - Workout Farhad G. Zanjani, Clint Sebastian, Egor Bondarev, Peter H.N. de With ( p.h.n.de.with@tue.nl
More informationPOINT CLOUD DEEP LEARNING
POINT CLOUD DEEP LEARNING Innfarn Yoo, 3/29/28 / 57 Introduction AGENDA Previous Work Method Result Conclusion 2 / 57 INTRODUCTION 3 / 57 2D OBJECT CLASSIFICATION Deep Learning for 2D Object Classification
More informationFlow-Based Video Recognition
Flow-Based Video Recognition Jifeng Dai Visual Computing Group, Microsoft Research Asia Joint work with Xizhou Zhu*, Yuwen Xiong*, Yujie Wang*, Lu Yuan and Yichen Wei (* interns) Talk pipeline Introduction
More informationSSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang
SSD: Single Shot MultiBox Detector Author: Wei Liu et al. Presenter: Siyu Jiang Outline 1. Motivations 2. Contributions 3. Methodology 4. Experiments 5. Conclusions 6. Extensions Motivation Motivation
More informationSupplementary material for Analyzing Filters Toward Efficient ConvNet
Supplementary material for Analyzing Filters Toward Efficient Net Takumi Kobayashi National Institute of Advanced Industrial Science and Technology, Japan takumi.kobayashi@aist.go.jp A. Orthonormal Steerable
More informationChannel Locality Block: A Variant of Squeeze-and-Excitation
Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan
More informationDeep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia
Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky
More informationConvolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech
Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:
More informationProject 3 Q&A. Jonathan Krause
Project 3 Q&A Jonathan Krause 1 Outline R-CNN Review Error metrics Code Overview Project 3 Report Project 3 Presentations 2 Outline R-CNN Review Error metrics Code Overview Project 3 Report Project 3 Presentations
More informationStudy of Residual Networks for Image Recognition
Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationKnow your data - many types of networks
Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for
More informationR-FCN: Object Detection with Really - Friggin Convolutional Networks
R-FCN: Object Detection with Really - Friggin Convolutional Networks Jifeng Dai Microsoft Research Li Yi Tsinghua Univ. Kaiming He FAIR Jian Sun Microsoft Research NIPS, 2016 Or Region-based Fully Convolutional
More informationConvolutional Neural Networks + Neural Style Transfer. Justin Johnson 2/1/2017
Convolutional Neural Networks + Neural Style Transfer Justin Johnson 2/1/2017 Outline Convolutional Neural Networks Convolution Pooling Feature Visualization Neural Style Transfer Feature Inversion Texture
More informationFully-Convolutional Siamese Networks for Object Tracking
Fully-Convolutional Siamese Networks for Object Tracking Luca Bertinetto*, Jack Valmadre*, João Henriques, Andrea Vedaldi and Philip Torr www.robots.ox.ac.uk/~luca luca.bertinetto@eng.ox.ac.uk Tracking
More informationarxiv: v1 [cs.cv] 1 Feb 2017
Siamese Network of Deep Fisher-Vector Descriptors for Image Retrieval arxiv:702.00338v [cs.cv] Feb 207 Abstract Eng-Jon Ong, Sameed Husain and Miroslaw Bober University of Surrey Guildford, UK This paper
More informationInception Network Overview. David White CS793
Inception Network Overview David White CS793 So, Leonardo DiCaprio dreams about dreaming... https://m.media-amazon.com/images/m/mv5bmjaxmzy3njcxnf5bml5banbnxkftztcwnti5otm0mw@@._v1_sy1000_cr0,0,675,1 000_AL_.jpg
More informationAn Exploration of Computer Vision Techniques for Bird Species Classification
An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex
More informationSketch Based Image Retrieval Approach Using Gray Level Co-Occurrence Matrix
Sketch Based Image Retrieval Approach Using Gray Level Co-Occurrence Matrix K... Nagarjuna Reddy P. Prasanna Kumari JNT University, JNT University, LIET, Himayatsagar, Hyderabad-8, LIET, Himayatsagar,
More informationDynamic Routing Between Capsules
Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet
More informationEfficient Algorithms may not be those we think
Efficient Algorithms may not be those we think Yann LeCun, Computational and Biological Learning Lab The Courant Institute of Mathematical Sciences New York University http://yann.lecun.com http://www.cs.nyu.edu/~yann
More informationIntro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn
Intro to Deep Learning Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn Why this class? Deep Features Have been able to harness the big data in the most efficient and effective
More informationlearning stage (Stage 1), CNNH learns approximate hash codes for training images by optimizing the following loss function:
1 Query-adaptive Image Retrieval by Deep Weighted Hashing Jian Zhang and Yuxin Peng arxiv:1612.2541v2 [cs.cv] 9 May 217 Abstract Hashing methods have attracted much attention for large scale image retrieval.
More informationDECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS
DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS Deep Neural Decision Forests Microsoft Research Cambridge UK, ICCV 2015 Decision Forests, Convolutional Networks and the Models in-between
More informationBilinear Models for Fine-Grained Visual Recognition
Bilinear Models for Fine-Grained Visual Recognition Subhransu Maji College of Information and Computer Sciences University of Massachusetts, Amherst Fine-grained visual recognition Example: distinguish
More informationSmart Content Recognition from Images Using a Mixture of Convolutional Neural Networks *
Smart Content Recognition from Images Using a Mixture of Convolutional Neural Networks * Tee Connie *, Mundher Al-Shabi *, and Michael Goh Faculty of Information Science and Technology, Multimedia University,
More informationSupervised Hashing for Image Retrieval via Image Representation Learning
Supervised Hashing for Image Retrieval via Image Representation Learning Rongkai Xia, Yan Pan, Cong Liu (Sun Yat-Sen University) Hanjiang Lai, Shuicheng Yan (National University of Singapore) Finding Similar
More informationUnsupervised Deep Learning. James Hays slides from Carl Doersch and Richard Zhang
Unsupervised Deep Learning James Hays slides from Carl Doersch and Richard Zhang Recap from Previous Lecture We saw two strategies to get structured output while using deep learning With object detection,
More informationHide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization
Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization Krishna Kumar Singh and Yong Jae Lee University of California, Davis ---- Paper Presentation Yixian
More informationSiamese Network Features for Image Matching
Siamese Network Features for Image Matching Iaroslav Melekhov Department of Computer Science Aalto University, Finland Email: iaroslav.melekhov@aalto.fi Juho Kannala Department of Computer Science Aalto
More informationA Novel Representation and Pipeline for Object Detection
A Novel Representation and Pipeline for Object Detection Vishakh Hegde Stanford University vishakh@stanford.edu Manik Dhar Stanford University dmanik@stanford.edu Abstract Object detection is an important
More informationarxiv: v1 [cs.cv] 29 Sep 2016
arxiv:1609.09545v1 [cs.cv] 29 Sep 2016 Two-stage Convolutional Part Heatmap Regression for the 1st 3D Face Alignment in the Wild (3DFAW) Challenge Adrian Bulat and Georgios Tzimiropoulos Computer Vision
More informationDeep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks
Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin
More informationDeep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval
Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval Jifei Song Qian Yu Yi-Zhe Song Tao Xiang Timothy M. Hospedales Queen Mary University of London University of Edinburgh {j.song,
More informationFully Convolutional Network for Depth Estimation and Semantic Segmentation
Fully Convolutional Network for Depth Estimation and Semantic Segmentation Yokila Arora ICME Stanford University yarora@stanford.edu Ishan Patil Department of Electrical Engineering Stanford University
More informationRecurrent Neural Networks and Transfer Learning for Action Recognition
Recurrent Neural Networks and Transfer Learning for Action Recognition Andrew Giel Stanford University agiel@stanford.edu Ryan Diaz Stanford University ryandiaz@stanford.edu Abstract We have taken on the
More informationCMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful
More informationApparel Classifier and Recommender using Deep Learning
Apparel Classifier and Recommender using Deep Learning Live Demo at: http://saurabhg.me/projects/tag-that-apparel Saurabh Gupta sag043@ucsd.edu Siddhartha Agarwal siagarwa@ucsd.edu Apoorve Dave a1dave@ucsd.edu
More informationSELF SUPERVISED DEEP REPRESENTATION LEARNING FOR FINE-GRAINED BODY PART RECOGNITION
SELF SUPERVISED DEEP REPRESENTATION LEARNING FOR FINE-GRAINED BODY PART RECOGNITION Pengyue Zhang Fusheng Wang Yefeng Zheng Medical Imaging Technologies, Siemens Medical Solutions USA Inc., Princeton,
More informationon learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015
on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 Vector visual representation Fixed-size image representation High-dim (100 100,000) Generic, unsupervised: BoW,
More informationComputer Vision Lecture 16
Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:
More informationObject Detection Based on Deep Learning
Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf
More informationExploiting noisy web data for largescale visual recognition
Exploiting noisy web data for largescale visual recognition Lamberto Ballan University of Padova, Italy CVPRW WebVision - Jul 26, 2017 Datasets drive computer vision progress ImageNet Slide credit: O.
More informationComputer Vision Lecture 16
Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts
More informationLARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL
LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL Hantao Yao 1,2, Shiliang Zhang 3, Dongming Zhang 1, Yongdong Zhang 1,2, Jintao Li 1, Yu Wang 4, Qi Tian 5 1 Key Lab of Intelligent Information Processing
More informationRECENT years have witnessed the rapid growth of image. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval
SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval Jian Zhang, Yuxin Peng, and Junchao Zhang arxiv:607.08477v [cs.cv] 28 Jul 206 Abstract The hashing methods have been widely used for efficient
More informationRich feature hierarchies for accurate object detection and semantic segmentation
Rich feature hierarchies for accurate object detection and semantic segmentation BY; ROSS GIRSHICK, JEFF DONAHUE, TREVOR DARRELL AND JITENDRA MALIK PRESENTER; MUHAMMAD OSAMA Object detection vs. classification
More informationLearning Visual Semantics: Models, Massive Computation, and Innovative Applications
Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center Evolvement of Visual Features
More informationMachine Learning. MGS Lecture 3: Deep Learning
Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer
More informationObject and Action Detection from a Single Example
Object and Action Detection from a Single Example Peyman Milanfar* EE Department University of California, Santa Cruz *Joint work with Hae Jong Seo AFOSR Program Review, June 4-5, 29 Take a look at this:
More informationFacial Expression Classification with Random Filters Feature Extraction
Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle
More informationGenerative Modeling with Convolutional Neural Networks. Denis Dus Data Scientist at InData Labs
Generative Modeling with Convolutional Neural Networks Denis Dus Data Scientist at InData Labs What we will discuss 1. 2. 3. 4. Discriminative vs Generative modeling Convolutional Neural Networks How to
More informationDeep learning for object detection. Slides from Svetlana Lazebnik and many others
Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep
More informationThree things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012)
Three things everyone should know to improve object retrieval Relja Arandjelović and Andrew Zisserman (CVPR 2012) University of Oxford 2 nd April 2012 Large scale object retrieval Find all instances of
More informationarxiv: v2 [cs.cv] 22 Mar 2017
Localizing and Orienting Street Views Using Overhead Imagery Nam N. Vo and James Hays Georgia Institute of Technology {namvo,hays}@gatech.edu arxiv:1608.00161v2 [cs.cv] 22 Mar 2017 Abstract. In this paper
More informationAnnouncements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14
Announcements Computer Vision I CSE 152 Lecture 14 Homework 3 is due May 18, 11:59 PM Reading: Chapter 15: Learning to Classify Chapter 16: Classifying Images Chapter 17: Detecting Objects in Images Given
More informationYelp Restaurant Photo Classification
Yelp Restaurant Photo Classification Rajarshi Roy Stanford University rroy@stanford.edu Abstract The Yelp Restaurant Photo Classification challenge is a Kaggle challenge that focuses on the problem predicting
More informationCharacterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager
Characterization and Benchmarking of Deep Learning Natalia Vassilieva, PhD Sr. Research Manager Deep learning applications Vision Speech Text Other Search & information extraction Security/Video surveillance
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3 Stefan Lee Virginia Tech Tasks Supervised Learning x Classification y Discrete x Regression
More informationarxiv: v1 [cs.cv] 16 Mar 2017
Deep Sketch Hashing: Fast Free-hand Sketch-Based Image Retrieval Li Liu, Fumin Shen 2, Yuming Shen, Xianglong Liu 3, and Ling Shao arxiv:73.565v [cs.cv] 6 Mar 27 School of Computing Science, University
More informationMartian lava field, NASA, Wikipedia
Martian lava field, NASA, Wikipedia Old Man of the Mountain, Franconia, New Hampshire Pareidolia http://smrt.ccel.ca/203/2/6/pareidolia/ Reddit for more : ) https://www.reddit.com/r/pareidolia/top/ Pareidolia
More informationUnsupervised Learning of Spatiotemporally Coherent Metrics
Unsupervised Learning of Spatiotemporally Coherent Metrics Ross Goroshin, Joan Bruna, Jonathan Tompson, David Eigen, Yann LeCun arxiv 2015. Presented by Jackie Chu Contributions Insight between slow feature
More informationTowards Weakly- and Semi- Supervised Object Localization and Semantic Segmentation
Towards Weakly- and Semi- Supervised Object Localization and Semantic Segmentation Lecturer: Yunchao Wei Image Formation and Processing (IFP) Group University of Illinois at Urbanahttps://weiyc.githu Champaign
More informationComputer Vision Lecture 16
Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period
More informationIs Bigger CNN Better? Samer Hijazi on behalf of IPG CTO Group Embedded Neural Networks Summit (enns2016) San Jose Feb. 9th
Is Bigger CNN Better? Samer Hijazi on behalf of IPG CTO Group Embedded Neural Networks Summit (enns2016) San Jose Feb. 9th Today s Story Why does CNN matter to the embedded world? How to enable CNN in
More information2028 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO. 4, APRIL 2017
2028 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO. 4, APRIL 2017 Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition Zhe Wang, Limin Wang, Yali Wang, Bowen
More informationFace Recognition A Deep Learning Approach
Face Recognition A Deep Learning Approach Lihi Shiloh Tal Perl Deep Learning Seminar 2 Outline What about Cat recognition? Classical face recognition Modern face recognition DeepFace FaceNet Comparison
More informationStructured Prediction using Convolutional Neural Networks
Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer
More informationFei-Fei Li & Justin Johnson & Serena Yeung
Lecture 9-1 Administrative A2 due Wed May 2 Midterm: In-class Tue May 8. Covers material through Lecture 10 (Thu May 3). Sample midterm released on piazza. Midterm review session: Fri May 4 discussion
More informationDeepIndex for Accurate and Efficient Image Retrieval
DeepIndex for Accurate and Efficient Image Retrieval Yu Liu, Yanming Guo, Song Wu, Michael S. Lew Media Lab, Leiden Institute of Advance Computer Science Outline Motivation Proposed Approach Results Conclusions
More informationInception and Residual Networks. Hantao Zhang. Deep Learning with Python.
Inception and Residual Networks Hantao Zhang Deep Learning with Python https://en.wikipedia.org/wiki/residual_neural_network Deep Neural Network Progress from Large Scale Visual Recognition Challenge (ILSVRC)
More informationCS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016
CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional
More informationHENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage
HENet: A Highly Efficient Convolutional Neural Networks Optimized for Accuracy, Speed and Storage Qiuyu Zhu Shanghai University zhuqiuyu@staff.shu.edu.cn Ruixin Zhang Shanghai University chriszhang96@shu.edu.cn
More informationFuzzy Set Theory in Computer Vision: Example 3
Fuzzy Set Theory in Computer Vision: Example 3 Derek T. Anderson and James M. Keller FUZZ-IEEE, July 2017 Overview Purpose of these slides are to make you aware of a few of the different CNN architectures
More informationEncoder-Decoder Networks for Semantic Segmentation. Sachin Mehta
Encoder-Decoder Networks for Semantic Segmentation Sachin Mehta Outline > Overview of Semantic Segmentation > Encoder-Decoder Networks > Results What is Semantic Segmentation? Input: RGB Image Output:
More informationarxiv: v1 [cs.cv] 30 Jul 2016
Localizing and Orienting Street Views Using Overhead Imagery Nam N. Vo and James Hays Georgia Institute of Technology {namvo,hays}@gatech.edu arxiv:1608.00161v1 [cs.cv] 30 Jul 2016 Abstract. In this paper
More informationDeconvolutions in Convolutional Neural Networks
Overview Deconvolutions in Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Deconvolutions in CNNs Applications Network visualization
More informationSu et al. Shape Descriptors - III
Su et al. Shape Descriptors - III Siddhartha Chaudhuri http://www.cse.iitb.ac.in/~cs749 Funkhouser; Feng, Liu, Gong Recap Global A shape descriptor is a set of numbers that describes a shape in a way that
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationSketching with Style: Visual Search with Sketches and Aesthetic Context
Sketching with Style: Visual Search with Sketches and Aesthetic Context John Collomosse 1,2 Tu Bui 1 Michael Wilber 3 Chen Fang 2 Hailin Jin 2 1 CVSSP, University of Surrey 2 Adobe Research 3 Cornell Tech
More informationYOLO9000: Better, Faster, Stronger
YOLO9000: Better, Faster, Stronger Date: January 24, 2018 Prepared by Haris Khan (University of Toronto) Haris Khan CSC2548: Machine Learning in Computer Vision 1 Overview 1. Motivation for one-shot object
More informationCS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Training Neural Networks II Connelly Barnes Overview Preprocessing Initialization Vanishing/exploding gradients problem Batch normalization Dropout Additional
More information