Perceptual Loss for Convolutional Neural Network Based Optical Flow Estimation. Zong-qing LU, Xiang ZHU and Qing-min LIAO *
|
|
- Darren Atkins
- 5 years ago
- Views:
Transcription
1 2017 2nd International Conference on Software, Multimedia and Communication Engineering (SMCE 2017) ISBN: Perceptual Loss for Convolutional Neural Network Based Optical Flow Estimation Zong-qing LU, Xiang ZHU and Qing-min LIAO * Graduate School at Shenzhen, Tsinghua University, Shenzhen, China *Corresponding author Keywords: Convolutional neural networks, Optical flow, Auto-encoder. Abstract. Convolutional Neural Networks (CNNs) are successfully used in optical flow estimation as learned patch based descriptors. In this work, rather training feature descriptors via CNNs, an end-to-end fully convolutional network, is developed for solving optical flow from a pair of images. Motivated by the success in image transformation tasks, a perceptual loss function is used for training the network for optical flow estimation. We trained a deep convolutional auto-encoder of optical flow field to obtain the high-level representation of motion structures rather than image texture. The perceptual loss function is then defined by high-level features extracted from the pretrained encoder. Conventional variational refinement are not performed. Experiments show the network achieves competitive performance on the challenging MPI Sintel set and Flying Chairs set. Introduction Optical flow is topic of great interest in video analysis. By means of its great represent ability of the motion information, optical flow is widely used in object tracking [1], action recognition [2], video stabilization [3] and video frame prediction [4] etc. As a result of illumination changes, deformations, repetitive patterns or occlusions optical flow estimation is a highly ill-posed problem. Though significant progress is made through decades of research, it remains an unsolved problem. Related Work Conventional methods [5, 6] estimate the optical flow by minimize a global energy function that is the weighted sum of a data term and a prior term. E global = E data + λ E prior (1) In conditional random field manner, the data term, E data which penalizes association of dissimilar patches can be treated as unary potentials while the prior term, E prior which constrains the ill-posed problem can be treated as pairwise potentials [7, 8, 9]. Recently, computer vision tasks, especially per-pixel prediction tasks [10, 11, 12] enjoy the performance boost with deep learning methods. Convolutional neural networks are successfully used in optical flow estimation. One approach [13] for solving optical flow tasks is to train a feedforward convolutional neural network in a supervised manner, using a per-pixel loss function to measure the difference between output and ground-truth flow. Another fashion serve convolutional neural network as a feature descriptor [14, 15]. Patch match method is then employed using these local features extracted by the trained network. Both the end-to-end network approach and the feature descriptor method serve the convolutional neural network as the data term. Thus, post variational refinement [16] is required to provide the absence prior term. Though most recent deep learning based optical flow estimation approach do not serve the network as constrains between pixels, the pairwise regulation is widely used in per-pixel prediction tasks. A pretrained network [10], a recurrent network [11] or a simple low-level smooth term [12] are used for different tasks. Many of the most successful conventional optical flow estimation methods benefit from well-designed robust prior terms. 268
2 Inspired by these methods, we propose a learned prior term. We first train a variational autoencoder to extract high level feature of motions. Then, a perceptual loss for optical flow is defined by the feature map of different layers from the trained encoder. The perceptual loss is then used to train an end-to-end convolutional neural network for optical flow estimation task. The perceptual loss function takes the role of the pairwise potentials i.e. the prior term in the global energy function. Contributions Our contributions are twofold. First, we demonstrate that one can extract motion information using a pretrained variational auto-encoder. Second, we show that applying the encoder to optical flow estimation network without variational refinement achieves competitive performance on different dataset. Method Given a consecutive frame pair frame 1 and frame 2, a deep convolutional neural network Net Θ is learned to estimate the per-pixel optical flow field w = (u, v) between the two frames, where Θ are the parameters of the network and u, v are the horizontal and vertical components of optical flow, respectively. Figure 1. The proposed network framework, with two convolutional neural network, one for optical flow estimation, and the other for perceptual loss definition. feat i is the feature extracted from the ith level of the perceptual loss network φ. (Low level loss functions are not shown in this figure.) The network is then trained using a combined loss functions of a per-pixel loss L epe, a smooth-ness loss L smooth and a perceptual loss L φ defined by a pretrained loss network φ. The loss network remains fixed during the training process. As shown in Figure 1, the optical flow estimation network transforms the concatenated two frames frame 1,2 into optical flow w. w = Net Θ (frame 1,2 ). (2) The network is then trained by stochastic gradient descent method to minimize the combined loss function. Network Architecture We flow the FlowNet [13] Simple architecture. The frame pair is concatenated. We first apply a 10-layers convolutional neural network directly to the concatenated input, and then apply 4 sub-pixel convolution layers that upscales the low-resolution optical flow estimation to high-resolution field. Each sub-pixel convolution layer is composed of a normal convolution layer with a pixel shuffling layer. The pixel shuffling [17] layer is a periodic shuffling operator that rearranges the elements of a H W C r 2 tensor to a tensor of shape rh rw C. This operation is more efficient than the 269
3 popular deconvolution layer. Skip connections between the corresponding layers in the downscale phase and upscale phase is applied to let the low-level information shuttled directly across the net. Loss Functions We then define the loss functions. The loss functions are consisted of a per-pixel loss, a smoothness loss and a perceptual loss. The most common error measure for optical flow evaluation is endpoint error (EPE). Thus, we use the endpoint error as the per-pixel loss. L epe = 1 N ( (u u gt ) 2 + (v v gt ) 2 ) (3) Smoothness prior is a widely-used prior term in conventional methods. Since the loss function is a distance between the estimation and the ground truth, we let the smoothness prior term preserve the edge structure. L smooth = 1 N ( ( u u gt ) 2 + ( v v gt ) 2 ) (4) Finally, we use a loss network φ to define perceptual loss functions that measure perceptual differences. Let the feat i denote the output feature map of the ith layer. The perceptual loss is defined as, L perceptual = λ i 1 N ( (feat i (w ) feat i (w gt )) 2 ) i (5) It is trivial to show that the smooth loss defined above is a special case of the perceptual loss. However, the pairwise potentials are much more complicate that the perceptual term should defined by a network. The pretrained loss network used in [10] is for image classification task. The constraint provided contains texture prior of natural image is therefore not suitable for the optical flow estimation problem. A pretrained network extracted motion and structure information while eliminate texture is needed. Variational Auto-encoder For optical flow field, there is no label to train a network for classification task. To train a network for high-level motion feature extracting, the network has to learn in an unsupervised manner. Variational auto-encoders [18] make this approach tractable. Figure 2. The graphical model. As shown in Figure 2, the optical flow dataset contains N samples of optical flow w. We assume that a latent random variable z which contains the motion information is drawn from a prior distribution p θ (z), then the datum w (i) is generated from some conditional distribution p θ (w z). The architecture of variational auto-encoder is shown in Table 1. The encoder code the input flow into latent variable by approximates the posterior q(z w) p θ (z w). It takes the optical flow datum w as 270
4 input and outputs the parameters μ and σ of the prior distribution p θ (z). After we sample latent variables z from the distribution p θ (z), the decoder reconstructs optical flow back. Table 1. Variational auto-encoder architecture. Encoder Decoder Layer Filter/Stride Layer Filter Input - Reparametrize - Conv / 2 Conv Conv / 2 Conv Conv / 2 Pixel Shuffle (2 2) - Conv Conv Conv5_1: μ Pixel Shuffle (4 4) - Conv5_2: σ Output - The variational auto-encoder is trained by the loss function defined as the sum of the squared error between the input optical flow and generated field and the Kullback-Liebler (KL) divergence between the distribution created by the encoder and the prior distribution. L VAE = w w 2 + KL(q(z w) p(z)). (6) To train the variational auto-encoder with backprop method, reparametrize trick [18] is used. Experiments We use the image from Flying Chairs [13] dataset to train the variational auto-encoder. The reconstruction results are shown below. The performance of our optical flow estimation network on the MPI-Sintel [19] and Flying Chairs datasets is reported. Implementation Details Basically, the architecture of our optical flow estimation network is similar with the FlowNet Simple architecture. However, there are few differences for faster training. Each convolution layer except the last in both the optical flow estimation network and the variational auto-encoder is followed by a batch-norm layer and an in-place activate layer, a leaky ReLU activation with its negative slope set to 0.1. For the expanding part of the network, instead of deconvolution layer, pixel shuffle layer is used. The configuration of the variational auto-encoder is same with the estimation network. The variational auto-encoder is trained using Adam optimization with the default parameter values β 1 = 0.9 and β 2 = We set the learning rate to We end our training at 500k iterations. The optical flow estimation network is trained using Adam optimization with the same parameter, but set the learning rate to 10 4 and half the value every 100k iterations after 300k iterations for preventing of the gradient explosion. The training is ended at 600k iterations. We fine-tune the network on MPI-Sintel dataset with a low learning rate Data augmentation is performed to prevent overfitting. a b c Figure 3. a: A visualization of the latent manifold that "generates" the optical flow field; b: Ground truth of the optical flow field from Flying Chairs Dataset; c: Reconstruction of the optical flow field from the ground truth. 271
5 Variational Auto-encoder To have a visualization of the latent manifold that generates the optical flow filed, we scan the latent plane, sampling latent points at regular intervals, and generating the corresponding optical flow field for each of these points. The results are demonstrated in Figure 3a. The decoded optical flow fields indicate the motion of the correspondence latent variable. The figure is somehow different from a flow color coding map. However, it is clear that there is a mapping relation between the latent variable and the optical flow. (There are lattice in the figure due to the input patch size is too small.) The variational take a ground optical flow ground truth as input, and reconstruct it. A reconstruction result is shown in Figure 3b, c. The motion information and the structure information is extracted by the encoder while the texture is loss. Comparison with the State-of-the-Art Figure 4 shows examples of our results on MPI-Sintel dataset. The average endpoint error on Flying Chairs and MPI-Sintel datasets are reported in Table 2. Frames Our results Ground Truth Figure 4. Examples on MPI-Sintel dataset. Table 2. Comparison with the state-of-the-art methods. Method Flying Chairs MPI-Sintel Epic Flow [16] FlowNetS+v [13] FlowNetS+ft+v PatchBatch [14] Ours The worse result on Flying Chairs dataset than FlowNetS is expected, for extra loss used in training. Unless the network parameters converge at the global minimal, the endpoint error will not be better than the result of network which is trained by using endpoint error only. Nevertheless, on the MPI-Sintel, the proposed method demonstrates that the perceptual loss as pairwise potentials leads to better generalization ability. Notice that the results of our method have no posterior variational refinement. Conclusion In this paper, we demonstrate that one can extract motion and structure information from optical flow field using a pretrained variational auto-encoder. The high-level information extracted from the encoder can be further used to train an optical flow estimation network. We show that without any variational refinement an optical flow estimation network trained using a perceptual loss defined by the motion feature extracted from the encoder achieves competitive performance on different dataset. 272
6 Acknowledgments This work was supported by Shenzhen STP (JCYJ ). References [1] Kalal, Zdenek, Krystian Mikolajczyk, and Jiri Matas. "Tracking-learning-detection." IEEE transactions on pattern analysis and machine intelligence 34.7 (2012): [2] Guo, Kai, Prakash Ishwar, and Janusz Konrad. "Action recognition using sparse representation on covariance manifolds of optical flow." Advanced Video and Signal Based Surveillance (AVSS), 2010 Seventh IEEE International Conference on. IEEE, [3] Liu, Shuaicheng, et al. "Steadyflow: Spatially smooth optical flow for video stabilization." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition [4] Mathieu, Michael, Camille Couprie, and Yann LeCun. "Deep multi-scale video prediction beyond mean square error." arxiv preprint arxiv: (2015). [5] Horn, Berthold KP, and Brian G. Schunck. "Determining optical flow." Artificial intelligence (1981): [6] Zach, Christopher, Thomas Pock, and Horst Bischof. "A duality based approach for realtime TV-L 1 optical flow." Joint Pattern Recognition Symposium. Springer Berlin Heidelberg, [7] Chen, Qifeng, and Vladlen Koltun. "Full flow: Optical flow estimation by global optimization over regular grids." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition [8] Bailer, Christian, Bertram Taetz, and Didier Stricker. "Flow fields: Dense correspondence fields for highly accurate large displacement optical flow estimation." Proceedings of the IEEE International Conference on Computer Vision [9] Menze, Moritz, Christian Heipke, and Andreas Geiger. "Discrete optimization for optical flow." German Conference on Pattern Recognition. Springer International Publishing, [10] Johnson, Justin, Alexandre Alahi, and Li Fei-Fei. "Perceptual losses for real-time style transfer and super-resolution." European Conference on Computer Vision. Springer International Publishing, [11] Zheng, Shuai, et al. "Conditional random fields as recurrent neural networks." Proceedings of the IEEE International Conference on Computer Vision [12] Eigen, David, and Rob Fergus. "Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture." Proceedings of the IEEE International Conference on Computer Vision [13] Dosovitskiy, Alexey, et al. "Flownet: Learning optical flow with convolutional networks." Proceedings of the IEEE International Conference on Computer Vision [14] Gadot, David, and Lior Wolf. "Patchbatch: a batch augmented loss for optical flow." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition [15] Schuster, Tal, Lior Wolf, and David Gadot. "Optical Flow Requires Multiple Strategies (but only one network)." arxiv preprint arxiv: (2016). [16] Revaud, Jerome, et al. "Epicflow: Edge-preserving interpolation of correspondences for optical flow." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
7 [17] Shi, Wenzhe, et al. "Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network." Computer Vision and Pattern Recognition (2016): [18] Kingma, Diederik P., and Max Welling. "Auto-Encoding Variational Bayes." stat 1050 (2014):1. [19] Butler, Daniel J., et al. "A naturalistic open source movie for optical flow evaluation." European Conference on Computer Vision (2012):
MOTION ESTIMATION USING CONVOLUTIONAL NEURAL NETWORKS. Mustafa Ozan Tezcan
MOTION ESTIMATION USING CONVOLUTIONAL NEURAL NETWORKS Mustafa Ozan Tezcan Boston University Department of Electrical and Computer Engineering 8 Saint Mary s Street Boston, MA 2215 www.bu.edu/ece Dec. 19,
More informationDeep Generative Models Variational Autoencoders
Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative
More informationDOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION
DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION Yen-Cheng Liu 1, Wei-Chen Chiu 2, Sheng-De Wang 1, and Yu-Chiang Frank Wang 1 1 Graduate Institute of Electrical Engineering,
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationAlternatives to Direct Supervision
CreativeAI: Deep Learning for Graphics Alternatives to Direct Supervision Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL UCL UCL TUM UCL Timetable Theory and Basics State of
More informationUnsupervised Learning
Deep Learning for Graphics Unsupervised Learning Niloy Mitra Iasonas Kokkinos Paul Guerrero Vladimir Kim Kostas Rematas Tobias Ritschel UCL UCL/Facebook UCL Adobe Research U Washington UCL Timetable Niloy
More informationOptical flow. Cordelia Schmid
Optical flow Cordelia Schmid Motion field The motion field is the projection of the 3D scene motion into the image Optical flow Definition: optical flow is the apparent motion of brightness patterns in
More informationDOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION
2017 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 25 28, 2017, TOKYO, JAPAN DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION Yen-Cheng Liu 1,
More informationExtend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network
Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Liwen Zheng, Canmiao Fu, Yong Zhao * School of Electronic and Computer Engineering, Shenzhen Graduate School of
More informationNotes 9: Optical Flow
Course 049064: Variational Methods in Image Processing Notes 9: Optical Flow Guy Gilboa 1 Basic Model 1.1 Background Optical flow is a fundamental problem in computer vision. The general goal is to find
More informationAn Empirical Study of Generative Adversarial Networks for Computer Vision Tasks
An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks Report for Undergraduate Project - CS396A Vinayak Tantia (Roll No: 14805) Guide: Prof Gaurav Sharma CSE, IIT Kanpur, India
More informationStereo Matching, Optical Flow, Filling the Gaps and more
Stereo Matching, Optical Flow, Filling the Gaps and more Prof. Lior Wolf The School of Computer Science, Tel-Aviv University ICRI-CI 2017 Retreat, May 9, 2017 Since last year, ICRI-CI supported projects
More informationSemantic Segmentation. Zhongang Qi
Semantic Segmentation Zhongang Qi qiz@oregonstate.edu Semantic Segmentation "Two men riding on a bike in front of a building on the road. And there is a car." Idea: recognizing, understanding what's in
More informationDeep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.
Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer
More informationEncoder-Decoder Networks for Semantic Segmentation. Sachin Mehta
Encoder-Decoder Networks for Semantic Segmentation Sachin Mehta Outline > Overview of Semantic Segmentation > Encoder-Decoder Networks > Results What is Semantic Segmentation? Input: RGB Image Output:
More informationGenerating Images with Perceptual Similarity Metrics based on Deep Networks
Generating Images with Perceptual Similarity Metrics based on Deep Networks Alexey Dosovitskiy and Thomas Brox University of Freiburg {dosovits, brox}@cs.uni-freiburg.de Abstract We propose a class of
More informationTracking. Hao Guan( 管皓 ) School of Computer Science Fudan University
Tracking Hao Guan( 管皓 ) School of Computer Science Fudan University 2014-09-29 Multimedia Video Audio Use your eyes Video Tracking Use your ears Audio Tracking Tracking Video Tracking Definition Given
More informationAuto-Encoding Variational Bayes
Auto-Encoding Variational Bayes Diederik P (Durk) Kingma, Max Welling University of Amsterdam Ph.D. Candidate, advised by Max Durk Kingma D.P. Kingma Max Welling Problem class Directed graphical model:
More informationarxiv: v1 [stat.ml] 10 Dec 2018
1st Symposium on Advances in Approximate Bayesian Inference, 2018 1 7 Disentangled Dynamic Representations from Unordered Data arxiv:1812.03962v1 [stat.ml] 10 Dec 2018 Leonhard Helminger Abdelaziz Djelouah
More informationFast Guided Global Interpolation for Depth and. Yu Li, Dongbo Min, Minh N. Do, Jiangbo Lu
Fast Guided Global Interpolation for Depth and Yu Li, Dongbo Min, Minh N. Do, Jiangbo Lu Introduction Depth upsampling and motion interpolation are often required to generate a dense, high-quality, and
More informationarxiv: v1 [cs.cv] 17 Nov 2016
Inverting The Generator Of A Generative Adversarial Network arxiv:1611.05644v1 [cs.cv] 17 Nov 2016 Antonia Creswell BICV Group Bioengineering Imperial College London ac2211@ic.ac.uk Abstract Anil Anthony
More informationOne Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models
One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models [Supplemental Materials] 1. Network Architecture b ref b ref +1 We now describe the architecture of the networks
More informationLecture 21 : A Hybrid: Deep Learning and Graphical Models
10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation
More informationVariational Autoencoders. Sargur N. Srihari
Variational Autoencoders Sargur N. srihari@cedar.buffalo.edu Topics 1. Generative Model 2. Standard Autoencoder 3. Variational autoencoders (VAE) 2 Generative Model A variational autoencoder (VAE) is a
More informationDeep Learning in Visual Recognition. Thanks Da Zhang for the slides
Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object
More informationBilevel Sparse Coding
Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional
More informationMultiframe Scene Flow with Piecewise Rigid Motion. Vladislav Golyanik,, Kihwan Kim, Robert Maier, Mathias Nießner, Didier Stricker and Jan Kautz
Multiframe Scene Flow with Piecewise Rigid Motion Vladislav Golyanik,, Kihwan Kim, Robert Maier, Mathias Nießner, Didier Stricker and Jan Kautz Scene Flow. 2 Scene Flow. 3 Scene Flow. Scene Flow Estimation:
More informationExample-Based Image Super-Resolution Techniques
Example-Based Image Super-Resolution Techniques Mark Sabini msabini & Gili Rusak gili December 17, 2016 1 Introduction With the current surge in popularity of imagebased applications, improving content
More informationA Patch Prior for Dense 3D Reconstruction in Man-Made Environments
A Patch Prior for Dense 3D Reconstruction in Man-Made Environments Christian Häne 1, Christopher Zach 2, Bernhard Zeisl 1, Marc Pollefeys 1 1 ETH Zürich 2 MSR Cambridge October 14, 2012 A Patch Prior for
More informationLecture 7: Semantic Segmentation
Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. bhhanpostech.ac.kr
More informationDeep Learning with Tensorflow AlexNet
Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification
More informationStructured Prediction using Convolutional Neural Networks
Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer
More informationFully Convolutional Networks for Semantic Segmentation
Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Chaim Ginzburg for Deep Learning seminar 1 Semantic Segmentation Define a pixel-wise labeling
More informationMachine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,
Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image
More informationUnsupervised Learning of Spatiotemporally Coherent Metrics
Unsupervised Learning of Spatiotemporally Coherent Metrics Ross Goroshin, Joan Bruna, Jonathan Tompson, David Eigen, Yann LeCun arxiv 2015. Presented by Jackie Chu Contributions Insight between slow feature
More informationActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems (Supplementary Materials)
ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems (Supplementary Materials) Yinda Zhang 1,2, Sameh Khamis 1, Christoph Rhemann 1, Julien Valentin 1, Adarsh Kowdle 1, Vladimir
More information3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis
3D Shape Analysis with Multi-view Convolutional Networks Evangelos Kalogerakis 3D model repositories [3D Warehouse - video] 3D geometry acquisition [KinectFusion - video] 3D shapes come in various flavors
More informationTranslation Symmetry Detection: A Repetitive Pattern Analysis Approach
2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops Translation Symmetry Detection: A Repetitive Pattern Analysis Approach Yunliang Cai and George Baciu GAMA Lab, Department of Computing
More informationRecovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform. Xintao Wang Ke Yu Chao Dong Chen Change Loy
Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform Xintao Wang Ke Yu Chao Dong Chen Change Loy Problem enlarge 4 times Low-resolution image High-resolution image Previous
More informationSingle Image Super Resolution of Textures via CNNs. Andrew Palmer
Single Image Super Resolution of Textures via CNNs Andrew Palmer What is Super Resolution (SR)? Simple: Obtain one or more high-resolution images from one or more low-resolution ones Many, many applications
More informationDeep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia
Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky
More informationGradient of the lower bound
Weakly Supervised with Latent PhD advisor: Dr. Ambedkar Dukkipati Department of Computer Science and Automation gaurav.pandey@csa.iisc.ernet.in Objective Given a training set that comprises image and image-level
More informationA Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation
, pp.162-167 http://dx.doi.org/10.14257/astl.2016.138.33 A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation Liqiang Hu, Chaofeng He Shijiazhuang Tiedao University,
More informationArbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. Presented by: Karen Lucknavalai and Alexandr Kuznetsov
Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization Presented by: Karen Lucknavalai and Alexandr Kuznetsov Example Style Content Result Motivation Transforming content of an image
More informationDeep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks
Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin
More informationarxiv: v1 [cs.cv] 25 Feb 2019
DD: Learning Optical with Unlabeled Data Distillation Pengpeng Liu, Irwin King, Michael R. Lyu, Jia Xu The Chinese University of Hong Kong, Shatin, N.T., Hong Kong Tencent AI Lab, Shenzhen, China {ppliu,
More informationDeep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies
http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort
More informationClustering and Unsupervised Anomaly Detection with l 2 Normalized Deep Auto-Encoder Representations
Clustering and Unsupervised Anomaly Detection with l 2 Normalized Deep Auto-Encoder Representations Caglar Aytekin, Xingyang Ni, Francesco Cricri and Emre Aksu Nokia Technologies, Tampere, Finland Corresponding
More informationPlaces Challenge 2017
Places Challenge 2017 Scene Parsing Task CASIA_IVA_JD Jun Fu, Jing Liu, Longteng Guo, Haijie Tian, Fei Liu, Hanqing Lu Yong Li, Yongjun Bao, Weipeng Yan National Laboratory of Pattern Recognition, Institute
More informationFacial Expression Classification with Random Filters Feature Extraction
Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle
More informationarxiv: v1 [cs.cv] 31 Mar 2016
Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.
More informationProject Updates Short lecture Volumetric Modeling +2 papers
Volumetric Modeling Schedule (tentative) Feb 20 Feb 27 Mar 5 Introduction Lecture: Geometry, Camera Model, Calibration Lecture: Features, Tracking/Matching Mar 12 Mar 19 Mar 26 Apr 2 Apr 9 Apr 16 Apr 23
More informationPredicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus
Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus Presented by: Rex Ying and Charles Qi Input: A Single RGB Image Estimate
More informationarxiv: v3 [cs.lg] 30 Dec 2016
Video Ladder Networks Francesco Cricri Nokia Technologies francesco.cricri@nokia.com Xingyang Ni Tampere University of Technology xingyang.ni@tut.fi arxiv:1612.01756v3 [cs.lg] 30 Dec 2016 Mikko Honkala
More informationConvolutional Neural Networks + Neural Style Transfer. Justin Johnson 2/1/2017
Convolutional Neural Networks + Neural Style Transfer Justin Johnson 2/1/2017 Outline Convolutional Neural Networks Convolution Pooling Feature Visualization Neural Style Transfer Feature Inversion Texture
More informationarxiv: v1 [cs.cv] 22 Jan 2016
UNSUPERVISED CONVOLUTIONAL NEURAL NETWORKS FOR MOTION ESTIMATION Aria Ahmadi, Ioannis Patras School of Electronic Engineering and Computer Science Queen Mary University of London Mile End road, E1 4NS,
More informationPart Localization by Exploiting Deep Convolutional Networks
Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.
More informationSSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang
SSD: Single Shot MultiBox Detector Author: Wei Liu et al. Presenter: Siyu Jiang Outline 1. Motivations 2. Contributions 3. Methodology 4. Experiments 5. Conclusions 6. Extensions Motivation Motivation
More informationUsings CNNs to Estimate Depth from Stereo Imagery
1 Usings CNNs to Estimate Depth from Stereo Imagery Tyler S. Jordan, Skanda Shridhar, Jayant Thatte Abstract This paper explores the benefit of using Convolutional Neural Networks in generating a disparity
More informationSupplementary Material for "FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks"
Supplementary Material for "FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks" Architecture Datasets S short S long S fine FlowNetS FlowNetC Chairs 15.58 - - Chairs - 14.60 14.28 Things3D
More informationDeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs Zhipeng Yan, Moyuan Huang, Hao Jiang 5/1/2017 1 Outline Background semantic segmentation Objective,
More informationLEARNING TO GENERATE CHAIRS WITH CONVOLUTIONAL NEURAL NETWORKS
LEARNING TO GENERATE CHAIRS WITH CONVOLUTIONAL NEURAL NETWORKS Alexey Dosovitskiy, Jost Tobias Springenberg and Thomas Brox University of Freiburg Presented by: Shreyansh Daftry Visual Learning and Recognition
More informationPerceptron: This is convolution!
Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image
More informationCOMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017
COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization
More informationRecurrent Convolutional Neural Networks for Scene Labeling
Recurrent Convolutional Neural Networks for Scene Labeling Pedro O. Pinheiro, Ronan Collobert Reviewed by Yizhe Zhang August 14, 2015 Scene labeling task Scene labeling: assign a class label to each pixel
More informationA Deep Learning Approach to Vehicle Speed Estimation
A Deep Learning Approach to Vehicle Speed Estimation Benjamin Penchas bpenchas@stanford.edu Tobin Bell tbell@stanford.edu Marco Monteiro marcorm@stanford.edu ABSTRACT Given car dashboard video footage,
More informationFlow-Based Video Recognition
Flow-Based Video Recognition Jifeng Dai Visual Computing Group, Microsoft Research Asia Joint work with Xizhou Zhu*, Yuwen Xiong*, Yujie Wang*, Lu Yuan and Yichen Wei (* interns) Talk pipeline Introduction
More informationImproving Image Segmentation Quality Via Graph Theory
International Symposium on Computers & Informatics (ISCI 05) Improving Image Segmentation Quality Via Graph Theory Xiangxiang Li, Songhao Zhu School of Automatic, Nanjing University of Post and Telecommunications,
More information08 An Introduction to Dense Continuous Robotic Mapping
NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy
More informationSEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks
More informationSmart Content Recognition from Images Using a Mixture of Convolutional Neural Networks *
Smart Content Recognition from Images Using a Mixture of Convolutional Neural Networks * Tee Connie *, Mundher Al-Shabi *, and Michael Goh Faculty of Information Science and Technology, Multimedia University,
More informationLearning visual odometry with a convolutional network
Learning visual odometry with a convolutional network Kishore Konda 1, Roland Memisevic 2 1 Goethe University Frankfurt 2 University of Montreal konda.kishorereddy@gmail.com, roland.memisevic@gmail.com
More informationStudy of Residual Networks for Image Recognition
Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks
More informationSupplemental Material for End-to-End Learning of Video Super-Resolution with Motion Compensation
Supplemental Material for End-to-End Learning of Video Super-Resolution with Motion Compensation Osama Makansi, Eddy Ilg, and Thomas Brox Department of Computer Science, University of Freiburg 1 Computation
More informationResearch on Pruning Convolutional Neural Network, Autoencoder and Capsule Network
Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network Tianyu Wang Australia National University, Colledge of Engineering and Computer Science u@anu.edu.au Abstract. Some tasks,
More informationKnow your data - many types of networks
Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for
More informationDeep Optical Flow Estimation Via Multi-Scale Correspondence Structure Learning
Deep Optical Flow Estimation Via Multi-Scale Correspondence Structure Learning Shanshan Zhao 1, Xi Li 1,2,, Omar El Farouk Bourahla 1 1 Zhejiang University, Hangzhou, China 2 Alibaba-Zhejiang University
More informationPresentation Outline. Semantic Segmentation. Overview. Presentation Outline CNN. Learning Deconvolution Network for Semantic Segmentation 6/6/16
6/6/16 Learning Deconvolution Network for Semantic Segmentation Hyeonwoo Noh, Seunghoon Hong,Bohyung Han Department of Computer Science and Engineering, POSTECH, Korea Shai Rozenberg 6/6/2016 1 2 Semantic
More informationDeep Models for 3D Reconstruction
Deep Models for 3D Reconstruction Andreas Geiger Autonomous Vision Group, MPI for Intelligent Systems, Tübingen Computer Vision and Geometry Group, ETH Zürich October 12, 2017 Max Planck Institute for
More informationSelf-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz Supplemental Material
Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz Supplemental Material Ayush Tewari 1,2 Michael Zollhöfer 1,2,3 Pablo Garrido 1,2 Florian Bernard 1,2 Hyeongwoo
More informationEVALUATION OF DEEP LEARNING BASED STEREO MATCHING METHODS: FROM GROUND TO AERIAL IMAGES
EVALUATION OF DEEP LEARNING BASED STEREO MATCHING METHODS: FROM GROUND TO AERIAL IMAGES J. Liu 1, S. Ji 1,*, C. Zhang 1, Z. Qin 1 1 School of Remote Sensing and Information Engineering, Wuhan University,
More informationLearning Feature Hierarchies for Object Recognition
Learning Feature Hierarchies for Object Recognition Koray Kavukcuoglu Computer Science Department Courant Institute of Mathematical Sciences New York University Marc Aurelio Ranzato, Kevin Jarrett, Pierre
More informationConvolutional Neural Networks
Lecturer: Barnabas Poczos Introduction to Machine Learning (Lecture Notes) Convolutional Neural Networks Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.
More informationFace Recognition A Deep Learning Approach
Face Recognition A Deep Learning Approach Lihi Shiloh Tal Perl Deep Learning Seminar 2 Outline What about Cat recognition? Classical face recognition Modern face recognition DeepFace FaceNet Comparison
More informationRotation Invariance Neural Network
Rotation Invariance Neural Network Shiyuan Li Abstract Rotation invariance and translate invariance have great values in image recognition. In this paper, we bring a new architecture in convolutional neural
More informationLearning and Recognizing Visual Object Categories Without First Detecting Features
Learning and Recognizing Visual Object Categories Without First Detecting Features Daniel Huttenlocher 2007 Joint work with D. Crandall and P. Felzenszwalb Object Category Recognition Generic classes rather
More information3D Shape Segmentation with Projective Convolutional Networks
3D Shape Segmentation with Projective Convolutional Networks Evangelos Kalogerakis 1 Melinos Averkiou 2 Subhransu Maji 1 Siddhartha Chaudhuri 3 1 University of Massachusetts Amherst 2 University of Cyprus
More informationOptical flow. Cordelia Schmid
Optical flow Cordelia Schmid Motion field The motion field is the projection of the 3D scene motion into the image Optical flow Definition: optical flow is the apparent motion of brightness patterns in
More informationDeconvolutions in Convolutional Neural Networks
Overview Deconvolutions in Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Deconvolutions in CNNs Applications Network visualization
More informationFully Convolutional Network for Depth Estimation and Semantic Segmentation
Fully Convolutional Network for Depth Estimation and Semantic Segmentation Yokila Arora ICME Stanford University yarora@stanford.edu Ishan Patil Department of Electrical Engineering Stanford University
More informationFlow Estimation. Min Bai. February 8, University of Toronto. Min Bai (UofT) Flow Estimation February 8, / 47
Flow Estimation Min Bai University of Toronto February 8, 2016 Min Bai (UofT) Flow Estimation February 8, 2016 1 / 47 Outline Optical Flow - Continued Min Bai (UofT) Flow Estimation February 8, 2016 2
More informationDeep generative models of natural images
Spring 2016 1 Motivation 2 3 Variational autoencoders Generative adversarial networks Generative moment matching networks Evaluating generative models 4 Outline 1 Motivation 2 3 Variational autoencoders
More informationSupplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains
Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains Jiahao Pang 1 Wenxiu Sun 1 Chengxi Yang 1 Jimmy Ren 1 Ruichao Xiao 1 Jin Zeng 1 Liang Lin 1,2 1 SenseTime Research
More informationP-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh
P-CNN: Pose-based CNN Features for Action Recognition Iman Rezazadeh Introduction automatic understanding of dynamic scenes strong variations of people and scenes in motion and appearance Fine-grained
More informationDisguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601
Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601 Introduction Face ID is complicated by alterations to an individual s appearance Beard,
More information3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington
3D Object Recognition and Scene Understanding from RGB-D Videos Yu Xiang Postdoctoral Researcher University of Washington 1 2 Act in the 3D World Sensing & Understanding Acting Intelligent System 3D World
More informationVideo Frame Interpolation Using Recurrent Convolutional Layers
2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM) Video Frame Interpolation Using Recurrent Convolutional Layers Zhifeng Zhang 1, Li Song 1,2, Rong Xie 2, Li Chen 1 1 Institute of
More informationAutoencoders. Stephen Scott. Introduction. Basic Idea. Stacked AE. Denoising AE. Sparse AE. Contractive AE. Variational AE GAN.
Stacked Denoising Sparse Variational (Adapted from Paul Quint and Ian Goodfellow) Stacked Denoising Sparse Variational Autoencoding is training a network to replicate its input to its output Applications:
More informationILLUMINATION ROBUST OPTICAL FLOW ESTIMATION BY ILLUMINATION-CHROMATICITY DECOUPLING. Sungheon Park and Nojun Kwak
ILLUMINATION ROBUST OPTICAL FLOW ESTIMATION BY ILLUMINATION-CHROMATICITY DECOUPLING Sungheon Park and Nojun Kwak Graduate School of Convergence Science and Technology, Seoul National University, Korea
More informationDeep Learning for Computer Vision II
IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L
More information