Visual Place Recognition in Changing Environments with Time-Invariant Image Patch Descriptors

Size: px
Start display at page:

Download "Visual Place Recognition in Changing Environments with Time-Invariant Image Patch Descriptors"

Transcription

1 Visual Place Recognition in Changing Environments with Time-Invariant Image Patch Descriptors Boris Ivanovic Stanford University Abstract Feature descriptors for images are a mature area of study within computer vision, and as a result, researchers now have access to many attribute-invariant features (e.g. scale, shift, rotation). However, changes to environments caused by changes in time, ie. weather and season, still pose a serious problem for current image matching systems. As the use of detailed 3D maps and visual Simultaneous Localization and Mapping (SLAM) for robotics becomes more widespread, the ability to match image points across different weather conditions, illumination, seasons, and vegetation growth becomes a more important problem to solve. In this paper, we propose a method to learn a timeinvariant image patch descriptor that can reliably match regions in images across the large-scale scenery changes caused by different weather and seasons. We use Convolutional Neural Networks (CNNs) to learn representations of image patches and in particular train a Siamese network with pairs of (non-)matching patches to enforce descriptor (dis)similarity. We enforce this by (maximizing)minimizing the Euclidean distance between descriptors of (non-)matching patches during training. To improve representation generalization, we work with the seldomused, large-scale Archive of Many Outdoor Scenes (AMOS) dataset. Figure 1. Top row: Images of a forest as the season changes from summer on the left to fall in the middle to winter on the right. Bottom row: Images of St. Louis, MO in the summer on the left and winter on the right. 1. Introduction navigation and perform pose estimation, scale-drift correction, map updating, etc. Additionally, visual place recognition over a long period of time is the more difficult problem of identifying locations previously visited by an agent where the agent s knowledge of the location is from a different season or weather condition, causing the visual appearance in memory to differ vastly from what is currently seen. Examples of these visual appearance differences can be seen in Fig. 1. With the growth of autonomous vehicles for consumer use and their use of visual Simultaneous Localization and Mapping (SLAM), the ability for vision systems to work in all conditions and seasons is paramount. As a result, visual place recognition over a long period of time has been identified as one of the core requirements of any modern robotic system to operate reliably in the real world [8]. Specifically, visual place recognition is the problem of identifying locations previously visited by an agent, enabling the agent to localize itself in an environment during We present an approach to visual place recognition over a long period of time based on local image features. Specifically, we propose a method to learn a time-invariant image patch descriptor that can reliably match regions in images across the large-scale scenery changes caused by different weather and seasons. We use CNNs to learn representations of image patches and in particular train a Siamese network with pairs of corresponding and non-corresponding patches to enforce descriptor similarity and dissimilarity for better patch matching. Fig. 2 illustrates our work s overall idea. 1

2 local image features using a novel data extraction scheme that generates more general time-invariant image patch descriptors as well as a novel patch dataset for this task. 3. Method 3.1. Data Collection and Feature Extraction Dataset Selection Figure 2. Illustration of our method. On the left is our training Siamese CNN architecture and on the right is how we use the trained CNN to generate image patch representations which we then compare via Euclidean distance to determine which patches match. 2. Previous Work Previous approaches for visual place recognition over a long period of time include matching image sequences [9], learning to predict appearance changes to simplify the problem of matching images [12], using holistic image descriptors obtained from Convolutional Neural Networks (CNNs) [14], and combining local image features with CNN feature descriptors to match patches across images and provide resilience to viewpoint changes [10]. More recent work focuses on combining local patch features and holistic descriptors with multi-scale superpixel grids to provide the accurate matching performance of holistic descriptors while still maintaining the viewpoint invariance of local patches [11]. Additionally, work has been done to create an algorithm that identifies sets of heterogeneous features that are invariant to the types of changes that occur across seasons and weather, simplifying the problem of image matching [5]. Our work is most similar to Neubert and Protzel s work [10] as it also combines a feature detector with a CNN feature descriptor. The main differences are that Neubert and Protzel use the vectorized third convolutional layer (conv3) of the VGG-M network [2] as a feature descriptor and perform Hough-based patch matching whereas our work uses a different CNN architecture and performs direct patch matching via smallest Euclidean distance between the patches descriptors. Our work also takes cues from the work of Simo-Serra et al. [13] as we are both trying to learn discriminative feature descriptors. As a result, we use the same CNN architecture and Siamese layout for training. The main contribution of our work is an approach to visual place recognition over a long period of time based on This work uses a seldom-used dataset for the task of visual place recognition: the Archive of Many Outdoor Scenes (AMOS) [6]. It is a very large collection of more than 1 billion images from almost 30,000 static cameras located around the world. The reason for using such a dataset rather than one of the more popular ones for this task (e.g. Nordland Dataset [9]) is that we aim to create a more general image patch descriptor that can be used in a variety of environments (e.g. cities, forests, roads, etc.) rather than specific scenes such as a railroad. Another candidate dataset that we may work with in the future is the Longterm Observation of Scenes with Tracks (LOST) Dataset [1], comprised of videos taken from streaming outdoor webcams. The LOST dataset is also very large, with more than 150 million video frames captured to date. The main difference between the LOST and AMOS datasets is that all videos in the LOST dataset are taken during the same 30-minute daytime interval (noon local time), which may hinder the diversity of data collected (e.g. sunsets and sunrises, poor illumination conditions at dusk, city lights, road lights, etc.) Day/Night Classification and Pruning The AMOS dataset contains both daytime and nighttime images, which is beneficial for data diversity, however, a majority of the cameras in the dataset have poor low-light performance and thus the nighttime images are mostly featureless and black. In order to exclude these images from being used during training, we have to identify which images are taken at night (undesirable) and which are taken during the day (desirable). First instincts may indicate that we should just use timestamps of the images to determine day/night boundaries. However, all timestamps are in GMT and there is no accompanying camera location information. Even if camera location and local time of capture were known, it is difficult to set day/night boundaries manually when they vary so heavily across seasons and locations. Thus, in order to avoid complex rule-based classification, we used the intuition that nighttime images have more dark pixels than daytime images to formulate this as a binary classification problem using image pixel values as the features. As a result, we decided to train a simple SVM 2

3 training data for this classifier from different cameras in the AMOS dataset and selected 476 daytime and 420 nighttime images. After training, the classifier was able to linearly separate the data, leading to perfect classification of day and night images. Fig. 4 shows some classification results Image Patch Pair Extraction In order to extract pairs of matching patches across different conditions, we capitalized on the fact that the AMOS dataset contains images from static cameras. For example, if the front door of a building in the summer time is contained in a 64x64 patch centered at (x, y), then that same door will be found in a 64x64 patch centered at the same (x, y) coordinates in a wintertime image. Now, patches from the image must be chosen. To do this we perform keypoint detection with SIFT on the image and select the 10 points with highest response (response specifies the strength of the found keypoint). We then define a 64x64 patch centered at each of the 10 keypoints and use that as one of our patches. To find a corresponding patch we pick a random image in a different environmental condition (e.g. season) and extract patches from the same coordinates as the original image. In order to prevent data duplication (ie. all top 10 keypoints lying within 3 pixels of each other, creating nearly identical patches), we choose the top 10 keypoints in a manner so that no two would have patches that overlap. Above are the steps to find matching patch pairs. The process to find non-corresponding patch pairs is very similar. The only difference is that when it comes time to choosing a patch in the randomly selected other image, we chose patch coordinates from one of the other non-overlapping patches in our image, guaranteeing that the patch extracted from the other image shares no intentional similarity with the original patch. Fig. 5 illustrates our overall image patch pair extraction process for corresponding patches. Fig. 6 shows a few extracted patch pairs. Figure 3. Illustration of the process and results of extracting features for the day/night classifier for two example images. Figure 4. Example classifications from the SVM classifier where the top row were classified as Day and the bottom row were classified as Night. classifier on the frequency of image pixel values. However, using the pixel values directly would create very highdimensional feature vectors, so we chose to bin the pixel values into 4 bins per channel. By binning the image pixel values, we separate the 256 pixel values of each channel into 4 bins: Very dark: [0,64), Dark: [64, 128), Light: [128, 192), and Very Light: [192, 256) and then count how many pixels in the image are in each bin, making it easier to group dark and light images together. Finally, the counts are normalized so that this feature extraction works with images of any size. Fig. 3 illustrates this process. With these features, we expect that nighttime images would have much higher values in the Very Dark and Dark bins whereas daytime images would have much higher values in the Light and Very Light bins. With the features chosen and defined, we hand-picked Dataset Statistics Using our patch extraction method, our dataset contains patches from daytime images of 16 different scenes in both the summer and winter. Dates vary from 2006 to The specific composition of the dataset is 55,137 pairs of matching patches and 62,444 pairs of non-matching patches CNN Image Patch Descriptor Model Architecture We use a three-layer CNN architecture similar to SimoSerra et al. [13] since we also aim to compute discriminative feature descriptors for image patches. Table 1 details the architecture of the network. 3

4 Figure 5. Illustration of the process to generate corresponding image patches. On the left is an image of certain type (in this case summer), in the middle is the image with SIFT keypoints drawn on, and on the right is a randomly chosen image of a different type (in this case winter) to match with. The red dashed boxes indicate 64x64 image patches and the horizontal red lines indicate which patches are matched. Figure 6. A sample of 64x64 corresponding (top) and non-corresponding (bottom) patches extracted from the AMOS dataset. fj is the representation of image patch xj, Wij is a label indicating if patches xi and xj should be similar (Wij = 1) or dissimilar (Wij = 0), and m is the distance that dissimilar patches should be at least apart by. Table 1. Architecture of our three-layer CNN Fig. 2 shows the Siamese CNN architecture used for training. Model Training 4. Experiments We aim to compute whether two examples are similar or not based on the similarity of their feature descriptors. This is an instance of semi-supervised embedding as defined in Weston et al. [16] and is also why a Siamese CNN architecture like the one outlined in Simo-Serra et al. [13] fits the task well (ie. identical copies of the same funciton, same weights, uses a distance measuring layer to compute similarity). As a result, we use the margin-based loss function proposed by Hadsell et al. [4] which encourages similar examples to be close, and dissimilar ones to be at least some distance away from each other: ( L(fi, fj, Wij ) = 4.1. AlexNet s Reprsesentation of Different Seasons In order to gauge how current CNNs perform on the task of grouping together similar environments across different weather conditions and seasons, we decided to visualize how AlexNet s fc7 layer [7] (trained on ImageNet [3]) represents a handful of images that differ in seasons, daytime or nighttime, weather conditions, and vegetation growth. To do this we performed t-distributed Stochastic Neighbor Embedding (t-sne) [15] and plotted the images on their resulting representation vectors. Fig. 7 shows the results. As can be seen, AlexNet does manage to group similar scenes together (ie. all the forest images were together, all the construction images were together, etc.). However, due to the heavy variations in color and illumination, scenes in different conditions are not represented alike, and this can be seen by the distances within the similar scene groups between specific images. fi fj 2 if Wij = 1 max(0, m fi fj 2 ) if Wij = 0 (1) where: fi is the representation of image patch xi, 4

5 Accuracy Cutoff Matching Accuracy Top % Top % Top % Table 2. The model s performance on our patch dataset. The accuracy cutoffs indicate what position the query s corresponding patch must be ranked within in order to count as a correct result Representation Analysis Figure 7. t-sne plot of AlexNet s fc7 representation of various scenes with appearance changes. In the top left are images of St. Louis, MO in the summer and winter; in the bottom left are images of a forest in the summer, fall, and winter; in the middle are images of a school building in the summer and winter; in the top right are images of a construction site in a city during the day, dusk, and night Patch Matching Performance In order to test the model s performance on patch matching, we devised an experiment where: 1. A query patch s 128-dimensional representation vector was obtained from a trained model. 2. A selection of patches from the same location in the opposite season are chosen (including the query patch s corresponding patch) and their representations obtained. 3. The choice patches are ranked in ascending order according to their representation s Euclidean distance to the query patch s representation. With this experiment, we are looking for the query patch s corresponding patch to have the smallest representation distance from the query patch. An visualization of this experiment is shown in Fig. 8 and the model s performance on our dataset is detailed in Table 2. When the experiment was run, usually 15 choice patches were presented (making the total number of choice patches 16, including the query patch s corresponding patch). The fact that the accuracy increases quickly with respect to the number of included top results is promising, as it shows that the corresponding query patch is still close to the query patch even if the model is wrong. Fig. 9 shows a t-sne visualization of the model s representation of patches from our dataset. Unlike typical uses of t-sne plots, we are not looking for global patterns across the latent space with this work. Instead, we are looking for local groups of patches corresponding to the same scene in different seasons, as this shows that the model has learned to represent different appearances of the same scene similarly. As can be seen, our model does produce clusters of the same scene in different seasons, lending credence to the notion that our model generates time-invariant image patch descriptors Weight Visualization Fig. 10 shows some weights from each layer of a trained model. Unfortunately, due to the sizes of the weight matrices and input patches, it is difficult to make out any specific patterns in the weights. Ideally, we would also have performed an analysis of which patches maximally-activate specific neurons, to test for the occurrence of semanticallyactivated neurons like a foliage neuron, shadow neuron, etc. 5. Conclusion In conclusion, we propose an approach to visual place recognition over a long period of time based on local image features using a novel data extraction scheme that generates time-invariant image patch descriptors. We use Convolutional Neural Networks (CNNs) to learn representations of image patches and in particular train a Siamese network with pairs of (non-)matching patches to enforce descriptor (dis)similarity. To improve representation generalization, we work with the seldom-used Archive of Many Outdoor Scenes (AMOS) dataset and show a method to easily extract corresponding and non-corresponding image patches from it. References [1] A. Abrams, J. Tucek, N. Jacobs, and R. Pless. LOST: Longterm Observation of Scenes (with Tracks). In IEEE Workshop on Applications of Computer Vision (WACV), pages , Acceptance rate: 44%. 5

6 Figure 8. A visualization of the patch matching accuracy test for two query patches. The query patches are shown on the left and the choice patches are shown on the right, in ascending order of their representation s Euclidean distance from the query patch. A green border indicates that the patch is the query s corresponding patch. We desire the patch with the green border to be as close as possible to the query patch (ie. to be the leftmost of the choice patches). The top trial would count as accurate in all the top 1, 3, and 5 accuracy cutoffs, whereas the bottom trial would only count as accurate in the top 3 and 5 accuracy cutoffs. Figure 9. A t-sne visualization of the model s representation of 300 patches from our dataset. Circled in red are local groups of patches corresponding to the same scene in different seasons. 6

7 Figure 10. Visualization of four 7x7 weight matrices from the first convolutional layer (top), four 6x6 weight matrices from the second convolutional layer (middle), and four 5x5 weight matrices from the third convolutional layer (bottom), with specific location in the model indicated in the plot title. 7

8 [2] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In British Machine Vision Conference, [3] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), June 2009, Miami, Florida, USA, pages , [4] R. Hadsell, S. Chopra, and Y. LeCun. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 06), volume 2, pages , [5] F. Han, X. Yang, Y. Deng, M. Rentschler, D. Yang, and H. Zhang. Life-long place recognition by shared representative appearance learning. In Workshop on Robotics: Science and Systems, AnnArbor, Michigan, June [6] N. Jacobs, N. Roman, and R. Pless. Consistent Temporal Variations in Many Outdoor Scenes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1 6, June Acceptance rate: 23.4%. [7] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages Curran Associates, Inc., [8] S. M. Lowry, N. Sünderhauf, P. Newman, J. J. Leonard, D. D. Cox, P. I. Corke, and M. J. Milford. Visual place recognition: A survey. IEEE Trans. Robotics, 32(1):1 19, [9] M. Milford and G. Wyeth. Seqslam : visual route-based navigation for sunny summer days and stormy winter nights. In N. Papanikolopoulos, editor, IEEE International Conferece on Robotics and Automation (ICRA 2012), pages , River Centre, Saint Paul, Minnesota, IEEE. [10] P. Neubert and P. Protzel. Local region detector + cnn based landmarks for practical place recognition in changing environments. In ECMR, [11] P. Neubert and P. Protzel. Beyond holistic descriptors, keypoints, and fixed patches: Multiscale superpixel grids for place recognition in changing environments. IEEE Robotics and Automation Letters, 1(1): , Jan [12] P. Neubert, N. Sünderhauf, and P. Protzel. Superpixelbased appearance change prediction for long-term navigation across seasons. Robotics and Autonomous Systems, 69:15 27, [13] E. Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, P. Fua, and F. Moreno-Noguer. Discriminative Learning of Deep Convolutional Feature Point Descriptors. In Proceedings of the International Conference on Computer Vision (ICCV), [14] N. Sünderhauf, F. Dayoub, S. Shirazi, B. Upcroft, and M. Milford. On the performance of convnet features for place recognition. CoRR, abs/ , [15] L. van der Maaten and G. E. Hinton. Visualizing highdimensional data using t-sne. Journal of Machine Learning Research, 9: , [16] J. Weston, F. Ratle, H. Mobahi, and R. Collobert. Deep Learning via Semi-supervised Embedding, pages Springer Berlin Heidelberg, Berlin, Heidelberg,

Single-View Place Recognition under Seasonal Changes

Single-View Place Recognition under Seasonal Changes Single-View Place Recognition under Seasonal Changes Daniel Olid, José M. Fácil and Javier Civera Abstract Single-view place recognition, that we can define as finding an image that corresponds to the

More information

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK 1 Po-Jen Lai ( 賴柏任 ), 2 Chiou-Shann Fuh ( 傅楸善 ) 1 Dept. of Electrical Engineering, National Taiwan University, Taiwan 2 Dept.

More information

Fashion Style in 128 Floats: Joint Ranking and Classification using Weak Data for Feature Extraction SUPPLEMENTAL MATERIAL

Fashion Style in 128 Floats: Joint Ranking and Classification using Weak Data for Feature Extraction SUPPLEMENTAL MATERIAL Fashion Style in 128 Floats: Joint Ranking and Classification using Weak Data for Feature Extraction SUPPLEMENTAL MATERIAL Edgar Simo-Serra Waseda University esimo@aoni.waseda.jp Hiroshi Ishikawa Waseda

More information

Classification of objects from Video Data (Group 30)

Classification of objects from Video Data (Group 30) Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

Beyond Bags of Features

Beyond Bags of Features : for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015

More information

Local Region Detector + CNN based Landmarks for Practical Place Recognition in Changing Environments

Local Region Detector + CNN based Landmarks for Practical Place Recognition in Changing Environments Local Region Detector + CNN based Landmarks for Practical Place Recognition in Changing Environments Peer Neubert and Peter Protzel Technische Universita t Chemnitz 926 Chemnitz, Germany Contact: peer.neubert@etit.tu-chemnitz.de

More information

Dynamic Environments Localization via Dimensions Reduction of Deep Learning Features

Dynamic Environments Localization via Dimensions Reduction of Deep Learning Features Dynamic Environments Localization via Dimensions Reduction of Deep Learning Features Hui Zhang 1(B), Xiangwei Wang 1, Xiaoguo Du 1, Ming Liu 2, and Qijun Chen 1 1 RAI-LAB, Tongji University, Shanghai,

More information

Fusion and Binarization of CNN Features for Robust Topological Localization across Seasons

Fusion and Binarization of CNN Features for Robust Topological Localization across Seasons Fusion and Binarization of CNN Features for Robust Topological Localization across Seasons Roberto Arroyo 1, Pablo F. Alcantarilla 2, Luis M. Bergasa 1 and Eduardo Romera 1 Abstract The extreme variability

More information

CSE 559A: Computer Vision

CSE 559A: Computer Vision CSE 559A: Computer Vision Fall 2018: T-R: 11:30-1pm @ Lopata 101 Instructor: Ayan Chakrabarti (ayan@wustl.edu). Course Staff: Zhihao Xia, Charlie Wu, Han Liu http://www.cse.wustl.edu/~ayan/courses/cse559a/

More information

Ensemble of Bayesian Filters for Loop Closure Detection

Ensemble of Bayesian Filters for Loop Closure Detection Ensemble of Bayesian Filters for Loop Closure Detection Mohammad Omar Salameh, Azizi Abdullah, Shahnorbanun Sahran Pattern Recognition Research Group Center for Artificial Intelligence Faculty of Information

More information

3D model classification using convolutional neural network

3D model classification using convolutional neural network 3D model classification using convolutional neural network JunYoung Gwak Stanford jgwak@cs.stanford.edu Abstract Our goal is to classify 3D models directly using convolutional neural network. Most of existing

More information

Improving Vision-based Topological Localization by Combining Local and Global Image Features

Improving Vision-based Topological Localization by Combining Local and Global Image Features Improving Vision-based Topological Localization by Combining Local and Global Image Features Shuai Yang and Han Wang Nanyang Technological University, Singapore Introduction Self-localization is crucial

More information

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deep learning for object detection. Slides from Svetlana Lazebnik and many others Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep

More information

arxiv: v1 [cs.ro] 30 Oct 2018

arxiv: v1 [cs.ro] 30 Oct 2018 Pre-print of article that will appear in Proceedings of the Australasian Conference on Robotics and Automation 28. Please cite this paper as: Stephen Hausler, Adam Jacobson, and Michael Milford. Feature

More information

Quantifying Translation-Invariance in Convolutional Neural Networks

Quantifying Translation-Invariance in Convolutional Neural Networks Quantifying Translation-Invariance in Convolutional Neural Networks Eric Kauderer-Abrams Stanford University 450 Serra Mall, Stanford, CA 94305 ekabrams@stanford.edu Abstract A fundamental problem in object

More information

DeepIndex for Accurate and Efficient Image Retrieval

DeepIndex for Accurate and Efficient Image Retrieval DeepIndex for Accurate and Efficient Image Retrieval Yu Liu, Yanming Guo, Song Wu, Michael S. Lew Media Lab, Leiden Institute of Advance Computer Science Outline Motivation Proposed Approach Results Conclusions

More information

Real-time Object Detection CS 229 Course Project

Real-time Object Detection CS 229 Course Project Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection

More information

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon Deep Learning For Video Classification Presented by Natalie Carlebach & Gil Sharon Overview Of Presentation Motivation Challenges of video classification Common datasets 4 different methods presented in

More information

Domain Adaptation For Mobile Robot Navigation

Domain Adaptation For Mobile Robot Navigation Domain Adaptation For Mobile Robot Navigation David M. Bradley, J. Andrew Bagnell Robotics Institute Carnegie Mellon University Pittsburgh, 15217 dbradley, dbagnell@rec.ri.cmu.edu 1 Introduction An important

More information

Combining Selective Search Segmentation and Random Forest for Image Classification

Combining Selective Search Segmentation and Random Forest for Image Classification Combining Selective Search Segmentation and Random Forest for Image Classification Gediminas Bertasius November 24, 2013 1 Problem Statement Random Forest algorithm have been successfully used in many

More information

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro Ryerson University CP8208 Soft Computing and Machine Intelligence Naive Road-Detection using CNNS Authors: Sarah Asiri - Domenic Curro April 24 2016 Contents 1 Abstract 2 2 Introduction 2 3 Motivation

More information

Using Machine Learning for Classification of Cancer Cells

Using Machine Learning for Classification of Cancer Cells Using Machine Learning for Classification of Cancer Cells Camille Biscarrat University of California, Berkeley I Introduction Cell screening is a commonly used technique in the development of new drugs.

More information

Part Localization by Exploiting Deep Convolutional Networks

Part Localization by Exploiting Deep Convolutional Networks Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.

More information

A Multi-Domain Feature Learning Method for Visual Place Recognition

A Multi-Domain Feature Learning Method for Visual Place Recognition A Multi-Domain Learning Method for Visual Place Recognition Peng Yin 1,, Lingyun Xu 1, Xueqian Li 3, Chen Yin 4, Yingli Li 1, Rangaprasad Arun Srivatsan 3, Lu Li 3, Jianmin Ji 2,, Yuqing He 1 Abstract

More information

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University. Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Analysis of the CMU Localization Algorithm Under Varied Conditions

Analysis of the CMU Localization Algorithm Under Varied Conditions Analysis of the CMU Localization Algorithm Under Varied Conditions Aayush Bansal, Hernán Badino, Daniel Huber CMU-RI-TR-00-00 July 2014 Robotics Institute Carnegie Mellon University Pittsburgh, Pennsylvania

More information

arxiv: v1 [cs.cv] 27 May 2015

arxiv: v1 [cs.cv] 27 May 2015 Training a Convolutional Neural Network for Appearance-Invariant Place Recognition Ruben Gomez-Ojeda 1 Manuel Lopez-Antequera 1,2 Nicolai Petkov 2 Javier Gonzalez-Jimenez 1 arxiv:.7428v1 [cs.cv] 27 May

More information

Semantics-aware Visual Localization under Challenging Perceptual Conditions

Semantics-aware Visual Localization under Challenging Perceptual Conditions Semantics-aware Visual Localization under Challenging Perceptual Conditions Tayyab Naseer Gabriel L. Oliveira Thomas Brox Wolfram Burgard Abstract Visual place recognition under difficult perceptual conditions

More information

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 Vector visual representation Fixed-size image representation High-dim (100 100,000) Generic, unsupervised: BoW,

More information

Object Detection. Part1. Presenter: Dae-Yong

Object Detection. Part1. Presenter: Dae-Yong Object Part1 Presenter: Dae-Yong Contents 1. What is an Object? 2. Traditional Object Detector 3. Deep Learning-based Object Detector What is an Object? Subset of Object Recognition What is an Object?

More information

Improving Recognition through Object Sub-categorization

Improving Recognition through Object Sub-categorization Improving Recognition through Object Sub-categorization Al Mansur and Yoshinori Kuno Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama-shi, Saitama 338-8570,

More information

Recurrent Convolutional Neural Networks for Scene Labeling

Recurrent Convolutional Neural Networks for Scene Labeling Recurrent Convolutional Neural Networks for Scene Labeling Pedro O. Pinheiro, Ronan Collobert Reviewed by Yizhe Zhang August 14, 2015 Scene labeling task Scene labeling: assign a class label to each pixel

More information

An Exploration of Computer Vision Techniques for Bird Species Classification

An Exploration of Computer Vision Techniques for Bird Species Classification An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex

More information

Siamese Network Features for Image Matching

Siamese Network Features for Image Matching Siamese Network Features for Image Matching Iaroslav Melekhov Department of Computer Science Aalto University, Finland Email: iaroslav.melekhov@aalto.fi Juho Kannala Department of Computer Science Aalto

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

Made to Measure: Bespoke Landmarks for 24-Hour, All-Weather Localisation with a Camera

Made to Measure: Bespoke Landmarks for 24-Hour, All-Weather Localisation with a Camera Made to Measure: Bespoke Landmarks for 24-Hour, All-Weather Localisation with a Camera Chris Linegar, Winston Churchill and Paul Newman Abstract This paper is about camera-only localisation in challenging

More information

Specular 3D Object Tracking by View Generative Learning

Specular 3D Object Tracking by View Generative Learning Specular 3D Object Tracking by View Generative Learning Yukiko Shinozuka, Francois de Sorbier and Hideo Saito Keio University 3-14-1 Hiyoshi, Kohoku-ku 223-8522 Yokohama, Japan shinozuka@hvrl.ics.keio.ac.jp

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Structured Prediction using Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

Learning visual odometry with a convolutional network

Learning visual odometry with a convolutional network Learning visual odometry with a convolutional network Kishore Konda 1, Roland Memisevic 2 1 Goethe University Frankfurt 2 University of Montreal konda.kishorereddy@gmail.com, roland.memisevic@gmail.com

More information

Image Features for Visual Teach-and-Repeat Navigation in Changing Environments

Image Features for Visual Teach-and-Repeat Navigation in Changing Environments Image Features for Visual Teach-and-Repeat Navigation in Changing Environments Tomáš Krajník a, Pablo Cristóforis b, Keerthy Kusumam a, Peer Neubert c, Tom Duckett a a Lincoln Centre for Autonomous Systems,

More information

UAV Pose Estimation using Cross-view Geolocalization with Satellite Imagery

UAV Pose Estimation using Cross-view Geolocalization with Satellite Imagery UAV Pose Estimation using Cross-view Geolocalization with Satellite Imagery Akshay Shetty and Grace Xingxin Gao Abstract We propose an image-based cross-view geolocalization method that estimates the global

More information

Lecture 12 Recognition

Lecture 12 Recognition Institute of Informatics Institute of Neuroinformatics Lecture 12 Recognition Davide Scaramuzza 1 Lab exercise today replaced by Deep Learning Tutorial Room ETH HG E 1.1 from 13:15 to 15:00 Optional lab

More information

Bilinear Models for Fine-Grained Visual Recognition

Bilinear Models for Fine-Grained Visual Recognition Bilinear Models for Fine-Grained Visual Recognition Subhransu Maji College of Information and Computer Sciences University of Massachusetts, Amherst Fine-grained visual recognition Example: distinguish

More information

Contextual Dropout. Sam Fok. Abstract. 1. Introduction. 2. Background and Related Work

Contextual Dropout. Sam Fok. Abstract. 1. Introduction. 2. Background and Related Work Contextual Dropout Finding subnets for subtasks Sam Fok samfok@stanford.edu Abstract The feedforward networks widely used in classification are static and have no means for leveraging information about

More information

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA

More information

Discriminative classifiers for image recognition

Discriminative classifiers for image recognition Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study

More information

A Novel Representation and Pipeline for Object Detection

A Novel Representation and Pipeline for Object Detection A Novel Representation and Pipeline for Object Detection Vishakh Hegde Stanford University vishakh@stanford.edu Manik Dhar Stanford University dmanik@stanford.edu Abstract Object detection is an important

More information

Multiple-Choice Questionnaire Group C

Multiple-Choice Questionnaire Group C Family name: Vision and Machine-Learning Given name: 1/28/2011 Multiple-Choice naire Group C No documents authorized. There can be several right answers to a question. Marking-scheme: 2 points if all right

More information

A Sequence-Based Neuronal Model for Mobile Robot Localization

A Sequence-Based Neuronal Model for Mobile Robot Localization Author version. KI 2018, LNAI 11117, pp. 117-130, 2018. The final authenticated publication is available online at https://doi.org/10.1007/978-3-030-00111-7_11 A Sequence-Based Neuronal Model for Mobile

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

Selection of Scale-Invariant Parts for Object Class Recognition

Selection of Scale-Invariant Parts for Object Class Recognition Selection of Scale-Invariant Parts for Object Class Recognition Gy. Dorkó and C. Schmid INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 3833 Montbonnot, France fdorko,schmidg@inrialpes.fr Abstract

More information

Robust Visual SLAM Across Seasons

Robust Visual SLAM Across Seasons Robust Visual SLAM Across Seasons Tayyab Naseer Michael Ruhnke Cyrill Stachniss Luciano Spinello Wolfram Burgard Abstract In this paper, we present an appearance-based visual SLAM approach that focuses

More information

CS4495/6495 Introduction to Computer Vision. 8C-L1 Classification: Discriminative models

CS4495/6495 Introduction to Computer Vision. 8C-L1 Classification: Discriminative models CS4495/6495 Introduction to Computer Vision 8C-L1 Classification: Discriminative models Remember: Supervised classification Given a collection of labeled examples, come up with a function that will predict

More information

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet

More information

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 ECCV 2016 Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 Fundamental Question What is a good vector representation of an object? Something that can be easily predicted from 2D

More information

Self-supervised Visual Descriptor Learning for Dense Correspondence

Self-supervised Visual Descriptor Learning for Dense Correspondence IEEE Robotics and Automation Letters (RA-L) paper presented at the 2017 IEEE International Conference on Robotics and Automation (ICRA) Singapore, May 29 - June 3, 2017 Self-supervised Visual Descriptor

More information

Semantic Object Recognition in Digital Images

Semantic Object Recognition in Digital Images Semantic Object Recognition in Digital Images Semantic Object Recognition in Digital Images Falk Schmidsberger and Frieder Stolzenburg Hochschule Harz, Friedrichstr. 57 59 38855 Wernigerode, GERMANY {fschmidsberger,fstolzenburg}@hs-harz.de

More information

Toward Part-based Document Image Decoding

Toward Part-based Document Image Decoding 2012 10th IAPR International Workshop on Document Analysis Systems Toward Part-based Document Image Decoding Wang Song, Seiichi Uchida Kyushu University, Fukuoka, Japan wangsong@human.ait.kyushu-u.ac.jp,

More information

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Charles R. Qi Hao Su Matthias Nießner Angela Dai Mengyuan Yan Leonidas J. Guibas Stanford University 1. Details

More information

A Keypoint Descriptor Inspired by Retinal Computation

A Keypoint Descriptor Inspired by Retinal Computation A Keypoint Descriptor Inspired by Retinal Computation Bongsoo Suh, Sungjoon Choi, Han Lee Stanford University {bssuh,sungjoonchoi,hanlee}@stanford.edu Abstract. The main goal of our project is to implement

More information

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation Object detection using Region Proposals (RCNN) Ernest Cheung COMP790-125 Presentation 1 2 Problem to solve Object detection Input: Image Output: Bounding box of the object 3 Object detection using CNN

More information

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional

More information

Deep Face Recognition. Nathan Sun

Deep Face Recognition. Nathan Sun Deep Face Recognition Nathan Sun Why Facial Recognition? Picture ID or video tracking Higher Security for Facial Recognition Software Immensely useful to police in tracking suspects Your face will be an

More information

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Supplementary material for Analyzing Filters Toward Efficient ConvNet Supplementary material for Analyzing Filters Toward Efficient Net Takumi Kobayashi National Institute of Advanced Industrial Science and Technology, Japan takumi.kobayashi@aist.go.jp A. Orthonormal Steerable

More information

Visual features detection based on deep neural network in autonomous driving tasks

Visual features detection based on deep neural network in autonomous driving tasks 430 Fomin I., Gromoshinskii D., Stepanov D. Visual features detection based on deep neural network in autonomous driving tasks Ivan Fomin, Dmitrii Gromoshinskii, Dmitry Stepanov Computer vision lab Russian

More information

Weighted Convolutional Neural Network. Ensemble.

Weighted Convolutional Neural Network. Ensemble. Weighted Convolutional Neural Network Ensemble Xavier Frazão and Luís A. Alexandre Dept. of Informatics, Univ. Beira Interior and Instituto de Telecomunicações Covilhã, Portugal xavierfrazao@gmail.com

More information

Multi-Glance Attention Models For Image Classification

Multi-Glance Attention Models For Image Classification Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We

More information

arxiv: v1 [cs.cv] 1 Jan 2019

arxiv: v1 [cs.cv] 1 Jan 2019 Mapping Areas using Computer Vision Algorithms and Drones Bashar Alhafni Saulo Fernando Guedes Lays Cavalcante Ribeiro Juhyun Park Jeongkyu Lee University of Bridgeport. Bridgeport, CT, 06606. United States

More information

Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation

Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation Al-Hussein A. El-Shafie Faculty of Engineering Cairo University Giza, Egypt elshafie_a@yahoo.com Mohamed Zaki Faculty

More information

Object detection with CNNs

Object detection with CNNs Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals

More information

Efficient and Effective Matching of Image Sequences Under Substantial Appearance Changes Exploiting GPS Priors

Efficient and Effective Matching of Image Sequences Under Substantial Appearance Changes Exploiting GPS Priors Efficient and Effective Matching of Image Sequences Under Substantial Appearance Changes Exploiting GPS Priors Olga Vysotska Tayyab Naseer Luciano Spinello Wolfram Burgard Cyrill Stachniss Abstract The

More information

Study of Residual Networks for Image Recognition

Study of Residual Networks for Image Recognition Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks

More information

Human Pose Estimation with Deep Learning. Wei Yang

Human Pose Estimation with Deep Learning. Wei Yang Human Pose Estimation with Deep Learning Wei Yang Applications Understand Activities Family Robots American Heist (2014) - The Bank Robbery Scene 2 What do we need to know to recognize a crime scene? 3

More information

Su et al. Shape Descriptors - III

Su et al. Shape Descriptors - III Su et al. Shape Descriptors - III Siddhartha Chaudhuri http://www.cse.iitb.ac.in/~cs749 Funkhouser; Feng, Liu, Gong Recap Global A shape descriptor is a set of numbers that describes a shape in a way that

More information

arxiv: v1 [cs.cv] 20 Dec 2016

arxiv: v1 [cs.cv] 20 Dec 2016 End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [cs.cv] 20 Dec 2016 Heechul Jung heechul@dgist.ac.kr Min-Kook Choi mkchoi@dgist.ac.kr

More information

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013 Feature Descriptors CS 510 Lecture #21 April 29 th, 2013 Programming Assignment #4 Due two weeks from today Any questions? How is it going? Where are we? We have two umbrella schemes for object recognition

More information

Depth from Stereo. Dominic Cheng February 7, 2018

Depth from Stereo. Dominic Cheng February 7, 2018 Depth from Stereo Dominic Cheng February 7, 2018 Agenda 1. Introduction to stereo 2. Efficient Deep Learning for Stereo Matching (W. Luo, A. Schwing, and R. Urtasun. In CVPR 2016.) 3. Cascade Residual

More information

A Generalized Method to Solve Text-Based CAPTCHAs

A Generalized Method to Solve Text-Based CAPTCHAs A Generalized Method to Solve Text-Based CAPTCHAs Jason Ma, Bilal Badaoui, Emile Chamoun December 11, 2009 1 Abstract We present work in progress on the automated solving of text-based CAPTCHAs. Our method

More information

3D Reconstruction of a Hopkins Landmark

3D Reconstruction of a Hopkins Landmark 3D Reconstruction of a Hopkins Landmark Ayushi Sinha (461), Hau Sze (461), Diane Duros (361) Abstract - This paper outlines a method for 3D reconstruction from two images. Our procedure is based on known

More information

HOG-based Pedestriant Detector Training

HOG-based Pedestriant Detector Training HOG-based Pedestriant Detector Training evs embedded Vision Systems Srl c/o Computer Science Park, Strada Le Grazie, 15 Verona- Italy http: // www. embeddedvisionsystems. it Abstract This paper describes

More information

arxiv: v1 [cs.mm] 12 Jan 2016

arxiv: v1 [cs.mm] 12 Jan 2016 Learning Subclass Representations for Visually-varied Image Classification Xinchao Li, Peng Xu, Yue Shi, Martha Larson, Alan Hanjalic Multimedia Information Retrieval Lab, Delft University of Technology

More information

FaceNet. Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana

FaceNet. Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana FaceNet Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana Introduction FaceNet learns a mapping from face images to a compact Euclidean Space

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

Real-time convolutional networks for sonar image classification in low-power embedded systems

Real-time convolutional networks for sonar image classification in low-power embedded systems Real-time convolutional networks for sonar image classification in low-power embedded systems Matias Valdenegro-Toro Ocean Systems Laboratory - School of Engineering & Physical Sciences Heriot-Watt University,

More information

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images Marc Aurelio Ranzato Yann LeCun Courant Institute of Mathematical Sciences New York University - New York, NY 10003 Abstract

More information

Lecture 12 Recognition. Davide Scaramuzza

Lecture 12 Recognition. Davide Scaramuzza Lecture 12 Recognition Davide Scaramuzza Oral exam dates UZH January 19-20 ETH 30.01 to 9.02 2017 (schedule handled by ETH) Exam location Davide Scaramuzza s office: Andreasstrasse 15, 2.10, 8050 Zurich

More information

Visual Recognition and Search April 18, 2008 Joo Hyun Kim

Visual Recognition and Search April 18, 2008 Joo Hyun Kim Visual Recognition and Search April 18, 2008 Joo Hyun Kim Introduction Suppose a stranger in downtown with a tour guide book?? Austin, TX 2 Introduction Look at guide What s this? Found Name of place Where

More information

Where s Waldo? A Deep Learning approach to Template Matching

Where s Waldo? A Deep Learning approach to Template Matching Where s Waldo? A Deep Learning approach to Template Matching Thomas Hossler Department of Geological Sciences Stanford University thossler@stanford.edu Abstract We propose a new approach to Template Matching

More information

Journal of Asian Scientific Research FEATURES COMPOSITION FOR PROFICIENT AND REAL TIME RETRIEVAL IN CBIR SYSTEM. Tohid Sedghi

Journal of Asian Scientific Research FEATURES COMPOSITION FOR PROFICIENT AND REAL TIME RETRIEVAL IN CBIR SYSTEM. Tohid Sedghi Journal of Asian Scientific Research, 013, 3(1):68-74 Journal of Asian Scientific Research journal homepage: http://aessweb.com/journal-detail.php?id=5003 FEATURES COMPOSTON FOR PROFCENT AND REAL TME RETREVAL

More information

Person Re-identification for Improved Multi-person Multi-camera Tracking by Continuous Entity Association

Person Re-identification for Improved Multi-person Multi-camera Tracking by Continuous Entity Association Person Re-identification for Improved Multi-person Multi-camera Tracking by Continuous Entity Association Neeti Narayan, Nishant Sankaran, Devansh Arpit, Karthik Dantu, Srirangaraj Setlur, Venu Govindaraju

More information

Towards Life-Long Visual Localization using an Efficient Matching of Binary Sequences from Images

Towards Life-Long Visual Localization using an Efficient Matching of Binary Sequences from Images Towards Life-Long Visual Localization using an Efficient Matching of Binary Sequences from Images Roberto Arroyo 1, Pablo F. Alcantarilla 2, Luis M. Bergasa 1 and Eduardo Romera 1 Abstract Life-long visual

More information

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking Feature descriptors Alain Pagani Prof. Didier Stricker Computer Vision: Object and People Tracking 1 Overview Previous lectures: Feature extraction Today: Gradiant/edge Points (Kanade-Tomasi + Harris)

More information

Robust Visual Robot Localization Across Seasons using Network Flows

Robust Visual Robot Localization Across Seasons using Network Flows Robust Visual Robot Localization Across Seasons using Network Flows Tayyab Naseer University of Freiburg naseer@cs.uni-freiburg.de Luciano Spinello University of Freiburg spinello@cs.uni-freiburg.de Wolfram

More information