Acquiring semantically meaningful models for robotic localization, mapping and target recognition.

Size: px
Start display at page:

Download "Acquiring semantically meaningful models for robotic localization, mapping and target recognition."

Transcription

1 Acquiring semantically meaningful models for robotic localization, mapping and target recognition. Principal Investigators: J. Košecká Project Abstract The goal this proposal is to develop novel representations and techniques for localization, mapping and target recognition from videos of indoors and urban outdoors environments. The proposed techniques would facilitate enhanced navigation capabilities by means of visual sensing and enable scalable, long-term navigation and target detection and tracking in outdoors and indoors environments. The attained representations will also be applicable towards human-robot interaction, enhancement of human navigational and decision making capabilities and provide compact semantically meaningful summaries of the acquired sensory experience. The proposed representations will be governed by principles of compositionality, facilitate bottom-up learning, enable efficient inference and could be adapted to a task at hand. The main novelty of the proposed approach will be the use both 3D and 2D geometric and photometric cues computed either from video sequence or from novel RBG-D cameras, which provide synchronized video and range data at frame rate. Video poses challenges related to more extreme variations in viewpoint and scale, dramatic changes in lighting and large amount of clutter and occlusions, but also enables computation of 3D structure and motion cues, which can aid segmentation and recognition of object and non-object categories. The proposed work can be partitioned into three main research topics: 1. Development of novel matching and alignment strategies of RGB-D video streams, which facilitate robust localization, loop-closing and mapping of indoors environments. 2. Development of robust visual mapping and localization algorithm, which will enable semantic labeling of outdoors environments using photometric and geometric cues from video; 3. Development of novel features and representations for weakly supervised and self supervised strategies for learning models of objects and non-object categories from video using 3D geometric cues and photometric cues. 4. Development of active strategies for model acquisition and target detection strategies exploiting both geometric and photometric cues. The proposed techniques will be evaluated extensively in the context on unmanned mobile vehicles, where additional sensing and control authority is available as well as in the context of wearable computing, where the sole goal is that of model building.

2 Figure 1: Examples of (a) sparse point cloud representation; (b) original panorama and (c) dense piecewise planar reconstruction of an urban street. 1 Statement of Objectives 2 Research Challenges and the State of the Art JK... transition While maps comprised of dense or sparse cloud of points are often suitable for accurate localization, they are often insufficient for more advanced planning and reasoning tasks. Several recent efforts for building semantic maps were considered in robot mapping applications. In work of [?] the semantic labels were either associated with individual locations, such as kitchen, corridor, printer room or individual image regions [?]. In majority of the approaches the features used to infer different semantic categories were derived from both 3D range data and photometric cues. In outdoors settings the final semantic labelling problem has been formulated as MAP assignment of labels to image regions in Markov Random field framework [?]. The example labels include road, building, pedestrians, sky, and trees. Additional aspects of the proposed problems studied in isolation include label propagation strategies and use of sparse set of 3D features guide the semantic labeling [?,?,?]. In work of [?] authors demonstrated that even the use of simple geometric features can lead to successful semantic segmentation of various semantic categories. The scale of these experiments was rather small, they proceeded in a fully supervised setting and have been mostly applied in indoors setting. Alternative strategies for endowing environment with additional information about rooms, use directly the occupancy grids [?]. These approaches discard the original images after building a map, which we believe throws away many useful cues not captured in the map. In previous work [?] we have used the constraints of urban environments to reconstruct in the multi-view stereo setting the 3D models of challenging imagery, captured from an moving vehicle in unconstrained and difficult lighting conditions. The obtained reconstructions correctly captured piecewise planar structure of the environments, but were computationally expensive and did not attain more informative semantic labels (see Figure 1). Several authors have recently demonstrated impressive single view reconstruction systems. In [?] pose the problem as a multiclass segmentation problem, with labels corresponding to 3D geometry, while authors in [?] reason simultaneously about geometry and object labels.

3 Figure 2: a) example environment (b) most likely labelling (classes are color coded: car, bus, sidewalk, street, building) (c) partitioning image into superpixels consistent with occlusion boundaries. 2.1 Related Work on Target Recognition and Semantic Segmentation The second research theme will strive to associate more detailed semantic labels, with individual image regions and video sequence. This problem is referred to as problem of semantic segmentation and deals with issues of simultaneous segmentation and recognition of different object and non-object categories in the image. Example of desired semantic segmentation of street scene is in Figure 2.1. The interplay between processes of segmentation and recognition has gained increased momentum in the field of computer vision, due to recent advances in fully supervised object recognition strategies and well as improvements in fully unsupervised segmentation algorithms. Majority of the existing approaches proceed with object recognition, semantic parsing is a fully or weakly supervised setting, where full pixel-wise segmentations or bounding boxes are used to train discriminative classifiers, where majority of the datasets consider single background category or have images with small number of objects and clutter (e.g. VOC Pascal, MRSC21). Examples of approaches which consider the object and scene recognition simulatenously have been considered in [?], where output of sliding window object detectors was combined with traditional semantic segmentation strategies. Authors in [?] emphasized the importance of contextual relationships and demonstrated that in the presence of strong contextual cues, even weaker representations of objects are sufficient and not all object categories have to be sought for. Additional efforts related to discovery of geometric semantic labels and contextual cues from single image have been proposed by [?], demonstrating the role of contextual cues for object detection played by geometry. Extensive review of the use and the types of contextual relatinships can be found in [?]. The existing work on multi-class segmentation typically differs in the choice of elementary regions for which the labels are sought, the types of features which are used to characterize them, means of integrating the spatial information and techniques for learning and inference given the model. In [?,?,?,?] authors used larger windows or superpixels, which are characterized by specific features such as color, texture moments or histograms, dominant orientations, shape, where the likelihoods of the observations are typically obtained in discriminative setting. 2

4 Figure 3: Class-class co-occurrence matrices with example images for the 11-class CamVid and 21-class MSRC21 dataset. Rows and columns of the matrices correspond to class labels and the numbers stand for class-class co-occurrence frequency in the datasets. White color stands for zero occurrence. Notice the sparsity of the matrix for the MSRC dataset, meaning that usually only 2-5 objects are present in the image while in the CamVid usually all 11 objects appear simultaneously. On the other hand there are existing approaches to object detection and recognition, which consider the object recognition problem separately. The existing representations, datasets and strategies for evaluating these vary. The representations used for object detection and recognition and dominated by successful sliding-window detectors [?,?], which have been used with success in detection and recognition of objects which subtend rectangular window (e.g. faces, cars, pedestrians). The class of more structured models has been inspired by grammar based models, which generalize deformable parts models by representing objects using variable hierarchical structures. While foundations of these types of models have been laid out some time ago [?,?] in experimental settings, they have been typically outperformed by simpler models [?]. Recent advances for building more structured but at the same time effective models using mixtures of multi-scale deformable models have proposed in [?]. These models represent objects as collections of parts arranged in deformable configuration. Each part captures local appearance properties, while deformations are characterized by spring like connections between certain pairs of parts. 2.2 Challenges of Target Recognition and Semantic Segmentation The semantic segmentation as typically studied in computer vision uses fully supervised approach for learning models of desired categories and the existing approaches are evaluated on commonly used object detection and recognition or semantic segmentation datasets (e.g. PASCAL VOC, Microsoft MSRC21 dataset). The images in these data sets have mostly small number (2-5) of object/background classes in the image and there is typically one object in the center, which takes dominant portion of the image, see Figure??. 3 Proposed Research The common theme of the proposed research is to develop novel representations of objects and environments using photometric and geometric cues which will be suited for more robust and scalable localization, target detection and semantic mapping of unmanned mobile and aerial vehicles. The proposed representations, could be also utilized for better human-robot interaction and be used 3

5 GPS visual (a) (b) Figure 4: Pose estimation. (a) Trajectory estimated by our method from images (blue) and by GPS (green) visualized in the Google Maps by the GPSVisualizer. Our trajectory is put into the world coordinate system by aligning first two estimated poses in the sequence with the GPS. (b) Three images captured along the trajectory at places marked in the map to the left. as basic for development of tool for parsing and browsing large quantities of videos acquired in urban outdoors and indoors environments. Next we briefly outline some of our preliminary work and discuss in more details proposed directions of research, which tackle the challenges outlined in the previous section. 3.1 Preliminary work Localization and Mapping In our previous work we have developed several approaches for motion recovery and reconstruction of sparse set of 3D features as well as dense 3D multi-view reconstruction suitable for recovering geometry in difficult indoors and outdoors environments [?]. For the problem of visual odometry, we employed omnidirectional images and the techniques based on matching robust scale invariant features, followed by computation of epipolar geometry and batch-based non-linear optimization. The omnidirectional panoramas were composed from four perspective images covering in total 360 deg horizontally and 127 deg vertically. For pose estimation we used matching of SURF features [?] between each consecutive image pair along the sequence and exploited favorable properties of omnidirectional cameras for motion estimation. The spherical representation of the omnidirectional image allows us to construct corresponding 3D rays p,p for established tentative point matches u u. The tentative matches were validated through RANSAC-based epipolar geometry estimation formulated on their 3D rays, i.e. p Ep = 0, yielding thus the essential ma- 4

6 trix E = ˆTR [?]. Treating the images as being captured by a central omnidirectional camera was beneficial in many aspects. As the 3D rays are spatially well distributed and cover large part of a space, it results in very stable estimate of the essential matrix, studied in [?]. Moreover, improved convergence of RANSAC was achieved by rays sampled uniformly from each of four subparts of the panorama. The large field of view (FOV) especially contributed towards better distinguishing of a rotation and translation obtained from the essential matrix. The scale of the translation was estimated by a linear closed-form 1-point algorithm on corresponding 3D points triangulated by DLT [?] from the previous image pair and the current one. The strategy differed from a well known and often used technique [?] where full pose of the third camera, i.e. the translation, its scale, and the rotation, is estimated through the 3-point algorithm. Figure?? shows an example of the pose estimation in outdoors setting and is compared to GPS data. Notice that the GPS position can get biased or can contain small jumps, especially when satellites in one half of the hemisphere are occluded by nearby buildings. Furthermore the GPS itself does not provide rotation information unless it is equipped with a compass, making the visual odometry a suitable complement to the GPS. 3.2 Preliminary work on semantic labeling We have recently completed some preliminary work on semantic labeling of street scenes [?], where we explored the use on non-parametric methods based on large visual vocabularies of image features computed over small superpixels for representation and segmentation of object and background categories. We showed how the effort to capture the local spatial context of visual words is of fundamental importance, providing interesting insight into part based representation. These ingredients along with novel means of integrating spatial relationships were integrated in a probabilistic framework yielding a second-order Markov Random Field (MRF), where the final labeling is obtained as a MAP solution of the labels given an image. The labeling L was estimated as a Maximum Aposteriori Probability (MAP), argmax L P(L V) = argmaxp(v L)P(L). (1) L where L are the desired labels and V are photometric features. The above MAP inference problem can in turn be converted into following energy minimization problem: argmin L ( S E app + λ s i=1 (i,j) E E smooth ). (2) In our problem domain, we considered 11 labels of object and non-object categories (e.g. sky, road, building, tree, pavement, pedestrian, car, sign, column pole etc) and the experiments were carried out on the data set of 450 images of street scenes. The key challenge of the above formulation is the choice of the individual terms of the energy function and their spatial interactions. We propose to enhance the existing approaches in several ways. 5

7 Figure 5: (a) An example of street scene view and (b) sparse set of 3D points recovered from video of street scene. Even coarse triangulation reveals 3D structural properties of the scene; (c) coarse semantic segmentation into 11 different object and background categories; (d) preliminary results of our method. References [1] Special issue on vision. IEEE Transactions on Robotics, [2] Torralba A., Murphy K., and Freeman W. Contextual models for object detection using boosted random fields. In Advances in Neural Information Processing Systems 17, pages [3] H. Bay, A. Ess, T. Tuytelaars, and L.J. Van Gool. Speeded-up robust features (SURF). Computer Vision and Image Understanding (CVIU), 110(3): , [4] A. Berg, F. Grabler, and J. Malik. Parsing images of architectural scenes. In ICCV, [5] Tomas Brodsky, Cornelia Fermüller, and Yiannis Aloimonos. Directions of motion fields are hardly ever ambiguous. IJCV, 1(26):5 24, [6] G. Brostow, J. Fauqueur, and R. Cipolla. Semantic object classes in video: A high-definition ground truth database. Pattern Recognition Letters, 30(2):88 97, [7] G. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla. Segmentation and recognition using structure from motion point clouds. In ECCV, pages I: 44 57, [8] P. Buschka and A. Saffioti. A virtual sensor for room detection. In IEEE Intelligent Robots and Systems, [9] Canesta. In [10] M. Chandraker, J. Lim, and D. Kriegman. Moving in sterei: Efficient structure and motion using lines. In ICCV, [11] M. Cummins and P. Newman. Highly scalable appearance only slam - fab-map 2.0. In RSS, [12] A. Davison, I. Reid, N. Molton, and O. Stasse. Monoslam: Real-time single camera slam. IEEE Transactions on Pattern Recognition and Machine Intelligence, [13] J. Yuenn et al. Labelme video. In ICCV, [14] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. IEEE Transactions of Pattern Analysis and Machine Intelligence,

8 [15] C. Galleguillos and S. Belongie. Context based object categorization: A critical survey. Computer Vision and Image Understanding (CVIU), 114: , [16] S. Gould, R. Fulton, and D. Koller. Decomposing a scene into geometric and semantically consistent regions. In CVPR, [17] S. Gould, T. Gao, and D. Koller. Region-based segmentation and object detection. In NIPS, [18] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, second edition, [19] P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox. Rgb-d mapping: Using depth cameras for dense 3d modeling of indoor environments. In ISER. [20] D. Hoeim, A. Efros, and M. Hebert. Geometric context from single image. In ICCV, , 2 [21] Shotton J., Winn J., Rother C., and Criminisi A. Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. IJCV, 81(1):2 23, [22] G. Klein and D. Murray. Parallel tracking and mapping for small ar workspaces. In IEEE Int Symp on Mixed and Augmented Reality,, [23] K. Konolige. Projected texture stereo. In IEEE International Conference on Robotics and Automation. [24] K. Konolige, M. Colander, J. Bowman, P. Michelich, JD CHen, P. Fua, and V. Lepetit. View-based maps. In RSS, [25] J. Kosecka and W. Zhang. Video compass. In ECCV, [26] J. Leonard M. Bosse, R. Rikoski and S. Teller. Omnidirectional structure from motion using vanishing points and 3d lines. In Visual Computer, volume 19, [27] B. Mičušík and J. Košecká. Multi-view superpixel stereo in man-made environments. International Journal of Computer Vision, [28] B. Mičušík and J. Košecká. Piecewise planar city modeling from street view panoramic sequences. In CVPR, , 4 [29] B. Mičušík and J. Košecká. Semantic segmentation of street scenes by superpixel co-occurrence and 3d geometry. In IEEE Workshop on Video-Oriented Object and Event Classification, [30] Dalal N. and Triggs B. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition, [31] David Nistér. Preemptive RANSAC for live structure and motion estimation. Machine Vision Application (MVA), 16(5): , [32] Kohli P., Ladicky L., and Torr P. Robust higher order potentials for enforcing label consistency. In CVPR, [33] Ian Reid Paul Smith and Andrew J. Davison. Real-time monocular slam with straight lines. [34] M. Posner and P. Newman. Learning hierarchical models of scenes, objects, and parts. In Robotics and Autonomous Systems, pages 56(11): ,

9 [35] PrimeSense. In [36] A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie. Objects in context. In ICCV, [37] B. Russell, A. Torralba, C. Liu, and R. Fergus. Object recognition by scene alignment. In NIPS, [38] Gould S., Rodgers J., Cohen D., Elidan G., and Koller D. Multi-class segmentation with relative location prior. IJCV, 80(3): , [39] S. Savarese, J. Winn, and A. Criminisi. Discriminative object class models of appearance and shape by correlatons. In CVPR, pages , [40] S. Se, D. Lowe, and J. Little. Mobile robot localization and mapping with uncertainty using scaleinvariant features. International Journal of Robotics Research, pages , [41] G. Singh and J. Kosecka. Visual loop closing using gist descriptors in manhattan world. In Proc. of IEEE Int. Conf. on Robotics and Automation, Workshop on Omnidirectional Vision, [42] C. Stachniss, O. Martnez-Mozos, A. Rottmann, and W. Burgard. Semantic labelling of places. In International Symposium on Robotics Research, [43] E. Sudderth, A. Torralba, W. Freeman, and A. Wilsky. Learning hierarchical models of scenes, objects, and parts. In ICCV, [44] M. Tomono. Robust 3d slam with a stereo camera, based on an edge-point icp algorithm. In IEEE Int Conference on Robotics and Automation, [45] A. Vazquez-Reina, S. Avidan, H. Pfister, and E. Miller. Multiple hypothesis video segmentation from superpixel flows. In ECCV, [46] P. Viola and M. Jones. Robust real-time object detection. IJCV, 2(57): , [47] J. Xiao and L. Quan. Multiple view semantic segmentation for street view images. In ICCV, [48] Y. Yin and S. Geman. Context and hierarchy in a probabilistic image model. In CVPR, [49] S. Zhu and D. Mumford. A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2(4): ,

Acquiring Semantics Induced Topology in Urban Environments

Acquiring Semantics Induced Topology in Urban Environments 2012 IEEE International Conference on Robotics and Automation RiverCentre, Saint Paul, Minnesota, USA May 14-18, 2012 Acquiring Semantics Induced Topology in Urban Environments Gautam Singh and Jana Košecká

More information

Segmentation. Bottom up Segmentation Semantic Segmentation

Segmentation. Bottom up Segmentation Semantic Segmentation Segmentation Bottom up Segmentation Semantic Segmentation Semantic Labeling of Street Scenes Ground Truth Labels 11 classes, almost all occur simultaneously, large changes in viewpoint, scale sky, road,

More information

Semantic segmentation of street scenes by superpixel co-occurrence and 3D geometry

Semantic segmentation of street scenes by superpixel co-occurrence and 3D geometry Semantic segmentation of street scenes by superpixel co-occurrence and 3D geometry B. Micusik and J. Košecka Department of Computer Science George Mason University bmicusik,kosecka@gmu.edu Abstract We

More information

Geometry and Illumination Modelling for Scene Understanding

Geometry and Illumination Modelling for Scene Understanding Geometry and Illumination Modelling for Scene Understanding Principal Investigators: Jana Košecká and Dimitris Samaras Project Summary The goal this proposal is to develop unified framework for reasoning

More information

Exploiting Sparsity for Real Time Video Labelling

Exploiting Sparsity for Real Time Video Labelling 2013 IEEE International Conference on Computer Vision Workshops Exploiting Sparsity for Real Time Video Labelling Lachlan Horne, Jose M. Alvarez, and Nick Barnes College of Engineering and Computer Science,

More information

3D Spatial Layout Propagation in a Video Sequence

3D Spatial Layout Propagation in a Video Sequence 3D Spatial Layout Propagation in a Video Sequence Alejandro Rituerto 1, Roberto Manduchi 2, Ana C. Murillo 1 and J. J. Guerrero 1 arituerto@unizar.es, manduchi@soe.ucsc.edu, acm@unizar.es, and josechu.guerrero@unizar.es

More information

Weakly Supervised Labeling of Dominant Image Regions in Indoor Sequences

Weakly Supervised Labeling of Dominant Image Regions in Indoor Sequences Weakly Supervised Labeling of Dominant Image Regions in Indoor Sequences A.C. Murillo a J. Košecká b B. Micusik b C. Sagüés a J.J. Guerrero a a DIIS-I3A, University of Zaragoza, Spain. b Dpt. Computer

More information

Visual localization using global visual features and vanishing points

Visual localization using global visual features and vanishing points Visual localization using global visual features and vanishing points Olivier Saurer, Friedrich Fraundorfer, and Marc Pollefeys Computer Vision and Geometry Group, ETH Zürich, Switzerland {saurero,fraundorfer,marc.pollefeys}@inf.ethz.ch

More information

Viewpoint Invariant Features from Single Images Using 3D Geometry

Viewpoint Invariant Features from Single Images Using 3D Geometry Viewpoint Invariant Features from Single Images Using 3D Geometry Yanpeng Cao and John McDonald Department of Computer Science National University of Ireland, Maynooth, Ireland {y.cao,johnmcd}@cs.nuim.ie

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Combining Appearance and Structure from Motion Features for Road Scene Understanding

Combining Appearance and Structure from Motion Features for Road Scene Understanding STURGESS et al.: COMBINING APPEARANCE AND SFM FEATURES 1 Combining Appearance and Structure from Motion Features for Road Scene Understanding Paul Sturgess paul.sturgess@brookes.ac.uk Karteek Alahari karteek.alahari@brookes.ac.uk

More information

Joint Semantic and Geometric Segmentation of Videos with a Stage Model

Joint Semantic and Geometric Segmentation of Videos with a Stage Model Joint Semantic and Geometric Segmentation of Videos with a Stage Model Buyu Liu ANU and NICTA Canberra, ACT, Australia buyu.liu@anu.edu.au Xuming He NICTA and ANU Canberra, ACT, Australia xuming.he@nicta.com.au

More information

Ensemble of Bayesian Filters for Loop Closure Detection

Ensemble of Bayesian Filters for Loop Closure Detection Ensemble of Bayesian Filters for Loop Closure Detection Mohammad Omar Salameh, Azizi Abdullah, Shahnorbanun Sahran Pattern Recognition Research Group Center for Artificial Intelligence Faculty of Information

More information

Geometry and Illumination Modelling for Scene Understanding

Geometry and Illumination Modelling for Scene Understanding Geometry and Illumination Modelling for Scene Understanding Principal Investigators: Jana Košecká and Dimitris Samaras Project Summary The goal this proposal is to develop unified framework for reasoning

More information

Detecting Changes in Images of Street Scenes

Detecting Changes in Images of Street Scenes Detecting Changes in Images of Street Scenes Jana Košecka George Mason University, Fairfax, VA, USA Abstract. In this paper we propose an novel algorithm for detecting changes in street scenes when the

More information

Growing semantically meaningful models for visual SLAM

Growing semantically meaningful models for visual SLAM Growing semantically meaningful models for visual SLAM Alex Flint, Christopher Mei, Ian Reid, and David Murray Active Vision Lab Mobile Robotics Group Dept. Engineering Science University of Oxford, Parks

More information

Human Upper Body Pose Estimation in Static Images

Human Upper Body Pose Estimation in Static Images 1. Research Team Human Upper Body Pose Estimation in Static Images Project Leader: Graduate Students: Prof. Isaac Cohen, Computer Science Mun Wai Lee 2. Statement of Project Goals This goal of this project

More information

Sparse Point Cloud Densification by Using Redundant Semantic Information

Sparse Point Cloud Densification by Using Redundant Semantic Information Sparse Point Cloud Densification by Using Redundant Semantic Information Michael Hödlmoser CVL, Vienna University of Technology ken@caa.tuwien.ac.at Branislav Micusik AIT Austrian Institute of Technology

More information

Local cues and global constraints in image understanding

Local cues and global constraints in image understanding Local cues and global constraints in image understanding Olga Barinova Lomonosov Moscow State University *Many slides adopted from the courses of Anton Konushin Image understanding «To see means to know

More information

Context by Region Ancestry

Context by Region Ancestry Context by Region Ancestry Joseph J. Lim, Pablo Arbeláez, Chunhui Gu, and Jitendra Malik University of California, Berkeley - Berkeley, CA 94720 {lim,arbelaez,chunhui,malik}@eecs.berkeley.edu Abstract

More information

Object Detection by 3D Aspectlets and Occlusion Reasoning

Object Detection by 3D Aspectlets and Occlusion Reasoning Object Detection by 3D Aspectlets and Occlusion Reasoning Yu Xiang University of Michigan Silvio Savarese Stanford University In the 4th International IEEE Workshop on 3D Representation and Recognition

More information

Context. CS 554 Computer Vision Pinar Duygulu Bilkent University. (Source:Antonio Torralba, James Hays)

Context. CS 554 Computer Vision Pinar Duygulu Bilkent University. (Source:Antonio Torralba, James Hays) Context CS 554 Computer Vision Pinar Duygulu Bilkent University (Source:Antonio Torralba, James Hays) A computer vision goal Recognize many different objects under many viewing conditions in unconstrained

More information

Topological Mapping. Discrete Bayes Filter

Topological Mapping. Discrete Bayes Filter Topological Mapping Discrete Bayes Filter Vision Based Localization Given a image(s) acquired by moving camera determine the robot s location and pose? Towards localization without odometry What can be

More information

Contexts and 3D Scenes

Contexts and 3D Scenes Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project presentation Nov 30 th 3:30 PM 4:45 PM Grading Three senior graders (30%)

More information

A Hierarchical Compositional System for Rapid Object Detection

A Hierarchical Compositional System for Rapid Object Detection A Hierarchical Compositional System for Rapid Object Detection Long Zhu and Alan Yuille Department of Statistics University of California at Los Angeles Los Angeles, CA 90095 {lzhu,yuille}@stat.ucla.edu

More information

Gist vocabularies in omnidirectional images for appearance based mapping and localization

Gist vocabularies in omnidirectional images for appearance based mapping and localization Gist vocabularies in omnidirectional images for appearance based mapping and localization A. C. Murillo, P. Campos, J. Kosecka and J. J. Guerrero DIIS-I3A, University of Zaragoza, Spain Dept. of Computer

More information

Multi-Class Segmentation with Relative Location Prior

Multi-Class Segmentation with Relative Location Prior Multi-Class Segmentation with Relative Location Prior Stephen Gould, Jim Rodgers, David Cohen, Gal Elidan, Daphne Koller Department of Computer Science, Stanford University International Journal of Computer

More information

Segmentation and Tracking of Partial Planar Templates

Segmentation and Tracking of Partial Planar Templates Segmentation and Tracking of Partial Planar Templates Abdelsalam Masoud William Hoff Colorado School of Mines Colorado School of Mines Golden, CO 800 Golden, CO 800 amasoud@mines.edu whoff@mines.edu Abstract

More information

Visual Bearing-Only Simultaneous Localization and Mapping with Improved Feature Matching

Visual Bearing-Only Simultaneous Localization and Mapping with Improved Feature Matching Visual Bearing-Only Simultaneous Localization and Mapping with Improved Feature Matching Hauke Strasdat, Cyrill Stachniss, Maren Bennewitz, and Wolfram Burgard Computer Science Institute, University of

More information

Experiments in Place Recognition using Gist Panoramas

Experiments in Place Recognition using Gist Panoramas Experiments in Place Recognition using Gist Panoramas A. C. Murillo DIIS-I3A, University of Zaragoza, Spain. acm@unizar.es J. Kosecka Dept. of Computer Science, GMU, Fairfax, USA. kosecka@cs.gmu.edu Abstract

More information

Part-based models. Lecture 10

Part-based models. Lecture 10 Part-based models Lecture 10 Overview Representation Location Appearance Generative interpretation Learning Distance transforms Other approaches using parts Felzenszwalb, Girshick, McAllester, Ramanan

More information

Advanced Techniques for Mobile Robotics Bag-of-Words Models & Appearance-Based Mapping

Advanced Techniques for Mobile Robotics Bag-of-Words Models & Appearance-Based Mapping Advanced Techniques for Mobile Robotics Bag-of-Words Models & Appearance-Based Mapping Wolfram Burgard, Cyrill Stachniss, Kai Arras, Maren Bennewitz Motivation: Analogy to Documents O f a l l t h e s e

More information

LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS

LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS 8th International DAAAM Baltic Conference "INDUSTRIAL ENGINEERING - 19-21 April 2012, Tallinn, Estonia LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS Shvarts, D. & Tamre, M. Abstract: The

More information

A Hierarchical Conditional Random Field Model for Labeling and Segmenting Images of Street Scenes

A Hierarchical Conditional Random Field Model for Labeling and Segmenting Images of Street Scenes A Hierarchical Conditional Random Field Model for Labeling and Segmenting Images of Street Scenes Qixing Huang Stanford University huangqx@stanford.edu Mei Han Google Inc. meihan@google.com Bo Wu Google

More information

Supervised Label Transfer for Semantic Segmentation of Street Scenes

Supervised Label Transfer for Semantic Segmentation of Street Scenes Supervised Label Transfer for Semantic Segmentation of Street Scenes Honghui Zhang 1, Jianxiong Xiao 2, and Long Quan 1 1 The Hong Kong University of Science and Technology {honghui,quan}@cse.ust.hk 2

More information

Closing the Loop in Scene Interpretation

Closing the Loop in Scene Interpretation Closing the Loop in Scene Interpretation Derek Hoiem Beckman Institute University of Illinois dhoiem@uiuc.edu Alexei A. Efros Robotics Institute Carnegie Mellon University efros@cs.cmu.edu Martial Hebert

More information

Deformable Part Models

Deformable Part Models CS 1674: Intro to Computer Vision Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 9, 2016 Today: Object category detection Window-based approaches: Last time: Viola-Jones

More information

Visual Object Recognition

Visual Object Recognition Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe Computer Vision Laboratory ETH Zurich Chicago, 14.07.2008 & Kristen Grauman Department

More information

Contexts and 3D Scenes

Contexts and 3D Scenes Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project presentation Dec 1 st 3:30 PM 4:45 PM Goodwin Hall Atrium Grading Three

More information

Multi-view Stereo. Ivo Boyadzhiev CS7670: September 13, 2011

Multi-view Stereo. Ivo Boyadzhiev CS7670: September 13, 2011 Multi-view Stereo Ivo Boyadzhiev CS7670: September 13, 2011 What is stereo vision? Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape

More information

3D layout propagation to improve object recognition in egocentric videos

3D layout propagation to improve object recognition in egocentric videos 3D layout propagation to improve object recognition in egocentric videos Alejandro Rituerto, Ana C. Murillo and José J. Guerrero {arituerto,acm,josechu.guerrero}@unizar.es Instituto de Investigación en

More information

Local features and image matching. Prof. Xin Yang HUST

Local features and image matching. Prof. Xin Yang HUST Local features and image matching Prof. Xin Yang HUST Last time RANSAC for robust geometric transformation estimation Translation, Affine, Homography Image warping Given a 2D transformation T and a source

More information

Semantic Segmentation of Street-Side Images

Semantic Segmentation of Street-Side Images Semantic Segmentation of Street-Side Images Michal Recky 1, Franz Leberl 2 1 Institute for Computer Graphics and Vision Graz University of Technology recky@icg.tugraz.at 2 Institute for Computer Graphics

More information

Structured Models in. Dan Huttenlocher. June 2010

Structured Models in. Dan Huttenlocher. June 2010 Structured Models in Computer Vision i Dan Huttenlocher June 2010 Structured Models Problems where output variables are mutually dependent or constrained E.g., spatial or temporal relations Such dependencies

More information

Supervised learning. y = f(x) function

Supervised learning. y = f(x) function Supervised learning y = f(x) output prediction function Image feature Training: given a training set of labeled examples {(x 1,y 1 ),, (x N,y N )}, estimate the prediction function f by minimizing the

More information

Can Similar Scenes help Surface Layout Estimation?

Can Similar Scenes help Surface Layout Estimation? Can Similar Scenes help Surface Layout Estimation? Santosh K. Divvala, Alexei A. Efros, Martial Hebert Robotics Institute, Carnegie Mellon University. {santosh,efros,hebert}@cs.cmu.edu Abstract We describe

More information

Geometric Context from Videos

Geometric Context from Videos 2013 IEEE Conference on Computer Vision and Pattern Recognition Geometric Context from Videos S. Hussain Raza Matthias Grundmann Irfan Essa Georgia Institute of Technology, Atlanta, GA, USA http://www.cc.gatech.edu/cpl/projects/videogeometriccontext

More information

Multi-Class Image Labeling with Top-Down Segmentation and Generalized Robust P N Potentials

Multi-Class Image Labeling with Top-Down Segmentation and Generalized Robust P N Potentials FLOROS ET AL.: MULTI-CLASS IMAGE LABELING WITH TOP-DOWN SEGMENTATIONS 1 Multi-Class Image Labeling with Top-Down Segmentation and Generalized Robust P N Potentials Georgios Floros 1 floros@umic.rwth-aachen.de

More information

CS 558: Computer Vision 13 th Set of Notes

CS 558: Computer Vision 13 th Set of Notes CS 558: Computer Vision 13 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215 Overview Context and Spatial Layout

More information

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients ThreeDimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients Authors: Zhile Ren, Erik B. Sudderth Presented by: Shannon Kao, Max Wang October 19, 2016 Introduction Given an

More information

Nonparametric Semantic Segmentation for 3D Street Scenes

Nonparametric Semantic Segmentation for 3D Street Scenes 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) November 3-7, 2013. Tokyo, Japan Nonparametric Semantic Segmentation for 3D Street Scenes Hu He and Ben Upcroft Abstract

More information

Instance-level recognition part 2

Instance-level recognition part 2 Visual Recognition and Machine Learning Summer School Paris 2011 Instance-level recognition part 2 Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d Informatique,

More information

Decomposing a Scene into Geometric and Semantically Consistent Regions

Decomposing a Scene into Geometric and Semantically Consistent Regions Decomposing a Scene into Geometric and Semantically Consistent Regions Stephen Gould sgould@stanford.edu Richard Fulton rafulton@cs.stanford.edu Daphne Koller koller@cs.stanford.edu IEEE International

More information

IDE-3D: Predicting Indoor Depth Utilizing Geometric and Monocular Cues

IDE-3D: Predicting Indoor Depth Utilizing Geometric and Monocular Cues 2016 International Conference on Computational Science and Computational Intelligence IDE-3D: Predicting Indoor Depth Utilizing Geometric and Monocular Cues Taylor Ripke Department of Computer Science

More information

Methods for Representing and Recognizing 3D objects

Methods for Representing and Recognizing 3D objects Methods for Representing and Recognizing 3D objects part 1 Silvio Savarese University of Michigan at Ann Arbor Object: Building, 45º pose, 8-10 meters away Object: Person, back; 1-2 meters away Object:

More information

Local Features and Bag of Words Models

Local Features and Bag of Words Models 10/14/11 Local Features and Bag of Words Models Computer Vision CS 143, Brown James Hays Slides from Svetlana Lazebnik, Derek Hoiem, Antonio Torralba, David Lowe, Fei Fei Li and others Computer Engineering

More information

Automatic Photo Popup

Automatic Photo Popup Automatic Photo Popup Derek Hoiem Alexei A. Efros Martial Hebert Carnegie Mellon University What Is Automatic Photo Popup Introduction Creating 3D models from images is a complex process Time-consuming

More information

Semantic Parsing of Street Scene Images Using 3D LiDAR Point Cloud

Semantic Parsing of Street Scene Images Using 3D LiDAR Point Cloud 2013 IEEE International Conference on Computer Vision Workshops Semantic Parsing of Street Scene Images Using 3D LiDAR Point Cloud Pouria Babahajiani Tampere University of Technology Tampere, Finland pouria.babahajiani@tut.fi

More information

Selection of Scale-Invariant Parts for Object Class Recognition

Selection of Scale-Invariant Parts for Object Class Recognition Selection of Scale-Invariant Parts for Object Class Recognition Gy. Dorkó and C. Schmid INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 3833 Montbonnot, France fdorko,schmidg@inrialpes.fr Abstract

More information

Scale and Rotation Invariant Color Features for Weakly-Supervised Object Learning in 3D Space

Scale and Rotation Invariant Color Features for Weakly-Supervised Object Learning in 3D Space Scale and Rotation Invariant Color Features for Weakly-Supervised Object Learning in 3D Space Asako Kanezaki Tatsuya Harada Yasuo Kuniyoshi Graduate School of Information Science and Technology, The University

More information

Nonrigid Surface Modelling. and Fast Recovery. Department of Computer Science and Engineering. Committee: Prof. Leo J. Jia and Prof. K. H.

Nonrigid Surface Modelling. and Fast Recovery. Department of Computer Science and Engineering. Committee: Prof. Leo J. Jia and Prof. K. H. Nonrigid Surface Modelling and Fast Recovery Zhu Jianke Supervisor: Prof. Michael R. Lyu Committee: Prof. Leo J. Jia and Prof. K. H. Wong Department of Computer Science and Engineering May 11, 2007 1 2

More information

Probabilistic Location Recognition using Reduced Feature Set

Probabilistic Location Recognition using Reduced Feature Set Probabilistic Location Recognition using Reduced Feature Set Fayin Li and Jana Košecá Department of Computer Science George Mason University, Fairfax, VA 3 Email: {fli,oseca}@cs.gmu.edu Abstract The localization

More information

AR Cultural Heritage Reconstruction Based on Feature Landmark Database Constructed by Using Omnidirectional Range Sensor

AR Cultural Heritage Reconstruction Based on Feature Landmark Database Constructed by Using Omnidirectional Range Sensor AR Cultural Heritage Reconstruction Based on Feature Landmark Database Constructed by Using Omnidirectional Range Sensor Takafumi Taketomi, Tomokazu Sato, and Naokazu Yokoya Graduate School of Information

More information

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009 Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context

More information

Category-level localization

Category-level localization Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object

More information

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14 Announcements Computer Vision I CSE 152 Lecture 14 Homework 3 is due May 18, 11:59 PM Reading: Chapter 15: Learning to Classify Chapter 16: Classifying Images Chapter 17: Detecting Objects in Images Given

More information

SuperParsing: Scalable Nonparametric Image Parsing with Superpixels

SuperParsing: Scalable Nonparametric Image Parsing with Superpixels SuperParsing: Scalable Nonparametric Image Parsing with Superpixels Joseph Tighe and Svetlana Lazebnik Dept. of Computer Science, University of North Carolina at Chapel Hill Chapel Hill, NC 27599-3175

More information

EE290T : 3D Reconstruction and Recognition

EE290T : 3D Reconstruction and Recognition EE290T : 3D Reconstruction and Recognition Acknowledgement Courtesy of Prof. Silvio Savarese. Introduction There was a table set out under a tree in front of the house, and the March Hare and the Hatter

More information

EECS 442 Computer vision. 3D Object Recognition and. Scene Understanding

EECS 442 Computer vision. 3D Object Recognition and. Scene Understanding EECS 442 Computer vision 3D Object Recognition and Scene Understanding Interpreting the visual world Object: Building 8-10 meters away Object: Traffic light Object: Car, ¾ view 2-3 meters away How can

More information

Simultaneous Multi-class Pixel Labeling over Coherent Image Sets

Simultaneous Multi-class Pixel Labeling over Coherent Image Sets Simultaneous Multi-class Pixel Labeling over Coherent Image Sets Paul Rivera Research School of Computer Science Australian National University Canberra, ACT 0200 Stephen Gould Research School of Computer

More information

Using the Forest to See the Trees: Context-based Object Recognition

Using the Forest to See the Trees: Context-based Object Recognition Using the Forest to See the Trees: Context-based Object Recognition Bill Freeman Joint work with Antonio Torralba and Kevin Murphy Computer Science and Artificial Intelligence Laboratory MIT A computer

More information

Flow Estimation. Min Bai. February 8, University of Toronto. Min Bai (UofT) Flow Estimation February 8, / 47

Flow Estimation. Min Bai. February 8, University of Toronto. Min Bai (UofT) Flow Estimation February 8, / 47 Flow Estimation Min Bai University of Toronto February 8, 2016 Min Bai (UofT) Flow Estimation February 8, 2016 1 / 47 Outline Optical Flow - Continued Min Bai (UofT) Flow Estimation February 8, 2016 2

More information

Joint Inference in Image Databases via Dense Correspondence. Michael Rubinstein MIT CSAIL (while interning at Microsoft Research)

Joint Inference in Image Databases via Dense Correspondence. Michael Rubinstein MIT CSAIL (while interning at Microsoft Research) Joint Inference in Image Databases via Dense Correspondence Michael Rubinstein MIT CSAIL (while interning at Microsoft Research) My work Throughout the year (and my PhD thesis): Temporal Video Analysis

More information

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Adding spatial information Forming vocabularies from pairs of nearby features doublets

More information

Removing Moving Objects from Point Cloud Scenes

Removing Moving Objects from Point Cloud Scenes Removing Moving Objects from Point Cloud Scenes Krystof Litomisky and Bir Bhanu University of California, Riverside krystof@litomisky.com, bhanu@ee.ucr.edu Abstract. Three-dimensional simultaneous localization

More information

Revisiting 3D Geometric Models for Accurate Object Shape and Pose

Revisiting 3D Geometric Models for Accurate Object Shape and Pose Revisiting 3D Geometric Models for Accurate Object Shape and Pose M. 1 Michael Stark 2,3 Bernt Schiele 3 Konrad Schindler 1 1 Photogrammetry and Remote Sensing Laboratory Swiss Federal Institute of Technology

More information

Visual Recognition and Search April 18, 2008 Joo Hyun Kim

Visual Recognition and Search April 18, 2008 Joo Hyun Kim Visual Recognition and Search April 18, 2008 Joo Hyun Kim Introduction Suppose a stranger in downtown with a tour guide book?? Austin, TX 2 Introduction Look at guide What s this? Found Name of place Where

More information

Line Image Signature for Scene Understanding with a Wearable Vision System

Line Image Signature for Scene Understanding with a Wearable Vision System Line Image Signature for Scene Understanding with a Wearable Vision System Alejandro Rituerto DIIS - I3A, University of Zaragoza, Spain arituerto@unizar.es Ana C. Murillo DIIS - I3A, University of Zaragoza,

More information

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Estimating Human Pose in Images. Navraj Singh December 11, 2009 Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks

More information

Correcting User Guided Image Segmentation

Correcting User Guided Image Segmentation Correcting User Guided Image Segmentation Garrett Bernstein (gsb29) Karen Ho (ksh33) Advanced Machine Learning: CS 6780 Abstract We tackle the problem of segmenting an image into planes given user input.

More information

Part based models for recognition. Kristen Grauman

Part based models for recognition. Kristen Grauman Part based models for recognition Kristen Grauman UT Austin Limitations of window-based models Not all objects are box-shaped Assuming specific 2d view of object Local components themselves do not necessarily

More information

Object and Class Recognition I:

Object and Class Recognition I: Object and Class Recognition I: Object Recognition Lectures 10 Sources ICCV 2005 short courses Li Fei-Fei (UIUC), Rob Fergus (Oxford-MIT), Antonio Torralba (MIT) http://people.csail.mit.edu/torralba/iccv2005

More information

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011 Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition

More information

arxiv: v1 [cs.cv] 28 Sep 2018

arxiv: v1 [cs.cv] 28 Sep 2018 Extrinsic camera calibration method and its performance evaluation Jacek Komorowski 1 and Przemyslaw Rokita 2 arxiv:1809.11073v1 [cs.cv] 28 Sep 2018 1 Maria Curie Sklodowska University Lublin, Poland jacek.komorowski@gmail.com

More information

Optimizing Monocular Cues for Depth Estimation from Indoor Images

Optimizing Monocular Cues for Depth Estimation from Indoor Images Optimizing Monocular Cues for Depth Estimation from Indoor Images Aditya Venkatraman 1, Sheetal Mahadik 2 1, 2 Department of Electronics and Telecommunication, ST Francis Institute of Technology, Mumbai,

More information

DEPTH AND GEOMETRY FROM A SINGLE 2D IMAGE USING TRIANGULATION

DEPTH AND GEOMETRY FROM A SINGLE 2D IMAGE USING TRIANGULATION 2012 IEEE International Conference on Multimedia and Expo Workshops DEPTH AND GEOMETRY FROM A SINGLE 2D IMAGE USING TRIANGULATION Yasir Salih and Aamir S. Malik, Senior Member IEEE Centre for Intelligent

More information

Instance-level recognition II.

Instance-level recognition II. Reconnaissance d objets et vision artificielle 2010 Instance-level recognition II. Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d Informatique, Ecole Normale

More information

High Level Computer Vision

High Level Computer Vision High Level Computer Vision Part-Based Models for Object Class Recognition Part 2 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de http://www.d2.mpi-inf.mpg.de/cv Please Note No

More information

Spatially Constrained Location Prior for Scene Parsing

Spatially Constrained Location Prior for Scene Parsing Spatially Constrained Location Prior for Scene Parsing Ligang Zhang, Brijesh Verma, David Stockwell, Sujan Chowdhury Centre for Intelligent Systems School of Engineering and Technology, Central Queensland

More information

Orientation-Aware Scene Understanding for Mobile Cameras

Orientation-Aware Scene Understanding for Mobile Cameras Orientation-Aware Scene Understanding for Mobile Cameras Jing Wang Georgia Inst. of Technology Atlanta, Georgia, USA jwang302@gatech.edu Grant Schindler Georgia Inst. of Technology Atlanta, Georgia, USA

More information

LEARNING BOUNDARIES WITH COLOR AND DEPTH. Zhaoyin Jia, Andrew Gallagher, Tsuhan Chen

LEARNING BOUNDARIES WITH COLOR AND DEPTH. Zhaoyin Jia, Andrew Gallagher, Tsuhan Chen LEARNING BOUNDARIES WITH COLOR AND DEPTH Zhaoyin Jia, Andrew Gallagher, Tsuhan Chen School of Electrical and Computer Engineering, Cornell University ABSTRACT To enable high-level understanding of a scene,

More information

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce Object Recognition Computer Vision Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce How many visual object categories are there? Biederman 1987 ANIMALS PLANTS OBJECTS

More information

CEng Computational Vision

CEng Computational Vision CEng 583 - Computational Vision 2011-2012 Spring Week 4 18 th of March, 2011 Today 3D Vision Binocular (Multi-view) cues: Stereopsis Motion Monocular cues Shading Texture Familiar size etc. "God must

More information

Lecture 12 Recognition

Lecture 12 Recognition Institute of Informatics Institute of Neuroinformatics Lecture 12 Recognition Davide Scaramuzza 1 Lab exercise today replaced by Deep Learning Tutorial Room ETH HG E 1.1 from 13:15 to 15:00 Optional lab

More information

Jakob Engel, Thomas Schöps, Daniel Cremers Technical University Munich. LSD-SLAM: Large-Scale Direct Monocular SLAM

Jakob Engel, Thomas Schöps, Daniel Cremers Technical University Munich. LSD-SLAM: Large-Scale Direct Monocular SLAM Computer Vision Group Technical University of Munich Jakob Engel LSD-SLAM: Large-Scale Direct Monocular SLAM Jakob Engel, Thomas Schöps, Daniel Cremers Technical University Munich Monocular Video Engel,

More information

Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model

Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model Johnson Hsieh (johnsonhsieh@gmail.com), Alexander Chia (alexchia@stanford.edu) Abstract -- Object occlusion presents a major

More information

arxiv: v1 [cs.cv] 1 Aug 2017

arxiv: v1 [cs.cv] 1 Aug 2017 Dense Piecewise Planar RGB-D SLAM for Indoor Environments Phi-Hung Le and Jana Kosecka arxiv:1708.00514v1 [cs.cv] 1 Aug 2017 Abstract The paper exploits weak Manhattan constraints to parse the structure

More information

Constructing Implicit 3D Shape Models for Pose Estimation

Constructing Implicit 3D Shape Models for Pose Estimation Constructing Implicit 3D Shape Models for Pose Estimation Mica Arie-Nachimson Ronen Basri Dept. of Computer Science and Applied Math. Weizmann Institute of Science Rehovot 76100, Israel Abstract We present

More information

Robot Localization based on Geo-referenced Images and G raphic Methods

Robot Localization based on Geo-referenced Images and G raphic Methods Robot Localization based on Geo-referenced Images and G raphic Methods Sid Ahmed Berrabah Mechanical Department, Royal Military School, Belgium, sidahmed.berrabah@rma.ac.be Janusz Bedkowski, Łukasz Lubasiński,

More information

Semantic Video Segmentation From Occlusion Relations Within a Convex Optimization Framework

Semantic Video Segmentation From Occlusion Relations Within a Convex Optimization Framework Semantic Video Segmentation From Occlusion Relations Within a Convex Optimization Framework Brian Taylor, Alper Ayvaci, Avinash Ravichandran, and Stefano Soatto University of California, Los Angeles Honda

More information