Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning
|
|
- Elmer Edwards
- 6 years ago
- Views:
Transcription
1 Allan Zelener Dissertation Proposal December 12 th 2016 Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning
2 Overview 1. Introduction to 3D Object Identification 2. Completed Work Part-based Object Classification of Vehicle Point Clouds. CNN-based Object Segmentation in LIDAR with Missing Points. 3. Proposed Work Joint localization, segmentation, classification, and 3D pose estimation. Depth-sensitive localization. Depth-sensitive subpixel methods for segmentation. Spatial transformers for pose estimation. Domain adaptation and shape completion from synthetic data. Timeline for completion.
3 Identifying 3D Objects Real world objects have a 3D shape and a position in a 3D scene. Objects may be oriented with respect to some reference pose. These object properties are associated with their semantic class.
4 Identifying 3D Objects
5 Identifying Objects in 2D Images Fei-Fei, Karpathy, Johnson (
6 Identifying 3D Objects in 2D Images 3D oriented CAD models mapped to 2D image regions. Approximate 3D shape based on selected models. Relative 3D position and scale may still be ambiguous. Visual perspective cues required to estimate object properties. Yu et al., ObjectNet3D: A Large Scale Database for 3D Object Recognition
7 Identifying 3D Objects in 3D Images 3D sensors provide accurate pointwise depth measurements. Object position and scale can be determined from a single 3D image. Song et al., SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite
8 Challenges in 3D Images Manual Labeling of 3D Point Cloud Missing measurements due to sensor properties. Partial 3D data based on limited viewpoints. Difficult large-scale annotation compared to 2D images. Feature representations for 3D properties.
9 Completed Work Classification of Vehicle Parts in Unstructured 3D Point Clouds RANSAC point clustering for planar parts. Part-based structured model for classifying parts and overall object class. Classification of Vehicle Parts in Unstructured 3D Point Clouds, Zelener, Mordohai, and Stamos, 3DV, 2014.
10 Local Feature Extraction Density weighted spin images. Dense sampling of keypoints on a uniformly spaced voxel grid. Normals oriented away from center of object centroid. K-means clustering to generate bag-of-words codebook. Baseline object descriptor is normalized count vector of codebook features. K-Means Spin Image Codebook k = 50
11 Automatic Part Segmentation Iterative RANSAC plane fitting. Candidate planes from faces of convex hull. Robust re-estimation of planes using PCA. For vehicles, five planar parts cover most of the surface Convex Hull Examples Colored by Segmentation Order
12 Part-Level Features Spin image bag-of-words. Average height ഥh. Horizontal/vertical indicator I n = 0, if nt z > cos π 4 1, otherwise Mean, median, and max of plane fit errors. Eigenvalues from plane fitting λ 1, λ 2, λ 3 (in descending order). Linearity (λ 1 λ 2 ) and Planarity λ 2 λ 3.
13 Pairwise Part Features Dot product of normals, n 1 T n 2 Absolute difference in average heights, h 1 h 2 Distance between centroids, c 1 c 2 Closest distance between points, min i P 1,j P 2 p 1,i p 2,j Coplanarity as mean, median, and max cross-plane fit errors.
14 Structured Part Modeling Generalized HMM as sequence of parts and final class variable. Trained discriminatively by structured averaged perceptron. Parts reordered in sequence based on I(n) and average height. a 1 a 2 a n c x 1 x 2 x n x 1 x 2 x n
15 Experimental Results for Part Classification Evaluation on Ottawa dataset with 155 sedans and 67 SUVs. Structured part modeling provides increased performance for part classification. Manual segmentation provides increase for classification of all parts per object. Part Classification Comparison
16 Experimental Results for Object Classification SP gives significant gains over baseline perceptron model. Manual segmentation with SP exceeds unstructured baselines. Sedan vs SUV Object Classification No Part Segmentation Part Segmentation
17 Comparison Between Automatic and Manual Segmentation Under-segmentation from unbounded plane fitting. Merged semantic part classes like roof-hood and roof-trunk. Inconsistent labeling behavior at boundaries and noisy points. Automatic Manual
18 Conclusions for Part-based Classification PROS RANSAC segmentation is robust to many complexities of 3D data. Structured part-based method shows improvement over bagof-words with local features. Pairwise features based on geometric properties improve classification performance. CONS RANSAC segmentation is not equivalent to semantic segmentation. Labeling ground truth parts for every possible object class may be infeasible. RANSAC segmentation, features, and structure model are determined before training the classifier.
19 CNN-Based Object Segmentation Segmentation on LIDAR scanning grid with missing points. CNN training procedure for LIDAR data. CNN-based features extracted from small set of initial feature maps for 3D images. CNN-Based Object Segmentation in Urban LIDAR with Missing Points, Zelener and Stamos, 3DV, 2016.
20 Missing Points in LIDAR Contiguous LIDAR scanlines form 2.5D grid of scanner measurements. Laser reflection causes missing points on objects in the grid. We can label and infer over these positions. Missing Points in Gray on Scanning Grid Missing Points on Vehicles are Labeled
21 Preprocessing Pipeline Sample positive and negative locations in large LIDAR scene piece. Extract M M patch as input to CNN. Predict labels for central K K region, K M. (M = 64, K = 8)
22 Initial Feature Maps Compute normalized feature maps from 3D points in M M patch. Assume ~N(0,1) truncated to [ 6, 6] within each patch. Missing data given max value (6) in clip range. 6 0 Relative Depth Relative Height -6
23 Initial Feature Maps Angle and missing mask describe sensor properties. Angle normalized as before and missing mask in {0,1} Angle Missing Mask 0
24 Initial Feature Maps Signed Angle from Hadjiliadis and Stamos. 3DPVT z v 2 p Signed Angle v 1 SignedAngle p 2 p 1 p 2 Scanning Direction = acos( zƹ v 2 ) sgn v 1 v 2 Horizontal surfaces at 90 degrees. Vertical surfaces at 0 degrees. Sharp changes yield negative sign.
25 L x, y Model Overview Baseline CNN architecture. ReLU nonlinear activation functions. L2-regularization on affine layers. Dropout regularization on final layer. Predict binary label for each point in the K K target. Total model loss is = K 2 [y k log p k + (1 y k )log (1 p k )] + λ k=1 Binary Cross Entropy L 2 l=1 W l 2 2 L2-Regularization Input Patch Conv 5 5 Conv 5 5 Affine Affine Output Labels (64, 64, 5) (32, 32, 32) (16, 16, 64) (= K 2 )
26 Results from Vehicle Point Detection using CNN [patch size = 64 x 64, target size = 8 x 8] nyc_0 (in-sample) test piece nyc_1 test piece True Positive Yellow True Negative Dark Blue False Positive Cyan False Negative Orange
27 True Positive Yellow True Negative Dark Blue Nyc_0 (In-sample) Test Recall.85, Precision.73 False Positive Cyan False Negative Orange
28 True Positive Yellow True Negative Dark Blue Nyc_1 Test Recall.85, Precision.73 False Positive Cyan False Negative Orange
29 Experimental Results Input Feature Map Comparison D Depth, H Height, A Angle, S Signed Angle, M - Missing Mask
30 Impact of Using Missing Point Labels DHASM with Missing Point Labels DHASM with No Missing Point Labels Training with missing point labels improves precision. Missing point labels allow for complete segmentation.
31 Experimental Results Use of Missing Point Labels NML No Missing Labels
32 Conclusions for CNN-Based Segmentation CNN for LIDAR learned using a sampling based training pipeline. We can predict class labels over missing points in LIDAR. Incorporating missing points improves precision. Input feature maps that describe 3D shape and sensor properties have a significant effect on performance.
33 Proposed Work Extend CNN model to multiclass object localization, segmentation, classification, and pose estimation in 3D images. Examine design and structure of CNN components for 3D images: Depth-sensitive localization. Depth-subpixel methods for segmentation. Spatial transformer for pose estimation. Utilize domain adaptation from synthetic data for auxiliary training data and missing point reconstruction.
34 Novelty of Proposed Work Multi-task model for all tasks. Previous models only address up to three of the proposed tasks. Addition of 3D object pose estimation. Improve performance on all tasks by integrating algorithms of current state-of-the-art techniques for the domain of 3D objects. Balance between 2.5D image and 3D voxel representation. Incorporation of additional datasets. Comparison across urban LIDAR and indoor RGB-D domains. Missing point estimation from synthetic data or multi-view reconstruction. Domain adaptation from synthetic datasets.
35 2D Object Localization in LIDAR (In Progress) Preliminary results at 0.8 confidence threshold. Based on YOLO single-shot architecture. Can be used for region proposal or extended to 3D bounding boxes.
36 Google Street View Dataset Ground Truth Pose Labeling Automatic fit of bounding boxes PCA to fit non-axis aligned boxes Manual tool to (a) select front face (different color) for orientation (default is selected automatically) (b) change size/position/orientation of boxes in case of incomplete objects
37 Multi-task Model for Object Identification Shared representation can be applicable for multiple tasks. Tasks: Object localization, segmentation, classification, and pose estimation. Error signal for each task trains weights for shared representation. Source: Dai et al., Instance-aware Semantic Segmentation via Multi-task Network Cascades
38 Multi-task Model for Object Identification Straightforward extension to orientation estimation. Assume objects are upright, estimate rotation about gravity axis. oriented Source: Dai et al., Instance-aware Semantic Segmentation via Multi-task Network Cascades
39 Localization for 3D Objects in Voxel Space 3D voxel input representation (TSDF). Voxel gives relative position, anchor box gives shape prior. Network estimates adjustments for box position and dimensions. Source: Song and Xiao, Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images
40 Depth-Sensitive Localization We aim to maintain a non-volumetric 2.5D input representation. Partition viewing volume and consider localization in depth slices. z 4 z 3 z 2 z 1 2.5D Input 2D Conv (W, H, F in ) a z a x a y W H b x = x i + dx b y = y i + dy b Ƹ z = z i + dz Conv 3D Box (X, Y, Z A 6) b width = a x sx b height = a y sy b depth = a z sz
41 Subpixel Convolutions Pooled CNN features can still encode higher resolution information. Upscale back through deconvolution or subpixel convolution. Used in state-of-the-art segmentation networks. Padded Image Zero-padded Sub-Pixel Image Subpixel Filter Filter Activations Source: Shi et al., Is the deconvolution layer the same as a convolutional layer?
42 Subpixel Convolutions Independent subpixel filter weights can be separated. All convolutions are in low resolution then interleaved to upsample at the end of the network. Padded Image Separate Filters Filter Activations Combined Filter Activations Source: Shi et al., Is the deconvolution layer the same as a convolutional layer?
43 Source: Dai et al., R-FCN: Object Detection via Region-based Fully Convolutional Networks Position-sensitive Score Maps Subpixel-like features can be specialized for a given task.
44 Depth-sensitive Score Maps We can extend this approach to be depth-sensitive. conv k 3 (C + 1) conv pool k vote C + 1 feature maps Top-left-back, Top-left-center, Bottom-right-center, Bottom-right-front. k k = C + 1
45 Spatial Transformers for Pose Estimation General method for parameterized transforms between feature maps. Interpolation of transformed sampling grid. Estimated transformation is related to 3D object pose.
46 Complete Model Sketch ROI pooling and spatial transformer conv down convs depth-sensitive segmentation, classification, pose estimation 2.5D image feature maps shared feature maps multi-scale depth-sensitive localization O
47 Timeline for Completion December 2016 Select and prepare new datasets for experiments. Annotate Street View dataset with object bounding boxes. Extend current localization and segmentation implementations for baselines. Begin implementation of classification and pose estimation baselines. January 2017 Complete implementation of baseline models and begin training models for evaluation on a chosen dataset. Implement baseline multi-task model.
48 Timeline for Completion February 2017 Begin some experiments with architectures using: Depth-sensitive localization. Depth-sensitive subpixel convolution for segmentation. 3D object pose estimation with spatial transformers. March 2017 Prepare paper for ICCV 2017 submission including experiments on: Multi-task learning for 3D object identification. One of the proposed depth-sensitive experimental architectures. Consider additional experiments on domain adaptation and missing point reconstruction.
49 Timeline for Completion April 2017 Dissertation writing. Continuation of experiments. May 2017 Dissertation defense. Prepare paper submission to 3DV 2017 containing additional experiments.
50 Additional Slides
51 Google Street View Dataset Google R5 Street View Dataset All but two pieces of NYC 0 used for training. Remaining runs used for evaluation.
52 KITTI Dataset 3D bounding boxes for vehicles, cyclists, and pedestrians in LIDAR. Precise segmentation labels not included in benchmark.
53 Synthia Dataset Synthetic urban scenes for simulated RGB-D scans. Exact labels for semantic segmentation but 3D poses are not given. Domain adaptation required for effective use on real-world data. Missing point reconstruction task can be simulated.
54 Indoor RGB-D Datasets SUN RGB-D and SceneNN. Class, segmentation, and oriented 3D bounding boxes included. Reconstructed shape can be used for missing points.
55 Assumptions for Proposed Work Single 3D image from LIDAR sensor sweep or RGB-D camera. Excludes video, multiview registration, and volumetric sensors. Possible shape completion only for missing (non-occluded) scan points. Excluding complete volumetric shape reconstruction and database matching. Hua et al., SceneNN: A Scene Meshes Dataset with annotations Wu et al., 3D ShapeNets: A Deep Representation for Volumetric Shapes
Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning
Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning by Allan Zelener A dissertation proposal submitted to the Graduate Faculty in Computer Science in
More informationThree-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients
ThreeDimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients Authors: Zhile Ren, Erik B. Sudderth Presented by: Shannon Kao, Max Wang October 19, 2016 Introduction Given an
More informationMulti-View 3D Object Detection Network for Autonomous Driving
Multi-View 3D Object Detection Network for Autonomous Driving Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, Tian Xia CVPR 2017 (Spotlight) Presented By: Jason Ku Overview Motivation Dataset Network Architecture
More informationLearning from 3D Data
Learning from 3D Data Thomas Funkhouser Princeton University* * On sabbatical at Stanford and Google Disclaimer: I am talking about the work of these people Shuran Song Andy Zeng Fisher Yu Yinda Zhang
More informationRobotics Programming Laboratory
Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car
More informationMask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma
Mask R-CNN presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN Background Related Work Architecture Experiment Mask R-CNN Background Related Work Architecture Experiment Background From left
More informationECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016
ECCV 2016 Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 Fundamental Question What is a good vector representation of an object? Something that can be easily predicted from 2D
More informationColored Point Cloud Registration Revisited Supplementary Material
Colored Point Cloud Registration Revisited Supplementary Material Jaesik Park Qian-Yi Zhou Vladlen Koltun Intel Labs A. RGB-D Image Alignment Section introduced a joint photometric and geometric objective
More informationCS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep
CS395T paper review Indoor Segmentation and Support Inference from RGBD Images Chao Jia Sep 28 2012 Introduction What do we want -- Indoor scene parsing Segmentation and labeling Support relationships
More informationObject Detection on Self-Driving Cars in China. Lingyun Li
Object Detection on Self-Driving Cars in China Lingyun Li Introduction Motivation: Perception is the key of self-driving cars Data set: 10000 images with annotation 2000 images without annotation (not
More information3D object recognition used by team robotto
3D object recognition used by team robotto Workshop Juliane Hoebel February 1, 2016 Faculty of Computer Science, Otto-von-Guericke University Magdeburg Content 1. Introduction 2. Depth sensor 3. 3D object
More informationFaster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects
More informationPerceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington
Perceiving the 3D World from Images and Videos Yu Xiang Postdoctoral Researcher University of Washington 1 2 Act in the 3D World Sensing & Understanding Acting Intelligent System 3D World 3 Understand
More information3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington
3D Object Recognition and Scene Understanding from RGB-D Videos Yu Xiang Postdoctoral Researcher University of Washington 1 2 Act in the 3D World Sensing & Understanding Acting Intelligent System 3D World
More informationCS 231A Computer Vision (Winter 2014) Problem Set 3
CS 231A Computer Vision (Winter 2014) Problem Set 3 Due: Feb. 18 th, 2015 (11:59pm) 1 Single Object Recognition Via SIFT (45 points) In his 2004 SIFT paper, David Lowe demonstrates impressive object recognition
More informationDiscrete Optimization of Ray Potentials for Semantic 3D Reconstruction
Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction Marc Pollefeys Joined work with Nikolay Savinov, Christian Haene, Lubor Ladicky 2 Comparison to Volumetric Fusion Higher-order ray
More information3D Convolutional Neural Networks for Landing Zone Detection from LiDAR
3D Convolutional Neural Networks for Landing Zone Detection from LiDAR Daniel Mataruna and Sebastian Scherer Presented by: Sabin Kafle Outline Introduction Preliminaries Approach Volumetric Density Mapping
More informationCIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm
CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm Instructions This is an individual assignment. Individual means each student must hand in their
More informationDeep Learning for Robust Normal Estimation in Unstructured Point Clouds. Alexandre Boulch. Renaud Marlet
Deep Learning for Robust Normal Estimation in Unstructured Point Clouds Alexandre Boulch Renaud Marlet Normal estimation in point clouds Normal: 3D normalized vector At each point: local orientation of
More informationPOINT CLOUD DEEP LEARNING
POINT CLOUD DEEP LEARNING Innfarn Yoo, 3/29/28 / 57 Introduction AGENDA Previous Work Method Result Conclusion 2 / 57 INTRODUCTION 3 / 57 2D OBJECT CLASSIFICATION Deep Learning for 2D Object Classification
More informationDeep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon
Deep Learning For Video Classification Presented by Natalie Carlebach & Gil Sharon Overview Of Presentation Motivation Challenges of video classification Common datasets 4 different methods presented in
More informationDeep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing
Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material Introduction In this supplementary material, Section 2 details the 3D annotation for CAD models and real
More informationEfficient Segmentation-Aided Text Detection For Intelligent Robots
Efficient Segmentation-Aided Text Detection For Intelligent Robots Junting Zhang, Yuewei Na, Siyang Li, C.-C. Jay Kuo University of Southern California Outline Problem Definition and Motivation Related
More informationMulti-view Stereo. Ivo Boyadzhiev CS7670: September 13, 2011
Multi-view Stereo Ivo Boyadzhiev CS7670: September 13, 2011 What is stereo vision? Generic problem formulation: given several images of the same object or scene, compute a representation of its 3D shape
More informationDepth from Stereo. Dominic Cheng February 7, 2018
Depth from Stereo Dominic Cheng February 7, 2018 Agenda 1. Introduction to stereo 2. Efficient Deep Learning for Stereo Matching (W. Luo, A. Schwing, and R. Urtasun. In CVPR 2016.) 3. Cascade Residual
More informationObject Detection by 3D Aspectlets and Occlusion Reasoning
Object Detection by 3D Aspectlets and Occlusion Reasoning Yu Xiang University of Michigan Silvio Savarese Stanford University In the 4th International IEEE Workshop on 3D Representation and Recognition
More informationImage Based Reconstruction II
Image Based Reconstruction II Qixing Huang Feb. 2 th 2017 Slide Credit: Yasutaka Furukawa Image-Based Geometry Reconstruction Pipeline Last Lecture: Multi-View SFM Multi-View SFM This Lecture: Multi-View
More informationTrademark Matching and Retrieval in Sport Video Databases
Trademark Matching and Retrieval in Sport Video Databases Andrew D. Bagdanov, Lamberto Ballan, Marco Bertini and Alberto Del Bimbo {bagdanov, ballan, bertini, delbimbo}@dsi.unifi.it 9th ACM SIGMM International
More informationPresented at the FIG Congress 2018, May 6-11, 2018 in Istanbul, Turkey
Presented at the FIG Congress 2018, May 6-11, 2018 in Istanbul, Turkey Evangelos MALTEZOS, Charalabos IOANNIDIS, Anastasios DOULAMIS and Nikolaos DOULAMIS Laboratory of Photogrammetry, School of Rural
More informationarxiv: v1 [cs.cv] 20 Dec 2016
End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [cs.cv] 20 Dec 2016 Heechul Jung heechul@dgist.ac.kr Min-Kook Choi mkchoi@dgist.ac.kr
More informationCSc Topics in Computer Graphics 3D Photography
CSc 83010 Topics in Computer Graphics 3D Photography Tuesdays 11:45-1:45 1:45 Room 3305 Ioannis Stamos istamos@hunter.cuny.edu Office: 1090F, Hunter North (Entrance at 69 th bw/ / Park and Lexington Avenues)
More informationBeyond Bags of features Spatial information & Shape models
Beyond Bags of features Spatial information & Shape models Jana Kosecka Many slides adapted from S. Lazebnik, FeiFei Li, Rob Fergus, and Antonio Torralba Detection, recognition (so far )! Bags of features
More informationFrom Structure-from-Motion Point Clouds to Fast Location Recognition
From Structure-from-Motion Point Clouds to Fast Location Recognition Arnold Irschara1;2, Christopher Zach2, Jan-Michael Frahm2, Horst Bischof1 1Graz University of Technology firschara, bischofg@icg.tugraz.at
More informationMulti-view stereo. Many slides adapted from S. Seitz
Multi-view stereo Many slides adapted from S. Seitz Beyond two-view stereo The third eye can be used for verification Multiple-baseline stereo Pick a reference image, and slide the corresponding window
More information3D Shape Segmentation with Projective Convolutional Networks
3D Shape Segmentation with Projective Convolutional Networks Evangelos Kalogerakis 1 Melinos Averkiou 2 Subhransu Maji 1 Siddhartha Chaudhuri 3 1 University of Massachusetts Amherst 2 University of Cyprus
More information3D Photography: Stereo
3D Photography: Stereo Marc Pollefeys, Torsten Sattler Spring 2016 http://www.cvg.ethz.ch/teaching/3dvision/ 3D Modeling with Depth Sensors Today s class Obtaining depth maps / range images unstructured
More informationDeep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material
Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material Chi Li, M. Zeeshan Zia 2, Quoc-Huy Tran 2, Xiang Yu 2, Gregory D. Hager, and Manmohan Chandraker 2 Johns
More informationCRF Based Point Cloud Segmentation Jonathan Nation
CRF Based Point Cloud Segmentation Jonathan Nation jsnation@stanford.edu 1. INTRODUCTION The goal of the project is to use the recently proposed fully connected conditional random field (CRF) model to
More information3D Computer Vision. Depth Cameras. Prof. Didier Stricker. Oliver Wasenmüller
3D Computer Vision Depth Cameras Prof. Didier Stricker Oliver Wasenmüller Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de
More informationObject Detection Based on Deep Learning
Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf
More informationLearning Semantic Environment Perception for Cognitive Robots
Learning Semantic Environment Perception for Cognitive Robots Sven Behnke University of Bonn, Germany Computer Science Institute VI Autonomous Intelligent Systems Some of Our Cognitive Robots Equipped
More informationDeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material
DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material Yi Li 1, Gu Wang 1, Xiangyang Ji 1, Yu Xiang 2, and Dieter Fox 2 1 Tsinghua University, BNRist 2 University of Washington
More informationJoint Object Detection and Viewpoint Estimation using CNN features
Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel, David Martín and José M. Armingol cguindel@ing.uc3m.es Intelligent Systems Laboratory Universidad Carlos III de Madrid
More informationSeminar Heidelberg University
Seminar Heidelberg University Mobile Human Detection Systems Pedestrian Detection by Stereo Vision on Mobile Robots Philip Mayer Matrikelnummer: 3300646 Motivation Fig.1: Pedestrians Within Bounding Box
More information3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis
3D Shape Analysis with Multi-view Convolutional Networks Evangelos Kalogerakis 3D model repositories [3D Warehouse - video] 3D geometry acquisition [KinectFusion - video] 3D shapes come in various flavors
More information3D Object Representations. COS 526, Fall 2016 Princeton University
3D Object Representations COS 526, Fall 2016 Princeton University 3D Object Representations How do we... Represent 3D objects in a computer? Acquire computer representations of 3D objects? Manipulate computer
More informationLearning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009
Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer
More informationarxiv: v1 [cs.cv] 28 Sep 2018
Camera Pose Estimation from Sequence of Calibrated Images arxiv:1809.11066v1 [cs.cv] 28 Sep 2018 Jacek Komorowski 1 and Przemyslaw Rokita 2 1 Maria Curie-Sklodowska University, Institute of Computer Science,
More informationObject Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR
Object Detection CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Problem Description Arguably the most important part of perception Long term goals for object recognition: Generalization
More informationCorrespondence. CS 468 Geometry Processing Algorithms. Maks Ovsjanikov
Shape Matching & Correspondence CS 468 Geometry Processing Algorithms Maks Ovsjanikov Wednesday, October 27 th 2010 Overall Goal Given two shapes, find correspondences between them. Overall Goal Given
More informationSpatial Localization and Detection. Lecture 8-1
Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday
More informationLecture 19: Depth Cameras. Visual Computing Systems CMU , Fall 2013
Lecture 19: Depth Cameras Visual Computing Systems Continuing theme: computational photography Cameras capture light, then extensive processing produces the desired image Today: - Capturing scene depth
More information3D Photography: Active Ranging, Structured Light, ICP
3D Photography: Active Ranging, Structured Light, ICP Kalin Kolev, Marc Pollefeys Spring 2013 http://cvg.ethz.ch/teaching/2013spring/3dphoto/ Schedule (tentative) Feb 18 Feb 25 Mar 4 Mar 11 Mar 18 Mar
More informationUrban Scene Segmentation, Recognition and Remodeling. Part III. Jinglu Wang 11/24/2016 ACCV 2016 TUTORIAL
Part III Jinglu Wang Urban Scene Segmentation, Recognition and Remodeling 102 Outline Introduction Related work Approaches Conclusion and future work o o - - ) 11/7/16 103 Introduction Motivation Motivation
More informationUsing Faster-RCNN to Improve Shape Detection in LIDAR
Using Faster-RCNN to Improve Shape Detection in LIDAR TJ Melanson Stanford University Stanford, CA 94305 melanson@stanford.edu Abstract In this paper, I propose a method for extracting objects from unordered
More informationMask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi
Mask R-CNN By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi Types of Computer Vision Tasks http://cs231n.stanford.edu/ Semantic vs Instance Segmentation Image
More informationP-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh
P-CNN: Pose-based CNN Features for Action Recognition Iman Rezazadeh Introduction automatic understanding of dynamic scenes strong variations of people and scenes in motion and appearance Fine-grained
More informationA Low Power, High Throughput, Fully Event-Based Stereo System: Supplementary Documentation
A Low Power, High Throughput, Fully Event-Based Stereo System: Supplementary Documentation Alexander Andreopoulos, Hirak J. Kashyap, Tapan K. Nayak, Arnon Amir, Myron D. Flickner IBM Research March 25,
More informationAmodal and Panoptic Segmentation. Stephanie Liu, Andrew Zhou
Amodal and Panoptic Segmentation Stephanie Liu, Andrew Zhou This lecture: 1. 2. 3. 4. Semantic Amodal Segmentation Cityscapes Dataset ADE20K Dataset Panoptic Segmentation Semantic Amodal Segmentation Yan
More informationMotion Tracking and Event Understanding in Video Sequences
Motion Tracking and Event Understanding in Video Sequences Isaac Cohen Elaine Kang, Jinman Kang Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA Objectives!
More informationVolumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material
Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Charles R. Qi Hao Su Matthias Nießner Angela Dai Mengyuan Yan Leonidas J. Guibas Stanford University 1. Details
More informationDeep Models for 3D Reconstruction
Deep Models for 3D Reconstruction Andreas Geiger Autonomous Vision Group, MPI for Intelligent Systems, Tübingen Computer Vision and Geometry Group, ETH Zürich October 12, 2017 Max Planck Institute for
More informationArticulated Pose Estimation with Flexible Mixtures-of-Parts
Articulated Pose Estimation with Flexible Mixtures-of-Parts PRESENTATION: JESSE DAVIS CS 3710 VISUAL RECOGNITION Outline Modeling Special Cases Inferences Learning Experiments Problem and Relevance Problem:
More informationTraining models for road scene understanding with automated ground truth Dan Levi
Training models for road scene understanding with automated ground truth Dan Levi With: Noa Garnett, Ethan Fetaya, Shai Silberstein, Rafi Cohen, Shaul Oron, Uri Verner, Ariel Ayash, Kobi Horn, Vlad Golder,
More informationCS468: 3D Deep Learning on Point Cloud Data. class label part label. Hao Su. image. May 10, 2017
CS468: 3D Deep Learning on Point Cloud Data class label part label Hao Su image. May 10, 2017 Agenda Point cloud generation Point cloud analysis CVPR 17, Point Set Generation Pipeline render CVPR 17, Point
More informationJOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA
JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS Zhao Chen Machine Learning Intern, NVIDIA ABOUT ME 5th year PhD student in physics @ Stanford by day, deep learning computer vision scientist
More informationDeformable Part Models
CS 1674: Intro to Computer Vision Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 9, 2016 Today: Object category detection Window-based approaches: Last time: Viola-Jones
More informationHuman Body Recognition and Tracking: How the Kinect Works. Kinect RGB-D Camera. What the Kinect Does. How Kinect Works: Overview
Human Body Recognition and Tracking: How the Kinect Works Kinect RGB-D Camera Microsoft Kinect (Nov. 2010) Color video camera + laser-projected IR dot pattern + IR camera $120 (April 2012) Kinect 1.5 due
More informationContexts and 3D Scenes
Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project presentation Nov 30 th 3:30 PM 4:45 PM Grading Three senior graders (30%)
More informationFinal Exam Study Guide
Final Exam Study Guide Exam Window: 28th April, 12:00am EST to 30th April, 11:59pm EST Description As indicated in class the goal of the exam is to encourage you to review the material from the course.
More informationGeometric Reconstruction Dense reconstruction of scene geometry
Lecture 5. Dense Reconstruction and Tracking with Real-Time Applications Part 2: Geometric Reconstruction Dr Richard Newcombe and Dr Steven Lovegrove Slide content developed from: [Newcombe, Dense Visual
More informationMultiview Reconstruction
Multiview Reconstruction Why More Than 2 Views? Baseline Too short low accuracy Too long matching becomes hard Why More Than 2 Views? Ambiguity with 2 views Camera 1 Camera 2 Camera 3 Trinocular Stereo
More informationLearning to Segment Object Candidates
Learning to Segment Object Candidates Pedro Pinheiro, Ronan Collobert and Piotr Dollar Presented by - Sivaraman, Kalpathy Sitaraman, M.S. in Computer Science, University of Virginia Facebook Artificial
More informationBus Detection and recognition for visually impaired people
Bus Detection and recognition for visually impaired people Hangrong Pan, Chucai Yi, and Yingli Tian The City College of New York The Graduate Center The City University of New York MAP4VIP Outline Motivation
More informationGeometric Registration for Deformable Shapes 3.3 Advanced Global Matching
Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching Correlated Correspondences [ASP*04] A Complete Registration System [HAW*08] In this session Advanced Global Matching Some practical
More informationStructured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov
Structured Light II Johannes Köhler Johannes.koehler@dfki.de Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Introduction Previous lecture: Structured Light I Active Scanning Camera/emitter
More informationA Systems View of Large- Scale 3D Reconstruction
Lecture 23: A Systems View of Large- Scale 3D Reconstruction Visual Computing Systems Goals and motivation Construct a detailed 3D model of the world from unstructured photographs (e.g., Flickr, Facebook)
More information3D Shape Modeling by Deformable Models. Ye Duan
3D Shape Modeling by Deformable Models Ye Duan Previous Work Shape Reconstruction from 3D data. Volumetric image datasets. Unorganized point clouds. Interactive Mesh Editing. Vertebral Dataset Vertebral
More informationPointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space Sikai Zhong February 14, 2018 COMPUTER SCIENCE Table of contents 1. PointNet 2. PointNet++ 3. Experiments 1 PointNet Property
More informationSeparating Objects and Clutter in Indoor Scenes
Separating Objects and Clutter in Indoor Scenes Salman H. Khan School of Computer Science & Software Engineering, The University of Western Australia Co-authors: Xuming He, Mohammed Bennamoun, Ferdous
More informationBridging the Gap Between Local and Global Approaches for 3D Object Recognition. Isma Hadji G. N. DeSouza
Bridging the Gap Between Local and Global Approaches for 3D Object Recognition Isma Hadji G. N. DeSouza Outline Introduction Motivation Proposed Methods: 1. LEFT keypoint Detector 2. LGS Feature Descriptor
More informationSSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang
SSD: Single Shot MultiBox Detector Author: Wei Liu et al. Presenter: Siyu Jiang Outline 1. Motivations 2. Contributions 3. Methodology 4. Experiments 5. Conclusions 6. Extensions Motivation Motivation
More informationClassification of objects from Video Data (Group 30)
Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time
More informationTRAFFIC SIGN RECOGNITION USING A MULTI-TASK CONVOLUTIONAL NEURAL NETWORK
TRAFFIC SIGN RECOGNITION USING A MULTI-TASK CONVOLUTIONAL NEURAL NETWORK Dr. S.V. Shinde Arshiya Sayyad Uzma Shaikh Department of IT Department of IT Department of IT Pimpri Chinchwad college of Engineering
More informationDeep learning for object detection. Slides from Svetlana Lazebnik and many others
Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep
More informationObject detection with CNNs
Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals
More informationOctree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs Supplementary Material
Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs Supplementary Material Peak memory usage, GB 10 1 0.1 0.01 OGN Quadratic Dense Cubic Iteration time, s 10
More informationEE795: Computer Vision and Intelligent Systems
EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 14 130307 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Stereo Dense Motion Estimation Translational
More informationPointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. Charles R. Qi* Hao Su* Kaichun Mo Leonidas J. Guibas
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation Charles R. Qi* Hao Su* Kaichun Mo Leonidas J. Guibas Big Data + Deep Representation Learning Robot Perception Augmented Reality
More informationPreviously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011
Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition
More informationRobot localization method based on visual features and their geometric relationship
, pp.46-50 http://dx.doi.org/10.14257/astl.2015.85.11 Robot localization method based on visual features and their geometric relationship Sangyun Lee 1, Changkyung Eem 2, and Hyunki Hong 3 1 Department
More informationPedestrian Detection Using Correlated Lidar and Image Data EECS442 Final Project Fall 2016
edestrian Detection Using Correlated Lidar and Image Data EECS442 Final roject Fall 2016 Samuel Rohrer University of Michigan rohrer@umich.edu Ian Lin University of Michigan tiannis@umich.edu Abstract
More informationPredicting ground-level scene Layout from Aerial imagery. Muhammad Hasan Maqbool
Predicting ground-level scene Layout from Aerial imagery Muhammad Hasan Maqbool Objective Given the overhead image predict its ground level semantic segmentation Predicted ground level labeling Overhead/Aerial
More informationCAP 6412 Advanced Computer Vision
CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong April 21st, 2016 Today Administrivia Free parameters in an approach, model, or algorithm? Egocentric videos by Aisha
More information3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.
3D Computer Vision Structured Light II Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1 Introduction
More informationMartian lava field, NASA, Wikipedia
Martian lava field, NASA, Wikipedia Old Man of the Mountain, Franconia, New Hampshire Pareidolia http://smrt.ccel.ca/203/2/6/pareidolia/ Reddit for more : ) https://www.reddit.com/r/pareidolia/top/ Pareidolia
More informationBeyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Adding spatial information Forming vocabularies from pairs of nearby features doublets
More information3D model classification using convolutional neural network
3D model classification using convolutional neural network JunYoung Gwak Stanford jgwak@cs.stanford.edu Abstract Our goal is to classify 3D models directly using convolutional neural network. Most of existing
More informationSupplementary: Cross-modal Deep Variational Hand Pose Estimation
Supplementary: Cross-modal Deep Variational Hand Pose Estimation Adrian Spurr, Jie Song, Seonwook Park, Otmar Hilliges ETH Zurich {spurra,jsong,spark,otmarh}@inf.ethz.ch Encoder/Decoder Linear(512) Table
More informationMULTI-LEVEL 3D CONVOLUTIONAL NEURAL NETWORK FOR OBJECT RECOGNITION SAMBIT GHADAI XIAN LEE ADITYA BALU SOUMIK SARKAR ADARSH KRISHNAMURTHY
MULTI-LEVEL 3D CONVOLUTIONAL NEURAL NETWORK FOR OBJECT RECOGNITION SAMBIT GHADAI XIAN LEE ADITYA BALU SOUMIK SARKAR ADARSH KRISHNAMURTHY Outline Object Recognition Multi-Level Volumetric Representations
More information