3D Perception. CS 4495 Computer Vision K. Hawkins. CS 4495 Computer Vision. 3D Perception. Kelsey Hawkins Robotics

Similar documents
CS4495/6495 Introduction to Computer Vision

Local features: detection and description. Local invariant features

3D Photography: Stereo

3D Photography: Active Ranging, Structured Light, ICP

CS4670: Computer Vision

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Chaplin, Modern Times, 1936

Structured light 3D reconstruction

Point Cloud Processing

CRF Based Point Cloud Segmentation Jonathan Nation

3D Models and Matching

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

CITS 4402 Computer Vision

3D Scanning. Qixing Huang Feb. 9 th Slide Credit: Yasutaka Furukawa

Edge and corner detection

3D object recognition used by team robotto

Local Feature Detectors

Scale Invariant Feature Transform

Image matching. Announcements. Harder case. Even harder case. Project 1 Out today Help session at the end of class. by Diva Sian.

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Feature descriptors and matching

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Final Exam Study Guide

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Local features: detection and description May 12 th, 2015

Scale Invariant Feature Transform

Midterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability

Miniature faking. In close-up photo, the depth of field is limited.

Local invariant features

Lecture 19: Depth Cameras. Visual Computing Systems CMU , Fall 2013

Application questions. Theoretical questions

Local features and image matching. Prof. Xin Yang HUST

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

3D Computer Vision. Depth Cameras. Prof. Didier Stricker. Oliver Wasenmüller

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

Image Features. Work on project 1. All is Vanity, by C. Allan Gilbert,

Fundamental matrix. Let p be a point in left image, p in right image. Epipolar relation. Epipolar mapping described by a 3x3 matrix F

Correspondence. CS 468 Geometry Processing Algorithms. Maks Ovsjanikov

CS 378: Autonomous Intelligent Robotics. Instructor: Jivko Sinapov

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Feature Tracking and Optical Flow

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Robotics Programming Laboratory

Feature Detectors and Descriptors: Corners, Lines, etc.

Lecture 16: Computer Vision

EE795: Computer Vision and Intelligent Systems

SLAM with SIFT (aka Mobile Robot Localization and Mapping with Uncertainty using Scale-Invariant Visual Landmarks ) Se, Lowe, and Little

COMPUTER VISION > OPTICAL FLOW UTRECHT UNIVERSITY RONALD POPPE

Automatic Image Alignment (feature-based)

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

3D Object Representations. COS 526, Fall 2016 Princeton University

Local Features: Detection, Description & Matching

CS 231A Computer Vision (Winter 2018) Problem Set 3

Computer Vision I - Appearance-based Matching and Projective Geometry

Stereo Vision. MAN-522 Computer Vision

Multiple View Geometry

Advanced Video Content Analysis and Video Compression (5LSH0), Module 4

3D Computer Vision. Structured Light I. Prof. Didier Stricker. Kaiserlautern University.

Structured Light. Tobias Nöll Thanks to Marc Pollefeys, David Nister and David Lowe

Lecture 16: Computer Vision

Complex Sensors: Cameras, Visual Sensing. The Robotics Primer (Ch. 9) ECE 497: Introduction to Mobile Robotics -Visual Sensors

Efficient Surface and Feature Estimation in RGBD

Scale Invariant Feature Transform by David Lowe

Prof. Feng Liu. Spring /26/2017

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

What is Computer Vision?

Digital Image Processing COSC 6380/4393

L2 Data Acquisition. Mechanical measurement (CMM) Structured light Range images Shape from shading Other methods

Depth Sensors Kinect V2 A. Fornaser

Corner Detection. GV12/3072 Image Processing.

Computer Vision Lecture 17

Computer Vision Lecture 17

Visual Learning and Recognition of 3D Objects from Appearance

arxiv: v1 [cs.cv] 28 Sep 2018

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Other Linear Filters CS 211A

Today. Stereo (two view) reconstruction. Multiview geometry. Today. Multiview geometry. Computational Photography

Image correspondences and structure from motion

Tracking system. Danica Kragic. Object Recognition & Model Based Tracking

Photo Tourism: Exploring Photo Collections in 3D

Feature Tracking and Optical Flow

Anno accademico 2006/2007. Davide Migliore

Lecture 14: Basic Multi-View Geometry

COMPARISON OF LASER SCANNING, PHOTOGRAMMETRY AND SfM-MVS PIPELINE APPLIED IN STRUCTURES AND ARTIFICIAL SURFACES

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing

INFO0948 Fitting and Shape Matching

Three-Dimensional Computer Vision

Video Google: A Text Retrieval Approach to Object Matching in Videos

CS5670: Computer Vision

Nonlinear State Estimation for Robotics and Computer Vision Applications: An Overview

Recognizing Deformable Shapes. Salvador Ruiz Correa Ph.D. UW EE

Announcements. Recognition I. Gradient Space (p,q) What is the reflectance map?

Outline 7/2/201011/6/

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882

Motion Tracking and Event Understanding in Video Sequences

Vision and Image Processing Lab., CRV Tutorial day- May 30, 2010 Ottawa, Canada

Recognition: Face Recognition. Linda Shapiro EE/CSE 576

Recap: Features and filters. Recap: Grouping & fitting. Now: Multiple views 10/29/2008. Epipolar geometry & stereo vision. Why multiple views?

Transcription:

CS 4495 Computer Vision Kelsey Hawkins Robotics

Motivation What do animals, people, and robots want to do with vision? Detect and recognize objects/landmarks Find location of objects with respect to themselves Want to grasp fruit/tool, where should I put my body/arm? Changes in elevation: steps, rocks, inclined planes Determine shape What is the physical 3D structure of this object? Where does an object begin and the background begin? Find obstacles and map the environment Is that a banana or a snake? A cup or a plate? How do I get my body/arm from A to B without hitting things? Others tracking, dynamics, etc.

Weaknesses of Images Surface Geometry Color Inconsistency

Weaknesses of Monocular Vision Scale Lack of texture Background-foreground similarity

Potential solution: 3D Sensing pointclouds.org

Types of 3D Sensing Passive 3D sensing Work with naturally occurring light Exploit geometry or known properties of scenes Active 3D sensing Project light or sound out into the environment and see how it reacts Encode some pattern which can be found in the sensor

Passive 3D Sensors Stereo Rigs Shape from focus Nayar, Watanabe, and Noguchi 1996

Active Photometric Stereo

Active Time of Flight Bounce signal off of a surface, record time to come back, X=V*t/2 LIDAR / Laser / Range finder SONAR / Sound / Transceiver

Active - Structured Light Remember stereo? Let's replace the camera with a projector Instead of looking for the same features in both image, we look for a known feature we've projected on the scene

Active Structured Light Zhang, Li et. al. "Rapid shape acquisition..."

Active Infrared Structured Light

How the Kinect works Cylindrical lens Only focuses light in one direction PrimeSense patent 2010/0290698

How the Kinect works PrimeSense patent No. 20100290698

How the Kinect works PrimeSense patent No. 20100290698

How the Kinect works PrimeSense patent 2010/0290698

How the Kinect works Psuedo-random speckle pattern PrimeSense patent 2010/0290698

2D vs. Analysis Tools 2D 3D Representation Image (u,v) 1st order differential geometry Image gradients Surface normals 2nd order differential geometry Second moment matrix Principle curvature Corner detection Harris image Surface variation Depth image (u,v,d) Point cloud (x,y,z) Feature extraction HOG Point Feature Histograms Spin Images Geometric model fitting Hough transform Clustering + RANSAC Alignment SSD window filter Iterative Closest Point (ICP)

Depth Images Advantages Dense representation Gives intuition about occlusion and free space Depth discontinuities are just edges on the image Disadvantages Viewpoint dependent, can't merge Doesn't capture physical geometry Need actual 3D locations

Point Clouds Take every depth pixel and put it out in the world What can this representation tell us? What information do we lose? R. Rusu's PCL Presentation

Point Clouds Advantages Viewpoint independent Captures surface geometry Points represent physical locations Disadvantages Sparse representation Lost information about free space and unknown space Variable density based on distance from sensor R. Rusu's PCL Presentation

Point Clouds and Surfaces Point clouds are sampled from the surfaces of the objects perceived The concept of volume is inferred, not perceived

Surfaces Let's say we'd like to learn the geometry around a point in our cloud What is the simplest surface representation we could use to approximate the surface about a point? Tangent plane Defined by normal First-order approximation

Surfaces To understand how we can characterise surfaces, we can look to differential geometry A surface is 2D manifold in 3D space f :ℝ 2 ℝ3 f (u, v )=(x, y, z ) Parametric representation How u and v are oriented with respect to the surface is irrelevant

Surfaces v u

Surfaces v u

Surfaces v u

Surface Normals f (u, v ) Want to estimate this function What can we do to estimate this function? Taylor Series 1st order approximation at (u 0, v 0 ) [ ] f ( u0, v 0) f (u, v ) f (u0, v 0)+[u u0, v v 0 ] u f ( u0, v 0) v

Surface Normals We have a problem though... Don't have (u, v) basis, infinite exist! Take a sample of 3D points we believe lie on f (u, v ) around (u 0, v 0 ) [ ][ ][ f (u 1, v 1 ) x1 y1 z1 u1 u0 v 1 v 0 A= = = xn yn zn u n u0 v n v 0 f (u n, v n ) Find n such that A n=0 We've done this before (last eigenvector) ][ f (u 0, v 0 )T u f (u 0, v 0 )T v ]

Surface Normals [ u1 u0 v 1 v 0 A n= un u0 v n v 0 ][ ] [] f T u n=0 f T v f T u n=0 f n=0, f n=0 u v f T v This n (the normal) is perpendicular to both partials, regardless of basis choice Surface normal is a first order approximation of the surface at the point invariant to basis choice f f n, n u v

Surface Normals Size of patch is like width of Gaussian in image gradient calculation We can use them to find planes

Principal Curvature Second order approximation

Surface Variation [ ][ f (u 1, v 1 ) x1 y1 z1 A= = xn yn zn f (u n, v n ) ] [ ] Normal s2 0 0 T A=U S V =U 0 s1 0 [ v 2 v 1 v 0 ] 0 0 s0 T surface variation= s 20 2 0 2 1 2 2 s +s +s Principal Curvatures This is equivalent to finding the eigenvalues/vectors of the covariance matrix AT A

Normals / Surface Variation Demo

Feature Extraction Suppose we want a denser description of the local surface function Want to find unique patches of surface geometry What type of invariance do we need? Need viewpoint invariance Translation + orientation Color and texture come automatically!

Point Feature Histograms Remember SIFT? We're going to use roughly the same idea Use the normal at the point to establish a dominant orientation Build a histogram of the orientations of normals in the general region with respect to the original

Point Feature Histograms At a point, take a ball of points around it For every pair of points, find the relationship between the two points and their normals Must be frame independent R. Rusu's Thesis

Point Feature Histograms (x 1, y 1, z 1, n x1, n y1, n z1 ) Reduce (x y z n, n, n ) to 4 variables 2, 2, 2, x2 y2 z2 R. Rusu's Thesis

Point Feature Histograms Find these for variables for every pair in the ball Build a 5x5x5x5 histogram of the variables Often the distance variable is excluded In this case, we have a 125-long feature vector Use this just like a SIFT feature descriptor Usually, a sped-up version called Fast Point Feature Histograms is used for real-time applications

Spin Images Rotate plane about normal of a point, project all points onto surface, build a histogram A. Johnson's 1997 Thesis

Spin Images A. Johnson's 1997 Thesis

Comparison of 3D Descriptors Alexandre, L., 3D Descriptors for Object and Category Recognition: a Comparative Evaluation

Comparison of 3D Descriptors Alexandre, L., 3D Descriptors for Object and Category Recognition: a Comparative Evaluation

Alignment PFH correspondences + RANSAC can be good at estimating an initial alignment Often the alignment is off by a little bit Or perhaps we already have a good estimate of the alignment of two point clouds from some other source? Viewpoint is roughly in the same place Use SIFT in 2D How can we remove that last bit of error?

Aligning 3D Data Slides stolen from Ronen Gvili

Corresponding Point Set Alignment Let M be a model point set. Let S be a scene point set. We assume : 1. NM = NS. 2. Each point Si correspond to Mi. Slides stolen from Ronen Gvili

Corresponding Point Set Alignment The MSE objective function : 1 f ( R, T ) = NS 1 f (q) = NS NS i= 1 NS i= 1 mi Rot ( si ) Trans mi R(qR ) si qt 2 2 The alignment is : (rot, trans, d mse ) = Φ ( M, S ) Slides stolen from Ronen Gvili

Aligning 3D Data If correct correspondences are known, can find correct relative rotation/translation Slides stolen from Ronen Gvili

Aligning 3D Data How to find correspondences: User input? Feature detection? Signatures? Alternative: assume closest points correspond Slides stolen from Ronen Gvili

Aligning 3D Data How to find correspondences: User input? Feature detection? Signatures? Alternative: assume closest points correspond Slides stolen from Ronen Gvili

Aligning 3D Data Converges if starting position close enough Slides stolen from Ronen Gvili

The Algorithm Init the error to Y = CP(M,S),e (rot,trans,d) S`= rot(s)+trans d` = d Calculate correspondence Calculate alignment Apply alignment Update error If error > threshold Slides stolen from Ronen Gvili

Convergence Theorem The ICP algorithm always converges monotonically to a local minimum with respect to the MSE distance objective function. Slides stolen from Ronen Gvili

RANSAC Segmentation RANSAC is a very general algorithm Have some model we want to fit Some reasonable percentage of the dataset fits the model Find the best model by subsampling, fitting, reprojecting, and evaluating the model Plane model: ax + by+ cz +d =0 A limited cylinder model: 2 2 (x a) +( y b) =r 2

RANSAC Cylinder Segmentation pointclouds.org

Point Cloud Software Point Cloud Library (PCL) Robot Operating System (ROS) Framework for building systems http://pointclouds.org http://www.ros.org Drivers for Kinect and other PrimeSense sensors http://www.ros.org/wiki/openni_launch