Recursive Grammars for Scene Parsing a.k.a. RANSACing Offices

Similar documents
Adaptive Point Cloud Rendering

cse 252c Fall 2004 Project Report: A Model of Perpendicular Texture for Determining Surface Geometry

Reconstruction of Polygonal Faces from Large-Scale Point-Clouds of Engineering Plants

Ray Tracing. Foley & Van Dam, Chapters 15 and 16

Ray Tracing Foley & Van Dam, Chapters 15 and 16

CS 563 Advanced Topics in Computer Graphics QSplat. by Matt Maziarz

Ransac Based Out-of-Core Point-Cloud Shape Detection for City-Modeling

Homework #2. Shading, Projections, Texture Mapping, Ray Tracing, and Bezier Curves

Combining Monocular and Stereo Depth Cues

3D object recognition used by team robotto

Three-Dimensional Shapes

Lecture 3 Sections 2.2, 4.4. Mon, Aug 31, 2009

Miniature faking. In close-up photo, the depth of field is limited.

3D Perception. CS 4495 Computer Vision K. Hawkins. CS 4495 Computer Vision. 3D Perception. Kelsey Hawkins Robotics

Keywords: clustering, construction, machine vision

Chapter 3 Image Registration. Chapter 3 Image Registration

Assignment 6: Ray Tracing

Scene Management. Video Game Technologies 11498: MSc in Computer Science and Engineering 11156: MSc in Game Design and Development

Efficient Surface and Feature Estimation in RGBD

Ray Tracer I: Ray Casting Due date: 12:00pm December 3, 2001

Index C, D, E, F I, J

Chapter 5. Projections and Rendering

Instructional Alignment Chart

CS4495/6495 Introduction to Computer Vision

CS 4758: Automated Semantic Mapping of Environment

Chapter 10 Practice Test

Enabling a Robot to Open Doors Andrei Iancu, Ellen Klingbeil, Justin Pearson Stanford University - CS Fall 2007

Culling. Computer Graphics CSE 167 Lecture 12

Perception IV: Place Recognition, Line Extraction

CSE 167: Introduction to Computer Graphics Lecture #11: Visibility Culling

Tecnologie per la ricostruzione di modelli 3D da immagini. Marco Callieri ISTI-CNR, Pisa, Italy

Spectral Classification

Midterm Examination CS 534: Computational Photography

CSE 527: Introduction to Computer Vision

Ray Tracer Due date: April 27, 2011

Tecnologie per la ricostruzione di modelli 3D da immagini. Marco Callieri ISTI-CNR, Pisa, Italy

Pipeline Operations. CS 4620 Lecture 10

Processing 3D Surface Data

CSE 167: Introduction to Computer Graphics Lecture #9: Visibility. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2018

4: Polygons and pixels

Spatial Data Structures

Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images

CS4620/5620: Lecture 14 Pipeline

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

3D from Images - Assisted Modeling, Photogrammetry. Marco Callieri ISTI-CNR, Pisa, Italy

STARTING COMPOSITING PROJECT

Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

A method for depth-based hand tracing

Integrating the Generations, FIG Working Week 2008,Stockholm, Sweden June 2008

Universiteit Leiden Computer Science

Ray Tracing. Cornell CS4620/5620 Fall 2012 Lecture Kavita Bala 1 (with previous instructors James/Marschner)

Exploiting Depth Camera for 3D Spatial Relationship Interpretation

City-Modeling. Detecting and Reconstructing Buildings from Aerial Images and LIDAR Data

Spatial Data Structures

More Texture Mapping. Texture Mapping 1/46

Spatial Data Structures

Pipeline Operations. CS 4620 Lecture 14

Clustering in Registration of 3D Point Clouds

CSE 167: Introduction to Computer Graphics Lecture #10: View Frustum Culling

Deferred Rendering Due: Wednesday November 15 at 10pm

Plane Detection in Point Cloud Data

L1 - Introduction. Contents. Introduction of CAD/CAM system Components of CAD/CAM systems Basic concepts of graphics programming

Practice Exam Sample Solutions

Articulated Pose Estimation with Flexible Mixtures-of-Parts

CS 664 Segmentation. Daniel Huttenlocher

CS 4758 Robot Navigation Through Exit Sign Detection

TSBK03 Screen-Space Ambient Occlusion

5 Subdivision Surfaces

Final Exam Study Guide

Final Exam Assigned: 11/21/02 Due: 12/05/02 at 2:30pm

I N T R O D U C T I O N T O C O M P U T E R G R A P H I C S

Triangular Mesh Segmentation Based On Surface Normal

Plane Detection and Segmentation For DARPA Robotics Challange. Jacob H. Palnick

From Structure-from-Motion Point Clouds to Fast Location Recognition

Processing 3D Surface Data

CSE 252B: Computer Vision II

EE795: Computer Vision and Intelligent Systems

Data analysis with ParaView CSMP Workshop 2009 Gillian Gruen

Combining Appearance and Topology for Wide

Computer Graphics (CS 543) Lecture 13b Ray Tracing (Part 1) Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI)

GAMES Webinar: Rendering Tutorial 2. Monte Carlo Methods. Shuang Zhao

CSE 167: Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

Segmentation and Tracking of Partial Planar Templates

A New Representation for Video Inspection. Fabio Viola

Data Visualization. What is the goal? A generalized environment for manipulation and visualization of multidimensional data

Tri-modal Human Body Segmentation

CSL 859: Advanced Computer Graphics. Dept of Computer Sc. & Engg. IIT Delhi

The exam begins at 2:40pm and ends at 4:00pm. You must turn your exam in when time is announced or risk not having it accepted.

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

Trademark Matching and Retrieval in Sport Video Databases

CS 231A Computer Vision (Winter 2014) Problem Set 3

Spatial Data Structures

Data Visualization. What is the goal? A generalized environment for manipulation and visualization of multidimensional data

Hidden surface removal. Computer Graphics

Data Acquisition, Leica Scan Station 2, Park Avenue and 70 th Street, NY

University of Florida CISE department Gator Engineering. Clustering Part 4

RANSAC and some HOUGH transform

Bagging for One-Class Learning

Spatial Data Structures and Speed-Up Techniques. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology

Computer Vision 2. SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung. Computer Vision 2 Dr. Benjamin Guthier

Transcription:

Recursive Grammars for Scene Parsing a.k.a. RANSACing Offices Caspar Anderegg (cja58), CS 12 Bill Best (wpb47), MEng 12 with Abhishek Anand May 15, 2012 Abstract Our goal was to use recursive grammars to construct parse trees within segmented point-clouds. Similar methods have been successfully used in flat images, but never with 3D data. We have written a virtual scanner to convert Wavefront object files to point-clouds. Additionally, we created a module to segment point-clouds using a generalized RANSAC algorithm. Both were incorporated in a ROS package that we are contributing to the official listing. Our RANSAC module interfaces directly with Abhishek s system for parse-tree construction and labeling. 1 Introduction Many algorithms for object identification rely on single pixels or small collections of pixels identified as important features. These systems usually do not take into account the relations between features, or the shape of the overall objects. When these systems segment an image, there is generally not a one-to-one correspondence between segments and objects in the scene. Multiple different objects might be included in a single segment, or a single object may be split into many segments. Cognitive science research in human perception has shown that there is ongoing feedback between recognition areas and segmentation areas of the brain. After splitting an object, we use a recursive grammar to merge segments. Atomic segments are merged into contiguous components (for example a plane forming the front of a computer tower), these components will be merged into complete objects. Such a segment hierarchy allows for a wholre new class of features to be used in object identification. 2 Related Work Previous projects [Girshick, R., P. Felzenswalb, and D. McAllester. 2011] and [Y. Zhao, S.Z. 2011] have successfully used recursive grammars for object recognition in two dimensional images. These projects generated many possible candidate parse trees, then used maximum likelihood estimation to select a most probable candidate given the scene. Neither project attempted to use 3D data. 1

Abhishek Anand and Hema Swetha Koppula have done work on scene understanding from Kinect point-cloud data. They are able to use local visual appearance and shape cues to label objects in home and office scenes. Abhishek has created a graphical interface to simplify the graphical construction and labeling of parse trees, given segmented point-clouds. As part of the project, we interface with this existing software. 3 Approach 3.1 Data Acquisition One of the major challenges was data acquisition. We wanted to use digital models from the Google Sketchup 3D Warehouse, converted into point-clouds. These files are downloaded in a proprietary Sketchup format, but can be exported into a Wavefront Object Files (.obj) and Material Libraries (.mtl). We had intended to find a ROS package that would convert.obj files into Point-cloud Library.pcd files, but were unable to find a suitable converter. The best we could find was a python script that simply stripped out face information and left vertices. Figure 1: A synthetically generated point-cloud (from several angles). The cloud was rendered from model taken from Google Sketchup 3D warehouse.it uses all three occlusion models, and Thus, we wrote a virtual scanner to convert.obj/.mtl files into.pcd files. The scanner used barycentric coordinates to sample each face. The view direction, sample rate, and and other parameters can be set via command line arguments. The scanner supports several noise models: no noise, Gaussian noise, and Kinect camera noise as described in [Liu, Yiping. 2012]. The Kinect noise model is one 2

dimensional Gaussian noise (in the direction of the view vector) that scales proportional to the distance from the camera. Where v is the view direction, e is the location of the eye, and p is the location of the rendered point in space, the noise follows the equation below: Noise N (0, 3.5 10 3 proj v ( e p) 2 ) v The scanner also supports three levels of visibility culling. At the most basic level, backface culling allows the removal of any faces with normals that point away from the camera. The next level up is occlusion culling, allowing faces to be removed if none of their vertices are visible to the camera. The highest level is trace culling, which checks whether each point is visible to the camera. These occlusion models can be used together to choose an appropriate level of speed and realism. We have downloaded almost 200 3D models in five categories of office objects. These can be converted to point-clouds as needed, using different perspectives and noise profiles. An example point-cloud generated by our virtual scanner can be seen in Figure 1. Figure 2: Two point-clouds of a computer tower. The image on the left shows a pointcloud generated from our virtual scanner. The image on the right shows the same pointcloud after being segmented by our generalized RANSAC algorithm. 3.2 Point-cloud Segmentation We wanted to run a generalized RANSAC segmentation on these point-clouds, as outlined in [Schnabel, Ruwen, Roland Wahl, and Reinhard Klein. 2007]. This general RANSAC needed to be able to fit multiple shapes (planes, spheres, cylinders, 3

etc.). We hoped to find a ROS package that would implement such a generalized RANSAC, but again we were unable to find a suitable package. All of the implementations we found fit only planes. We created a ROS package to run a generalized iterative RANSAC segmentation of a point-cloud file. The algorithm generates a tree of candidate shape combinations, then performs a breadth-first search of this tree to identify the smallest set of shapes to fit some critical majority of the points in the model. Figure 3: Two point-clouds of a computer printer. Again the left-hand side is the generated point-cloud, and the right-hand side is the segmented cloud. Note the rounded corners have been fit with cylinders. For our RANSAC algorithm, we used three basic shapes: planes, spheres, and cylinders. Planes and spheres were already available to us as functions in PCL for random consensus, so implementing them was relatively simple. We decided not to implement shapes such as rectangular prisms since they would already be representable by multiple planes and, given the number of degrees of freedom in a rectangular prism, would take too long to converge on a final answer. We selected cylinders as the third shape because we found that after planes, the next most common shape was cylinders. While a cylindrical shape in the model could be represented by multiple planes, we elected to take the route of fitting a cylinder instead. While fitting a cylinder takes longer (due to the number of degrees of freedom in a cylinder) we were trying to fit fewer shapes overall, so fitting a cylinder instead of multiple planes was the better option. Since we were working with a specific subset of possible objects office objects we were able to notice that 4

shapes other than planes, sphere, or cylinders were rare enough that we could easily represent them as multiple shapes rather than as their true shape, without significant loss of time or final efficacy in Abhishek s code. There are some useful shapes that are currently not supported by PCL (the cone and the torus), but which are scheduled to be implemented at some future date. Our code contains a framework that allows these shapes to be easilly included once they are implemented. 3.3 Labeling and Parse-tree Construction For parse tree construction and labeling, we interface with Abhishek s system. The RANSAC module outputs a segmented point-cloud, a neighbor map between segments, and an initial parse tree. These files can be passed directly into Abhishek s cfg3d ParseTreeLabelerF for parse tree construction and labeling. Figure 4: An example parse tree created from the tower shown in Figure 2. The constructed parse trees are output in.dot format. This allows parse trees to be build or edited by hand if required. For an example parse tree constructed with the labeler, see Figure 4. The grammar extracted from the parse tree might look as follows: ROOT CPUTower CPUTower CPUTop CPUFront CPUSide CPUFront CPUSide CPUFront CPUSide CPUTop Plane CPUFront Plane CPUSide Plane Plane Plane plane terminal plane terminal (For convention, nonterminal symbols begin with uppercase letters while terminals are teletyped and begin with lowercase letters.) 5

3.4 ROS Package and Dataset As part of the project, we created a ROS package that we are in the process of getting on the official ROS listing. We also created a synthetic data set with over 1000 point-clouds rendered from almost 200 Sketchup objects. 4 Results We performed trials to see how many shapes the RANSAC module used to fit different classes of objects. Results can be found in the table below. Type Average Shapes Keyboards 3.61 Computer Mice 2.89 CPU Towers 6.30 Computer Monitors 5.63 Printers 7.47 In the RANSAC module, we found the percentage of points that were not fit by the RANSAC model (the number deemed outliers) was on average 3.70%. Results were computed with a small threshold and outlier cutoff value, for detail on a single object. A larger threshold and and outlier cutoff value would give detail closer to that of a full scene. 5 Conclusion While we were not able to accomplish everything we had planned due to time restrictions, overall we found relatively good results. As noted above, we were able to reduce the number of points not associated with a shape to about 3.7% per object. We feel this is a reasonable allowance of error for the needs of the project going forward. Further, we were able to produce point-cloud sets with a minimal number of constituent shapes. For example, a keyboard is comprised of (on average) 3.61 shapes. This would be a plane for the front edge, a plane for the top of the body, a plane for all the keys, and sometimes an extra plane depending on the angle. This is basically the minimum number of shapes that could be created for the object. We see this to be common among all objects. Having a minimum number of shapes is important because it allows us to get a more accurate representation of what the object is comprised of, more needless shapes would be akin to having more noise. We were also able to successfully coordinate our code with the current labeling code. The RANSAC program allows us to associate more shapes to the objects we want to label, giving us a deeper understanding of the objects themselves, as well as the scene they are in. Finally, we were also pleased with our model scanner, which allowed us to easily create PCD files from obj files. It produced accurate point-cloud images to which we were able to add noise to simulate Kinect noise. This program should allow for production of PCD files from multiple angles, without spending all the time capturing the images with a real Kinect. We have combined these in a ROS package, that we hope to have on the official listing soon. 6

6 Future Work Going forward, we need to create a dataset of parse trees using the labeler. Using this dataset, we could extract a grammar of shapes and relations between shapes which we can train using an SVM (similar to SVM-cfg by Thorsten Joachims). From this, we hope to be able to RANSAC an entire office and label all objects in it with a reasonable degree of accuracy. Even more exciting, we would be able to parse relations between objects themselves. This would allow an even deeper understanding of the composition scene. 7 Acknowledgements We would like to thank the following individuals: Ashutosh Saxena, course professor CS 4758 Cornell University Abhishek Anand, Ph.D. mentor Cornell University Ken Conley, ROS Liaison WillowGarage References Anand, Abhishek, Sherwin Li, and Paul Heran Yang. Understanding 3D Scenes Using Visual Grammars. Cornell University, Ithaca: 15 December 2011. Girshick, R., P. Felzenswalb, and D. McAllester. Object Detection with grammar models. NIPS: 2011. Liu, Yiping. Precision of the Kinect Sensor. ROS Wiki. 2011. Web. 2 March 2012. Schnabel, Ruwen, Roland Wahl, and Reinhard Klein. Efficient RANSAC for Point-Cloud Shape Detection. Computer Graphics Forum: 2007. Y. Zhao, S.Z. Image parsing with stochastic scene grammar. NIPS: 2011 7