Breaking it Down: The World as Legos Benjamin Savage, Eric Chu

Similar documents
Outline 7/2/201011/6/

Facial Expression Detection Using Implemented (PCA) Algorithm

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Mapping of Hierarchical Activation in the Visual Cortex Suman Chakravartula, Denise Jones, Guillaume Leseur CS229 Final Project Report. Autumn 2008.

Final Exam Study Guide

Does the Brain do Inverse Graphics?

Does the Brain do Inverse Graphics?

ROTATION INVARIANT SPARSE CODING AND PCA

Image Processing. Image Features

Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images

Separating Speech From Noise Challenge

CS 229 Midterm Review

Facial Expression Recognition Using Non-negative Matrix Factorization

CHAPTER 5 GLOBAL AND LOCAL FEATURES FOR FACE RECOGNITION

The Detection of Faces in Color Images: EE368 Project Report

Selection of Scale-Invariant Parts for Object Class Recognition

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

Scale Invariant Feature Transform

Application of Support Vector Machine Algorithm in Spam Filtering

Wikipedia - Mysid

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Dynamic Routing Between Capsules

Scale Invariant Feature Transform

Automatic Image Alignment (feature-based)

Using Geometric Blur for Point Correspondence

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

10-701/15-781, Fall 2006, Final

Recognition, SVD, and PCA

A Summary of Projective Geometry

STRUCTURAL EDGE LEARNING FOR 3-D RECONSTRUCTION FROM A SINGLE STILL IMAGE. Nan Hu. Stanford University Electrical Engineering

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Random projection for non-gaussian mixture models

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

Edge and corner detection

FACE DETECTION AND RECOGNITION OF DRAWN CHARACTERS HERMAN CHAU

3D Object Recognition using Multiclass SVM-KNN

Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels

Learning to Recognize Faces in Realistic Conditions

The SIFT (Scale Invariant Feature

Feature Extraction and Image Processing, 2 nd Edition. Contents. Preface

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

The Curse of Dimensionality

Correcting User Guided Image Segmentation

then assume that we are given the image of one of these textures captured by a camera at a different (longer) distance and with unknown direction of i

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

FACE RECOGNITION USING INDEPENDENT COMPONENT

Determinant of homography-matrix-based multiple-object recognition

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation

Patch-based Object Recognition. Basic Idea

Announcements. Recognition I. Gradient Space (p,q) What is the reflectance map?

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Estimating Noise and Dimensionality in BCI Data Sets: Towards Illiteracy Comprehension

K-Means Clustering 3/3/17

CSE 252B: Computer Vision II

Shape Matching. Brandon Smith and Shengnan Wang Computer Vision CS766 Fall 2007

Image Coding with Active Appearance Models

Digital Image Processing (CS/ECE 545) Lecture 5: Edge Detection (Part 2) & Corner Detection

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Principal Component Analysis (PCA) is a most practicable. statistical technique. Its application plays a major role in many

A Generalized Method to Solve Text-Based CAPTCHAs

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Enabling a Robot to Open Doors Andrei Iancu, Ellen Klingbeil, Justin Pearson Stanford University - CS Fall 2007

Capturing, Modeling, Rendering 3D Structures

COMPUTER AND ROBOT VISION

CHAPTER 1 Introduction 1. CHAPTER 2 Images, Sampling and Frequency Domain Processing 37

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Facial Expression Classification with Random Filters Feature Extraction

Face Recognition using Eigenfaces SMAI Course Project

Face Recognition for Different Facial Expressions Using Principal Component analysis

ECG782: Multidimensional Digital Signal Processing

Image Segmentation and Registration

Assignment 4: CS Machine Learning

Image Features: Detection, Description, and Matching and their Applications

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Laser sensors. Transmitter. Receiver. Basilio Bona ROBOTICA 03CFIOR

CS 223B Computer Vision Problem Set 3

CS 231A Computer Vision (Fall 2012) Problem Set 3

Factorization with Missing and Noisy Data

Centre for Digital Image Measurement and Analysis, School of Engineering, City University, Northampton Square, London, ECIV OHB

Advanced Video Content Analysis and Video Compression (5LSH0), Module 4

Generic Face Alignment Using an Improved Active Shape Model

Computer Vision. Exercise 3 Panorama Stitching 09/12/2013. Compute Vision : Exercise 3 Panorama Stitching

ECE 176 Digital Image Processing Handout #14 Pamela Cosman 4/29/05 TEXTURE ANALYSIS

Computer Graphics Hands-on

De-identifying Facial Images using k-anonymity

Clustering and Visualisation of Data

Local Image preprocessing (cont d)

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale.

Image matching. Announcements. Harder case. Even harder case. Project 1 Out today Help session at the end of class. by Diva Sian.

Image Processing

A Keypoint Descriptor Inspired by Retinal Computation

Computational Foundations of Cognitive Science

Epipolar Geometry in Stereo, Motion and Object Recognition

Digital Image Processing

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

CS228: Project Report Boosted Decision Stumps for Object Recognition

Finite Element Methods for the Poisson Equation and its Applications

2D Image Processing Feature Descriptors

Transcription:

Breaking it Down: The World as Legos Benjamin Savage, Eric Chu To devise a general formalization for identifying objects via image processing, we suggest a two-pronged approach of identifying principal parts and model fitting to encode spatial relationship. We begin with a relatively simple object, the cube. Additional motivation for this choice was the fact that much of the man made world is composed of cubic shaped structures, so cube recognition could serve as an input to later, more complex object recognition tasks. Our recognition formalization begins with training SVMs to identify the principle parts of an object. For our project we used an SVM to pick out cube corners like those appearing in man-made objects. In the second stage of our formalism, information from the detected locations of the principle object components is used to fit a pre-defined model to the location data. We then used the minimal least-squares affine transformation to fit an arrangement of seven corners our cube-model to various permutations of the detected corners. This model contains information about the spatial relationships between the constituent corners and successfully distinguishes between possible cubes and impossible cubes. Introduction Current object recognition algorithms face the challenge of generalization. It is possible for a machine to identify a teacup if the teacup is presented to the machine in the same size and orientation; otherwise, the machine falters in its recognition. In an attempt to model human vision and human spatial cognition, the purpose of this project is to take a step towards making object detection and recognition rotationally invariant, scale invariant, and partially occlusion invariant. By constructing object models and discovering a least-squares mapping from a suspect object to its model, we hope to achieve that end. The Overview After a cursory observation of the human visual system, we generated a list of three major elements that contribute to the flexibility of human object recognition:. Using parts and their relationships to identify the whole 2. Using context to identify the object 3. Being able to mentally manipulate / rotate the object We have successfully implemented very basic elements of each of these three components in our attempt to construct a better object recognition algorithm. Figure diagrams the general flow chart devised for this particular scheme. Image In SVM Ranker Model Fitting Figure Flow chart of algorithm. Object Out The SVM step performs the operation of decoding contextual information as well as identifying fundamental parts of an object. The Model Fitting step provides the computer with a mathematical framework with which to rotate, scale, translate, and skew the object. It also encodes the spatial relationships between the fundamental parts of an object. Using these steps in conjunction with each other, we are able to identify very basic cubes. From now on, we shall discuss how the algorithm applies to identifying cube-like objects and conclude by elaborating on how it can be generalized to a broader framework.

The SVM Since the simple building blocks of a cube are its corners, the main purpose of the SVM / machine learning algorithm is to reinforce or discourage certain classifications of pixels as cube-corners. These corners will be used later in object mappings. In early incarnations of this project, we attempted to use the Harris corner detector to detect cube-corner candidates. This approach was unsuccessful because it only used local pixel data and did not use contextual clues to make predictions. We had much more success with the SVM classifier by using large image patches and geometrical information to encode contextual knowledge into our classifier. We first converted our color image to an intensity image, and then took the gradient of that image. To recognize cube-corners in this image we extracted 5x5 pixel image patches to make sure that some amount of context was observed. Experimentation with training an SVM to directly classify image patches of this kind showed little promise for many reasons. Therefore, to both endow the SVM with a sense of the spatial ties between the various pixels in this patch, and to substantially reduce the dimension of the space, we found the projection of this 260- vector (5x5 = 260) into a 277 dimensional space. The projection basis chosen was a collection of geometric masks that were blurred significantly. Below (figure 2) are artistic renditions of 3 such spaces. These subspaces found trends in various dimensions, while remaining invariant to a good deal of noise. Furthermore, they allowed us to highlight significant characteristics of a corner in a 5x5 patch (such as an edge emanating from the center pixel). Using this technique gave superior results to using PCA to create a basis of eigen-corners, and far superior results to using the direct 5x5 cutout. In the final step, the 277 vector given by projecting into this subspace is scaled to unit length to eliminate the variation caused by differences in lighting. This 277 element vector was used as training and test vectors in an SVM. Using a Gaussian kernel in our SVM with a γ of 5, we were able to obtain less than 3% error on a leave-one-out cross-validation test. As another form of validation we ran the SVM on every single 5x5 image patch in an image not used in our training set. Every pixel is labeled with its margin from the support vectors. More confident points are highlighted in red. The results are shown below. These results were even more encouraging than the LOOCV test error. Not only did the SVM find most cube-corners in the image, but the confidence of the prediction seems to be well correlated with how well centered the corner is in the cut-out. We described this image as a corner-heatmap. (Note: our SVM was further improved after using it to create this image, but as the image took 30 minutes to generate we decided not to update this image.) Figure 2 Artistic renditions of our basis masks applied on to 5x5 image patches. 2

Thus, we will model the positions of all of the corners of the cube in the image as an affine transformation of the x(j), added to a 2-D Gaussian error term, as in equation. ( ) ( i) z i = Ax + ε () Figure 3 Original photo (left) the Cornell Box and corner-heatmap (right). Cube Regression We developed mathematical machinery to grapple with the task of identifying cubes from our detected corner data. The general idea is laid out in figure 4. We modeled a perspective of a cube (or even more generally, any object with highly distinguishable principle image components such as corners) as a set of 3-D homogeneous points X = {x(), x(2),, x(m)}. When appearing in an image, this set of points may be rotated, scaled, and translated. On top of this there may be random noise, or perhaps distortion due to the fact that the cube is being viewed from a slightly different perspective. Cube Model Note that x is in homogenous coordinates (so that A contains translation information). This model allows us to find the Likelihood of this arrangement of the corners given the parameters of the affine transformation. We even briefly toyed with the idea of viewing this problem in a Bayesian framework where there was some prior distribution on the transformation A, although we didn t have time to explore this idea in depth. We found the maximum likelihood estimate of the affine transformation to be as follows. ( T T W = S X X S X ) (2) ( i) A = Z W (3) i In equation (2), Z (i) is a 3 x m matrix of the observed corner locations (of image i) in homogenous coordinates, (see figure 5), X is also a 3 x m matrix but this time of the model points, and S is a diagonal matrix of weightings used to weight the error of each point relative to one another. (It is the inverse covariance matrix when the distribution of noise around each point is assumed to be a circular Gaussian). Observed corners with wireframe maximum likelihood estimate cube. Figure 4. Diagram of cube regression in action. 3

generated by the same model under an affine transformation A_i, for i = to n, and treating the A_i as the latent variables, we were able to find the best cube-model. We were also able to find the variance of error for each individual point to create the weighting matrix S. Experimentation showed that using this S gave superior results to the non-weighted case of simply using the identity matrix for S. The EM algorithm was extremely efficient and converged in around 0 steps. Figure 6 shows the results of this algorithm after convergence. (i) Z = 68 00 0 0 0 22 98 63 68 24 62 24 220 207 Figure 5. (Top) An example training image. (Bottom) The Z (i) -matrix used in equation 3. Note that the 2 nd point is missing (and hence all zeros). We were pleased with the resemblance of this weighted regression formula to the normal equations. In our tests we found that this worked quite well. This formula allowed us to quickly find the likelihood that a set of points was generated by a given model. As desired it is completely invariant to translation, rotation in the plane, scaling, and is mildly robust to minor distortion. We even found a method of fitting less than the full m points by setting ones to zeros in the homogenous coordinate. If the SVM did not observe a corner (either it was not in the image, or the SVM simply made an error), A was still calculated, using the remaining data from the other points (with the zero-ed column). This allowed us to realize the dream of making a model robust to partial occlusion. To generate the cube-model we derived a form of the EM algorithm. By finding the ordered locations of all visible corners in hundreds of observed cubes {Z_, Z_2,, Z_n}, we generated a training set. By assuming each training example was Figure 6. Note the seven major clusters corresponding to the seven prominent corners of a cube. Also, note that the center corner has the most variance. This mathematical side of our project was far more satisfying than the image processing portion. We were able to apply the techniques taught in class (Linear regression, EM algorithm, multivariate gaussians, maximum likelihood estimation), to achieve the goal of storing the spatial relationships of parts of an object in a mathematical model, and to create an affine transformation invariant object detection framework. We encountered one truly difficult challenge however when it came time to implement this methodology: factorial blowup. Unfortunately, the order of the corner points is important, and so there was a factorial blow up. When 50 cube corners were detected in an image, say, we had to try all 4

[( 7 C 0 )* ( 50 P 7 )] + [( 7 C )* ( 50 P 6 )] + [( 7 C 2 )* ( 50 P 5 )] + [( 7 C 3 )* ( 50 P 4 )] possible permutations (each bracketed term corresponds to a certain number of unobserved corners). There is still hope that one might solve this problem. It makes sense to use the confidence of the SVM prediction to try the most confident corners first. There are other optimizations and heuristics to try here for ordering that search, but we didn t have time to implement any of these methods. It may also be possible that this algorithm is useful just not in this particular context. It might be more suited to eliminating options rather than to find the optimal one. Using the simple heuristic (ordering by SVM confidence), we were able to identify a cube in simple cube images (see figure 7). different corners of a cube. This would have to be done in the general object case. (For example, using this framework to detect human faces, one would train a nose SVM, a right eye SVM, a mouth SVM, etc ). This drastically reduces the number of permutations of image locations to try in the model fitting step of the paradigm. We lacked enough training data to create robust SVMs for each type of corner, we would have needed several hundred more images of cubes. The other problem, (particular to the cube) is that every principle-part is the same as every other one, just rotated in 3 dimensions. Tiering is another idea to for future research to pursue. Once the method in this paper is perfected to recognize small objects, we treat collections of objects in the same fashion to recognize more complex, compound objects, such as cities, freeways, or airplane formations. By constructing, say, a model of a collection of cubes to represent a city. Finally, the weakness of the SVM is that it is a binary classifier. Ultimately, to create a machine capable of identifying all objects, one would like to not enter image patches into thousands of various SVMs. Ideally one could create an n-object-classifier, whose running time did not vary with n. This might improve detection of every kind of object, cube corners would be better defined in contrast to all other objects than with simply cube-corners and non-cube-corners. Figure 7. (Top row) Shows the result of running simple heuristics on the given cube before finally choosing the top 7. (Bottom row). Shows the results of model fitting to the top 6 or 7 points. The Future One idea we had and experimented with was to train multiple SVMs to identify the Acknowledgements We d like to acknowledge Geremy Heitz for initial guidance, Carl Erickson for mathematical assistance, and Thorsten Joachims for SVM_light. And, of course, Andrew Ng for wonderful instruction that made this possible. 5

References Cornell Box, The. Cornell University Program of Computer Graphics. 5 Dec, 2006. <http://www.graphics.cornell.edu/online/box/> Corner Detection. Wikipedia. 5 Dec, 2006. <http://en.wikipedia.org/wiki/corner_detection> Derpanis, Konstantinos G. The Harris Corner Detector. 5 Dec, 2006. <www.cse.yorku.ca/~kosta/compvis_notes/harris_detector.pdf> 6