Invariant Feature Extraction using 3D Silhouette Modeling

Similar documents
A Hybrid Feature Extractor using Fast Hessian Detector and SIFT

Robot localization method based on visual features and their geometric relationship

A Method to Eliminate Wrongly Matched Points for Image Matching

Appearance-Based Place Recognition Using Whole-Image BRISK for Collaborative MultiRobot Localization

Lecture 10 Detectors and descriptors

A Fuzzy Brute Force Matching Method for Binary Image Features

Local features and image matching. Prof. Xin Yang HUST

Lecture 4.1 Feature descriptors. Trym Vegard Haavardsholm

Image Matching Using SIFT, SURF, BRIEF and ORB: Performance Comparison for Distorted Images

IMPROVING DISTINCTIVENESS OF BRISK FEATURES USING DEPTH MAPS. Maxim Karpushin, Giuseppe Valenzise, Frédéric Dufaux

Ensemble of Bayesian Filters for Loop Closure Detection

Video Processing for Judicial Applications

Multi-view stereo. Many slides adapted from S. Seitz

A hardware design of optimized ORB algorithm with reduced hardware cost

A NEW ILLUMINATION INVARIANT FEATURE BASED ON FREAK DESCRIPTOR IN RGB COLOR SPACE

Yudistira Pictures; Universitas Brawijaya

International Journal of Advanced Research in Computer Science and Software Engineering

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

Local Patch Descriptors

Component-based Face Recognition with 3D Morphable Models

A Comparison of SIFT, PCA-SIFT and SURF

COMPARISON OF FEATURE EXTRACTORS FOR REAL- TIME OBJECT DETECTION ON ANDROID SMARTPHONE

Indian Currency Recognition Based on ORB

Eligible Features Segregation for Real-time Visual Odometry

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

A System of Image Matching and 3D Reconstruction

ARTVision Tracker 2D

Click to edit title style

IJSER. 1. Introduction

CS4670: Computer Vision

Image Features: Detection, Description, and Matching and their Applications

3D reconstruction how accurate can it be?

LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS

Feature descriptors and matching

The Brightness Clustering Transform and Locally Contrasting Keypoints

Improving Visual SLAM Algorithms for use in Realtime Robotic Applications

Motion Estimation and Optical Flow Tracking

ANALYSIS OF REAL-TIME OBJECT DETECTION METHODS FOR ANDROID SMARTPHONE

Evaluation and comparison of interest points/regions

A Novel Extreme Point Selection Algorithm in SIFT

Available online at ScienceDirect. Procedia Computer Science 22 (2013 )

Adaptive Zoom Distance Measuring System of Camera Based on the Ranging of Binocular Vision

A Comparison of SIFT and SURF

Determinant of homography-matrix-based multiple-object recognition

Lucas-Kanade Scale Invariant Feature Transform for Uncontrolled Viewpoint Face Recognition

HUMAN TRACKING SYSTEM

Object Recognition Algorithms for Computer Vision System: A Survey

Fuzzy based Multiple Dictionary Bag of Words for Image Classification

Detecting Multiple Symmetries with Extended SIFT

Fast Natural Feature Tracking for Mobile Augmented Reality Applications

International Journal Of Global Innovations -Vol.6, Issue.I Paper Id: SP-V6-I1-P01 ISSN Online:

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Unconstrained Face Recognition using MRF Priors and Manifold Traversing

SIFT: Scale Invariant Feature Transform

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim

Modern to Historic Image Matching: ORB/SURF an Effective Matching Technique

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Analysis of Feature Detector and Descriptor Combinations with a Localization Experiment for Various Performance Metrics

Automatic Gait Recognition. - Karthik Sridharan

Fast Image Matching Using Multi-level Texture Descriptor

Person re-identification in multi-camera system by signature based on interest point descriptors collected on short video sequences

arxiv: v1 [cs.cv] 28 Sep 2018

Hand Posture Recognition Using Adaboost with SIFT for Human Robot Interaction

3D object recognition used by team robotto

arxiv: v1 [cs.cv] 1 Jan 2019

Is ORB Efficient Over SURF for Object Recognition?

K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors

Key properties of local features

A Robust Feature Descriptor: Signed LBP

Review on Feature Detection and Matching Algorithms for 3D Object Reconstruction

An Algorithm for Medical Image Registration using Local Feature Modal Mapping

Stereo Vision. MAN-522 Computer Vision

Robust Binary Feature using the Intensity Order

Person identification from spatio-temporal 3D gait

Real-time Textureless Object Detection and Recognition Based on an Edge-based Hierarchical Template Matching Algorithm

ICICS-2011 Beijing, China

The Gixel Array Descriptor (GAD) for Multi-Modal Image Matching

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

SURF applied in Panorama Image Stitching

arxiv: v1 [cs.cv] 28 Sep 2018

A Keypoint Descriptor Inspired by Retinal Computation

Rotation Invariant Finger Vein Recognition *

Publications. Books. Journal Articles

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

ROS2D: Image Feature Detector Using Rank Order Statistics

Retrieving images based on a specific place in a living room

Real-time Vehicle Matching for Multi-camera Tunnel Surveillance

Augmenting Reality, Naturally:

Feature Detection and Matching

Face Recognition using SURF Features and SVM Classifier

INTRODUCTION TO ARTIFICIAL INTELLIGENCE

A Study on Low-Cost Representations for Image Feature Extraction on Mobile Devices

Thin Plate Spline Feature Point Matching for Organ Surfaces in Minimally Invasive Surgery Imaging

Fast and Effective Visual Place Recognition using Binary Codes and Disparity Information

An Evaluation of Volumetric Interest Points

ActivityRepresentationUsing3DShapeModels

Patch Descriptors. CSE 455 Linda Shapiro

SCALE INVARIANT TEMPLATE MATCHING

Keywords Wavelet decomposition, SIFT, Unibiometrics, Multibiometrics, Histogram Equalization.

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN

Transcription:

Invariant Feature Extraction using 3D Silhouette Modeling Jaehwan Lee 1, Sook Yoon 2, and Dong Sun Park 3 1 Department of Electronic Engineering, Chonbuk National University, Korea 2 Department of Multimedia Engineering, Mokpo National University, Korea 3 IT Convergence Research Center, Chonbuk National University, Abstract - One of the major challenging tasks in object recognition results from the great change of object appearance in the process of perspectively projecting objects from 3-dimensional space onto 2-dimensional image plane with different viewpoints. In this paper, we proposed a method to extract features invariant to limited movements of objects by constructing a 3-D model using silhouettes of objects from images with multiple viewpoints. We investigated several renowned invariant features to find the most appropriate one for the proposed method, including SIFT[5], SURF[6], ORB[7], BRISK[8]. The simulation results shows that all the invariant features tested work well and the SURF performs best in terms of matching accuracy. Keywords: Invariant Feature, Shape From Silhouette, Intelligence Surveillance System 1 Introduction Accurate recognition of 3-dimensional objects in 2- dimensional images is the most crucial and difficult task in image understanding. There are many possible factors making the recognition task challenging such as information loss from perspective transformation, illumination effects and various appearance of non-rigid body objects[1]. Especially, movements of non-rigid objects in the 3-D space may significantly change images of objects so that matching models in the database with input object images may experience a large difference. There have been many techniques to resolve the difficulties by using color information, face recognition, partbased recognition, video based gait recognition[2][3][4], etc. Popular image-based local feature description methods such as SIFT[5], SURF[6], ORB[7], BRISK[8] are used to deal with the movements of objects by designing the features invariant to the appearance changes. These feature description methods may work well for a given situation, however, the matching accuracy may need to be improved for the case of recognizing very flexible objects with various viewpoints. In this paper, We proposed an invariant feature extraction method using a 3-D modelling based on silhouettes of objects from multiple images with different viewpoints. The method firstly construct 3-D models with shape from silhouette approach and then use these models to extract invariant features applicable for any viewpoint. To determine the best feature description method for the proposed method, we also investigate several state-of-the-art feature description methods including the four image-based local feature description methods mentioned above. 2 Proposed Feature Extraction Method The overall block diagram of the proposed extraction method is depicted in Fig. 1. It consists of a feature extraction block and a test phase block. The first clock is to construct a 3-D model with multiple images and to extract features after projecting the constructed model onto a 2-D plane according to the angle obtained in the pose detection step. This feature extraction method can generate features for any viewpoint changes that can be used to compare to the actual features from test images. We reconstructed 3D models using the Shape From Silhouette (SFS), described in the Ref. 15, which requires relatively fewer images than other 3-D modelling techniques. Shape From Silhouette is a shape reconstruction method which constructs a 3D shape estimate of an object using silhouette images of the object[16]. In this step, 3D model is trained using multiple images. A set of reconstructed 3-D models can be stored in a database each representing an object at the training phase. These models can be projected onto any 2-D image plane with a specific viewpoint and to be used to extract local features using one of the existing popular methods such as ORB, BRISK, SIFT and SURF shown at the second and third steps. The two steps are later used to verify the existence of objects for test images with additional information from the test stage. At the test stage, new test images are presented to the system. A test image is firstly used to segment out object regions and then extract features for the regions. The extracted features from the current input image are compared with those from the 3-D models with a set of viewpoints for

matching. If a maximum matching score above a certain threshold value, we accept the input image containing the specific object with a viewpoint. There can be a series of comparison to find the best possible matching. Figure 1. Overall Block Diagram In this paper, we focused on selecting the most appropriate feature description method which shows the highest matching accuracy. We used four feature description methods: ORB, BRISK, SIFT and SURF. For this purpose, we use the input data set with known viewpoints representing the target objects and compare these objects to those objects reconstructed from the 3-D models. Each feature description method used in this paper generates two sets of feature points for a target object and a reconstructed object. Then each feature point in a target set searches for a feature point in another set for matching. Matching of a point is defined as true if the relative distance between two points is less than a predefined threshold value. The matching accuracy between the two sets of feature points is then determined as in Eq. 1. TP FP N u m b e r o f tru e m a tc h in g fe a tu re p o in ts N u m b e r o f fa ls e m a tc h in g fe a tu re p o in ts M a tc h in g A c c u r a c y TP T P F P Fig. 2 shows an example of true and false matching. the objects on the left and right are from input reference image and reconstructed image with reduced size, respectively. In the figure, a false matching and a true matching between feature points are shown as red and blue line segments. (1) Figure 2. True and False matching 3 Experiments and Discussions Two data sets, Visual Geometry Group data set[13] and Yasutaka Furukawa and Jean Ponce data set[14], are used for the experiment. The Visual Geometry Group data set contains 36 720x576 images for an object all with different viewpoints. The Yasutaka Furukawa and Jean Ponce data set also contains 24 200 1500 images for another object. Fig 3 shows two example images from each data set. We used the shape from silhouette(sfs)functions introduced by Lore Shure[11] to perform the 3D modeling and the projection of the constructed 3D model. We used feature description methods

Figure 3. Used image for reconstructing 3D model (up)visual Geiometry Group data set (down)yasutaka Furukawa and Jean Ponce data set implemented in opencv library to extract local features for ORB, BRISK, SIFT and SURF. 3.1 3D Silhouette Modeling A 3-D model of an object is reconstructed with different number of images, using the SFS. For this experiment, we tested with 4, 8 or 36 images for the reconstruction. The angles between two images become 90, 45, and 10 for 4, 8, and 36 training images, respectively. Fig. 4 shows the 3D modeling results of the Visual Geometry Group data set. Three 3Figure 2Reconstructed 3D model using Visual Geometry Group data set-d models are reconstructed first with three different number of images and the models are projected for two different viewpoints. For the original images as targets with two different viewpoints, shown in Fig. 4a, the projected images from three 3-D models shown in Fig.4 (b)-(d). As we can expect, the more images with different viewpoints are used, the better the reconstructed image quality is. Fig. 5 shows the 3-D modelling results the Yasutaka Furukawa and Jean Ponce data set. 3.2 Matching Accuracy of Feature Extraction Methods Before testing the matching accuracy of the invariant features from 3-D modelling, we executed a simple effectiveness and accuracy test to each the feature extraction method. To know the performance of each method, a reference image is modified with a gaussian smoothing and a resizing operations and the matching between the original and the modified versions are performed. Fig. 6 shows the images used for this experiment. Fig. 6a-c shows the 256 256 original image, and the modified images with Gaussian smoothing and the resizing to half size. Figure 4. Reconstructed 3D model using Visual Geometry Group data set

Figure 6. Images for simple matching accuracy Number of target s features Smoothing Accuraccy (%) Number of target s features Resizing Accuraccy (%) ORB 460 100 240 82 BRISK 173 9 58 0.6 SIFT 1197 95 341 100 SURF 569 73 163 47 Table 1. Matching accuracy for simple 2-D image Figure 5. Reconstructed 3D model using Yasutaka Furukawad and Jean Ponce data set Table 1 shows the accuracy measurement results. In the experiment, a matching is defined as true if the distance between locations of two feature points is less than 10 pixels. As we can see in the table, BRISK extracts the less number of feature points and the accuracy is very low. For the case of SURF, we assume that it produces enough number of feature points but the accuracy is not high enough. The number of extracted features of ORB, SURF and SIFT are abundant, the accuracy is relatively high. Thus we assume that ORB and SIFT are good features to use for these types of modifications. Especially, the SURF is very robust to smoothing operation and the SIFT is robust to resizing operation. Fig. 7 shows an example of feature matching between a projected image from its 3D model reconstructed with 8 images and the corresponding target image. Experimental results show rather lower accuracy than the simple image example. In case of comparison between the actual image and the projected image from a 3D model, SIFT has best accuracy. In this experiment, we measure the accuracy between images from a 3D model with more viewpoints and images from a 3D model with a less number of viewpoints. The reason of this comparison is to produce target images with more details than Figure 7. Feature Matching example between target Image and a projected image from 3D model test images. In case of comparison between images from a 3D model and another 3D model, SURF has best accuracy, over ORB and SIFT. 4 Conclusion Identifying an 3-dimensional object appeared in different angles is a very challenging task in computer vision, even if

Accuracy (%) Reference Target ORB BRISK SIFT SURF Real 4 Images 0.73 2.16 6.61 2.32 VGG Real 8 Images 0.34 1.46 4.29 3.68 36 Images 4 Images 18.75 18.7 23.08 24.2515 36 Images 8 Images 34.74 17.62 33.33 38.157 YF&JP Real 4 Images 3.94 6.34 5.56 5.60 8 Images 4 Images 38.30 16.42 16.67 13.56 we exclude external factors making the recognition even worse. In this paper, we used the shape from silhouette technique to reconstruct 3-D models of objects with multiple images. The reconstructed 3-D model is used to produce a 2- D projected image with a specific viewpoint for comparison using renowned feature description methods, including ORB, BRISK, SIFT and SURF. The reconstructed 3D models from multiple images contains more details as the more training images are used. When a better reconstructed model is used for testing, the matching accuracy becomes higher. Although there are some positive evidences for automatic generation of invariant features using 3-D modelling, generally speaking, matching performance of feature points are not good enough for the current set of feature extraction methods and different types of features should be developed for this purpose. We will further search for better features and 3-D models to automatically generate invariant features using 3D models. 5 Acknowledgement This work was supported by the Brain Korea 21 PLUS Project, National Research Foundation of Korea and by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(2013R1A1A2013778). Table 2. The accuracy of feature about reconstructed model [3] Alper Yilmaz, Omar Javed, and Mubarak Shah, "Object Tracking: A Survey", ACM Computer Surveys, Vol.38, No.4, Article 13, Publication date: December 2006 [4] Laurenz Wiskott, Jean-Marc Fellous, Norbert Kruger, and Christoph von der Malsburg, "Face recognition by elastic Bunch graph matching", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 19, NO. 7, JULY 1997 [5] Lowe, D. G., Distinctive Image Features from Scale- Invariant Keypoints, International Journal of Computer Vision, 60, 2, pp. 91-110, 2004 [6] H. Bay, T. Tuytelaars, and L. Van Gool. SURF: Speeded up robust features., Computer Vision ECCV 2006, pages 404 417, 2006 [7] Ethan Rublee, Vincent Rabaud, Kurt Konolige, Gary R. Bradski, "ORB: An efficient alternative to SIFT or SURF". ICCV 2011: 2564-2571. [8] Stefan Leutenegger, Margarita Chli, Roland Y. Siegwart, "BRISK: Binary Robust invariant scalable keypoints," iccv, pp.2548-2555, 2011 International Conference on Computer Vision, 2011 6 References [1] S. Fleck and W. Straber, Privacy sensitive surveillance for assisted living a smart camera approach, Handbook of Ambient Intelligence and Smart Environments, Springer, pp.985-1014, 2010 [2] Amit A. Kale, Aravind Sundaresan, A. N. Rajagopalan, Naresh P. Cuntoor, Amit K. Roy Chowdhury, Volker Kruger, and Rama Chellappa. "Identification of humans using gait", IEEE Transactions on Image Processing, 13(9):1163-1173, September 2004. [9] Pierre Moreels and Pietro Perona, "Evaluation of Features Detectors and Descriptors based on 3D objects", ICCV2005, Vol.1, pp.800-807, 2005 [10] Powers, David M W, "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies, Vol.2, Issue.1, pp.37 63, 1970 [11] Loren Shure, "Carving a Dinosaur", http://blogs.mathworks.com/loren/2009/12/16/carving-adinosaur/, [Access: 2014.04.19]

[12] OpenCV User Site, http://opencv.org [Access: 2014.05.19] [13] Visual Geometry Group, "Dino data", Department of Science, University of Oxford, http://www.robots.ox.ac.uk/~vgg/data1.html [Access:2014.04.19] [14] Yasutaka Furukawa and Jean Ponce, "3D Photography Dataset", Beckman Institute and Department of Computer Science University of Illinois at Urbana-Champaign [15] Gloria Haro, "Shape from Silhouette Consensus", Pattern Recognition, Vol. 45, No. 9, pp. 3231-3244, 2012 [16] Kong-man (German) Cheung, Simon Baker and Takeo Kanade, "Shape-from-Silhouette Across Time - Part I: Theory and Algorithms", International Journal of Computer Vision, Vol. 63, pp. 225-245,