Pictures at an Exhibition

Similar documents
EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

Outline 7/2/201011/6/

SCALE INVARIANT FEATURE TRANSFORM (SIFT)

Obtaining Feature Correspondences

Implementing the Scale Invariant Feature Transform(SIFT) Method

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Local Features Tutorial: Nov. 8, 04

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

The SIFT (Scale Invariant Feature

SIFT: Scale Invariant Feature Transform

Identification of Paintings in Camera-Phone Images

School of Computing University of Utah

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale.

A Novel Algorithm for Color Image matching using Wavelet-SIFT

Local Feature Detectors

SIFT - scale-invariant feature transform Konrad Schindler

Scale Invariant Feature Transform

Ulas Bagci

Scale Invariant Feature Transform

Local Features: Detection, Description & Matching

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

Implementation and Comparison of Feature Detection Methods in Image Mosaicing

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

CAP 5415 Computer Vision Fall 2012

A Comparison of SIFT and SURF

Computer Vision for HCI. Topics of This Lecture

Eppur si muove ( And yet it moves )

Object Detection by Point Feature Matching using Matlab

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

CS231A Section 6: Problem Set 3

Key properties of local features

Distinctive Image Features from Scale-Invariant Keypoints

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

3D Reconstruction From Multiple Views Based on Scale-Invariant Feature Transform. Wenqi Zhu

Panoramic Image Stitching

IMAGE-GUIDED TOURS: FAST-APPROXIMATED SIFT WITH U-SURF FEATURES

Comparison of Feature Detection and Matching Approaches: SIFT and SURF

Introduction to Medical Imaging (5XSA0)

Object Reconstruction

Coarse-to-fine image registration

convolution shift invariant linear system Fourier Transform Aliasing and sampling scale representation edge detection corner detection

Faster Image Feature Extraction Hardware

Motion illusion, rotating snakes

Comparison between Various Edge Detection Methods on Satellite Image

Feature Detection and Matching

Interactive Museum Guide: Accurate Retrieval of Object Descriptions

Yudistira Pictures; Universitas Brawijaya

Distinctive Image Features from Scale-Invariant Keypoints

Edge Detection Lecture 03 Computer Vision

CS5670: Computer Vision

Verslag Project beeldverwerking A study of the 2D SIFT algorithm

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Distinctive Image Features from Scale-Invariant Keypoints

Filtering and Enhancing Images

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Advanced Video Content Analysis and Video Compression (5LSH0), Module 4

EE795: Computer Vision and Intelligent Systems

Peripheral drift illusion

Line, edge, blob and corner detection

Computer Vision. Recap: Smoothing with a Gaussian. Recap: Effect of σ on derivatives. Computer Science Tripos Part II. Dr Christopher Town

Identifying and Reading Visual Code Markers

A Comparison and Matching Point Extraction of SIFT and ISIFT

NOWADAYS, the computer vision is one of the top

Image Matching. AKA: Image registration, the correspondence problem, Tracking,

CS201: Computer Vision Introduction to Tracking

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Digital Image Processing (CS/ECE 545) Lecture 5: Edge Detection (Part 2) & Corner Detection

Pictures at an Exhibition: EE368 Project

Efficient SLAM Scheme Based ICP Matching Algorithm Using Image and Laser Scan Information

Robotics Programming Laboratory

Scale Invariant Feature Transform (SIFT) CS 763 Ajit Rajwade

Digital Image Processing

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

2D Image Processing Feature Descriptors

Chapter 3 Image Registration. Chapter 3 Image Registration

Feature Matching and Robust Fitting

THE ANNALS OF DUNAREA DE JOS UNIVERSITY OF GALATI FASCICLE III, 2007 ISSN X ELECTROTECHNICS, ELECTRONICS, AUTOMATIC CONTROL, INFORMATICS

Scale Invariant Feature Transform by David Lowe

Keypoint detection. (image registration, panorama stitching, motion estimation + tracking, recognition )

An Exploration of the SIFT Operator. Module number: P00999 Supervisor: Prof. Philip H. S. Torr Course code: CM79

Edge and local feature detection - 2. Importance of edge detection in computer vision

Lecture 10 Detectors and descriptors

Edge detection. Gradient-based edge operators

TA Section 7 Problem Set 3. SIFT (Lowe 2004) Shape Context (Belongie et al. 2002) Voxel Coloring (Seitz and Dyer 1999)

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Experiments with Edge Detection using One-dimensional Surface Fitting

Anno accademico 2006/2007. Davide Migliore

Performance Evaluation of Scale-Interpolated Hessian-Laplace and Haar Descriptors for Feature Matching

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM INTRODUCTION

3D Environment Reconstruction

A Hybrid Feature Extractor using Fast Hessian Detector and SIFT

Visual Tracking (1) Tracking of Feature Points and Planar Rigid Objects

Designing Applications that See Lecture 7: Object Recognition

Lecture 7: Most Common Edge Detectors

CS143 Introduction to Computer Vision Homework assignment 1.

Local features: detection and description. Local invariant features

Motion Estimation and Optical Flow Tracking

Spatial Enhancement Definition

Transcription:

Pictures at an Exhibition Han-I Su Department of Electrical Engineering Stanford University, CA, 94305 Abstract We employ an image identification algorithm for interactive museum guide with pictures taken by camera phones. Our algorithm is based on the shift-invariant feature transform (SIFT) algorithm. However, efficiency is a critical issue in the real-time interactive system, so a simplified version of SIFT algorithm is implemented in this work. Our testing database includes 33 paintings at the Cantor Art Center. Each painting has 3 images which are different in scale and in perspective angle. Thus, there are total 99 training images. We also take extra 20 extra pictures for stress tests, such as linear motion and rotation. The results show that even though our algorithm is a simplified version of the SIFT algorithm, it is still robust for interactive museum guide. I. INTRODUCTION Interactive museum guide can provide the information about the exhibition requested by visitors immediately. One approach is the tour-guide robot [1]. The hardware of the robot includes different types of sensors such as camera, sonar, and infrared sensor, and the software integrates AI methods. The museum tour-guide robot is powerful; however, it costs too much and lacks scalability, that is, it cannot work distributedly at different places of the museum at the same time. Another approach is RFID sensing board [3], but this is mainly for kids. Therefor, camera phones are chosen as the interactive user interface, because they are the most accessible image sensors and can work distributedly. Image identification is a critical step for cameraphone based interactive museum guide, especially when the pictures taken by camera phones have low quality. The direct method for image identification is matching by correlation [2]. The database image is used as a template to search through the object image and to find the region with strongest correlation. Exhausted computation is needed for this method, and the correlation metric is not robust under scale. The other method of image identification is feature detection in scale-space, such as junction detection and blob detection [4], and this is more robust under scale. In this work, we employ a simplified scaleinvariant feature transform (SIFT) algorithm [5] for efficient image identification. The orientation assignment step in the SIFT algorithm is skipped, because the pictures taken at an exhibition will mostly be horizontal or vertical with small rotation angle. The museum image database includes three snapshots for 33 paintings in the Cantor Art Center. Each image in the database is three-megapixel; however, the information in the image downsampled by a factor of 4 is enough for correct identification, that is, the effective sizes of training images are 512 34. The details of the algorithm is described in section II. Section III shows the test results of our algorithm for the Cantor Art Center database. II. ALGORITHM The first step of the SIFT algorithm is scaleinvariant interest point detection. Then we will assign a descriptor vector for the area around each interest point. The matching between two images is processed by comparing these descriptor sets of both images. We will describe the implementation details in the following. A. Difference of Gaussian pyramid The scale space of an image I(x, y) is defined as the solution to the diffusion equation t L(x, y, t) = 1 2 2 L(x, y, t) (1) with initial condition L(x, y, 0) = I(x, y) [4]. It can be shown that convolution of the image and the Gaussian kernels satisfies (1) and forms the scale space pyramid: L(x, y, t) = g(x, y, t) f(x, y) (2)

where g(x, y, t) = 1 x 2 +y 2 2πt e 2t. (3) Fig. 1 shows an example of Gaussian pyramid with three octaves. The leftmost images are the finest scales of each octave, and the smaller images are at coarse scales. Although more levels of octaves increase robustness, it also increases computation complexity. After considering that the pictures can be taken at a small distance at the Cantor Art Center and their sizes are large enough, we choose the number of octaves to be three. Blob detection finds the regions which is brighter or darker than the surrending. The Laplacian of the scale-space image L(x, y, t) has large positive values at darker blobs and strong negative values at bright blobs of extent t. The Laplacian of Gaussian can be approximated by the difference of Gaussian which is easier to calculate. D(x, y, t) = [g(x, y, k 2 t) g(x, y, t)] I(x, y) = L(x, y, k 2 t) L(x, y, t). (4) Fig. 2 shows the difference between adjacent scales of Fig. 1. In this work, we generate the first octave of Gaussian pyramid with initial scale σ 0 = t 0 = 1.6, and choose k 2 = 2 to generate higher scale by gradually blurring: L(x, y, k 2 t) = g(x, y, (k 2 1)t) L(x, y, t) L(x, y, 2t) = g(x, y, t) L(x, y, t). (5) With gradually blurring, we can truncate the size of Gaussian kernel to 4 t which is smaller than 4 2t used for directly blurring. After generating five scales for the first octave, we downsample the third image at scale σ = 3.2 by a factor of 2 and get the first image of next octave. Repeating the process above, we can have three octaves with five scales in each octave. Then the difference of Gaussian pyramid can be obtained by subtraction of adjacent Gaussian blurred images. The difference of Gaussian pyramid can be used to detect interest points. B. Local maxima and minima detection The interest point candidates are the local maxima and minima of difference of Gaussian. The local extreme values are search through the neighbors at the same scale, the 2 corresponding points at higher and lower scale, and their 16 neighbors. We use gray level erosion and dilation with 3-by-1 and 1-by-3 structuring elements to find local maxima and minima at the same scale, and then compare these local maxima and minima to adjacent scales. In order to increace the robustness of interest points, we have to refine these interest point candidates. C. Interest points refinement The SIFT algorithm refines the interest point candidates by considering the Taylor expansion at the local maxima and minima of difference of Gaussian. Let x be the offset from the local extreme value point. D(x) = D + D T x + 1 2 D x 2 xt x x (6) 2 The accurate offset of extreme value is 1 D ˆx = 2 D (7) x 2 x The Taylor expansion evaluated at the accurate offset is D(ˆx) = D + 1 T D ˆx () 2 x We iteratively refine the location of interest points and discard the interest points with low contrast which has D(ˆx) in () less than 0.01. D. Descriptor assignment We assume that pictures taken at the exhibition are within small rotation angles with respect to horizon. The orientation assignment step of the SIFT algorithm is skipped. SIFT descriptors include information of gradient histograms near interst points, so we first calculate the gradient magnitude and angle of interest points at their closest scale: m(x, y) = [(L(x + 1, y) L(x 1, y)) 2 + (L(x, y + 1) L(x, y 1)) 2 ] 1/2 1 L(x, y + 1) L(x, y 1) θ(x, y) = tan L(x + 1, y) L(x 1, y) An window with size 16 16 is applied at each interest point, and the sampling period is the closest integer to the square root of scale t. The gradient magnitudes are first weighted by a Gaussian

Fig. 1. Gaussian pyramid with three octaves Fig. 2. Difference of Gaussian pyramid calculated from Fig. 1 function with σ =, and the gradient angles are quantized to equally spaced directions. The 16 16 block is divided into sixteen 4 4 blocks with 16 block centers. The gradient magnitudes are bilinearly added to the corresponding gradient angle bins of four nearest centers to avoid sharp transition of gradient histogram. In Fig. 3, the gradient magnitude of the shadowed pixel is added to the four centers it points to. The gradient magnitude is multiplied by 5 5 = 25 64 and added to the angle histogram of the upper left center. Then it is multiplied by 3 5 = 15 and added 64 to the angle histogram of the upper right center, and so on. In practice, we calculate the bilinear coefficients matrix in advance to efficiently generate gradient histograms. An region around a block center is considered in Fig. 4, we represent pixels in this region by a matrix M. It can be shown that the bilinear coefficient vector c is c = [ 1 3 5 7 7 5 3 1 ]T. (9) The weighted value at the block center is c T Mc. We can generalize the coeffcient vector c to a matrix and calculate in the similar way to obtain gradient histograms. These 16 gradient histograms with angle bins form the 12-dimension descriptor. The descriptor vector is normalized to achieve brightness invariance. E. Nearest neighbor matching We use nearest neighbor method to match descriptors of two different images. We pick one descriptor of the object image and find the nearest and the second nearest descriptors of the database image. If the angle of the best one is smaller than 0.6 times of the angle of the second best one. Then we call the descriptor matched; otherwise, the descriptor does not find any match descriptor in the database image. III. EVALUATION Each painting has snapshots in the Cantor Art Center database, but we only need one set of descriptors for each painting. Thus, we divide the 99 training images into three levels. The first level includes images having the most matches to images of the same painting and also having the fewest matches to images of different paintings. The second and the third levels include images which are less distinguishable. However, if the numbers of

3/ 5/ 3/ 5/ Fig. 5. An example of linear motion with length equal to 40 along 45 degrees Fig. 3. The gradient of the shadowed pixel is bilinearly added to four nearest centers. Fig. 4. region around a 4 4 block center matches between the object image and two firstlevel database images are too close, or they are too small compared with the number of descriptors, we can use the second level or even the third level database to increase the success probability of correct identification. The thresholds for deciding whether the second or third database is used are determined experimentally. In order to test the robustness of our algorithm, it is passed through the following stress tests: 1) Images outside the training set We take 20 extra pictures at the Cantor Art Center to test the training set with 99 images. Some of the test images has different perspective angles, and some of the test images include additional background objects. After slightly adjusting the threshold for identification, we can identify all of thest test images perfectly. 2) Linear motion Linear motion is a common type of degradation of pictures taken by camera phones. Thus, we apply linear motion filtering on the test set described above. Our algorithm still can identify these 20 images correctly for linear motion with length smaller than 12 and several different angles. If the test image has similar perspective angle with the training image, even linear motion length around 40 will not cause false identificaion. An example of linear motion is shown in Fig. 5 3) Rotation Because the orientation assignment step of the SIFT algorithm is skipped, the rotation test is challenging. The result of rotation test shows that our image identification algorithm can work properly under roation angle smaller than ten degrees. Therefore, it is reasonable to skip the orientation assignemnt step for the museum image database to increase efficiency. Fig. 6 shows an exmple of 10-degree rotation of one of the test images. IV. CONCLUSION There is trade-off between robustness and efficiency of image identification. In the real-time interactive museum guide, we should focus more on

Fig. 6. An exmaple of rotaion of 10 degrees the efficiency. Thus, we employ a simplified version of the SIFT algorithm. The results of stress tests show that our algorithm still has enough robustness for interactive museum guide application. REFERENCES [1] W. Burgard, A. Cremers, D. Fox, D. Hähnel, G. Lakemeyer, D. Schulz, W. Steiner, and S. Thrun. The interactive museum tour-guide robot. In Proc. of the Fifteenth National Conference on Artificial Intelligence (AAAI-9), 199. [2] R. C. Gonzalez, R. E. Woods, and S. L. Eddins. Digital Image Processing Using MATLAB. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 2003. [3] F. Kusunoki, M. Sugimoto, and H. Hashizume. Toward an interactive museum guide system with sensing and wireless network technologies. In WMTE 02: Proceedings IEEE International Workshop on Wireless and Mobile Technologies in Education, pages 99 102, Washington, DC, USA, 2002. IEEE Computer Society. [4] T. Lindeberg. Scale-Space Theory in Computer Vision. Kluwer Academic Publishers, Norwell, MA, USA, 1993. [5] D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60(2):91 110, 2004.