Keypoint Recognition with Two-Stage Randomized Trees

Size: px

Start display at page:

Download "Keypoint Recognition with Two-Stage Randomized Trees"

Emil Edwards
5 years ago
Views:

1 1766 PAPER Special Section on Machine Vision and its Applications Keypoint Recognition with Two-Stage Randomized Trees Shoichi SHIMIZU a) and Hironobu FUJIYOSHI b), Members SUMMARY This paper proposes a high-precision, high-speed keypoint matching method using two-stage randomized trees (RTs). The keypoint classification uses conventional RTs for high-precision, real-time keypoint matching. However, the wide variety of view transformations for templates expressed by RTs make it diffidult to achieve high-precision classification for all transformations with a single RTs. To solve this problem, the proposed method classifies the template view transformations in the first stage and then, in the second stage, classifies the keypoints using the RTs that corresponds to each of the view transformations classified in the first stage. Testing demonstrated that the proposed method is 88.5% more precise than SIFT, and 63.5% more precise than using conventional RTs for images in which the viewpoint of the object is rotated by 70 degrees. We have also shown that the proposed method supports real-time keypoint matching at 12 fps. key words: keypoint matching, viewpoint estimation, randomized trees 1. Introduction Technology for automatic recognition of specific objects in images holds promise for implementation in a variety of fields and is an important research topic in the field of computer vision. In the field of Intelligent Transportation System (ITS), the automatic recognition of road signs is being studied as a way to support safety in driving [1]. Another application is a product ordering system that uses automatic recognition of objects photographed by a cell phone camera [2]. Other devices equipped with cameras, such as smartphones, tablets, and portable game devices also have many uses for augmented reality (AR). An essential technology for AR function is the ability to recognize in real time known markers in images acquired by the camera. Implementation of such specific object recognition requires a recognition method that is robust against view changes such as image rotation, scale, illumination, and viewpoint. One approach to dealing with view changes that has attracted attention in recent years is to match corresponding points by using features obtained from local regions in the image. Conventional methods that use local features for corresponding point matching can be divided into two types: those that use high-performance local features and those that Manuscript received November 7, Manuscript revised January 23, The author is with the Department of Computer Science, Chubu University, Kasugai-shi, Japan. Presently, with the advanced Technology R&D Center, Mitsubishi Electric Corporation, Amagasaki-shi, Japan. a) Shimizu.Shoichi@ab.MitsubishiElectric.co.jp b) hf@cs.chubu.co.jp DOI: /transinf.E95.D.1766 introduce a training algorithm. The former type is typified by Scale Invariant Feature Transform (SIFT) [3]. SIFT is robust against image rotation, changes in scale, and changes in illumination, and so is capable of highly accurate matching. In recent years, PCA-SIFT [4], which increases the descriptive power of SIFT, GLOH [5], Shape Context [6], and ASIFT [7] have been proposed to achieve higher matching accuracy. However, these SIFT based approaches suffer from the problem of high computational cost. Although faster versions of SIFT (SURF [8] and Fast Approximated SIFT [9]) have been proposed, real-time processing remains difficult at this time. On the other hand, a method that uses a training algorithm to train Randomized Trees (RTs) for keypoint classification off-line and uses the trained trees to classify keypoints during on-line processing [10] has been proposed. Reference [10] applies affine transforms to a single template image to generate training images that represent various pseudo changes in view. Those images are then used to train RTs [11], resulting in keypoint classification that is robust to view changes. The RTs technique implements corresponding point search as decision tree traversal and is capable of high-speed classification of keypoints. In recent years, this method has been developed further, and there are reports that it can even run on low-memory mobile devices [12], [13]. However, methods based on Ref. [11] have low matching accuracy when there are large view changes in the image. One cause of this problem is that there are various kinds of view changes in the template represented by RTs, so a single RT cannot easily achieve highly accurate keypoint classification with respect to all of the view changes. To solve this problem, we propose here a keypoint classification method that uses two-stage RTs. The viewpoints of the input image are classified in the first stage; in the second stage, keypoint classification is performed using the RTs trained with image viewpoints that are near those classified in the first stage. Because an RTs that has been trained on images visually close to the input image can be used for the classification, improved keypoint classification can be expected. 2. Keypoint Classification with Randomized Trees In this section, we explain the work of Ref. [10], which is the basis of the research reported here. Lepetit and Fua [10] proposed a keypoint classification method to which a train- Copyright c 2012 The Institute of Electronics, Information and Communication Engineers

2 SHIMIZU and FUJIYOSHI: KEYPOINT RECOGNITION WITH TWO-STAGE RANDOMIZED TREES 1767 Fig. 1 Keypoint detection by approximated LoG. ing algorithm is introduced. In the training, RTs [11] are used to enable high-speed keypoint classification. Also, images to which affine transforms have been applied, are used as training data to achieve robustness against view changes. 2.1 Keypoint Detection First, the keypoints needed for corresponding point matching are detected. Keypoint detection involves raster scanning of the template and calculating the Laplacian of gaussian (LoG) response values for pixels m = (x,y)asshown in Fig. 1 (a). The pixels whose eight neighbors have the lagest response values are selected as keypoint candidates. In Ref. [10], LoG is approximated by Eq. (1) for faster computation. LoG(m) α [0,π) I σ (m dr α ) I σ (m) + I σ (m + dr α ), (1) where, I σ is the smoothed image and dr α = (R cos α; R sin α), where R is the radius. The magnitude of R is varied to obtain the largest response value, and that value of R serves as the keypoint scale value. The approximated LoG image is shown in Fig. 1 (b). Next, the detected keypoint candidates that are robust against affine transformations are selected to achieve robustness against view changes. The template is subjected to transformation by multiple affine parameters and keypoint candidates are detected in the transformed images. The keypoints that are robust against affine transformation can be calculated by selecting only the keypoint candidates that were detected in many multiple affine transform images. The selected keypoints are then used to train the RT. 2.2 Randomized Trees The RTs technique is a training method for multi-class discrimination problems. This technique has been attracting attention in recent years for robustness against noise in the training sample and fast processing with high discrimination accuracy. In Ref. [10], RTs are applied to the keypoint classification problem. The construction of RTs is described in the following sections Training First, subsets of training samples are created to train the decision trees. A subset consists of 32x32 patches that are centered on randomly selected keypoints. A decision tree is trained with one subset. Decision trees Decision tree T = {T 1,, T L } comprises nodes and leaf nodes. L is the number of subsets, which corresponds to the number of trees. A node has a condition (split function) for branching to child nodes. The numbers of levels of split functions and nodes are set in advance. Nodes The node split function takes the intensity relationship of two pixels m 1 and m 2 randomly selected from a patch as a feature and determines the branch destination, as shown by Eq. (2). The left and right sides of Eq. (2) indicate the left and right child nodes. { left If Iσ (P, m C 2 (m 1, m 2 ) = 1 ) I σ (P, m 2 ). right otherwise (2) I σ (P, m) is the intensity of pixel m in patch P. This feature captures the difference in intensity, so the output of the split function does not change unless the intensity magnitude relation of the two pixels changes. This technique is thus robust against changes in illumination. The node branching ends when the predefined node depth is reached. Terminal nodes, called leaf nodes, hold a probability distribution of keypoints. Leaf nodes Leaf nodes hold probability distributions P η(tl,p)(y(p) = c) for arriving at keypoints. The term η(t l, P) isthe leaf node of decision tree T l for arriving at input patch P, and c is the recognition number of template keypoints. Y is a function that returns a class number from a leaf node when patch P is given to the decision tree Classification The keypoints are classified by a set of decision trees trained with all subsets, T = {T 1,, T L }. The flow of classification is illustrated in Fig. 2. The keypoints are detected in the input image and a 32 x 32 patch is constructed from each keypoint. The keypoint patches are input to the decision tree to obtain the leaf node probability distributions of the keypoints, P η(tl,p)(y(p) = c) by tree transversal. Then, the keypoints that have the greatest mean probabilities are output as the corresponding point matching result according to Eq. (3). Ŷ(P) = argmax c 1 L 2.3 Problems L P η(tl,p)(y(p) = c). (3) l=1 The RTs keypoint classification method trains the trees with

1768 Fig. 2 Keypoint classification [10]. Fig. 4 Processing flow for the proposed method. points of a training image that are clustered around K representative viewpoints.

3 1768 Fig. 2 Keypoint classification [10]. Fig. 4 Processing flow for the proposed method. points of a training image that are clustered around K representative viewpoints. In the second stage, keypoint classification is done using the RTs trained with the training images that belong the viewpoint classes identified in the first stage. In this way, the keypoints can be classified with RTs trained with images that have viewpoints close to the input image. The processing flow for the proposed method is shown in Fig. 4. First, training images that represent many different viewpoints of the template are generated and viewpoint clustering is done as preprocessing. Next, the two-stage RTs are trained. Fig. 3 Corresponding point matching result according to Ref. [10]. images that include all of the affine transformations. Thus, there are many different views even for the same keypoint, and it is difficult to correctly represent them all with a single RTs. Furthermore, because affine transforms are used to generate the training images, it is not possible to represent rotation in three dimensions. Another problem is that the training images generation by randomly selecting is inefficiency. Those problems reduce matching accuracy, as shown in Fig. 3. To solve those problems, we represent input image view changes and keypoint classification with two-stage RTs. In generating the training images, we use the Euler angle for representing rotation in three dimensions. Thus, we solve the problem of inefficiency by setting equidistant rotation parameters for each of the X, Y and Z axes. The proposed method is described below. 3. Proposed Method: Corresponding Point Matching with Two-Stage Randomized Trees The proposed method deals with changes in template viewpoint and keypoint classification by training two-stage RTs. In the first stage, the input image viewpoints are classified. Viewpoint classes are groups of the many different view- 3.1 Generation of Training Images The training image generation and viewpoint clustering are described in detail below Three-Dimensional Rotation Training Images In Ref. [10], the affine transform parameters for generating the training images are selected randomly, which creates a problem of inefficiency. To solve those problems, the proposed method uses Euler angles to represent rotation in three dimensions when generating training images (Fig. 5). The affine transform matrix using the Euler angles in a 2-D coordinate is given by A in Eq. (4). [ ][ ] cos(ψ) sin(ψ) cos(θ) 0 A = sin(ψ) cos(ψ) 0 1 [ ] cos(φ) sin(φ), (4) sin(φ) cos(φ) where, A is a 2 x 2 matrix for transformation in a 2-D coordinate system. The problem of inefficiency is solved by generating the training images from the affine transform matrix A, whose Euler angle rotation parameters ψ, θ and φ are set to equal intervals. In the research reported, the rotation ranges for the parameters are φ [0, 360 ), θ [0,90 ), and ψ [0, 360 ). The interval for φ, θ, and ψ is 5, and

SHIMIZU and FUJIYOSHI: KEYPOINT RECOGNITION WITH TWO-STAGE RANDOMIZED TREES 1769 Fig. 7 Viewpoint clustering. Fig. 8 Spherical display of viewpoint clustering results. Fig. 5 Viewpoint. Fig. 6 Example of training image generation.

4 SHIMIZU and FUJIYOSHI: KEYPOINT RECOGNITION WITH TWO-STAGE RANDOMIZED TREES 1769 Fig. 7 Viewpoint clustering. Fig. 8 Spherical display of viewpoint clustering results. Fig. 5 Viewpoint. Fig. 6 Example of training image generation. 93,312 training images are generated from one template image. Examples of the images generated for ψ = 90 and the various values of θ and ψ are shown in Fig Example of Training Image Generation Next, viewpoint clustering is performed. When clustering by the Euler angle X, Y, and Z axis rotation parameters, the periodicity in the rotation cannot be represented. Therefore, the proposed method clusters the viewpoints by the k-means clustering, using the generated patch images as features. Thus, even for image rotations of 0 or 359, clustering of images of close viewpoints is possible. The flow of clustering is shown in Fig. 7. For each training image, a series of linked 32 x 32 patch images centered on the keypoints is created. The patch image series is projected into the intensity feature space and clustered by the k-means clustering. The clustering results are presented in Fig. 8 for the num- Fig. 9 Clustering result. ber of viewpoint classes K = 30, with clusters represented by color coding. From Fig. 8, we see that images whose viewpoints are close can be clustered. The training images included in each class are shown in Fig. 9, from which we see that images whose viewpoints are near each other can be clustered into a single class. 3.2 Two-Stage Randomized Trees Training The proposed method deals with changes in template viewpoint and the keypoint classification problem using twostage RTs training. In the first stage, viewpoint class frequencies are learned by RTs using the relation of patch intensity magnitudes. In the second stage, RTs are trained for each viewpoint class. Accordingly, RTs are created for each of the viewpoint classes indicated by a color in Fig. 8. The

5 1770 two-stage RTs training method is described in detail below First Stage: Training the Viewpoint Classification Randomized Trees The decision tree set T 1 = {T1 1,, T M 1 } for classifying the input image viewpoints is trained. M is the number of decision trees. Decision tree set T 1 is trained by dividing the patches into M subsets. Node branching is determined by the intensity relationship of the keypoint patches in the same way as for Eq. (2). Then, the viewpoint class probability distributions of the leaf nodes are obtained. It is thus possible to classify the viewpoints using the probability distribution of the leaf node arrived at in classification. In the research reported here, the number of patches handled in the first stage is 37 million when the number of training images is m = 93,312 and the number of keypoint classes is c = Second Stage: Training Randomized Trees for Keypoint Classification Fig. 10 Keypoint classification with two-stage decision trees. The decision tree sets for keypoint classification are trained for each viewpoint class. The second stage decision tree set comprises the decision tree set of K viewpoint classes, T 2 = {T1 2,, T K 2 }. The decision tree set for viewpoint class k(k K), Tk 2 = {Tk,1 2,, T k,n 2 } is trained by dividing the patches that belong to viewpoint class k into N subsets. The node branching is determined by the relationship of the intensity magnitudes of the keypoint patches in the same way as Eq. (2). Then, the probability distributions of the leaf node keypoints classes are obtained. In the research reported here, the number of patches handled by RTs of the second stage Tk 2 is 310,000 when the number of training images is m = 93,312, the number of keypoint classes is c = 400, and the number of viewpoint classes is K = Keypoint Classification Using Two-Stage Randomized Trees The flow of keypoint classification is shown in Fig. 10. Viewpoint class k of the input image is classified with decision tree set T 1 from the first stage. Next, the keypoints are classified with decision tree set TK 2 from the second stage decision tree set T 2, which corresponds to the viewpoint class k that was obtained in the first stage First Stage: Viewpoint Classification The classification of viewpoint class k involves obtaining the mean of the probability distribution of the leaf node in the decision tree T 1 = {T1 1,, T M 1 } at which input patch P arrived, P η(t 1 m,p)(y(p) = k), using all of the keypoints on the template as indicated in Eq. (5). If the value exceeds the threshold, th, the viewpoint class is judged to be k. Fig. 11 Example of viewpoint class classification. 1 If 1 M P G(k) = M η(t 1 m,p)(y(p) = k) > th m=1 0 otherwise The threshold value, th, is computed using Eq. (6). th = 0.7 max k 1 M. (5) M P η(t 1 m,p)(y(p) = k). (6) m=1 An example of viewpoint classification is shown in Fig. 11. We can see that the input images are similar to the centroid images of the highest frequency viewpoint classes Second Stage: Keypoint Classification The first stage of viewpoint classification, may result in multiple classes for which G(k) = 1, so we obtain the mean of the leaf node probability distribution P η(t 2 kn,p)(y(p) = c) from the set Tk 2 of multiple decision trees for which G(k) = 1 and use Eq. (7) to assign the keypoints of high probability to class c. Ŷ(P) = arg max c 1 N N n=1 k=1 K G(k)P η(t 2 kn,p)(y(p) = c). (7) The proposed method classifies the input image viewpoints with the first stage RTs, as shown in Fig. 12 (a), and then uses the RTs that have been trained with images whose viewpoints are close to the class 4 centroid image to achieve highly accurate corresponding point matching in the second stage.

SHIMIZU and FUJIYOSHI: KEYPOINT RECOGNITION WITH TWO-STAGE RANDOMIZED TREES 1771 Fig. 13 Data Sets in the experiment. Fig. 12 Example of corresponding point matching. 4.

the number of viewpoint classes and the matching accuracy.

Finally, to show the effectiveness of the proposed method, we present the results of experimental compararison of template corresponding point matching with results from the

The image data from the Morel database included seven images of an object rotated in the range from 10 degrees to 70 degrees (Data Set A) and four images of an object rotated in

The image data from the Mikolajczyk database included three images of an object rotated in the range from 10 degrees to 40 degrees (Data Set C).

2 Experiment Overview We compared SIFT [3], SURF [8], Randomized Trees (RTs) [10], and ASIFT [7] regarding the matching rate obtained from Eq. (8). Fig.

The number of the first stage RTs was M = 30, the number of the second stage RTs was N = 30. The depth of each decision tree was 15. 4.

for comparison of matching accuracy. The results (Fig. 14) show that accuracy increases with the number of viewpoint classes, but the improvement saturates at 30 classes.

6 SHIMIZU and FUJIYOSHI: KEYPOINT RECOGNITION WITH TWO-STAGE RANDOMIZED TREES 1771 Fig. 13 Data Sets in the experiment. Fig. 12 Example of corresponding point matching. 4. Evaluation Experiment The matching performance of the proposed method is related to the number and sizes of the viewpoint classes, so we first determine the relationship between the number of viewpoint classes and the matching accuracy. The viewpoint classes can be assigned manually, so we also compare the matching accuracy of clustering by the proposed method with the result of manual clustering. Finally, to show the effectiveness of the proposed method, we present the results of experimental compararison of template corresponding point matching with results from the conventional method as well as the results of processing time experiments. 4.1 Database We used the Morel database and the Mikolajczyk database. The image data from the Morel database included seven images of an object rotated in the range from 10 degrees to 70 degrees (Data Set A) and four images of an object rotated in the range from 45 degrees to 80 degrees (Data Set B). The image data from the Mikolajczyk database included three images of an object rotated in the range from 10 degrees to 40 degrees (Data Set C). In the experiments, we used one template image and three input images for each set of data (Fig. 13). 4.2 Experiment Overview We compared SIFT [3], SURF [8], Randomized Trees (RTs) [10], and ASIFT [7] regarding the matching rate obtained from Eq. (8). Fig. 14 Performance in relation to number of viewpoint classes. Number of matching successes Matching rate =. (8) Number of matching The RTs were trained with 93,312 images. The number of the first stage RTs was M = 30, the number of the second stage RTs was N = 30. The depth of each decision tree was Viewpoint Clustering Performance The proposed method varies in performance with the number of viewpoint classes, so we varied the number of classes from 5 to 40 in steps of 5 for comparison of matching accuracy. The results (Fig. 14) show that accuracy increases with the number of viewpoint classes, but the improvement saturates at 30 classes. From this result we conclude that 30 is the optimum number of viewpoint classes. Next, we compared matching performance for equalwidth Euler-angle clustering and clustering by the proposed method. Thirty classes are defined with Euler angles (Fig. 15). The results of viewpoint clustering for each template by the proposed method are presented in Fig. 16. The corresponding point matching accuracies are listed in Table 1, from which we conclude that the proposed method improved performance for all templates. Because the clustering is by intensity, it is possible to generate viewpoint yu/research/asift/ vgg/research/affine/

7 1772 Table 2 Data Set A matching rate [%] Avg. 2RTs SIFT SURF RTs ASIFT Fig. 15 Euler-angle clustering result. Table 3 Data Set B matching rate [%] Avg. 2RTs SIFT SURF RTs ASIFT Fig. 16 Viewpoint clustering results for each template. Table 1 Accuracy of corresponding point matching by clustering [%]. Eular angles clustering Automatic clustering Data Set A Data Set B Data Set C Avg classes that take into account large changes in intensity that depend on template texture as well as small changes. 4.4 Comparison with the Conventional Methods We conducted comparison experiments to test the effectiveness of the proposed method. We compared SIFT, SURF, RTs, ASIFT, and the proposed method (2RTs), regarding the matching rate and processing time. The personal computer used in the experiment had a Xeon@2.66 GHz processor. The matching accuracy results are presented in Table 2 through Table 4 for the various sets of image data, with Data Set A in Table 2, Data Set B in Table 3, and Data Set C in Table 4. The corresponding point matching result image for each image data set is shown in Fig. 17. The processing times for keypoint detection and keypoint matching (including keypoint description) are presented in Table 5. Because the keypoint matching for 2RTs and RTs includes the keypoint description processing and the keypoint matching processing, the processing time calculations are done with the keypoint description time for SIFT, SURF and ASIFT counted as matching time. Brute Force approach was used for the matching. The most accurate was ASIFT, followed by the proposed method, RTs, SURF, and finally SIFT. The proposed method is less accurate than ASIFT. However, the processing of the proposed method is about 600 times faster than ASIFT. Compared to the conventional RTs method, the pro- Table 4 Data Set C matching rate [%] Avg. 2RTs SIFT SURF RTs ASIFT posed method is more robust to changes in the image. The reason for this improvement is that, by training the twostage RTs, the many different viewpoints of the template can be limited in the first stage, simplifying the keypoint classification problem for the second stage RTs and thus improving accuracy. The conventional RTs method has the shortest processing time, followed by the proposed method, SURF, SIFT, and then ASIFT. The proposed method uses two-stage RTs for matching, which increases the processing time by a factor of 1.7 relative to the conventional RTs method. Nevertheless, it is still capable of real-time processing at 12 fps. Figure 18 shows the relation between procesing time and accuracy for each method. The proposed method appears in the upper left corner of graph, which indicates that it is effective for real-time processing in scenes that include affin transforms. Which method provides the best performance depends on the application. 5. Conclusion We have proposed a keypoint classification method that uses two-stage Randomized Trees. That method addresses the two problems of changes in template viewpoint and keypoint classification by using two-stage Randomized Trees, which simplifies the classification problem compared to the conventional RTs method. The result is that even if the viewpoint of the target object is rotated by 70 degrees in the input image, an improvement in accuracy of 88.5% relative to SIFT and 63.5% relative to RTs is achieved. We confirmed that the proposed method is capable of corresponding point matching at 12 fps. In future work, we will investigate techniques for training RTs with less memory and on-line training methods.

SHIMIZU and FUJIYOSHI: KEYPOINT RECOGNITION WITH TWO-STAGE RANDOMIZED TREES 1773 Fig. 17 Corresponding point matching results. Table 5 Processing time [ms].

[1] A. Ihara, H. Fujiyoshi, M. Takagi, H. Kumon, and Y.

Iwamura, Robust and efficient recognition of low-quality images by cascaded recognizers with massive local features, Proceedings of the 1st International Workshop on Emergent Issues in Large Amount

8 SHIMIZU and FUJIYOSHI: KEYPOINT RECOGNITION WITH TWO-STAGE RANDOMIZED TREES 1773 Fig. 17 Corresponding point matching results. Table 5 Processing time [ms]. Keypoint detection Keypoint matching Total 2RTs SIFT SURF RTs ASIFT Fig. 18 References Relation between processing time and accuracy. [1] A. Ihara, H. Fujiyoshi, M. Takagi, H. Kumon, and Y. Tamatsu, Improved matching accuracy in traffic sign recognition by using different feature subspaces, Machine Vision Applications 2009 (MVA2009), pp , [2] K. Kise, K. Noguchi, and M. Iwamura, Robust and efficient recognition of low-quality images by cascaded recognizers with massive local features, Proceedings of the 1st International Workshop on Emergent Issues in Large Amount of Visual Data (WS-LAVD2009), pp , Oct [3] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis., vol.60, pp , [4] Y. Ke and R. Sukthankar, PCA-SIFT: A more distinctive representation for local image descriptors, Proc. of IEEE Conference on Computer Vision and Pattern Recognition, vol.2, pp , [5] K. Mikolajczyk and C. Schmid, A performance evaluation of local descriptors, IEEE Trans. Pattern Anal. Mach. Intell., vol.27, no.10, pp.35 47, [6] S. Belongie, J. Malik, and J. Puzicha, Shape matching and object recognition using shape contexts, IEEE Trans. Pattern Anal. Mach. Intell., vol.2, no.4, pp , [7] J.M. Morel and G. Yu, Asift: A new framework for fully affine invariant image comparison, SIAM Journal on Imaging Sciences, vol.2, no.2, pp , [8] H. Bay, T. Tuytelaars, and L.V. Gool, SURF: speeded-up robust features, ECCV, pp , [9] G. Michael, G. Helmut, and B. Horst, Fast approximated SIFT, Proc. of ACCV, pp , [10] V. Lepetit and P. Fua, Keypoint recognition using randomized trees, IEEE Trans. Pattern Anal. Mach. Intell., vol.28, no.9, pp , [11] L. Breiman, Random forests, Mach. Learn., vol.45, no.1, pp.5 32, [12] M. Ozuysal, M. Calonder, V. Lepetit, and P. Fua, Fast keypoint recognition using random ferns, IEEE Trans. Pattern Anal. Mach. Intell., vol.32, no.3, pp , [13] D. Wagner, G. Reitmayr, A. Mulloni, T. Drummond, and D. Schmalstieg, Pose tracking from natural features on mobile phones, Proc. ISMAR 2008, 2008.

1774 Shoichi Shimizu received his Ph.D. in Computer Science from Chubu University, Japan, in 2009. From 2009 to 2010 he was a postdoctoral fellow in Computer Science at Chubu University.

Hironobu Fujiyoshi received his Ph.D. in Electrical Engineering from Chubu University, Japan, in 1997.

9 1774 Shoichi Shimizu received his Ph.D. in Computer Science from Chubu University, Japan, in From 2009 to 2010 he was a postdoctoral fellow in Computer Science at Chubu University. He is now working at the Advanced Technology R&D Center of the Mitsubishi Electric Corporation. His research interests include computer vision and ITS. He is a member of the IPSJ and JSPE. Hironobu Fujiyoshi received his Ph.D. in Electrical Engineering from Chubu University, Japan, in From 1997 to 2000 he was a post-doctoral fellow at the Robotics Institute of Carnegie Mellon University, Pittsburgh, PA, USA, working on the DARPA Video Surveillance and Monitoring (VSAM) effort and the humanoid vision project for the HONDA Humanoid Robot. He is now a professor of the Department of Computer Science, Chubu University, Japan. From 2005 to 2006, he was a visiting researcher at the Robotics Institute, Carnegie Mellon University. His research interests include computer vision, video understanding and pattern recognition. He is a member of the IEEE, the IPSJ, and the IEE.

Specular 3D Object Tracking by View Generative Learning

Specular 3D Object Tracking by View Generative Learning Yukiko Shinozuka, Francois de Sorbier and Hideo Saito Keio University 3-14-1 Hiyoshi, Kohoku-ku 223-8522 Yokohama, Japan shinozuka@hvrl.ics.keio.ac.jp