Adaptive Feature Selection via Boosting-like Sparsity Regularization

Similar documents
Graph Matching Iris Image Blocks with Local Binary Pattern

An efficient face recognition algorithm based on multi-kernel regularization learning

Efficient Iris Spoof Detection via Boosted Local Binary Patterns

A Feature-level Solution to Off-angle Iris Recognition

Boosting Ordinal Features for Accurate and Fast Iris Recognition

Discriminative classifiers for image recognition

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

HW2 due on Thursday. Face Recognition: Dimensionality Reduction. Biometrics CSE 190 Lecture 11. Perceptron Revisited: Linear Separators

Subject-Oriented Image Classification based on Face Detection and Recognition

Experts-Shift: Learning Active Spatial Classification Experts for Keyframe-based Video Segmentation

Face detection and recognition. Detection Recognition Sally

ECG782: Multidimensional Digital Signal Processing

K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors

A Cascade of Feed-Forward Classifiers for Fast Pedestrian Detection

Voxel selection algorithms for fmri

A General Greedy Approximation Algorithm with Applications

Face Detection Using Look-Up Table Based Gentle AdaBoost

Supplementary material: Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features

All lecture slides will be available at CSC2515_Winter15.html

Combining SVMs with Various Feature Selection Strategies

Generic Object Detection Using Improved Gentleboost Classifier

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging

Generic Face Alignment Using an Improved Active Shape Model

PV211: Introduction to Information Retrieval

A New Strategy of Pedestrian Detection Based on Pseudo- Wavelet Transform and SVM

Lecture 7: Support Vector Machine

Kernel Methods & Support Vector Machines

Skin and Face Detection

Bagging and Boosting Algorithms for Support Vector Machine Classifiers

The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R

Describable Visual Attributes for Face Verification and Image Search

Introduction to Support Vector Machines

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

Face Recognition using SURF Features and SVM Classifier

Application of Support Vector Machine Algorithm in Spam Filtering

Support Vector Machines + Classification for IR

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Video Inter-frame Forgery Identification Based on Optical Flow Consistency

Active learning for visual object recognition

Object Detection Design challenges

Optimal Extension of Error Correcting Output Codes

Binary Online Learned Descriptors

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Feature-level Fusion for Effective Palmprint Authentication

[2008] IEEE. Reprinted, with permission, from [Yan Chen, Qiang Wu, Xiangjian He, Wenjing Jia,Tom Hintz, A Modified Mahalanobis Distance for Human

Hand Posture Recognition Using Adaboost with SIFT for Human Robot Interaction

Embedded Palmprint Recognition System on Mobile Devices

Iris Recognition for Eyelash Detection Using Gabor Filter

A novel template matching method for human detection

Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels

Choosing the kernel parameters for SVMs by the inter-cluster distance in the feature space Authors: Kuo-Ping Wu, Sheng-De Wang Published 2008

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN

Face Detection and Alignment. Prof. Xin Yang HUST

Large synthetic data sets to compare different data mining methods

Combine the PA Algorithm with a Proximal Classifier

Real-time Object Classification in Video Surveillance Based on Appearance Learning

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Chap.12 Kernel methods [Book, Chap.7]

Facial Expression Classification with Random Filters Feature Extraction

Software Documentation of the Potential Support Vector Machine

Self Lane Assignment Using Smart Mobile Camera For Intelligent GPS Navigation and Traffic Interpretation

Video annotation based on adaptive annular spatial partition scheme

Computer Vision Group Prof. Daniel Cremers. 6. Boosting

Support Vector Machines.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

STA 4273H: Statistical Machine Learning

Post-Classification Change Detection of High Resolution Satellite Images Using AdaBoost Classifier

A Novel Extreme Point Selection Algorithm in SIFT

Keywords:- Object tracking, multiple instance learning, supervised learning, online boosting, ODFS tracker, classifier. IJSER

Out-of-Plane Rotated Object Detection using Patch Feature based Classifier

Classification of Digital Photos Taken by Photographers or Home Users


Generic Object Class Detection using Feature Maps

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Writer Authentication Based on the Analysis of Strokes

Data mining with Support Vector Machine

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation

Naïve Bayes for text classification

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur

Chakra Chennubhotla and David Koes

Generic Object-Face detection

6 Model selection and kernels

A Feature Selection Method to Handle Imbalanced Data in Text Classification

Comparison of Optimization Methods for L1-regularized Logistic Regression

Table of Contents. Recognition of Facial Gestures... 1 Attila Fazekas

Eye Detection by Haar wavelets and cascaded Support Vector Machine

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

Client Dependent GMM-SVM Models for Speaker Verification

Variable Selection 6.783, Biomedical Decision Support

Support Vector Machines

Human detection using local shape and nonredundant

Seminary Iris Segmentation. BCC448 Pattern Recognition

Classifier Case Study: Viola-Jones Face Detector

Logistic Regression: Probabilistic Interpretation

4820 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 12, DECEMBER 2015

Selection of Scale-Invariant Parts for Object Class Recognition

Bagging for One-Class Learning

Gene Expression Based Classification using Iterative Transductive Support Vector Machine

Support Vector Machines

Aggregating Descriptors with Local Gaussian Metrics

Transcription:

Adaptive Feature Selection via Boosting-like Sparsity Regularization Libin Wang, Zhenan Sun, Tieniu Tan Center for Research on Intelligent Perception and Computing NLPR, Beijing, China Email: {lbwang, znsun, tnt}@nlpr.ia.ac.cn Abstract In order to efficiently select a discriminative and complementary subset from a large feature pool, we propose a two-stage learning strategy considering both samples and their features simultaneously, namely sample selection and feature selection. The objective functions of both stages are consistent with a large margin loss. At the first stage, the support samples are selected by Support Vector Machine (SVM). At the second stage, a Boosting-like Sparsity Regularization (SRBoost) algorithm is presented to select a small number of complementary features. In detail, a weak learner is composed of a few features, which are selected by a sparsity enforcing model, and an intermediate variable is gracefully used to reweight the corresponding sample. Extensive experimental results on the CASIA-IrisV4. database demonstrate that our method outperforms the state-of-the-art methods. 1 Support Samples 2 Sparsity reweight S 1 3 Boosting-Like S 2 4 Keywords-feature selection; Boosting; sparse; Sample Selection Feature Selection I. INTRODUCTION Feature selection aims to select a small subset of compact and discriminative features. In biometrics, object classification and recognition, one image is usually represented by local feature descriptors, such as SIFT [1], and Ordinal Measures (OM) [13], which are extracted at every pixel by certain filters. So a large feature pool is generated, and it is over complete to describe the image itself. In this case, feature selection is brought to deal with high dimensional data. Many related algorithms have been presented during the past several decades. Among them, AdaBoost [6], [14] is a class of successful methods. They select a new feature (weak learner) on the reweighted samples heuristically. Recently, sparsity enforcing models [7][15][11] have attracted great attention, and achieved competitive performance especially in the case of small scale of training samples. These sparse models address the feature selection problem as an l or l 1 regularization optimization in common, which enforces the weights of features sparse. Besides the regularization term, the loss function is another significant element. Destrero et.al. [5] adopt Least Squares (LS) directly as a loss function with application to face detection. He et.al. [7] propose a correntropy based robust estimation loss to tackle the non-gaussian noise. And Wang et.al. [15] formulate feature selection as a Linear Programming (LP) model with a large margin loss, which is robust to noise and outliers as well. In the above models, complementary features are not explicitly taken into consideration, such Figure 1: Flowchart of the proposed method. (1) Sample selection ( 1-2 ). The points inside the dash ellipse line are selected samples. (2) Feature selection ( 3-4 ) by SRBoost. The bold dash lines (S 1, S 2 ) are the selected features by the Simplex algorithm (the polyhedron) respectively. that similar features will share large weights simultaneously. Moreover, the optimization of a sparse model usually involves the computation of matrix [11][7], which is timeconsuming in general. In summary, although the sparsity regularization methods have achieved promising performance, there are some limitations, e.g., they are lacking of considering the distribution of samples. And the complementation of features is not explicitly taken into account. To regard the above problems, in this paper, we propose a two-stage learning strategy, including sample selection and Boosting-like learning. The loss functions of two stages are consistent to promote the performance. At the first stage, the support samples are selected by SVM, which has the Hinge loss function with a large margin principle. At the second stage, a Boostinglike sparsity regularization (SRBoost) algorithm is designed to select the complementary features. In detail, SRBoost iteratively selects a small number of features by sparsity enforcing model, in which a large margin loss is added as well. And the complementary features are selected, because the features selected in each iteration classify training samples with different weights. In general, the proposed model

with the large margin loss can be formulated as a linear programming problem, which can be efficiently solved by an iterative Simplex algorithm. Figure 1 illustrates the flowchart of the proposed method. II. BOOSTING-LIKE SPARSITY REGULARIZATION MODEL A. Notations and primary settings Without loss of generality, we consider a binary classification problem, because multi-class problem can be transferred into intra-matching class and inter-matching class for feature selection problem. Assuming that class labels are linear mapping results of feature spaces, we learn the linear function by minimizing the mean squared error. Here {y y j, y j {+1, 1}} denotes the class label, and {X x, x {x +, x }} denotes a data set of D dimensional features, wherein {x +, x } represents the positive and negative samples respectively. The linear decision hyperplane is y Xw =, where w represents the weight vector. B. Sample selection In the first stage, a preprocessing strategy of sample selection is applied to reduce the scale of training samples, simultaneously, the distribution of samples will be maintained for classification. Sampling technique may be an offthe-shelf solution. The classic statistic bootstrap and n out of m bootstrap are important general resampling approaches. However, in order to hold the distribution of the original data, sufficient times of sampling should be taken. From another perspective, in this paper, we take good advantage of the Support Vector Machine (SVM). As we know, the output of SVM is a function of selected support vectors, therefore, sample selection here is implemented in this supervised way. Considering the efficiency, we use the linear SVM without kernels to generate samples. The objective function takes the form [2]: L(w, b, a) = 1 N 2 w 2 2 a n {y n (w T x n + b) 1} (1) n=1 And the linear SVM can be efficiently solved by the Sequential Minimal Optimization (SMO) algorithm [4], [3]. In addition, the support vectors have a good property that they are close to the decision boundary. And they reflect the distribution of samples relatively hard to be classified. Thus it is reasonable that the following feature selection can be just deployed on the selected samples. Figure 1( 1-2 ) shows the process of the sample selection. Generally, the time complexity of sample selection is worthy compared to the following selection step. It is worth mentioning that the Hinge loss of SVM is provided with large margin criterion, which has a close relationship with the following feature selection. Furthermore, the rest of training samples can be cast as a validation set for cross validation. C. SRBoost 1) learning weak classifiers: This step aims to construct weak learners as Adaboost, the difference lies in the learning approach rather than hand-craft one. Specifically, The first step of the second learning stage is to select features by sparsity regularization, the few learnt features constitute one weak classifier. As previously mentioned, the linear decision hyperplane is y Xw =, The original sparsity enforcing methods [5][9] can be summarized as: w = arg min y Xw 2 2 + λ w 1 (2) w To further improve the performance, a robust estimator φ( ) is introduced to deal with non-gaussian noise [7]: w = arg min w N φ((y i X i w)) + λ w 1 (3) i=1 The robust functions have the property that φ(x) is stable even if the independent variable x is very large, e.g., φ(x) = 1 exp( x 2 ) [7], which is different from original least squares loss. In order to be consistent with the objective function (Equation (1)) of sample selection, we still employ a large margin loss function [15] for learning weak classifiers. And considering the Boosting-like strategy, a weighted term k is gracefully introduced to update samples. Therefore, in this paper, we present a sparsity regularization model as: min w T 1 + λ (k T ξ) w T x + j C+ + ξ j, j = 1...N +. s.t. w T x j C ξ j j = 1...N. w i, ξ j i = 1...D, j = 1...N. (4) where 1 is a vector whose elements are all 1. C + and C are determined empirically constants, λ is a regularized parameter balancing the two parts of the objective function. k is a weighted term, it can control the variation of learnt ξ. The objective function is an l 1 minimization with nonnegative constraints, therefore, the weight w is sparse. The constraint terms are to classify the samples in a supervised way, additionally, loss functions with constraints are also yielded under large margin principle [15], which are elegantly consistent with that of the above sample selection step. Finally, the features are selected according to the weights. The important term in our model is the slack variable ξ. Naturally, it has almost the same effect with robust estimator φ, but it is an adaptively learning-based term. For example, if one sample is interfered with large noise, then the corresponding slack variable ξ is also large automatically. In this case, the response values of noisy samples will be suppressed to ensure the learning performance.

Algorithm 1 Boosting-like Sparsity Regularized Feature Selection (SRBoost) 1: Input: Data X = {X + R N + D, X R N D }. Output: Weight vector W R D. Initialization: k = 1/N. 2: sample selection: ˆX solving Eqn. (1); 3: for m = 1 : M do 4: w (m) solving Eqn. (4) by ˆX; 5: k (m+1) solving Eqn. (5) or (6); 6: end for 7: W = M w (m) m=1 From another perspective, this slack variable can be viewed as the classification rate of training samples. The smaller the slack variable is, the more confidence we have to classify the corresponding samples correctly. This is key to the idea of the following Boosting-like strategy. 2) Sample reweighting: The goal of this step is to reweight the training samples as Adaboost. Specific speaking, the second step of the second learning stage is boosting the selected features. Complementary features are not explicitly taken into consideration in sparsity enforcing selection methods, thus, there are some of the similar features sharing almost the same large weight. Therefore, complementary analysis is necessary to be added directly to reduce the redundancy. Generally, a pair of complementary features can classify different training samples. In order to implement the above idea, we adopt the sample reweighting strategy inspired by the success of Adaboost [6]. Here we deploy two different functions to update the weights of samples, i.e., linear and exponential penalty: k (m+1) = ξ (t) /(1 T ξ (m) ) (5) k (m+1) = exp(ρξ (m) )/Z (6) where ρ is a learning rate, and Z is the factor of normalization. From the objective function of (4), minimizing k j ξ j, we can see that the larger k j is, the smaller ξ j is learned, which means the samples with large weight should be correctly classified. And under the condition of Equation (6) or (5), the samples with large classification error ξ (m) j will have large k (m+1) j in the previous iteration, thus larger weights are added to these samples in the next round, so that these samples are enforced classified with smaller error ξ (m+1) j in the current iteration. In other words, the selected features in different iterations are complementary to deal with different samples. In the linear case, the update rate of samples is fixed without parameters. But the exponential penalty is more flexible with a tunable learning ratio. In the exponential case, different values of the update ratio ρ show different levels of penalty. If ρ is small, it updates the weights more gentle than the linear case. And if ρ is large, it updates the weights more severe than the linear case. Figure 1( 3-4 ) shows the flowchart of SRBoost. Finally, the selected features are the intersection of the results of M interations. The entire model is a LP problem, thus it can be efficiently solved by the Simplex algorithms. Algorithm 1 describes the proposed SRBoost in brief. In summary, the proposed two-stage learning strategy considers both efficiency and effectiveness, as shown in Algorithm 1. Sample selection is introduced to reduce the computation complexity of feature selection, and meanwhile ensures the local distribution of samples close to the decision boundary. Especially, the second stage SRBoost combines the advantages of sparsity regularization and AdaBoost like methods. In addition, the two stages have a close relationship sharing with a consistent large margin loss function. III. EXPERIMENTAL RESULTS In order to verify the performance of our method, we conduct experiments on feature selection of iris images [12], [15], because the local feature descriptors of biometrics are typically high dimensional. A. Datasets. We evaluate our method on two subsets of CASIA- Iris-V4. database [1]. CASIA-Iris-Thousand (Thousand) contains 2, iris images from 1, subjects. We use Distance subset to verify the generalization of these methods. They are both challenging databases. B. Settings. We use the same settings as in [15]. The iris images are all normalized to the size of 7 54 without preprocessing. 5 iris images from 25 subjects (1 images per eye) in the Thousand database are used for training. We generate 2,25 intra-class matching scores as the positive samples, and 4,9 inter-class matching scores as the negative samples, and the rest of the Thousand subset serves as the test set. We adopt the regional OM [13][8] as our local feature, and the matching scores are computed by Hamming distance. 47,42 regional OM features are extracted for selection. We select 15 features for comparison, and they are enough for a competitive performance. C + and C are set to be.4 and.8 respectively. The algorithms involved in comparison are GentleBoost [6], traditional l 1 regularized sparse methods [5], RRLP [15]. C. Evaluations. We will analyze the experimental results from the following two aspects: learning stage and analysis of the results including parameter selection, accuracy and efficiency.

The weight of features 5 Results of the 1st iteration Results of the 2nd iteration 1 2 3 4 5 The index of features x 1 4 Figure 2: The results of feature selection at the first two iterations. Table I: Comparative results on the Thousand database Methods EER @=% GentleBoost [6].96.297 l 1 [5].9.285 RRLP [15].85.244 Proposed SRBoost.42.77 Table II: Comparative results on the Distance database Methods EER @=% GentleBoost [6].756 275 l 1 [5].689 291 RRLP [15].641 197 Proposed SRBoost.613 898 1) Learning stage: We train the models to select several numbers of features on Thousand database for iris recognition. Sample selection: The first step is sample selection. Totally, the number of training samples N is 715, including N + = 225, N = 49. The support samples are selected by linear SVM [3] with default parameters. Here we only focus on the samples rather than the performance of SVM. Then 147 support vectors are extracted which are only about 2% of the original training samples. The other samples distributed beyond the decision boundary are not so crucial for feature selection. Intuitively, they are used as an validation set in the following stage to select optimal parameters of SRBoost. SRBoost learning: In the inner loop, a sparsity regularized feature selection via LP is implemented with fixed C + and C. Initially, the weights of samples are set to be 1/N. Then the samples weights are updated according to the learned slack variable ξ via Equation (5) or (6) at each iteration. The former linear update function is convenient without extra parameters, however the latter exponential function is more flexible with tunable update ratio ρ. The large ρ is suitable for the cases with a small number of Boosting-like iterations. In order to compute conveniently, we implement two rounds of iterations, which is also enough for competitive results. For example, if ρ = 3, then the numbers of features selected at two iterations are 27 and 19, respectively. Figure 2 illustrates the feature selection results at the first two iterations. As shown in Figure 2, the features are different and complementary to some extent. Finally, to fairly compare the performance, we select 15 features for all algorithms. For Lasso and RRLP, we select top 15 features by the absolute value of weights. then in our proposed SRBoost algorithm, we select 8 and 7 features in the two iterations respectively. And SVM is applied as the classifier for iris recognition. 2) Performance analysis: In biometrics, ROC curves and Equal Error Rate (EER) are usually employed as measurements of performance. EER is the rate where False Accept Rate () and False Reject Rate () are equal in the ROC curve. The smaller EER is, the better the performance is. Parameter selection: Firstly, we study the impact of different update functions on the performance. Three models are trained with linear case and exponential case (ρ = 1, ρ = 3) respectively. For simplification, the first 5 classes (left eyes and right eyes of 25 subjects) in Thousand database are selected to test the performance except for the training data. As show in Figure 3(a), the exponential update function performs generally better than the linear case, and the large update ratio obtains the best results, that is because only two Boosting iterations are carried, the severe update function of samples ensures the better complementarity of features. The features selected at the second iteration are more prone to classify the hard samples misclassified at previous iteration. To further prove the explanation, the same three models are fed to the Distance database, which is different from the training data. From Figure 3(b), we can see the similar results. The performance of the three models is closer than results on the Thousand database because of the generalization, i.e., the capacity of the training models is not so strong, which suggests that the quality of two subset is of great differences. Comparative results: Secondly, we compare the proposed method to other three state-of-the-art algorithms. The rest images of Thousand database are all testing set, which has 8775 intra-class matching and 19275 interclass matching. The sufficient number of samples is enough for testing the algorithms. We adopt exponential update function (ρ = 3) due to the analysis above. As shown in Figure 3(c), RRLP has better results than l 1 sparse method, because it deploys a more robust large margin based loss function. Then the proposed SRBoost performs best, which explains that the Boosting strategy works better, it is necessary to explicitly consider the complementarity of features. The EER and at =% are illustrated in Table I. EER of our method is improved by nearly 5% compared with other methods. Considering the applicability, at =% is.77, which is much lower than

.6.4.3.2.1 ρ=1 exp ρ=3 exp Linear EER curve ρ=1 exp ρ=3 exp Linear EER curve.8.6.4.2 GentleBoost L1 RRLP SRBoost EER Curve GentleBoost L1 RRLP SRBoost EER Curve 1 4 1 2 1 1 4 1 2 1 1 6 1 4 1 2 1 1 4 1 2 1 (a) (b) (c) (d) Figure 3: The ROC curves of feature selection under two kinds of update functions (a) on the Thousand database and (b) on the Distance database. The ROC curves of feature selection compared with other methods (c) on the Thousand database and (d) on the Distance database. The training data sets are from the Thousand database. classical methods. In order to verify the generalization, we also conduct the same experiments on the Distance database, and on this database, we generate 4766 intra-class matching and 32896 inter-class matching. ROC curves are shown in Figure 3(d). The recognition rates are consistent with the results on the Thousand database, SRBoost also obtains the best performance, although the results on the Distance database are not so good as those on the Thousand database, our algorithm still shows its potential of good generalization. IV. CONCLUSION In this paper, we have proposed a two-stage learning strategy, including sample selection and feature selection, to select features. Our method considers the samples and their high dimensional features simultaneously, additionally the loss functions of both stages are gracefully consistent based on large margin principle. At the first stage, the support samples are selected by SVM regarding the distribution of training samples. At the second stage, a Boosting-like sparsity regularization (SRBoost) algorithm is presented to select a small number of complementary features. The experimental results on the CASIA-IrisV4. database have demonstrated that our method outperforms the state-of-theart methods. ACKNOWLEDGMENT This work is funded by the National Basic Research Program of China (212CB3163), National Natural Science Foundation of China (Grant No. 61273272, 6113155), International S&T Cooperation Program of China (Grant No. 21DFB1411) and Instrument Developing Project of the Chinese Academy of Sciences (Grant No. YZ21266). REFERENCES [1] CASIA Iris-V4. Database, http://biometrics.idealtest.org/. [2] M. Bishop, in Pattern Recognition and Machine Learning, 26. [3] C.-C. Chang and C.-J. Lin, Libsvm: A library for support vector machines, ACM TIST, vol. 2, no. 3, p. 27, 211. [4] C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, vol. 2, no. 3, pp. 273 297, 1995. [5] A. Destrero, C. De Mol, F. Odone, and A. Verri, A regularized framework for feature selection in face detection and authentication, IJCV, vol. 83, pp. 164 177, 29. [6] J. Friedman, T. Hastie, and R. Tibshirani, Additive logistic regression: a statistical view of boosting, Annals of Statistics, vol. 28, p. 2, 1998. [7] R. He, T. Tan, L. Wang, and W.-S. Zheng, l2, 1 regularized correntropy for robust feature selection, in CVPR, 212, pp. 254 2511. [8] Z. He, Z. Sun, T. Tan, X. Qiu, C. Zhong, and W. Dong, Boosting ordinal features for accurate and fast iris recognition, in CVPR, june 28, pp. 1 8. [9] Y. Liang, S. Liao, L. Wang, and B. Zou, Exploring regularized feature selection for person specific face verification, in ICCV, nov. 211, pp. 1676 1683. [1] D. G. Lowe, Distinctive image features from scale-invariant keypoints, IJCV, vol. 6, no. 2, pp. 91 11, 24. [11] F. Nie, H. Huang, X. Cai, and C. H. Q. Ding, Efficient and robust feature selection via joint ;2, 1-norms minimization, in NIPS, 21, pp. 1813 1821. [12] J. Pillai, V. Patel, R. Chellappa, and N. Ratha, Secure and robust iris recognition using random projections and sparse representations, TPAMI, vol. 33, no. 9, pp. 1877 1893, 211. [13] Z. Sun and T. Tan, Ordinal measures for iris recognition, TPAMI, vol. 31, no. 12, pp. 2211 2226, dec. 29. [14] P. A. Viola, M. J. Jones, and D. Snow, Detecting pedestrians using patterns of motion and appearance, IJCV, vol. 63, no. 2, pp. 153 161, 25. [15] L. Wang, Z. Sun, and T. Tan, Robust regularized feature selection for iris recognition via linear programming, in ICPR, Nov. 212, pp. 3358 3361.