Distortion-invariant Kernel Correlation Filters for General Object Recognition

Size: px

Start display at page:

Download "Distortion-invariant Kernel Correlation Filters for General Object Recognition"

Nicholas Houston
5 years ago
Views:

1 Distortion-invariant Kernel Correlation Filters for General Object Recognition Dissertation by Rohit Patnaik Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical and Computer Engineering Electrical & Computer Engineering Department Carnegie Institute of Technology Carnegie Mellon University Pittsburgh, Pennsylvania July, 2009

3 Abstract General object recognition is a specific application of pattern recognition, in which an object in a background must be classified in the presence of several distortions such as aspect-view differences, scale differences, and depression-angle differences. Since the object can be present at different locations in the test input, a classification algorithm must be applied to all possible object locations in the test input. We emphasize one type of classifier, the distortion-invariant filter (DIF), for fast object recognition, since it can be applied to all possible object locations using a fast Fourier transform (FFT) correlation. We refer to distortion-invariant correlation filters simply as DIFs. DIFs all use a combination of training-set images that are representative of the expected distortions in the test set. In this dissertation, we consider a new approach that combines DIFs and the higher-order kernel technique; these form what we refer to as kernel DIFs. Our objective is to develop higher-order classifiers that can be applied (efficiently and fast) to all possible locations of the object in the test input. All prior kernel DIFs ignored the issue of efficient filter shifts. We detail which kernel DIF formulations are computational realistic to use and why. We discuss the proper way to synthesize DIFs and kernel DIFs for the wide area search case (i.e., when a small filter must be applied to a much larger test input) and the preferable way to perform wide area search with these filters; this is new. We use computer-aided design (CAD) simulated infrared (IR) object imagery and real IR clutter imagery to obtain test results. Our test results on IR data show that a particular kernel DIF, the kernel SDF filter and its new preprocessed version, is promising, in terms of both test-set performance and on-line calculations, and is emphasized in this dissertation. We examine the recognition of object variants. We also quantify the effect of different constant-valued object backgrounds in training and tests and the effect of non-constant clutter near the test objects on performance scores; these affect the target-to-background contrast ratio and have not been addressed in any prior DIF IR tests. ii

4 Acknowledgments I am extremely grateful to my parents for financing my undergraduate education at Carnegie Mellon University and for helping me in many other ways during my undergraduate and graduate studies. I am thankful to my Ph.D. advisor, Prof. David Casasent, for his continual guidance and support during the Ph.D. program and for his research help in the year preceding that. I appreciate the valuable suggestions of the other members of my dissertation committee Prof. Vijayakumar Bhagavatula, Prof. Tsuhan Chen, and Dr. Abhijit Mahalanobis to improve this dissertation. I would also like to thank fellow graduate student, Yu-Chiang Wang, for many helpful technical (and non-technical) discussions. Support for this work by Raytheon Missile Systems (Tucson, Arizona) is gratefully acknowledged. iii

5 Table of Contents 1. Introduction Overview of Issues Addressed Prior Work SDF filter Mace filter Minace filter Prior kernel DIFs Contributions of this Dissertation Organization of this Dissertation Chapter Chapter Chapter Chapter Chapter 6 and Appendix A New Minace Filter Developments Introduction Remarks on Minace filter formulation Choice of Correlation Plane Metric (PCE or Peak Value) When and Why Linear versus Circular Correlation DIFs are Best Differences between linear and circular correlation DIFs Data resolution affects whether linear or circular correlation DIFs are best Automated Synthesis of DIFs Selection of parameters Minace automated filter-synthesis procedure Summary of Minace Filter Results and Trends DIF Wide Area Search Case Different filter designs considered Filter design and implementation for the fewest number of on-line calculations Filter design and implementation for best performance (P C, P FA ) Conclusions Distortion-invariant Kernel Correlation Filters Introduction Kernel SDF filter formulation Important Issues Concerning the Use of Kernel DIFs Need for fast kernel DIF shifts for general object recognition Significantly larger on-line computation and storage requirements for kernel DIFs iv

6 3.2.3 Proper minimization of correlation plane energy with kernel DIFs Comparison of the on-line computational complexity of different kernel DIFs Different Kernel SDF Filter Formulations Vector-based versus pixel-based image-domain kernel SDF filters FT-domain kernel SDF filters do not allow fast implementation of filter shifts Fast test-set FFT correlation method for Gaussian kernel SDF filters Minace-preprocessed kernel SDF filters Different Formulations of Energy-minimizing Kernel DIFs FT-domain energy-minimizing kernel DIFs Image-domain energy-minimizing kernel DIFs Kernel SDF Wide Area Search Case Kernel SDF filter design and implementation for the chip case Standard kernel SDF filter implementation for wide area search Preprocessed kernel SDF filter design and implementation for wide area search Conclusions Database and Training and Test Procedures Introduction IR Database CAD objects Object variants Different sets of true-class and unseen confuser-class objects used Real clutter Training and Test Procedures for the Chip Case Training Procedures Number of training-set aspect views needed Test Procedures Filter-synthesis procedure to determine when and how to add training-set data at new scales (ranges) to a filter Automated Synthesis of Kernel SDF Filters Selection of parameters Automated standard kernel SDF filter synthesis Automated preprocessed kernel SDF filter synthesis Test Results Introduction Test Results for the Standard Minace Filter Range separation needed between range indices Test results on the baseline 3-class database Test results on the different 3-class databases Test results on the difficult 3-class database Depression-angle tolerance test results Clutter rejection results v

7 5.3 Test Results for Standard and Preprocessed Kernel SDF Filters Range separation used between range indices Test results on the baseline 3-class database Test results on the different 3-class databases Test results on the difficult 3-class database Test results on the 4-class database Clutter rejection results Kernel SDF filter design to reduce the number of training-set images included in a filter Kernel SDF Test Results with Depression-angle Test-set Differences and for Recognition of Variants Depression-angle tolerance test results Test results for recognition of variants of the T72 object Use of the PCE Metric with Kernel DIFs Kernel SDF Test Results using the PCE Metric Proper versus Improper Energy-minimizing Kernel Mace PCE Results Kernel SDF Wide Area Search Test Results with Different Input Sizes and Backgrounds Use of larger 68x36 pixel training-set chips for the wide area search case Use of even larger 132x68 pixel training-set chips for the wide area search case Use of a higher constant 100-valued background in training and in tests Selecting a useful c value for the chip case Chip test results with different c selection methods Test Results for Different Target-to-Background Contrast Ratio Cases Training and testing with different constant-valued backgrounds Tests results for the case of background clutter present near the test objects Conclusions Conclusions and Future Work Contributions of this Dissertation Conclusions of this Dissertation Chapter 2 conclusions Chapter 3 conclusions Chapter 5 conclusions Use of kernels SDF filters with visible imagery and other applications Future Work A. Computational Complexity of Different Distortion-invariant Correlation Filters A.1 Standard Mace filter A.2 Kernel SDF filter A.3 FT-domain kernel Mace filter A.4 Image-domain kernel Mace filter A.5 Calculation of image-domain kernel Mace test run-time necessary to obtain ROC (P C, P FA ) data. 194 References vi

8 List of Tables Table 2.1 Percentage of the total energy present in higher spatial frequencies averaged over all 120 (72) aspect views for the CAD initial (Comanche) true-class objects Table 2.2 Table 2.3 Table 2.4 Table 2.5 Table 3.1 Table 3.2 Table 4.1 Table 4.2 Table 5.1 Table 5.2 Minace correlation type, PCE metric details, and filter-synthesis parameters we have found best for different databases using the PCE metric Zero-mean and energy-normalization regions and the value and amount of the background pixel padding used in filter synthesis and chip tests in the three filter designs considered. The regions AA and BB noted are those in Figure Number of block correlations (N BLK ) and the total number of real multiplications (B) required for the overlap-add block FFT correlation algorithm for different block FFT sizes for a 32x32 pixel filter and a 16353x16353 pixel wide area test input Number of block correlations (N BLK ) and the total number of real multiplications (B) required for the overlap-add block FFT correlation algorithm for different block FFT sizes for a 16x16 pixel filter and a 16353x16353 pixel wide area test input Comparison of the number of on-line real multiplications required in chip tests to calculate the correlation plane output only in a small M = 11x11 pixel correlation peak search region. The training-set chips are of size d = 64x32 pixels and N = 20 training-set images (true-class) are included in each filter Comparison of the number of on-line real multiplications required in wide area search tests to calculate the full M = 4d pixel linear correlation plane output for different filters. The trainingset chips are of size d = 64x32 pixels and N = 20 training-set images (true-class) are included in each filter Different sets of true-class and unseen confuser-class objects that we use for the 3-class problem Different training-set and test-set range cases for the chip case and how we refer to them (close range and far range and number of range indices) Standard Minace test results on the baseline 3-class database for our three different close-range and three different far-range cases. At the overall P C = 90% ROC operating point, P C for each of the two true-class objects is separately shown and P FA for each of the three unseen confuserclass objects is separately shown Standard Minace test results on the three different 3-class databases (filter pairs) for our three different close-range cases. P FA scores at two ROC operating points (P C = 90% and EER) are shown and for rejection of only the BM21 and ZIL131 objects vii

9 Table 5.3 Table 5.4 Table 5.5 Table 5.6 Table 5.7 Table 5.8 Table 5.9 Standard Minace test results with depression-angle test-set differences (in addition to aspectview and scale differences) for the modified baseline 3-class database for our five close-range and six far-range cases. The filters are synthesized using training-set data at only a 17 depression angle. P FA is for rejection of only the ZIL131 object Standard Minace clutter rejection results on the baseline 3-class database for three different close-range and three different far-range cases. Clutter false alarm rate (P CFA ) scores for the different sets of clutter chips at the P C = 90% ROC operating point are shown Standard and preprocessed polynomial kernel SDF test results on the baseline 3-class database for our three different close-range and three different far-range cases. At the overall P C = 90% ROC operating point, P C for each of the two true-class objects is separately shown and P FA for each of the three unseen confuser-class objects is separately shown Standard and preprocessed polynomial kernel SDF filter-synthesis parameter choices for the baseline 3-class database ((T72, M60) filter pair) in Table 5.5 and for the same six different close-range and far-range cases. The final value of p (p final ) chosen and the number of trainingset images included in the filter (N) (the maximum allowed is N MAX = 90% of the training set) are shown Standard polynomial and Gaussian kernel SDF test results on the modified baseline 3-class database for our six different close-range and far-range cases. P FA is for rejection of only the BM21 and ZIL131 objects Preprocessed polynomial and Gaussian kernel SDF test results on the modified baseline 3-class database for our three different far-range cases. P FA is for rejection of only the BM21 and ZIL131 objects Standard and preprocessed polynomial kernel SDF test results on the three different 3-class databases (filter pairs) for two different close-range and three different far-range cases. P FA scores at two ROC operating points (P C = 90% and EER) are shown and are for rejection of only the BM21 and ZIl131 objects Table 5.10 Standard and preprocessed polynomial kernel SDF test results on the difficult 3-class database for our six different close-range and far-range cases Table 5.11 Preprocessed polynomial kernel SDF test results on the 4-class database ((T72, M60, BMP) filters) and the modified baseline 3-class database ((T72, M60) filter pair) for our three different far-range cases. P FA is for rejection of the BM21 and ZIL131 objects Table 5.12 Standard and preprocessed polynomial kernel SDF clutter rejection results on the baseline 3- class database for our six different close-range and far-range cases. Clutter false alarm rate (P CFA ) scores for the different sets of clutter chips at the P C = 90% ROC operating point are shown Table 5.13 Standard polynomial kernel SDF test results for two different values of N MAX (50% and 90%) on the modified baseline 3-class database for our three close-range case. P FA is for rejection of only the BM21 and ZIL131 objects viii

10 Table 5.14 Standard and preprocessed polynomial kernel SDF test results with depression-angle test-set differences (in addition to aspect-view and scale differences) for the modified baseline 3-class database for our six far-range case. The filters are synthesized using training-set data at only a 17 depression angle. P FA is for rejection of only the BM21 and ZIL131 objects Table 5.15 Standard and preprocessed polynomial kernel SDF test results for recognition of variants of the T72 object on the modified baseline 3-class database for our six far-range case; a different T72 reference object (T72 without tread skirts or oil drums present) and filter are now used. At the P FA = 10% ROC operating point, P C for the T72 reference object and its three variants and P C for the M60 object are separately shown. P FA is for rejection of only the BM21 and ZIL131 objects Table 5.16 Standard polynomial and Gaussian kernel SDF test results on the difficult 3-class database for our four far-range case. The PCE metric is now used both in filter synthesis and in tests Table 5.17 Energy minimization comparison of proper (image-domain) and improper (FT-domain) polynomial kernel Mace filters. Largest (best) average PCE values for the best p choice for five training-set and five test-set aspect views Table 5.18 Preprocessed polynomial kernel SDF test results on the difficult 3-class database for our fourfar-range case for the 36x20 pixel (chip) case and the 68x36 pixel (wide area search) case. A c value of 3x10-3 is used for both pixel cases Table 5.19 Preprocessed polynomial kernel SDF test results on the difficult 3-class database for our fourfar-range case for the 36x20 pixel (chip) case and the 68x36 pixel (wide area search) case. A c value of 3x10-3 is used for both the BM21 and ZIL131 for the chip case, and c values of 4.3x10-4 and 3.5x10-4 are used for the BM21 and ZIL131 filters for the wide area search case Table 5.20 Preprocessed polynomial kernel SDF test results on the difficult 3-class database for our fourfar-range case for the 36x20 pixel (chip) case and the 132x68 pixel (wide area search) case. A c value of 3x10-3 is used for both the BM21 and ZIL131 for the chip case, and c values of 3.9x10-5 and 2.9x10-5 are used for the BM21 and ZIL131 filters for the wide area search case Table 5.21 Preprocessed polynomial kernel SDF test results on the difficult 3-class database for our fourfar-range case for the 36x20 pixel (chip) case and the 68x36 pixel (wide area search) case. A c value of 3x10-3 is used for both the BM21 and ZIL131 for the chip case, and c values of 3.1x10-4 and 9.3x10-4 are used for the BM21 and ZIL131 filters for the wide area search case. A higher constant 100-valued background is now present in training and in tests Table 5.22 Standard and preprocessed polynomial kernel SDF chip test results on the difficult 3-class database for our four-far-range case. A constant 60-valued background is present in training and in tests. For both the BM21 and ZIL131 preprocessed kernel SDF filters, the ad hoc 3x10-3 value of c was the same as the one selected using our average MSE rule ix

11 Table 5.23 Standard and preprocessed polynomial kernel SDF chip test results on the difficult 3-class database for our four-far-range case. A higher constant 100-valued background is now present in training and in tests. For the BM21 and ZIL131 preprocessed kernel SDF filters, we show test results for the c chosen ad hoc (3x10-3 ) and using our average MSE rule (c = 3x10-3 and 3x10-4 respectively) Table 5.24 Statistics (minimum, maximum, and average) of the CR values for the training-set aspect views over all four training-set range indices for the T72 and M60 objects for our four farrange case for three different constant-valued (60, 80, and 100) background cases. The statistics (minimum, maximum, and average) of the target pixel values, µ, and σ are also shown; these do not change with the level of the constant-valued background used Table 5.25 Standard Minace and standard and preprocessed polynomial kernel SDF test results on the modified baseline 3-class database for our four far-range case for three different constantvalued (60, 80, and 100) test backgrounds. A constant 60-valued background is present in synthesis (training and validation sets). P FA scores at two ROC operating points (P C = 90% and EER) are shown and are for rejection of only the BM21 and ZIL131 objects Table 5.26 Standard Minace and standard and preprocessed polynomial kernel SDF test results on the modified baseline 3-class database for our four far-range case for three different constantvalued (60, 80, and 100) test backgrounds. A constant 80-valued background is present in synthesis (training and validation sets). P FA scores at two ROC operating points (P C = 90% and EER) are shown and are for rejection of only the BM21 and ZIL131 objects Table 5.27 Standard Minace and standard and preprocessed polynomial kernel SDF test results on the modified baseline 3-class database for our four far-range case for three different constantvalued (60, 80, and 100) test backgrounds. An constant 100-valued background is present in synthesis (training and validation sets). P FA scores at two ROC operating points (P C = 90% and EER) are shown and are for rejection of only the BM21 and ZIL131 objects Table 5.28 Statistics (minimum, maximum, and average) of the CR values for the training-set aspect views over all four training-set range indices for the T72 and M60 objects for our four farrange case for the 10 different real clutter backgrounds with the σ values noted (10, 15, and 20) and a background µ of 60. The statistics (minimum, maximum, and average) of the target pixel values, µ, and σ are also shown; these do not change with the µ or σ of the clutter background used Table 5.29 Standard Minace and standard and preprocessed polynomial kernel SDF test results on the modified baseline 3-class database for our four far-range case for test clutter backgrounds with a σ of 10. A constant 60-valued background is present in the training and validation sets and µ = 60 for all test clutter backgrounds. P C and P FA scores are shown with a 95% confidence interval. P FA scores are for rejection of only the BM21 and ZIL131 objects x

12 Table 5.30 Standard Minace and standard and preprocessed polynomial kernel SDF test results on the modified baseline 3-class database for our four far-range case for test clutter backgrounds with σ values of 15 and 20. A constant 60-valued background is present in the training and validation sets and µ = 60 for all test clutter backgrounds. P C and P FA scores are shown with a 95% confidence interval. P FA scores are for rejection of only the BM21 and ZIL131 objects xi

13 List of Figures Figure 2.1 Images of the M60 tank at different aspect views in the (a) CAD database, and (b) Comanche database Figure 2.2 Different regions (referred to in the text) over which we zero-mean, energy-normalize, and pad with zero or constant non-zero background pixels. We do this for training-set and validationset chips (in filter synthesis) and test-set chips for the different filter designs (for both linear and circular correlation cases). This figure only applies to filter synthesis and chip tests Figure 4.1 CAD images of the (a) T72 tank, (b) M60 tank, (c) BMP APC, (d) BM21 missile launcher, and (e) ZIL131 truck, at different aspect views (0, 45, 90, 135, and 180 ) and at a 17 depression angle Figure 4.2 CAD images of thet72 tank at depression angles of (a) 15, (b) 17, and (c) 19, and at different aspect views (0, 45, 90, 135, and 180 ) Figure x32 pixel chip images of some of the scaled versions of the T72 CAD object at different ranges and at a 17 depression angle Figure 4.4 CAD images of the new T72 reference object and its variants (used in tests with variants) at different aspect views (0, 45, 90, 135, and 180 ) and at a 17 depression angle Figure 4.5 Examples of the 64x32 pixel MWIR clutter chips Figure 4.6 Examples of the 64x32 pixel UCIR bush-like clutter chips Figure 4.7 Examples of the 64x32 pixel UCIR blob clutter chips Figure 5.1 CAD images of the T72 tank near the front (0 ) and rear (180 ) aspect views and at a 17 depression angle Figure 5.2 CAD images of the M60 tank near the front (0 ) and rear (180 ) aspect views and at a 17 depression angle Figure 5.3 Standard polynomial kernel SDF correlation plane patterns for a true-class test chip for three different values of p. As p increases, the correlation peak becomes sharper and PCE increases Figure 5.4 Standard Gaussian kernel SDF correlation plane patterns for a true-class test chip for three different values of σ. As σ decreases, the correlation peak becomes sharper and PCE increases Figure 5.5 Standard Minace correlation plane patterns for a true-class test chip for three different values of c. As c decreases, the correlation peak becomes sharper and PCE increases xii

14 Figure 5.6 Average MSE between the central 32x16 pixel chip area for the training-set chips for the 68x36 pixel ( wide area search) case and the 36x20 pixel (chip) case for the (a) BM21, and (b) ZIL131, for the four close-range case. A c value of 3x10-3 is used for the chip case and the average MSE is calculated for different values of c for the wide area search case Figure 5.7 CAD images of the (a) BM21 missile launcher, and (b) ZIL131 truck, at different aspect views (0, 45, 90, 135, and 180 ) and at a 17 depression angle. All objects are now present in a higher constant 100-valued background Figure 5.8 CAD images of the (a) T72 tank, and (b) M60 tank, at different aspect views (0, 45, 90, 135, and 180 ). All objects are present in a higher constant 80-valued background Figure 5.9 CAD images of the (a) T72 tank, and (b) M60 tank, at different aspect views (0, 45, 90, 135, and 180 ). All objects are present in an even higher constant 100-valued background Figure 5.10 Ten different MWIR backgrounds (32x16 pixels) that we use for the case of background clutter present near the test objects Figure x16 pixel chip images of the T72 CAD object (at a 90 aspect view, a 2.1 km range, and at a 17 depression angle) embedded in five different MWIR clutter backgrounds (images 2, 4, 6, 8, and 10 in Figure 5.10) with background σ values of (a) 10, (b) 15, and (c) 20. The CR value (Eq. (5.2)) is noted for each clutter background σ value xiii

15 1. Introduction 1.1 Overview of Issues Addressed Pattern recognition is a field concerned with recognizing a known pattern of interest such as a person s voice in an audio recording or a person s face in a picture. In the above scenarios, the signal (i.e., the person s voice or face) must be recognized in the presence of distortions such as the rate of speech or the orientation and scale of the face. Additionally, the presence of background sounds in the recording or the presence of other objects in the picture makes the above tasks challenging. General object recognition is a specific application of pattern recognition, in which an object in a background must be classified in the presence of several distortions. One category of distortion is aspectview differences due to different object orientations. Another category of distortion is scale differences caused by different ranges. Yet another category of distortion is depression-angle differences. Finally, the object can be present at different locations in the test input. Since the object location is unknown, a classification algorithm must be applied to all possible object locations in the test input. We emphasize one type of classifier, the distortion-invariant filter (DIF), for fast object recognition, since it can be applied to all possible object locations using a fast Fourier transform (FFT) correlation. We refer to distortion-invariant correlation filters simply as DIFs. DIFs all use a combination of training-set images that are representative of the expected distortions in the test set. Many different DIFs have been developed [1] to handle the various distortion problems. A single DIF [2] handles all aspect-view distortions, a range of scale distortions, and a ±2 range of depression-angle distortions. Our objective is to develop higher-order classifiers that can be applied (efficiently and fast) to all possible locations of the object in the test input. In this dissertation, we consider a new approach [3] [6] that combines DIFs and the higher-order kernel technique [7]; these form what we refer to as kernel DIFs. In kernel-based versions of a classification algorithm, the algorithm is written in terms of vector inner products (VIPs). The VIP of samples x and y is written as a kernel function K as 1

16 K(x, y) = Φ T (x) Φ(y), (1.1) where Φ is some non-linear mapping to higher-order space; Φ is unknown. To calculate Φ T (x) Φ(y), we use either a polynomial kernel with parameter p (Eq. (1.2)) or a Gaussian kernel with parameter σ (Eq. (1.3)), and can thus evaluate the kernel using the original data x and y as K(x, y) = (x T y + 1) p, or (1.2) K(x, y) = exp( x y 2 / 2σ 2 ). (1.3) Thus, we can use data in the higher-dimensional space (to take advantage of higher-order data correlations present) without knowing Φ. We use this kernel technique to form higher-order kernel DIFs. In our tests, we assume that a small filter must be applied to all possible locations of an object in a test input that is larger than the filter. We use target chips (with only a small background region around the target) in most tests (all chips are slightly larger than the filter); we refer to this as the chip case (Sec. 4.3). Large test inputs (much larger than the filter) are considered in some cases; we refer to this as the wide area search case. We discuss the proper way to synthesize DIFs and kernel DIFs for the wide area search case and the preferable way to perform wide area search with these filters (Secs. 2.6 and 3.5); this is new. We address the need for fast on-line shifts of kernel DIFs. We have shown [4],[5] and we will detail in this dissertation, that fast FFT on-line implementation of polynomial and Gaussian kernel filters are possible (since they contain x T y VIPs) and that an efficient implementation is possible only for the kernel synthetic discriminant function (SDF) filter. Hence, we consider only polynomial and Gaussian kernel functions and we emphasize the kernel SDF filter in this dissertation. We have summarized [8] test results for different classifiers on synthetic aperture radar (SAR) imagery. As we will discuss in Ch. 2, in Ref. [8], the performance was better for DIFs than for other classifiers. Thus, in this dissertation, we consider kernel versions of only DIFs; we emphasize the kernel SDF filter and compare the performance of the kernel SDF filter to that of only the best DIF (not to feature extraction methods). We only consider infrared (IR) data in this dissertation. (We discuss other potential applications of kernel SDF filters, e.g. in applications that use visible imagery, in Sec ) We address aspect-view, scale, and depression-angle distortions; the combination of all these distortions has never been addressed, except in our prior work [2]. We also quantify the effect of different constant-valued object backgrounds in training and tests and the effect of non-constant clutter near the test objects on performance scores (Sec. 2

17 5.7); these affect the target-to-background contrast ratio and have not been addressed in any prior DIF IR tests. Our work is relevant to tracking, but we do not address tracking, since we do not use prior-frame target location information in our analysis. We use computer-aided design (CAD) simulated IR object imagery and real IR clutter imagery. There are five objects in our CAD database. In all cases, we consider the classification of different sets of two or three true-class objects and the rejection of unseen clutter and several different unseen confuser-class objects. The rejection of unseen confuser-class objects has been ignored in most prior work. We note that in the field of computer vision, the term rejection typically means not making a decision. In this dissertation, we use the term rejection to indicate that a test input is declared not to belong to any of the true objects classes. We also address the classification of variants (Sec ), these are different unseen versions of true-class objects. We use one or more filters for each true-class object to handle the various distortions. In all tests, we correlate each test input image versus the filters for all true-class objects. Only the highest filter output is considered for each test input (true-class, confuser-class, and clutter). Test inputs for true-class objects are scored as follows. If the input for a true-class object produces the highest filter output with one of the filters for the correct true-class object, and the output is T (a threshold), that input contributes to P C (correct classification rate), otherwise that input is rejected and does not contribute to P C. Test inputs for unseen confuser-class objects and clutter are scored as follows. If the highest output for any filter for an unseen confuser-class or clutter input is T, that input contributes to P FA (confuser false alarm rate) or P CFA (clutter false alarm rate), otherwise that input is correctly rejected and does not contribute to P FA or P CFA. In all tests, we vary the threshold T to obtain receiver operating characteristic (ROC) data for P C vs. P FA (confuser false alarm rate) or P C vs. P CFA (clutter false alarm rate). Our objectives are P C 90%, P FA 10%, and a low P CFA (e.g. 1%). We show test results at the P C = 90% and P FA = 10% ROC operating points; we also show test results at the equal error rate (EER) ROC operating point, where the percentage of true-class errors (100% P C ) and the percentage of confuser-class errors (P FA ) are equal, i.e., where 100% P C = P FA. We also consider other ROC operating points as necessary. We now briefly discuss how the DIF algorithm differs from other approaches in computer vision to do object recognition. In many approaches in computer vision, several features (such as points and edges) are 3

18 extracted from the test image. These features are then matched to those for the training images, and if the match-score is above some threshold, the test input is classified as belonging to the associated true-class object. We note that to obtain a good match-score in such approaches, many features typically must be extracted, and higher resolution imagery is typically required. In DIF algorithms, the matching of the different object parts is implicitly done in one step, and even if some parts of the target are occluded, the DIF algorithm is expected to be able to recognize the object well, since the overall structure of the object is taken into account. For this reason, we expect the DIF algorithm to also work better for lower resolution imagery. 1.2 Prior Work In this section, we discuss relevant prior DIF algorithms. This is a review of only filter design types, not of test / application results. Reference [1] contains details of many standard DIFs. We now discuss three of these DIFs: synthetic discriminant function (SDF), minimum average correlation energy (MACE), and minimum noise and correlation energy (MINACE); we consider different kernel versions of these DIFs in this dissertation. We selected these three DIFs, since we emphasize kernel SDF filters (because of their fast implementation in tests) and because kernel versions of these three DIFs have been proposed in prior kernel DIF work. Additionally, we use Minace-preprocessing (Sec ) with kernel SDF filters to improve performance. In the following discussion, vectors (matrices) are denoted as lower (upper) case bold letters SDF filter All data are in the image-domain. The filter solution is h; it is required to give specified correlation peak values (one) for each training-set image x i included in the filter; these values are specified by the elements of a column vector u. These peak constraints are described by X T h = u = [1 1 1] T, (1.4) where the columns of the data matrix X are the training-set images x i included in the filter. h is also required to be a linear combination of the training-set images x i in X. The solution for the filter h is [1] h = X (X T X) -1 u. (1.5) 4

19 If the training set contains images of the object with different distortions such as aspect view and / or scale, a DIF results. The SDF filter controls the filter s response to each of the training-set images only at one point (the center of the output correlation plane). In tests, the SDF filter s output can be higher at other points; this results in the misidentification of the object s location. Additionally, the SDF filter can give high correlation outputs for unseen confuser-class and clutter inputs; this results in false alarms. These problems are reduced by the Mace filter (Sec ) Mace filter All data (x i, X, h) are now in the Fourier transform (FT)-domain. The peak constraints are now X H h = u = [1 1 1] T. (1.6) To improve performance, the filter h is now also required to minimize the average correlation plane energies of all N training-set images included in the filter (this produces sharp correlation peaks with low sidelobes). This is done as follows. The correlation plane energy of the i-th training-set image for a filter h is h H S i h, where S i is a diagonal matrix whose entries are the elements of the lexicographically ordered 2-D power spectrum ( FT 2 ) of training-set image i. The filter h is required to minimize E = h H S avg h, where (1.7) S avg (k, k) = mean[s 1 (k, k), S 2 (k, k),, S N (k, k)]. (1.8) S avg is a diagonal matrix, since all S i are diagonal matrices. Its entries are the average power spectra of the training-set images included in the filter. The solution for the filter h that minimizes Eq. (1.7) subject to the peak constraints in Eq. (1.6) obtained by Lagrange multiplier techniques is [1] h = (S avg ) -1 X [X H (S avg ) -1 X] -1 u, or (1.9) h = (S avg ) -1/2 {(S avg ) -1/2 X} [{(S avg ) -1/2 X} H {(S avg ) -1/2 X}] -1. (1.10) In the form in Eq. (1.10), we see that the solution is equivalent to forming the solution using preprocessed training-set data x i and then applying a second (S avg ) -1/2 preprocessing step (the first term in Eq. (1.10)). The matrix (S avg ) -1/2 emphasizes higher spatial frequencies and attenuates lower spatial frequencies. At higher spatial frequencies, noise dominates image energy and the distortion differences between the training-set images (the fine details with low energy) are also more pronounced. Thus, the Mace filter is sensitive to noise and object distortions. This is reduced by the Minace filter (Sec ). 5

20 1.2.3 Minace filter We now briefly describe the version [9] of the Minace filter [10] that we use. All data (x i, X, h) are again in the FT-domain. To reduce the sensitivity of the Mace filter (Sec ) to noise and object distortions, the Minace filter minimize a combination of the correlation plane energies of all N training-set images (this produces sharp correlation peaks with low sidelobes) and of distortions in the input (this improves recognition and noise sensitivity). We now discuss how this is achieved. To handle distortions, we use zero-mean white Gaussian noise to model the expected distortion power spectrum; the correlation plane energy due to input distortions is thus h H Nh, where the identity matrix N models distortions. We minimize an upper bound T on these two correlation plane energies, given by the spectral envelope of N and all S i at each spatial frequency (i.e., the maximum value at each spatial frequency). We minimize the correlation plane energy due to this power spectrum T by minimizing E = h H T h, where (1.11) T(k, k) = max[s 1 (k, k), S 2 (k, k),, S N (k, k), cn(k, k)] ; (1.12) in Eq. (1.12), the parameter c (0 c 1) controls the emphasis on distortion-tolerance or discrimination, as noted below. The solution for the filter h that minimizes Eq. (1.11) subject to the peak constraints in Eq. (1.6) obtained by Lagrange multiplier techniques is [10] h = T -1 X [X H T -1 X] -1 u, or (1.13) h = T -1/2 (T -1/2 X) [(T -1/2 X) H (T -1/2 X)] -1 u. (1.14) In the form in Eq. (1.14), we see that the solution is equivalent to forming the solution using preprocessed training-set data x i and then applying a second T -1/2 preprocessing step (the first term in Eq. (1.14)). The matrix T -1/2 performs high-pass filtering of the data. The matrix T -1/2 is similar to the matrix (S avg ) -1/2 in the Mace filter (Eq. (1.10)) and a similar term exists in all energy-minimizing DIFs. The choice of the parameter c affects the preprocessing. A higher value of c makes the filter emphasize lower spatial frequencies; this improves its distortion-tolerance (recognition). A lower value of c makes the filter emphasize higher spatial frequencies and improves discrimination (rejection). Thus, selecting c trades-off distortion-tolerance versus discrimination. The Minace preprocessor matrix T -1/2 is used in preprocessed kernel SDF filters (Sec ) to improve performance. 6

21 We note that the formulation of the Minace and OTF (optimal trade-off filter) [11] filters are similar; for the Minace filter, T is the envelope of the power spectra of the training-set images and noise level; while for the OTF filter, T is a weighted combination (sum) of the average power spectra of the training-set images and noise level. In Ch. 2, we discuss why we use the Minace rather than the OTF filter. As we will discuss in Ch. 2, the Minace filter performs better than all other DIFs [8]; thus, we use the performance of the Minace filter as the baseline to which to compare kernel SDF test results. In another comparison of DIFs [12], the filters were not properly synthesized (as we will detail in Sec. 2.1) and thus the DIF comparison results were not fair. Our Minace test results [2],[13] are much better than all cases in Ref. [12] for aspect-view, scale, and depression-angle distortions. In Ch. 2, we also summarize the main conclusions of our new Minace filter three-year research [2],[8],[9],[13],[14]. These are: we note that use of the peak-to-correlation-energy (PCE) ratio rather than the correlation peak as the correlation plane metric gives significantly better test results; we discuss when and why linear versus circular correlation DIFs are best; we develop an automated DIF filter-synthesis algorithm; we tabulate the correlation type, PCE metric details, and filter-synthesis parameters we have found best for different databases using the PCE metric; and we discuss the proper way to synthesize DIFs for the wide area search case and the preferable way to perform wide area search with these filters. The details in Ch. 2 only concern theory and algorithms (test and application results are in the references noted). The Minace filter baseline test results (to which to compare kernel SDF test results) are included in Ch Prior kernel DIFs In Ch. 3, we note the errors and shortcomings in all prior [15] [17] kernel DIF formulations and advance solutions. We discuss the need for fast kernel DIF shifts for general object recognition; we note the significantly larger on-line computation and storage requirements for kernel DIFs; we discuss how to properly minimize correlation plane energy with kernel DIFs; and we note which kernel DIFs are computationally realistic to use and why. For DIFs, the Minace filter outperforms the SDF filter because the Minace filter preprocesses the data using the preprocessor matrix T -1/2 (Eq. (1.14)). This preprocessing emphasizes higher spatial frequencies and is a result of the fact that the Minace filter minimizes correlation plane energy. We thus expect kernel Minace (energy-minimizing) filters to outperform kernel SDF filters. However, kernel Minace DIFs and 7

22 all energy-minimizing kernel DIFs require a very large number of on-line computations (see Appendix A for details and Sec for a summary). Thus, only kernel SDF filters are computationally realistic to use. We thus consider a new approach (preprocessed kernel SDF) [4] that combines the advantage of energy minimization and kernel methods without high on-line computation requirements. In this method, we synthesize kernel SDF filters using training-set images that have been preprocessed by the Minace preprocessor matrix T -1/2. This algorithm is computationally efficient (it avoids the computationally expensive on-line calculations necessary in energy-minimizing kernel DIFs) and should achieve the advantages of both energy minimization and kernel DIFs [4]. 1.3 Contributions of this Dissertation This dissertation is the first extensive work on combining DIFs and the kernel method to form kernel DIFs for general object recognition, in which fast filter shifts and several distortions are addressed. The main contributions of this dissertation follow. We detail which kernel DIF formulations are computational realistic to use and why, and which ones are not and why; we quantify the trade-off among different kernel DIFs (Ch. 3 and Appendix A). We advance our new Minace-preprocessed kernel SDF filters (Sec ); these combine the advantage of energy minimization and kernel DIFs. We discuss the proper way to synthesize DIFs and kernel DIFs for the wide area search case (i.e., when a small filter must be applied to a much larger test input) and the preferable way to perform wide area search with these filters (Secs. 2.6 and 3.5); this is new. We quantify the effect of different constant-valued object backgrounds in training and tests and the effect of non-constant clutter near the test objects on performance scores (Sec. 5.7); these affect the target-to-background contrast ratio and have not been addressed in any prior DIF IR tests. We develop an automated algorithm to synthesize standard and preprocessed polynomial and Gaussian kernel SDF filters (Sec. 4.4). 8

23 We quantify the improvement in performance of kernel SDF filters over that of Minace filters for different easy and difficult discrimination problems (Secs. 5.3 and 5.4.1). 1.4 Organization of this Dissertation This dissertation is divided into different chapters and appendices, as noted in the following subsections Chapter 2 Chapter 2 summarizes the main conclusions of our new Minace filter three-year research [2],[8],[9],[13],[14]. (The details in Ch. 2 only concern theory and algorithms (test and application results are in the references noted).) These are: We note that use of the peak-to-correlation-energy (PCE) ratio rather than the correlation peak as the correlation plane metric gives significantly better test results (Sec. 2.2). We discuss when and why linear versus circular correlation DIFs are best (Sec. 2.3). We develop an automated DIF filter-synthesis algorithm (Sec. 2.4). We tabulate the correlation type, PCE metric details, and filter-synthesis parameters we have found best for different databases using the PCE metric (Sec. 2.5). We discuss the proper way to synthesize DIFs for the wide area search case and the preferable way to perform wide area search with these filters (Sec. 2.6) Chapter 3 Chapter 3 contains details of kernel DIF theory. In Ch. 3, we do the following: We discuss the need for fast kernel DIF shifts (i.e., FFT correlations) for general object recognition (Sec ). We note the significantly larger on-line computation and storage requirements for kernel DIFs (Sec ). We think only the kernel SDF filter is computationally realistic to use (see Appendix A for details and Sec for a summary). We discuss how to properly minimize correlation plane energy with kernel DIFs (see Sec. 3.4 for details and Sec for a summary). 9

24 We detail several different formulations of the kernel SDF filter and discuss which ones are realistic to use and why (Sec. 3.3). Only the image-domain vector-based kernel SDF filter allows fast FFT correlations (Secs and 3.3.2). We show how polynomial and Gaussian kernel SDF filters can use fast test-set FFT correlations (Secs and 3.3.3). We advance our new Minace-preprocessed kernel SDF filters (Sec ); these combine the advantage of energy minimization and kernel DIFs. We discuss the proper way to synthesize kernel DIFs for the wide area search case and the preferable way to perform wide area search with these filters (Sec. 3.5). All kernel DIF test results are in Ch Chapter 4 Chapter 4 contains details of our database and training and test procedures. In Ch. 4, we do the following: We describe the infrared (IR) database used (Sec. 4.2). We discuss the training and test procedures we use for the chip case (Secs and 4.3.3). We discuss the general filter-synthesis procedure to determine when and how to add training-set data at new scales (ranges) to a filter (Sec ). We detail our automated procedure to synthesize standard and preprocessed polynomial and Gaussian kernel SDF filters (Sec. 4.4) Chapter 5 Chapter 5 contains detailed test results for the standard Minace filter and for kernel SDF filters. In Ch. 5, we do the following: We present test results for the standard Minace filter (Sec. 5.2); this is the baseline to which to compare kernel SDF test results. We compare test results for the standard and preprocessed kernel SDF filters (Sec. 5.3) and with use of the polynomial and Gaussian kernels (Secs and ). We quantify the improvement in performance of kernel SDF filters over that of standard Minace filters for different easy and difficult discrimination problems (Secs. 5.3 and 5.4.1). 10

25 We present depression-angle tolerance test results for the standard Minace filter (Sec ) and for the standard and preprocessed kernel SDF filters (Sec ). We present test results for recognition of variants (Sec ). We discuss kernel SDF test results using the PCE metric (Sec ). We present data for the kernel Mace filter and demonstrate that use of the proper energy minimization (image-domain formulation) gives much lower correlation plane energy (and thus a higher PCE) than use of the improper energy minimization (FT-domain formulation) does (Sec ). We present kernel SDF wide area search test results with different input sizes and backgrounds (Sec. 5.6). We quantify the effect of different constant-valued object backgrounds in training and tests and the effect of non-constant clutter near the test objects on performance scores (Sec. 5.7); these affect the target-to-background contrast ratio and have not been addressed in any prior DIF IR tests. We note that some of these test results are in various conference papers [2],[5],[6],[13] but this is the first time all have been detailed in one place. Both the polynomial and Gaussian kernels are used in Secs , , and In all other kernel DIF tests, only the polynomial kernel is used Chapter 6 and Appendix A Chapter 6 presents the conclusions of this dissertation and discusses possible directions for future research. Appendix A.1 contains the derivation of the computational complexity of the standard Mace filter; Appendices A.2 A.4 contain the derivation of the computational complexity of different kernel DIFs. This information is used in determining which kernel DIFs are computationally realistic to use. 11

Illumination invariant face recognition and impostor rejection using different MINACE filter algorithms

Illumination invariant face recognition and impostor rejection using different MINACE filter algorithms Rohit Patnaik and David Casasent Dept. of Electrical and Computer Engineering, Carnegie Mellon University,