Automatic Target Recognition of Synthetic Aperture Radar Images using Elliptical Fourier Descriptors

Size: px

Start display at page:

Download "Automatic Target Recognition of Synthetic Aperture Radar Images using Elliptical Fourier Descriptors"

Stephany Briggs
6 years ago
Views:

1 Automatic Target Recognition of Synthetic Aperture Radar Images using Elliptical Fourier Descriptors by Louis Patrick Nicoli Bachelor of Science Electrical Engineering Florida Institute of Technology 2000 A thesis submitted to Florida Institute of Technology in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering Melbourne, Florida August, 2007

2 We the undersigned committee hereby recommend that the attached document be accepted as fulfilling in part the requirements for the degree of Masters of Science in Electrical Engineering. Automatic Target Recognition of Synthetic Aperture Radar Images Using Elliptical Fourier Descriptors by Louis Patrick Nicoli Georgios C. Anagnostopoulos, Ph.D. Assistant Professor Electrical and Computer Engineering Thesis Advisor Samuel P. Kozaitis, Ph.D. Professor Electrical and Computer Engineering Gnana Bhaskar Tenali Professor of Mathematics Mathematical Sciences Robert L. Sullivan Professor and Department Head Electrical and Computer Engineering

3 Abstract Title: Automatic Target Recognition of Synthetic Aperture Radar Images Using Elliptical Fourier Descriptors Author: Louis Patrick Nicoli Principal Advisor: Georgios C. Anagnostopoulos, Ph.D. This paper primarily investigates the use of shape-based features by an Automatic Target Recognition(ATR) system to classify various types of targets in Synthetic Aperture Radar(SAR) images. In specific, shapes of target outlines are represented via Elliptical Fourier Descriptors(EFD s), which, in turn, are utilized as recognition features. According to the proposed ATR approach, a segmentation stage first isolates the target region from shadow and ground clutter via a sequence of thresholding and morphological operations. Next, a number of EFD s are computed that can sufficiently describe the salient characteristics of the target outline. Finally, a classification stage based on an ensemble of Support Vector Machines identifies the target with the appropriate class label. In order to experimentally illustrate the merit of the proposed approach, SAR intensity images from the well-known Moving and Stationary Target iii

4 Acquisition and Recognition(MSTAR) dataset were used in a 10-class and a 3-class recognition problem. Furthermore, comparisons were drawn in terms of classification performance and computation time to other successful methods discussed in the literature, such as template matching methods. The obtained results portray that only a very limited amount of EFD s are required to achieve recognition rates that are competitive to well-established approaches. It is illustrated that unlike other prior approaches, target recognition via EFD s eliminates the need to estimate target pose, which is a severe functional limitation and computational burden to other approaches. Moreover, it is shown that the outline of the target which is a natural high level heuristic feature used frequently in optical image recognition can be used successfully in recognition of images generated from Synthetic Aperature Radar. Although previous studies have used other high level image features (such as magnitude peak location, length and width of target region, location of edges, outline length to target area ratio, etc.) to the best of our knowledge, no other study has investigated the use of the shape of the target outline in an ATR system for the MSTAR dataset. iv

5 Table of Contents Chapter 1: Introduction Synthetic Aperture Radar Segmentation Automatic Target Recognition...9 Chapter Related Work Template Matching Filter Template Minimum Squared Error High-Level Feature-Based Recognition Large Margin Classifiers...28 Chapter Segmentation Previously Implemented Methods Proposed Method of Segmentation Morphological Operations Combining Disconnected Target Region...39 Chapter Chapter Elliptical Fourier Descriptors Elliptical Fourier Descriptors Classification Classification Performance Comparison...60 Chapter Summary & Conclusions References v

6 List of Figures Figure 1.1-1: Synthetic Aperture Radar...2 Figure 1.1-2: Target Azimuth Angle Target Azimuth Angle...6 Figure 1.1-3: Azimuth Pose between and degrees...7 Figure 1.1-4: Azimuth Pose between 1.5 and degrees...8 Figure 1.1-6: Image Difference Due to Compression Angle...9 Figure: : Results of 3 Model Study (Taken from [4])...18 Figure 3-1: MSTAR Vehicle Types...32 Figure 3-2: Shadow, Clutter, and Target Regions...32 Figure 3.2-1: Original Image...36 Figure 3.2-2: Peaks...36 Figure 3.2-3: Target Region...36 Figure 3.3-1: Close Target Region...38 Figure 3.3-2: Close Target Region...39 Figure 3.4-1: Original Image with Target Outlined In White...41 Figure 4-1: Outline of MSTAR image...45 Figure 4-2: Outline represented by 20 Fourier Descriptors Figure 4-3: Outline represented by 114 Fourier Descriptors...46 Figure 4.1-1: Contour with 4 Elliptical Fourier Descriptors...49 Figure 4.1-2: Reconstruction of Contour...49 Figure 4.1-3: Side by Side Comparison...50 Figure 4.1-4: First Ellipse of EFD...51 Figure 4.1-5: Axes of First Ellipse...52 Figure 4.2-1: Ten Class Directed Acyclic Graph...55 Figure 4.3-1: Performance of Elliptical Fourier Descriptors...57 Figure 4.4-1: 10 Class ATR Perfomance Comparison...62 Figure 4.4-2: 3 Class ATR Perfomance Comparison...64 vi

7 List of Tables Table 1.1-1: Target Images...5 Table : Class Training Set Size...19 Table : Testing Set Size...22 Table : 128X128 Image Size Quarter Power Confusion Matrix...22 Table : 96X96 Image Size Quarter Power Confusion Matrix...23 Table : 48X48 Image Size Quarter Power Confusion Matrix...23 Table 3.4-1: 48X48 Quarter Power Template with Target Region Segmentation..42 Table 3.4-2: 96X96 Quarter Power Template with Target Region Segmentation...43 Table 3.4-3: 128X128 Quarter Power Template with Target Region Segmentation...43 Table 4.3-1:Max Wins SVM with 2 Elliptical Fourier Descriptors (6 features)...58 Table 4.3-2: Max Wins SVM with 3 Elliptical Fourier Descriptors (10 features)..59 Table 4.3-3: Max Wins SVM with 5 Elliptical Fourier Descriptors (18 features)..59 Table 4.3-4: Max Wins SVM with 7 Elliptical Fourier Descriptors (26 features)..60 Table 4.3-5: Max Wins SVM with 20 Elliptical Fourier Descriptors (78 features) 60 Table 4.4-1: 10 Class ATR Performance Comparison...62 Table 4.4-2: 3 Class ATR Performance Comparison...63 vii

8 List of Abbreviations DAG Directed Acyclic Graph DCCF Distance Classifier Correlation Filter EFD Elliptical Fourier Descriptors EMACH Extended Maximum Average Correlation Height MACH Maximum Average Correlation Height MINACE Minimum Noise And Correlation Energy MSTAR Moving and Stationary Target Acquisition Radar PDCCF Polynomial Distance Correlation Classifier Filter SAR Synthetic Aperture Radar SVM Support Vector Machine QP Quarter Power viii

9 Acknowledgement This material is based upon work/research supported in part by the National Science Foundation under Grant No and Grant No Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. ix

10 Chapter 1: Introduction The purpose of this thesis is to investigate the explicit use of target outlines in an automatic target recognition system for synthetic aperture radar(sar) images. A parametric representation of the target outline is used as a feature set for classifying the targets. Additionally, the particular parameters used in this representation will be elliptical Fourier descriptors(efd s). Before venturing into the specifics of the proposed approach, some background information is here due. 1.1 Synthetic Aperture Radar SAR is an imaging technology that produces a high resolution 2D image of target areas [1][2]. SAR is used on airborne and space based vehicles for high resolution radar imagery. Using advanced signal processing techniques, detailed images are formed through integration of successive radar reflections of the same target area. Since SAR operates in the microwave range, images of ground targets can be made at night and during inclement weather when optical based imagery systems are unusable. A diagram of a SAR system is shown in figure

11 Synthetic Aperature Near range Physical Beamwidth Far Range Cross Range Direction Range Direction Figure 1.1-1: Synthetic Aperture Radar A SAR system employs a linear antenna that is mounted in the direction of the heading of the aircraft. The aircraft heading is called the cross range direction. The antenna has a main lobe which points towards a direction perpendicular to the azimuth direction. This direction is called the range direction. A SAR system can be viewed as a linear array of antennas. The received signal of an array of antennas can be phase combined to produce a finer beam width than that of the individual antennas in the array. Antenna systems that exploit this characteristic are called phased array antennas [1]. The fine beam width achieved through the use of arrays is what allows SAR to have a high azimuth resolution. SAR differs from a physical phased array in that all elements of a physical phased array are energized simultaneously. Conversely, the elements in a SAR system are actually the same antenna. A multiple element array is synthesized from the same antenna at different locations. As the aircraft moves, the SAR system stores pulse return 2

12 information in memory until the pulses from all elements in the array have been returned. Then, an advanced processor integrates the information in order to produce a fine resolution image. The SAR images considered in this thesis have a resolution of one sample per foot. The one-foot resolution of the radar images allows the radar to easily discriminate small to large vehicles from the surrounding terrain. This makes it particularly useful for military applications. With the high resolution a vehicle produces a magnitude image consisting of a large number of pixels rather than a single magnitude return that is characteristic of other types of radar [1]. So, a SAR image conveys more information than just target location. Being able to discriminate targets into separate classes of vehicles provides a greater level of battlefield awareness to the radar operator. The benefit of being able to accurately discriminate between enemy, friendly, and non-military targets is obvious. The Moving and Stationary Target Acquisition and Recognition(MSTAR) data set is a collection of SAR images taken of soviet made military vehicles [3]. The United States Air Force released the collection to the public for research purposes. Each target image in the dataset is of a single target vehicle. The images consist of complex valued pixels representing the magnitude and phase of the radar return within one foot by one foot range bins. Each image is contained in a separate file. The files contain a header that lists information about the target parameters including: target model number; type of vehicle (tank, transport, truck, etc.); serial Number of the Target; pose (Azimuth Heading); pitch; roll; yaw; Radar 3

13 Depression Angle; Radar Ground Squint Angle; range; and several other parameters. The types of vehicles, visual pictures of the targets, and radar images of the target class are shown in Table In the radar images, the pixel intensity is mapped to a fixed range of grey levels. Pixel intensities that are outside the range of the gray levels are shown as pure white. The method of display allows the viewer to clearly see the shape of the target vehicles in the radar image. This thesis only involves the ten targets shown in Table These images are only a subset of the entire MSTAR database. The MSTAR dataset also includes an image of a confuser target called Slicy which is not considered in this thesis. There are also several images with no targets at all that are called clutter. The clutter images are not considered in this thesis. There are two reasons for the exclusion of Slicy and clutter images from this thesis. The first is that other papers only considered the ten classes in table [4]. The same ten targets were used so that results of this thesis can be compared to those of previous work. The second reason is that detection of the SAR targets among clutter is considered a separate problem from target recognition. Several noteworthy studies have investigated the methods for target detection [5][6][7][8][9]. For this thesis, it is assumed that all images belong to one of the ten classes and that the target is centered in the image. 4

BRDM2 Truck BTR60 Transport BTR70 Transport No Picture Available D7 Bulldozer T62 Tank T72 Tank

14 Vehicle Class Table 1.1-1: Target Images Vehicle Type Optical image of Vehicle SAR image of Vehicle 2S1 Gun BMP2 Tank BRDM2 Truck BTR60 Transport BTR70 Transport No Picture Available D7 Bulldozer T62 Tank T72 Tank ZIL131 Truck ZSU23 4 Gun Several of the image parameters affect the appearance of the target in the MSTAR images. The most significant parameter is the target pose, or azimuth. 5

The azimuth angle is defined as the angle between the forward direction of the vehicle with respect to the angle of incidence of the radar pulse. Figure 1.1-2 shows a diagram of the target pose.

15 The azimuth angle is defined as the angle between the forward direction of the vehicle with respect to the angle of incidence of the radar pulse. Figure shows a diagram of the target pose. Figure 1.1-2: Target Azimuth Angle Target Azimuth Angle The reflected magnitude image from the same target can be very different at different azimuth angles because of the nature of radar scatter [10]. Generally, peaks do not persist over a rotation of 5 degrees. This characteristic poses a significant challenge in classification of the images. Due to the variance of scatterers, one class of targets is typically considered as an ensemble of multiple subclasses representing the target at variance azimuth angles. The figures show several images of the T62 tank with an azimuth pose of between and degrees. The targets are displayed as scaled intensity images of each target using a multi color transform. The color transform (colormap) represents smaller values by shades of blue, medium values as green and yellow, and high values as red. The images are shown in this format to highlight the distribution of pixel intensities. The majority of the target area has a relatively low intensity. There are 6

discrete points on the target that have very high intensities.

They are locations on the target vehicle that reflect a large amount of power back to

However, the reflectivity of scatterer is highly dependant on the pose of the vehicle.

5 degrees The variation of image features with respect to pose is even more noticeable

1-4 shows the T62 tank at poses from 1.5 to 301.

16 discrete points on the target that have very high intensities. These are referred to as the scatterer centers. They are locations on the target vehicle that reflect a large amount of power back to the radar. These scatterers are a dominant feature of the SAR images. However, the reflectivity of scatterer is highly dependant on the pose of the vehicle º 176.5º 181.5º 186.5º º Figure 1.1-3: Azimuth Pose between and degrees The variation of image features with respect to pose is even more noticeable when viewing images of the same vehicle taken at azimuths 60 degrees apart. Figures shows the T62 tank at poses from 1.5 to degrees in increments of sixty degrees. It is evident that images of the same vehicle at different azimuths bear little resemblance to each other in terms of magnitude and location of their scatterers. 7

1.5º 61.5º 121.5º 181.5º 241.5º 301.5º Figure 1.

5 degrees Another parameter that affects the SAR

An example of this is shown in Figure 1.1-6.

ground plane and a line between the radar and

As the compression angle gets smaller, the

17 1.5º 61.5º 121.5º 181.5º 241.5º 301.5º Figure 1.1-4: Azimuth Pose between 1.5 and degrees Another parameter that affects the SAR image appearance is the compression angle. An example of this is shown in Figure The compression angle is the angle between the ground plane and a line between the radar and the target that is perpendicular to the flight path of the radar. Figure shows a diagram of the compression angle. As the compression angle gets smaller, the shadow cast by the target grows longer. Also, some scatterers on the Figure 1.1-5: Compression Angle 8

target that are not facing the radar may be occluded in the shadow. Figure 1.1-6 shows a SAR image of the same vehicle at 45º and at 15º. 2S1 at 45º Compression 2S1 at 15º Compression 1.

Each region consists of a group of pixels that are heuristically grouped together.

18 target that are not facing the radar may be occluded in the shadow. Figure shows a SAR image of the same vehicle at 45º and at 15º. 2S1 at 45º Compression 2S1 at 15º Compression 1.2 Segmentation Figure 1.1-6: Image Difference Due to Compression Angle Segmentation is the process of dividing an image into regions. Each region consists of a group of pixels that are heuristically grouped together. In this thesis, segmentation is used to differentiate between pixels that potentially belong to the target as opposed to pixels that are part of the clutter or shadow. Segmentation of the SAR images was accomplished with a combination of thresholding and morphological operations. Its specific implementation is discussed in detail in Chapter Automatic Target Recognition Target recognition is the process of identifying an image as belonging to a specific type of vehicle. Using a computer to classify a target without user intervention is called automatic target recognition(atr). The most successful 9

19 methods for ATR on the MSTAR dataset use a large set of target templates [4][11][12][13]. Although these methods produce very accurate recognitions, they are computationally complex as they require correlation of the target with all subclasses of all target classes before a classification decision is made. We will touch upon the fundamentals of some of these methods in Chapter 2. In this thesis, the segmented outline of the target is used to classify the type of vehicle in the SAR images. The outline is represented parametrically by a relatively small set of coefficients called elliptical Fourier descriptors. An important characteristic of elliptical Fourier descriptors is that they can describe any arbitrary closed contour. By using them, the computational complexity of the proposed ATR system is far lower than template based methods as we will show. 10

20 Chapter 2 Related Work Several methods of automatic target recognition of the MSTAR dataset have been investigated. In particular, two noteworthy categories are template matching and high-level feature based recognition. 2.1 Template Matching In the context of this thesis, templates are any reference images in either the spatial of frequency domain that are constructed from the training set. These templates typically have the same number of pixels as the training images. In the case of frequency domain templates, there are an equal number of harmonic coefficients as there are pixels in the training images. Typically, several templates are constructed for each target class to account for intra-class distortion due to variations in pose and illumination. A test image is classified depending on the template that most closely resembles it. Here, resemblance is usually quantified by an appropriately chosen distance metric. Templates are straightforward to 11

21 implement and have produced very high rates of classification accuracy [4][11][14] as is shown in section One disadvantage of template matching is that template models tend to be very large and computationally intensive. Additionally, the SAR image of each target varies with azimuth making the images non-invariant to rotation, therefore each target class must be composed of an ensemble of sub-classes (the components of a mixture distribution) that include models of an individual target at several orientations. For optimum classification utilizing template matching, up to 72 orientation sub-classes have been used [4]. With 72 orientation subclasses, a 10- target classifier requires 720 templates. Since there are a large number of subclasses that need to be accounted for, the quantity of samples available for training is rather not sufficient. When divided into 720 subclasses, the portion of samples that belong to a particular vehicle and pose subclass is small. In the Maximum Likelihood model [4] there is only an average of 10.2 sample images used in training each subclass. Another disadvantage in template matching is that the entire scene is considered in the model. This includes the clutter region of the MSTAR chips. The clutter region contains no information about the target and should, theoretically, only add noise to the distribution of the data. Almost all of the template matching methods use each pixel in the images as a feature. Therefore, a 128 x 128 image would contain features. Considering the number of templates needed and the number of features used, the memory requirements for an ATR system that uses 12

22 template matching is relatively large. Additionally, the computational complexity is also large because each image to be classified consists of individual features and needs to be compared with up to 720 templates. Template matching has been accomplished using filter templates [14][15][16] and minimum squared error templates [4][11][12][13]; both of which will be the subject of the next two subsections Filter Template Some success has been achieved with templates that act as filters on a test image. Some of the types of filter templates that have been studied with the MSTAR set are: Maximum Average Correlation Height(MACH) combined with Distance Correlation Classifier Filter(DCCF) [15], Extended Maximum Average Correlation Height [16], Polynomial Distance Correlation Classifier Filter(PDCCF) [16], and Minimum Noise and Correlation Energy(MINACE) [14]. The MACH/DCCF combination was the first study to propose a filter template method. It investigated the use of the filters to classify three target classes, BTR70, T72, and BMP2. In the MACH/DCCF method, averaging all training images that belong to the class creates a class template. A DCCF filter is then created that maximizes the interclass distance. This DCCF filter is then applied to all the templates. First, for classification, the MACH and DCCF filters are applied to a test image. Then, an estimate of the target pose is calculated using a morphological segmentation process. A set of templates that correspond to the target pose is then selected. Next, the mean squared error between the filtered 13

23 image and each of the templates is calculated. Finally, the test image is classified as the class of the template that produced the minimum mean squared error. The percent of test images that were correctly classified, P C, was 98.2%. Thus, for the three class problem, the MACH/DCCF approach was very accurate. Extensions to the MACH/DCCF were also studied. The extended maximum average correlation height filter(emach) was an improved version of the MACH filter [1]. The EMACH approach was applied a ten class set instead of three classes so the improvement due to the extension cannot be directly seen from the results. For the ten classes, the EMACH approached produce a P C of 95%. An extension to the DCCF called polynomial distance correlation classifiers(pdccf) filters was developed as well [16]. The PDCCF extended the DCCF method by first raising the individual features by an integer power before applying the distance correlation operation. The PDCCF was applied to a three target class set. The P C reported for it was 99.1%. The azimuth was assumed to be known so the results can not be directly compared with other studies. MINACE filters were also investigated as a template for a three class ATR system [14]. The construction of the MINACE filters is very different from the MACH/DCCF approach. MINACE filters are synthesized iteratively from a training set until a threshold is met. In the first iteration, only one training image is used to construct the filter. The filter is then used to calculate a correlation score for each of the images in the training set. If all correlation scores are above a predetermined threshold, filter synthesis is completed. If at least one correlation 14

24 score is below the threshold, the filter is updated. The filter is updated by using a linear combination of the previous images used to construct the filter and the training image with the lowest correlation score. Once the filter has been updated, the correlation scores are computed again. The process is repeated until all images in the training set have correlation scores above the threshold. One property of the MINACE filter synthesis process is that if there is too much variation among images in the training set, no filter will be produced. In such a case, it is necessary divide the training set into smaller subclasses so that a filter can be synthesized. This property can be used to detect how many MINACE filters are required to represent all azimuth range defined subclasses of a single target class. Using the minimum acceptable number of subclasses keeps the number of filter operations and overall computational complexity of the recognition operation low. To determine the minimum number of target subclasses, a training set is first selected using images of the same target class over a large range of azimuths. If a filter cannot be synthesized, i.e. the minimum correlation score never exceeds the threshold; the training set is divided into smaller azimuth subranges. The process continues until filters are successfully synthesized for each azimuth subrange. For the MSTAR three class set, the minimum number of filters needed for each class was four. However, to improve the results, six filters per class were used in the final evaluation. Additionally, three target classes were considered for the final evaluation of the MINACE filter approach. Clutter images as well as confuser images were 15

25 included in the set of test images. Clutter images are SAR images of bare terrain with no target included. Confuser images are SAR images of objects that are not part of the set of target classes of the ATR system. Including the confuser and clutter images in the test set measures the ability of the ATR to reject images not belonging to any of the target classes. The MINACE approach had a best P C of 86.4% using a threshold value of Minimum Squared Error Minimum Squared Error(MSE) approaches also treat each pixel as a feature. Two successful MSE models are Quarter Power and Log Magnitude. Another model called Conditionally Gaussian is investigated in the same paper as the log magnitude and quarter power. The Conditionally Gaussian uses every pixel as a feature like the other two, however, its discrimination function is a probabilistic likelihood estimate rather than a squared error distance. Each of these three approaches models every pixel in the image as an independent identically distributed random variable. The quarter power model assumes a gamma distribution, the log magnitude model assumes log normal model, and the conditionally Gaussian model assumes a complex normal distribution. The feature vector of a sample image can be formed by concatenating the columns of the n by m complex value image matrix into a vector r of length mn. The number of templates used is the product of the number of target classes and the number of orientations used. As discussed earlier, the total number of templates needed was 720 for the most complex model. To study the effect of 16

26 image size on complexity and performance, image sizes of 48x48, 64x64, 80x80, 96x96, 112x112, and 128x128 pixels were used [4]. The quarter power model is shown to have the best fit to the distribution of pixel magnitudes in the MSTAR data set [11][17]. The goodness of fit of each of the models has been ranked in order from best to worst as Quarter Power, Complex Gaussian, and Log Magnitude [11]. The clutter region distribution is not well represented in the log magnitude approach that corresponds to it being ranked worst. Log normal and quarter power models are shown to represent the distribution inside the target region equally well [11]. This result showed that a statistics based ATR that considered pixels only in the target region would work equally well using the log normal or quarter power model. However, an ATR system that included pixels from the clutter region would perform better using the quarter power model rather than the log normal. The results of the three model study [11] are tabulated in Figure 2(b) of that document. A reproduction of that figure is shown in Figure Figure: shows that the Normalized Conditionally Gaussian model with a template size of 96x96 using 72 azimuth windows performed the best with an accuracy of about 97.5%. The accuracy using the same statistical model is above 96.5% for all other window sizes. In the MSTAR data set, the target regions of the images are almost always wholly located within the center 48 x 48 pixels of the image. Additionally, the shadow region can be partially occluded at this size. Since the recognition of the model at a window size of 48 x 48 is comparable to the result at all other 17

27 Figure: : Results of 3 Model Study (Taken from [4]) window sizes it is evident that the target region contains the majority of the discriminating features for target recognition. The log magnitude approach also demonstrates that the target region pixels are contained inside the center 48 x 48 pixels of the MSTAR images. The accuracy of the Normalized Log-Magnitude model at that image size is about 95%. Accuracy of the model continues to decrease monotonically with increases in the size of the image templates. At 128 x 128, the accuracy has decreased to around 80%. The result is due to the increased number of pixels that belong to the clutter region as the template size is increased. This confirms that the log normal model does not fit the distribution of the clutter pixels [17]. 18

28 The Quarter Power approach accuracy is similar to the Conditionally Gaussian approach and provides a good fit to the clutter pixel distribution [17]. Taking the square root of the image transforms the approximately Gamma distributed pixel magnitudes of the clutter region into a normal distribution. For this study, the quarter power approach was implemented in order to establish a baseline performance for comparison with the proposed classifier. The set of training samples used to construct the quarter power templates is listed in Table Table : Class Training Set Size All images used in training set are at 17 degrees compression. Each template vector ˆ ( θ, ) µ is generated using the following equation: k a l Vehicle Class Number Of Training Images Average Number of Training Images Per Azimuth Window 2S BMP BRDM BTR BTR D T T ZIL ZSU Entire Set ˆ µ i 1 = χ ( θ, a ) r for 1 k 72 1 l 10 k l k, a i r χ k, a 1 2 (1) Where θ k is the kth azimuth window, a l is the l -th target class, and χk, a is the set of training images that to belong to target class a and azimuth window θ k. χ k, a is 19

29 the number of training images in the training set. In this case, 72 azimuth windows were used. A training image belongs to the kth azimuth window if its azimuth θ in degrees satisfies: ( 1) 5( k + 1) for 1 k 72 5 k θ (2) Note that the condition in (2) allows one image to belong to more than one azimuth window and guarantees that all images within 5 degrees of azimuth from the center of the azimuth window are included in the in the construction of the template. The azimuth window size was chosen because scatterers typically don't persist past 5 degrees of rotation [10]. Also, χ k, a is the set of training samples for the kth azimuth window and target class a. Note, because each training was used in the construction of two templates, the actual number of available training images per azimuth window and target class is half of the number listed in Table Since the images are assumed to be persistent over an angle of about 5 degrees, there are only 5.1 unique images per azimuth window and target class. The next step is to classify a test vector r. The class is estimated by finding the template that has the smallest square distance to r. This can be stated as: ( θ k, al ) ( θ a ) r µ aˆ = arg min min (3) a θ r µ, 2 k l

30 It was stated previously that the Quarter Power model was the best fit to the distribution of the pixels [17]. However, the Conditionally Gaussian model performs better than the quarter power model in the results [11]. This unexpected result may be explained by the classifying function used for the conditionally Gaussian case. The conditionally Gaussian classifier treats each pixel as zero mean complex Gaussian random variable. The variance of each pixel is estimated from the training data using equation (4). 2 ˆ σ i 1 = χ 2 ( θ, a ) r for 1 k 72 1 l 10 k l k, a χ r k, a i (4) The log likelihood conditioned on a given class and pose is: 2 r 2 i l( r θ, a) = ln( σ i ( θ, a) ) (5) 2 i σ ( ) i θ, a For determining the class of a sample image, the log likelihood of all poses of a given class are combined to make a decision by (6) aˆ = argmax a l( rθ k, a) [ e ] k (6) Since scatterer centers may persist across two windows, more than one template may assist in positive identification of a sample image. However, the advantage is small and almost completely disappears at an image size of 128x128. For this study, the quarter power approach was used to confirm the results stated in [4] and to use as a base for comparison. Table , Table , and Table : show the confusion matrix of the experiment for 128x128, 96x96, and 21

31 48x48 respectively. The number of samples used for testing the Quarter Power classifier is shown in Table Table : Testing Set Size Vehicle Class Number of Training Images 2S1 274 BMP2 587 BRDM2 274 BTR BTR D7 274 T T ZIL ZSU Entire Set 3203 All images used in the testing set are at 15 degrees compression. This is slightly different than the 17 degrees compression of the training images. However, it is assumed to be negligible [4]. Before they can be vectorized, the images must be trimmed so that they all have the same number of pixels. For the set of images in this study, the maximum image size that can be used is 128x128 pixels. Table shows the result of the classification using 128 x 128 image size. Table : 128X128 Image Size Quarter Power Confusion Matrix Estimated Class Classification Target 2S1 BMP2 BRDM2 BTR60 BTR70 D7 T62 T72 ZIL131 ZSU23 4 Accuracy 2S % BMP % BRDM % BTR % BTR % D % T % T % ZIL % ZSU % Estimate Confidence 99.2% 92.7% 99.6% 99.5% 100.0% 99.6% 100.0% 94.5% 99.6% 98.2% Total Classification Accuracy 97.16% Actual Class 22

32 From the confusion matrix, the accuracy of the classifier for each target class can be read. The rows of the table indicate the actual class of the test images. The columns indicate the estimated class of the test images produced by the classifier. The cell in the table that has the same actual and estimated class show the number of test images that were estimated correctly. The other cells show the number of test images that were estimated incorrectly. Table and Table : show the result for image sizes of 96x96 and 48x48. Table : 96X96 Image Size Quarter Power Confusion Matrix Estimated Class Classification 2S1 BMP2 BRDM2 BTR60 BTR70 D7 T62 T72 ZIL131 ZSU23 4 Accuracy 2S % BMP % BRDM % BTR % BTR % D % T % T % ZIL % ZSU % Estimate Accuracy 98.5% 94.7% 98.8% 98.5% 99.5% 99.3% 98.8% 94.0% 98.9% 97.8% Total Correct Percent Was 97.07% Actual Class Table : 48X48 Image Size Quarter Power Confusion Matrix Estimated Class Classification 2S1 BMP2 BRDM2 BTR60 BTR70 D7 T62 T72 ZIL131 ZSU23 4 Accuracy 2S % BMP % BRDM % BTR % BTR % D % T % T % ZIL % ZSU % Estimate Accuracy 98.1% 95.6% 97.6% 95.5% 93.8% 98.2% 94.4% 94.0% 92.7% 96.7% Total Correct Percent Was 95.50% Actual Class 23

33 The use of each pixel as a feature has typically produced the best classification results on the MSTAR dataset. Other studies have also used raw pixel features as input to template construction. In [18], a three target classifier was proposed that uses pixel magnitudes as part of its feature set. It also uses the pose corrected DFT coefficients. A pose estimation algorithm based on segmentation of the target region is used. The pose estimate is used to rotate the image to a standard aspect angle of 180 degrees before calculating the DFT. In this way, the feature set is transitionally and rotationally invariant. In order to reduce the complexity of the target recognition problem several methods for feature extraction have been suggested and investigated for the MSTAR dataset. While template-based methods have been shown to produce near 100% accuracy, these latter approaches attempt to maintain high accuracy and, simultaneously, to control the high input space dimensionality, which in turn greatly reduces the required complexity of the classifier. Several feature sets have been used for classification of the MSTAR dataset. The most studied is the topography of scatterer peaks within the image. In [19] the magnitude, location, size, and frequency dependence of the peaks in an image are used as features. A maximum likelihood classifier is developed to determine the target class from a variable length collection of peak locations. Classification rate of target in that research was 96.17%. However, the method used in [19] included an indexing step in which the test image was correlated to the 24

34 all the training images. The correlation score was used to determine the top 50 most likely class membership candidates for a test image. Then, from the indexed candidate classes, the peak feature classification was determined. Thus the classifier used both template correlation and scatterer topology. Additionally, the test image set only included 2747 images instead of the full 3203 available in the MSTAR dataset. Other studies that use features derived from the peaks [20] had similar difficulty achieving good classification rates using only peak scatterer topology as the input feature set. The location of peaks is considered a very lowlevel feature. Another attempt focused on locating not only peaks, but edges and corners as well [21]. The latter work only considered 3 targets in the recognition problem so the results cannot be directly compared to using peaks only. However, the results were fairly similar. Other researchers consider higher level features as well [22][23]. However, none of the studies reviewed had performance comparable to those achieved using template matching techniques. 2.2 High-Level Feature-Based Recognition A variety of high-level features (other than the raw image intensity values) are capable of conveying important information about a target class. For instance, the pose of the target has already been shown to be a feature that has a large impact on the classification accuracy of an ATR system [4][15][18]. Even though the estimates of the pose are imperfect, their use in the ATR systems reduces 25

35 complexity and can improve performance [18]. Other heuristic features from the problem domain have been investigated for use in SAR ATR. Among the most obvious choices for high level features are the length and width of the target. Other high level feature specific to radar are the average radar cross section(rcs) and log standard deviation(lsd) [24]. The radar cross section is a measure of how well a target reflects radar waves. The average radar cross section is therefore a feature that is related to the overall reflectivity of the target. The LSD measures the intra-pixel variation in intensity in the target region. Using only the RCS, LSD, length and width, a classification accuracy of 92% was achieved [4]. Nevertheless, the pose was assumed known beforehand, which inherently simplifies the ATR problem. Still, the use of high level features was shown to be a useful tool in ATR, considering that the smallest number of features used in the template approaches was 2304(48x48 pixels) [4] as opposed to the four in the high level feature approach [24]. Another common feature used specific to radar is the location and magnitude of peaks [21][22][24][25][26]. In addition to the aforementioned, heuristic features, additional ones, common to general image processing, are also used in ATR systems. Simple optical features such as edges and corners have been investigated [21][27]. One of the studies focused on deriving features from the shape of the target region. A set of features that can be derived from the shape of a region is Hu Moments [28]. Hu moments have the advantage that they are rotation and scale invariant. However since the target images are not rotationally invariant this 26

36 advantage is nullified. Using a 3 nearest neighbor classifier and Hu Moments, a classification accuracy of 76.85% on a set of 7 vehicle classes was achieved. Also in the same paper, principle component analysis(pca) and independent component analysis(ica) were considered [28]. PCA is a method of reducing the number of features in a dataset while still retaining the majority of the variation between elements of the dataset [29]. The complexity problem caused by a high-dimensional feature space is fundamental in pattern recognition and is called the curse of dimensionality [29]. PCA does not have a direct visual interpretation; rather it is a statistical feature. The PCA transform matrix for the first 200 principle components was estimated from a data set of x65 pixel images with 7 vehicle classes. A 3 nearest-neighbor (3- NN) classification of 96.47% was achieved on a test set of 679 images. It is not stated explicitly, but a graph presented from the same study shows that using only 25 principle components achieves similar performance [28]. The result of PCA is very good. However, it requires that the covariance matrix of the training set be calculated, and then, the eigen-vector decomposition of the covariance must be determined. For the 7 target class set, the training set is a 679x4225 covariance matrix, whose decomposition is quite computationally demanding. Nevertheless, it is only performed once during training and, subsequently, classifying a test pattern is relatively quick and inexpensive, ICA is similar to PCA in that the number of features is reduced. However, ICA seeks to find latent components in the feature data that are independent rather 27

37 than just uncorrelated as is done in PCA. ICA can only find components that have distributions that are not Gaussian [30]. Using ICA on the same images yielded a classification rate of %88.3, which is much worse than the more simple PCA. 2.3 Large Margin Classifiers A problem with many trained ATR classifiers is overfitting. Overfitting occurs when a classifier becomes too specific to the training data. An overfit classifier does a good job of identifying class membership of the training set, yet lacks enough generality to have similar classification accuracy on test data. Large margin classifiers attempt to partition the feature space, such that different classes are maximally separated in some identifiable way. Support vector machines(svm s) are a type of classifier that obtain good generality by maximizing the margin in feature space between two classes. SVM s are discussed in more detail in section 4.3. For the 7 target classification system 3- NN was more accurate than SVM [28]. However, SVM s have been successfully applied in many other studies [31][32][33][34]. The performance of each approach varied because of differences in implementation, feature set, and assumptions. One moderately successful approach used all the pixel values of normalized 80x80 images as the input features for an SVM classifier [13]. The SVM was trained using three sets of images. Radial basis function kernels were used to transform the input space before applying the SVM. The achieved classification rate was 28

38 90.99%, which suggests that the simple MSE templates outperform specialized classifiers, when all pixels are used as the feature set. Another large margin classifier that has been used on the MSTAR database is Adaptive boosting(adaboost) [18]. Adaptive boosting is a method for selecting an ensemble of training patterns that maximize the margin between classes. This is accomplished by identifying, which patterns in a training set are hard to classify. Adaboost is applied at the output of a base learner. In the case of the experiment on the MSTAR data set, a Gaussian radial basis function neural network was used as the base learner. After the base learner is trained, the training set is examined using the base learner. Adaboost then applies weights to the training patterns according to their score from the base learner. The training patterns that are harder to classify are assigned higher weights, so that the next training iteration of the base learner will be focused on the harder to classify patterns. The feature set for the Adaboost experiment is one of the largest among all of the studies reviewed for this thesis. For feature extraction, the pose of each target was estimated. The image is then rotated to a standard pose, cropped to an 80x80 size, and the DFT calculated. Because of the nature of the targets, the pose estimate can be off by 180 degrees when the back of the target is mistaken for the front. When the pose estimate is off by 180 degrees, the image rotated by the wrong angle and this do not match other images which were rotated properly. This causes variation between images of the same class and can lead to poor classification performance. To mitigate the effects of bad pose estimation, the pixel 29

39 intensities of the original image are included as part of the feature set for a total of elements. The two different types of features are used to train two individual neural networks. At the end of each iteration, the network that performs better is used to determine the weights for the Adaboost step. This trade off causes a significant increase in the complexity of the training system. The Adaboost experiment has the highest classification rate of any of the studies reviewed. For the three class set, it had a classification accuracy of 99.63%. The experiment also had the most computationally intensive implementation. 30

40 Chapter 3 Segmentation Before being able to utilize the target outline description features considered in this study (elliptical Fourier descriptors), an important preparatory step is to isolate the target region from the clutter and shadow regions within the intensity images. This is achieved via segmentation of the image into target and non-target regions. Although for a human observer, it is very easy to visually separate these regions, the automatic segmentation of the image is not as straightforward. The pixels in the MSTAR images represent the intensity of the radar return. The MSTAR database includes several types of vehicles including trucks, tanks, transport, guns, and so-called confuser vehicles. The confuser vehicles are targets of non-military nature such as the bulldozer. 31

Zil131 Truck BTR-60 Personnel Transport 2S1 Gun D7 Bulldozer Figure 3-1: MSTAR Vehicle Types As mentioned

region. Figure 3-2 shows each of the three regions in a sample image.

The intensity of the radar return depends on the material of the target and the angle of incidence of the radar

The shadow region has dark pixels which represent the void behind the target where the radar pulse does not reach

41 Zil131 Truck BTR-60 Personnel Transport 2S1 Gun D7 Bulldozer Figure 3-1: MSTAR Vehicle Types As mentioned earlier, the images can be separated into three distinct regions: target region; shadow region; and clutter region. Figure 3-2 shows each of the three regions in a sample image. The target region is characterized by brighter pixels called peaks that represent the actual target. The intensity of the radar return depends on the material of the target and the angle of incidence of the radar pulse with respect to the target s surface. The shadow region has dark pixels which represent the void behind the target where the radar pulse does not reach because it is blocked by the target. The clutter region has pixels that represent the noisy radar reflections from the ground surrounding the target. Shadow Clutter Target Original Image Segmented Image Figure 3-2: Shadow, Clutter, and Target Regions 32

42 3.1 Previously Implemented Methods Many segmentation techniques have been proposed in previous studies [18][19][22][35]. All techniques proposed included two basic steps. One step is to apply a threshold to the image to create a binary image. The other step is to apply morphological operations to the binary image. Synthetic Aperture Radar Automatic Target Recognition Using Adaptive Boosting [18], a segmentation method was proposed that involved histogram equalization; mean filtering; and a two step threshold.. In the first phase of this method, a predefined threshold value is chosen and, subsequently used to threshold the image. Next, median filtering is used to reject salt-and-pepper noise from the thresholded image [36]. Then a global threshold is calculated using the distribution of pixel intensities in the filtered image. The threshold is then applied to the smoothed image. Following the image binarization procedure, a Sobel operator is applied to detect edges [36]. A morphological operation called dilation [36] is applied to the output of the Sobel operator to produce the final segmented region. The many steps involved highlight the difficulty in designing a robust algorithm that segments images satisfactorily. A more straightforward approach to segmentation was proposed in [35]. The 200 brightest pixels are used as the first estimate of the target region. Then, a morphological close [36] operation is used to fill in the areas between the brightest pixels. The close operation is discussed in detail later in this document. A similar 33

43 process was also used to segment the shadow region. Thresholding followed by morphological operations is also used in other proposed methods of segmentation of SAR images [19][22]. 3.2 Proposed Method of Segmentation Although many different methods have been proposed, comparison of performance between the different methods can only be made subjectively as there is no way to ascertain which pixels truly belong to the target region. As such there is no way to determine which method of segmentation was best to use in this thesis. Additionally, the descriptions of the methods were not detailed enough to be recreated for the purpose of using the methods in this thesis. Therefore, a method of segmentation was used that concentrated on the two basic steps of thresholding and morphological operations in order to keep the implementation simple and computationally efficient. In this thesis, the proposed method of segmentation of the image into target and non-target regions is accomplished by estimating the set characteristic function of the target region from the input image. The set of pixel locations in an input image is defined as X. The set of pixel locations that belong to a segmented region R is a subset of X. Formally, The characteristic function of R is defined as R X (7) { 0,1} 1R : X (8) 34

44 That is the characteristic function maps pixel locations from X to a value of one if the location exists in R and maps to zero otherwise. Segmenting the image is commonly accomplished by thresholding the image and then applying morphological operations. A threshold can be used to turn an intensity image represented by the function ( x y) function, T ( x, y) 1 by the following equation: 35 I, into a characteristic 0, I( x, y) < T 1 T ( x, y) = (9) 1, I( x, y) T where T is the threshold. When considering the target region, the intensities of some of the pixels inside the region are not greater than the intensities of some of the pixels in the clutter region. Consequently, a simple threshold function will not produce a good result, as it will be demonstrated shortly. In order to increase the accuracy, a priori information about the chip must be considered. To begin, all the images in the database are centered. Therefore, pixels near the edge of the image will not be considered, and the peak pixels inside the target region are well above the clutter level. The peaks can be easily separated from the clutter using a global threshold. The image is then segmented again using a lower threshold that will capture the entire target region but includes some of the clutter. Figure shows the original image of a BTR 60 Personnel Transport. The peaks are shown in Figure and were segmented using a threshold of: () I T = 0.4 max (10)

The target region is shown in Figure 3.2-3 and was segmented using: () I T = 0.15 max (11) Figure 3.2-1: Original Image Figure 3.2-2: Peaks Figure 3.

45 The target region is shown in Figure and was segmented using: () I T = 0.15 max (11) Figure 3.2-1: Original Image Figure 3.2-2: Peaks Figure 3.2-3: Target Region It is evident from the target region image in figure that there are several pixel locations outside of the actual target region that are included in the initial target region extraction. Additionally, there are pixels inside the target region that have a zero value. Clearly, it seems desirable to obtain a segmented target region that is simply connected and has relatively smoother boundaries than the ones obtained. 36

46 3.3 Morphological Operations A common approach in segmentation process is the use of morphological operations. There are two basic morphological operations utilized, dilation and erosion [36][37]. The morphological operations use a structuring element to distort a binary image. A structuring element is usually a disk or square shaped set of pixel locations, however, any arbitrary shape can be used for a structuring element. Consider S, a zero centered 3 by 3 structuring element. { 1, 1, 1,0, 1,1, 1,0, 0,0, 1,0, 1, 1, 1,0, 1,1 } S = (12) The translation of S to a position S x x X is defined as: { c c = x + s s S} = : (13) The morphological dilation operation is then defined as: { x S R x R} R S = : x 0 (14) The dilation operation expands the boundaries of all non-zero regions in a binary image. The opposite of the dilation operation is the erode operation. Instead of expanding the boundaries, the erode operation shrinks the non-zero regions. { x S R x R} R S = : x (15) The close operation shown in equation 10 is a combination of two morphological operations, erosion and dilation. In the close operation, the image is first dilated to close any wholes or small gaps, and then the image is eroded to recover the edges of the original image. R S = ( R S) S (16) 37

47 The close operation tends to fill holes and small gaps in a binary image while preserving the boundary detail of an image. It also produces a continuous region from a cluster of closely packed but disconnected regions. The close operation was previously used in target segmentation in Measured and Predicted Synthetic Aperture Radar Target Comparison [35]. Figure shows the target region image after it is closed with a 3 by 3 structuring element. The binary image after the close operation still contains many pixels outside the target region. A region is a set of connected pixels and many of these pixels are not connected to the main target region. Figure 3.3-1: Close Target Region The next step is to select the region that shares pixels with the characteristic function of the peaks regions. This will guarantee only pixels from the target region remain. The ith subset of 1 T is connected if it satisfies the following condition: Ri = ( ri : Sr Ri 0 ri Ri ) i (17) S is a 3 by 3 structuring element that does not include the center pixel location: { 1, 1, 1,0, 1,1, 1,0, 1,0, 1, 1, 1,0, 1,1 } S = (18) 38

48 Utilizing this structuring element employs 8 connectivity. 1 T Can be used to generate a subset of X that is composed of pixel locations that belong to the peaks P which can be written as: { x 1 = for x X } P = : p 1 (19) Since the peaks in the image are assumed to be part of the target region, the regions of P that belong to the target region are those regions that intersect with P. This can be stated as: R T { R : R 0} = i i P (20) The final image is shown in Figure The final characteristic function contains all pixels belonging to the target region. This approach preserves edge information of the target region while rejecting noise from the clutter region. Figure 3.3-2: Close Target Region 3.4 Combining Disconnected Target Region The goal of the segmentation is to produce a single region that represents the target region. However, it is possible that following the described methodology would produce more than one region. If there is more than one region after 39

49 segmenting the target region, the image must be further manipulated by morphological operations so that a single region is identified, while having a minimum impact on the current shape of the images. Previously, the morphological close operation was utilized as a method for filling in small gaps in an image. Since the close operation has already been applied and failed to connect the regions of interest, a new larger structuring element must be used for connecting. To connect any disconnected regions, a variable sized structuring element is used. S () i is an i by i square structuring element. Our original structuring element would be S () 3. The following algorithm is proposed to connect the regions: 1. Start with i = 4 2. Compute RT + = RT S() i 3. If R T + is composed of one continuous region, stop 4. Else set R = T RT + 5. Set i = i Repeat from step 2 Once the regions have been connected, the next step is to extract the subset of R T that describe the boundaries of the target. A pixel location lies on the boundary of the target region if it is adjacent to at least one pixel location outside of R T. The boundary region R B can then be expressed as: { r S R for r R } R T B = T : r T T T (21) 40

Figure 3.4-1 shows the original image with the target region outlined in white. 20 40 60 80 100 120 20 40 60 80 100 120 Figure 3.

50 Figure shows the original image with the target region outlined in white Figure 3.4-1: Original Image with Target Outlined In White With a segmentation method for isolating the target region from the image, it is possible to determine the performance of the template matching systems in lieu of most of the clutter. For the classification methodology proposed in this study, measuring this latter performance is imperative to certify the quality of target region segmentation and facilitate high classification rates. If template matching methods yield good results on the segmented images, this translates to high confidence in isolating the target region. In our research, this is important because the region s outline is eventually characterized by contour descriptors, which are used as classification features. This is similar to what was done by [35] where the clutter was removed from the image before comparing the image to templates for classification [35]. The use of segmentation before classification using template methods was also 41

51 suggested as an area of possible future research [4]. However, no references were found that investigate statistical based templates after segmentation. For this thesis, the target region of the training images was segmented and all other pixels outside the identified region were ignored (set to zero). The quarter power model was then used to generate templates via equation (1). The test images were segmented in the same manner and classified using equation (3). The result of the classification for an image size of 48x48 is shown in Table Table 3.4-1: 48X48 Quarter Power Template with Target Region Segmentation Estimated Class Classification 2S1 BMP2 BRDM2 BTR60 BTR70 D7 T62 T72 ZIL131 ZSU23 4 Accuracy 2S % BMP % BRDM % BTR % BTR % D % T % T % ZIL % ZSU % Estimate Accuracy 98.1% 95.9% 93.2% 92.4% 86.8% 99.3% 95.0% 91.6% 91.5% 97.0% Total Correct Percent Was 94.10% Actual Class The results with window sizes of 96X96 and 128x128 are in Table and Table respectively. Note that since most of the target region lies within the inner 48x48 pixels, the performance of the classifier is nearly the same for all window sizes. Equally important, note that after comparing these tables to their counterparts in Section (Tables through -5), where the entire image is used, classification rates are very close, which validates the segmentation procedure 42

52 used in this thesis. This result asserts that target regions are successfully identified, and opens the way to describe their contours. Table 3.4-2: 96X96 Quarter Power Template with Target Region Segmentation Estimated Class Classification 2S1 BMP2 BRDM2 BTR60 BTR70 D7 T62 T72 ZIL131 ZSU23 4 Accuracy 2S % BMP % BRDM % BTR % BTR % D % T % T % ZIL % ZSU % Estimate Accuracy 98.1% 95.7% 93.2% 92.4% 86.8% 99.3% 95.4% 91.6% 91.5% 97.0% Total Correct Percent Was 94.10% Actual Class Table 3.4-3: 128X128 Quarter Power Template with Target Region Segmentation Estimated Class Classification 2S1 BMP2 BRDM2 BTR60 BTR70 D7 T62 T72 ZIL131 ZSU23 4 Accuracy 2S % BMP % BRDM % BTR % BTR % D % T % T % ZIL % ZSU % Estimate Accuracy 98.1% 95.7% 93.2% 92.4% 86.8% 99.3% 95.4% 91.6% 91.5% 97.0% Total Correct Percent Was 94.10% Actual Class 43

53 Chapter 4 Elliptical Fourier Descriptors It is the intent of this thesis to determine to what degree target region contours can be used as a high-level feature in the classification of SAR images. Since the target region is a single completely connected region, its boundary forms simple (non-intersecting), closed contour. A well established method for representing contours parametrically is the use of various Fourier descriptors [38][37]. The most common Fourier descriptors encountered in the literature are the complex Fourier descriptors. In order to compute them, the pixel coordinates of the target outline R B are vectorized so the t th element of the vector is the most adjacent element to the (t-1) th element in a clockwise direction. Considering the image space as a complex plane, a contour pixel with coordinates (x,y) is represented as a complex number R B (n)=x+jy and the complex Fourier descriptors of a contour R B consisting of N pixels are computed as: F ( k) = R ( n) 44 N t= 1 B e kn j2π N (22)

54 where F ( k) is, in general, complex-valued and represents the k th discrete Fourier transform(dft) [39] coefficient of the complex-valued sequence R B. Evidently, the higher order coefficients of the descriptors represent high frequency content in the shape of the outline. By setting these coefficients to zero and reconstructing the contour from the remaining descriptors, the contour is essentially smoothed. This is analogous to low pass filtering and is useful to remove high frequency noise from the target region boundary stemming from the image acquisition phase or the segmentation process. In this thesis, it was determined that the complex Fourier descriptors require a significant amount of higher order terms to satisfactorily describe a contour of many of the images in the dataset. Figure 4-1 shows the outline of the HB MSTAR image Figure 4-1: Outline of MSTAR image Figure 4-2 shows the outline of the image parameterized with 20 Fourier descriptors. 45

20 40 60 80 100 120 140 20 40 60 80 100 120 140 Figure 4-2: Outline represented by 20 Fourier Descriptors.

55 Figure 4-2: Outline represented by 20 Fourier Descriptors. It is evident that more descriptors are needed to represent the outline, as the outline still seems too round. Since the targets are vehicles and are more rectangular, their contours contain high frequency components at the corners of the image. These hard edges require a much higher number of Fourier descriptors for a proper representation of the target boundary. For a good approximation of the boundary, an average of about 114 descriptors was needed. Figure 4-3 shows a reconstruction of the target boundary using 114 descriptors Figure 4-3: Outline represented by 114 Fourier Descriptors 46

56 To get around this shortcoming, an alternative to the aforementioned type of Fourier descriptor was used, namely elliptical Fourier descriptors, which are examined in the next section. 4.1 Elliptical Fourier Descriptors Elliptical Fourier descriptors are a parametric representation of closed contours based on harmonically related ellipses [38]. Each elliptical Fourier descriptor is composed of four coefficients: a, b, c, and d. Any closed contour with coordinates (x(t), y(t)) parametrized by t [ 0,2π ) can be constructed from an infinite set of elliptical Fourier descriptors using the synthesis equation (24) stated below x y () t () t = k = 0 cos F k sin ( kt) ( ) kt (24) where: F k a = ˆ c k k bk = dk 2π 2π 1 1 x 2π T 1 x( t) cos( kt) 2π 2π dt = 0 0 2π 2π 2π 0 y( t) sin( kt) 1 1 y sin 2π 2π 0 0 () t cos( kt) dt x() t sin( kt) () t cos( kt) dt y() t ( kt) dt dt (25) 47

57 The zero th term in the series expansion describes the center location of the contour, and thus a 0 and c 0 are the x and y coordinates of the center respectively. Note that b 0 and d 0 are always zero. Notice that each of the coefficients is a matrix, which defines an ellipse [35]. The reconstruction of the contour can be viewed as the superposition of phasors and is accomplished by computing equation 20 over the interval t = ( 0,2π ). Each phasor corresponds to a coefficient matrix and rotates at a rate proportional to its harmonic number k. While rotating, it traces the outline of an ellipse, whose major and minor axes sizes are determined by the a, b, c and d values of the coefficient matrix. In other words, a contour can be described as a superposition of simultaneously rotating phasors, each one tracing an ellipse, at angular speeds that are multiples of a fundamental one. The advantage of these elliptical orbits is that they allow for a good estimate of the target s shape, even if only low-order term of the elliptical Fourier descriptors are employed. Figure shows the contour of HB reconstructed from four elliptical Fourier descriptors (0 through 3). 48

Figure 4.1-1: Contour with 4 Elliptical Fourier Descriptors Figure 4.1-2 shows the graphical representation of the reconstruction of the contour from Figure 4.1-1. Additionally, Figure 4.

58 Figure 4.1-1: Contour with 4 Elliptical Fourier Descriptors Figure shows the graphical representation of the reconstruction of the contour from Figure Additionally, Figure shows the ellipses and phasors for 3 points of on the reconstructed contour of the target region boundary. Each ellipse is represented by a different color. A snapshot of each phasor is represented as a straight line from the center of an ellipse to its corresponding edge. k=1 Ellipse k=2 Ellipse k=3 Ellipse Reconstructed Contour t = π t = π/2 (a 0,c 0 ) t = 3π/4 Figure 4.1-2: Reconstruction of Contour 49

The elliptical Fourier descriptors provide such a close estimate of the contour

actually less than would be needed for an accurate parameterization using the

1-3 shows a side by side comparison of regular Fourier descriptor and

20 20 40 40 60 60 80 80 100 100 120 120 140 140 20 40 60 80 100 120 140 19

Descriptors 20 20 40 40 60 60 80 80 100 100 120 120 140 140 20 40 60 80 100 120

140 20 40 60 80 100 120 140 99 Common Fourier Descriptors 50 Elliptical Fourier

59 The elliptical Fourier descriptors provide such a close estimate of the contour with so few descriptors that the total amount of coefficients needed is actually less than would be needed for an accurate parameterization using the commonly used Fourier descriptors mentioned earlier. Figure shows a side by side comparison of regular Fourier descriptor and elliptical Fourier descriptors using the same number of coefficients Common Fourier Descriptors Elliptical Fourier Descriptors Common Fourier Descriptors 25 Elliptical Fourier Descriptors Common Fourier Descriptors 50 Elliptical Fourier Descriptors Figure 4.1-3: Side by Side Comparison 50

It is evident from the figures above that the elliptical Fourier descriptors represent a more accurate description of the boundary of the target region.

60 It is evident from the figures above that the elliptical Fourier descriptors represent a more accurate description of the boundary of the target region. The first harmonic of an elliptical Fourier descriptor describes an ellipse whose properties closely resemble the geometric properties of the target region. Figure shows the ellipse. Figure 4.1-4: First Ellipse of EFD Figure gives a closer inspection of the ellipse and shows that the first elliptical descriptor contains information about the rough dimensional properties of the target region, as well as a rough estimation of pose. The length of the semi-major axis is roughly half the length of the target region. The length of the semi-minor axis is roughly half the width of the target region. Also, the angle of the semi-major axis gives an indication as to the pose of the target. Since the descriptors inherently contain an estimate of the pose, there is no need to estimate the pose in a separate step as is done by other ATR systems [15][18]. All this information is contained within the first elliptical Fourier descriptor. 51

61 Figure 4.1-5: Axes of First Ellipse In several applications, the length and angle of the semi-major axis can be used to manipulate the Fourier descriptors so that they are translational, rotationally, and scale invariant. The descriptors can be made translationally invariant by setting all the zero th descriptors to zero. Furthermore, they can be made rotationally invariant by 'rotating' the image to align the semi-major axis to be aligned with the vertical axis. This is achieved by diagonalizing F 1. They can also be made scale invariant by dividing all the descriptors by the length of the semi-major axis or a few other, more robust schemes involving matrix norms. For this study, the target regions are not invariant to rotation, as the radar scatterers on the target vehicle are very sensitive to the angle of incidence of the radar pulse. As such, no manipulations are done to the Fourier descriptors to make them invariant to rotation. Additionally, all targets images were taken from approximately 4.5 to 5km away. Therefore, images are all approximately on the same scale. Considering this, there is no need to adjust the Fourier descriptors for 52

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract