Support vector machines for classification of underwater targets in sidescan sonar imagery

Size: px

Start display at page:

Download "Support vector machines for classification of underwater targets in sidescan sonar imagery"

Roland George
5 years ago
Views:

1 Copy No. Defence Research and Development Canada Recherche et développement pour la défense Canada DEFENCE & DÉFENSE Support vector machines for classification of underwater targets in sidescan sonar imagery M. Couillard Defence R&D Canada CORA J. A. Fawcett Defence R&D Canada Atlantic V. L. Myers Defence R&D Canada Atlantic M. Davison University of Western Ontario Defence R&D Canada Atlantic Technical Memorandum DRDC Atlantic TM November 2008

2 This page intentionally left blank.

3 Support vector machines for classification of underwater targets in sidescan sonar imagery M. Couillard Defence R&D Canada CORA J. A. Fawcett Defence R&D Canada Atlantic V. L. Myers Defence R&D Canada Atlantic M. Davison University of Western Ontario Defence R&D Canada Atlantic Technical Memorandum DRDC Atlantic TM November 2008

4 Principal Author Original signed by M. Couillard M. Couillard Approved by Original signed by D. Hopkin D. Hopkin Head/Maritime Asset Protection Approved for release by Original signed by Ron Kuwahara for Calvin Hyatt Chair/Document Review Panel c Her Majesty the Queen in Right of Canada as represented by the Minister of National Defence, 2008 c Sa Majesté la Reine (en droit du Canada), telle que représentée par le ministre de la Défense nationale, 2008

5 Abstract High-frequency sidescan sonars have become a fundamental tool for modern mine hunting operations. They produce quality images of the seabed and a high probability of detection can be achieved. A crucial classification phase is then needed to accurately identify these contacts as harmless or as potential underwater mines. To ensure the security of the follow-on traffic, all mines have to be identified accurately. At the same time, to avoid delays in the minefield clearing operations, clutter should not be misclassified as mines. For this contact identification task, multiple classification tools are available. This technical memorandum focuses on a powerful classification tool: support vector machines. A comprehensive introduction to support vector machines is provided and their usefulness for the classification of underwater objects in sidescan sonar imagery is investigated. The database used in this study is made of real sidescan sonar images collected during the CITADEL sea trial held at the NATO Undersea Research Center in October It is shown that this classification tool yields excellent classification performances when shadow-based features are used. These performances increase significantly when highlight-based features are added. It is also shown that the Ridge regression approximation is faster than quadratic optimization for large dataset and yields a comparable performance. Résumé Les sonars hautes fréquences sont aujourd hui des outils essentiels pour les opérations modernes de chasse aux mines. Ils produisent des images de qualité du plancher océanique et permettent de réaliser des détections avec un haut degré de probabilité. Une phase de classification est alors nécessaire pour déterminer si les contacts obtenus sont inoffensifs ou s ils sont susceptibles d être des mines immergées. Afin que la sécurité des navires ultérieurs puisse être garantie, toutes les mines doivent être identifiées avec exactitude. D autre part, pour éviter de retarder les opérations de déminage, on ne doit pas confondre des objects inoffensifs, comme des pierres par exemple, avec une mine. Plusieurs outils de classification sont disponibles pour cette tâche d identification des contacts. Le présent document technique est consacré à un puissant outil de classification : les machines à vecteurs de support. Une introduction approfondie est présentée sur les machines à vecteurs de support ; le document examine également l utilité de celles-ci pour la classification des objets immergés dans les images de sonar à balayage latéral. La base de données utilisée dans la présente étude est constituée d authentiques images de sonar à balayage latéral recueillies lors de l essai marin CITADEL au NATO Undersea Research Centre, en octobre Il sera démontré que cet outil de classification présente une excellent capacité de classification lorsque sont utilisées les caractéristiques de classification axées sur l ombre acoustique. Cette performance est considérablement accrue lorsque DRDC Atlantic TM i

6 les caractéristiques de l écho acoustique sont également utilisées. Il sera également démontré que l approximation par méthode de régression Ridge est plus rapide que l optimisation quadratique lorsque les ensembles de données sont vastes, tandis que la performance de classification demeure comparable. ii DRDC Atlantic TM

7 Executive summary Support vector machines for classification of underwater targets in sidescan sonar imagery M. Couillard, J. A. Fawcett, V. L. Myers, M. Davison; DRDC Atlantic TM ; Defence R&D Canada Atlantic; November Background: High-frequency sidescan sonars have become a fundamental tool for modern mine hunting operations. They produce quality images of the seabed and a high number of contacts can be detected. A crucial classification phase is then needed to accurately identify these contacts as harmless or as potential underwater mines. For this contact identification task, multiple classification tools are available. Among these classification tools, a powerful one, support vector machines, has recently received a lot of attention in various fields such as data mining, face recognition, medical detection of micro calcifications and speech recognition. One of the objectives of this technical memorandum is to provide a comprehensive introduction to support vector machines. Also, their usefulness for the classification of underwater objects is investigated. Principal results: Support vector machines are used to discriminate mine shapes from clutter in sidescan sonar imagery. The database is made of real sidescan sonar images of mine shapes and various clutter collected during the 2005 CITADEL sea trial. These objects are first described only by shadow-based features and the impact of adding highlight-based features on the classification performance is then quantified. It was found that this classification tool yields excellent classification performances. It was also found that the Ridge regression approximation is faster than quadratic optimization for large dataset and yields a comparable performance. For our mines versus clutter classification problem, the best choices of support vector kernels are the Gaussian kernel with σ = 10 and the distance kernel with exponent 1 2. With a Gaussian kernel, the Ridge regression approximation and only the shadow based features, a hit rate of 70.0% is reached at a false alarm rate of 29.4%. Adding the highlight based features significantly increases the classification performance yielding a hit rate of 70.0% for a very small false alarm rate of 5.2%. With the mine shapes taken individually, the complete shadow and highlight feature sets produced outstanding classification results. At a 90.0% hit rate, we obtain false alarm rates of 10.0%, 12.5% and 19.0% respectively for the Manta, cylinder and Rockan shapes. Significance of results: This study provides a comprehensive introduction to support vector machines and shows that they are excellent classifiers that should be considered for computer-aided classification software designed for mine hunting. Support DRDC Atlantic TM iii

8 vectors are an efficient approach to large training sets as only a limited number of training points carrying all relevant information for the classification problem are used. The use of kernel functions also allows for non-linear decision boundaries in the original feature space by implicitly mapping the training data into a higherdimensional feature space. This classification performance study is more detailed and complete than shorter studies recently published and shows that highlight based features should be used in addition to the more traditionally accepted shadow-based features. Future work: In the future we would like to investigate the selection of feature sets and the classifier parameters which make the classification process more robust to limited training sets and different seabed conditions. For example, one would hope that the classifier would be able to correctly classify a cylindrical object which has slightly different dimensions than the cylinders/ in the training set. We also hope to apply kernel-based classification methods to other image and signal detection and classification problems. iv DRDC Atlantic TM

9 Sommaire Support vector machines for classification of underwater targets in sidescan sonar imagery M. Couillard, J. A. Fawcett, V. L. Myers, M. Davison ; DRDC Atlantic TM ; R & D pour la défense Canada Atlantique ; novembre Contexte : Les sonars hautes fréquences sont aujourd hui des outils essentiels pour les opérations modernes de chasse aux mines. Ils produisent des images de qualité du plancher océanique et permettent de réaliser des détections avec un haut degré de probabilité. Une phase de classification est alors nécessaire pour déterminer si les contacts obtenus sont inoffensifs ou s ils sont susceptibles d être des mines immergées. Plusieurs outils de classification sont disponibles pour cette tâche d identification des contacts. Les machines à vecteurs de support, qui comptent parmi les plus puissants de ces outils de classification, ont récemment reçu beaucoup d attention dans différents domaines tels que l exploration de données, la reconnaissance faciale, le dépistage médical de la microcalcification, ainsi que la reconnaissance de la parole. L un des objectifs de ce document technique est de présenter une introduction approfondie sur les machines à vecteurs de support. De plus, le document examine leur utilité pour la classification des objets immergés. Principaux résultats : Les machines à vecteurs de support sont utilisées pour établir une distinction entre les formes de mines et les objets inoffensifs (appelés clutters) dans les images de sonar à balayage latéral. La base de données est constituée d authentiques images de sonar à balayage latéral représentant des formes de mines et différents exemples de clutters. Ces images ont été recueillies lors de l essai marin CITADEL de Ces objets sont d abord décrits uniquement en termes de caractéristiques axées sur l ombre acoustique ; on examine ensuite l incidence de l inclusion des caractéristiques de l écho acoustique sur la performance de la classification. Il a été constaté que cet outil offre une excellente performance de classification. Il a également été constaté que l approximation par méthode de régression Ridge est plus rapide que l optimisation quadratique lorsque les ensembles de données sont vastes, tandis que la performance de classification demeure comparable. En appliquant un noyau gaussien, l approximation par méthode de régression Ridge et uniquement les caractéristiques axées sur l ombre acoustique, on obtient un taux de classification de 70,0%, pour un taux de fausse alarme de 29,4%. Si l on tient compte des caractéristiques de l écho acoustique, la performance de la classification est considérablement accrue : 70,0%, pour un très faible taux de fausse alarme : 5,2%. Si l on considère chacune des formes de mine individuellement, l ensemble complet de caractéristiques d ombre et d écho donne d excellents résultats de classification : pour DRDC Atlantic TM v

10 un taux de classification de 90.0%, les taux de fausse alarme sont respectivement de 10,0%, 12,5% et 19,0% pour les mines Manta, cylindriques et Rockan. Importance des résultats : L étude présente une introduction approfondie sur les machines à vecteurs de support et démontre que celles-ci constituent d excellents classificateurs qui devraient être envisagés pour les logiciels de classification assistée par ordinateur destinés à la chasse aux mines. L approche des vecteurs de support est efficace pour les vastes ensembles d apprentissage : elle n utilise que quelques points d apprentissage porteurs de l ensemble des renseignements pertinents pour la classification. L usage de fonctions noyaux rend également possible des limites décisionnelles non linéaires dans l espace des caractéristiques initiales grâce à une mise en correspondance implicite des données d apprentissage dans un espace de caractéristiques comportant davantage de dimensions. La présente étude sur la performance de classification est plus détaillée et plus complète que les études plus brèves publiées récemment ; elle indique que les caractéristiques d écho acoustique doivent être utilisées en plus des caractéristiques d ombre acoustique, dont l usage est traditionnellement plus répandu. Perspectives : Nous souhaiterions étudier ultérieurement la méthode de sélection des ensembles de caractéristiques et les paramètres du classificateur qui rendent le processus de classification plus robuste en présence d ensembles d apprentissage plus limités et dans des conditions de planchers océaniques différents. Par exemple, nous pourrions tenter d obtenir un classificateur qui est à même de classifier correctement un objet cylindrique dont les dimensions sont légèrement différentes de celles des cylindres de l ensemble d apprentissage. Nous espérons également appliquer les méthodes de classification fondées sur un noyau à d autres problèmes de détection et de classification d images et de signaux. vi DRDC Atlantic TM

11 Table of contents Abstract i Résumé i Executive summary Sommaire Table of contents List of figures iii v vii ix 1 Introduction Classification with support vector machines Support vector machines Linearly separable problems Non-linearly separable problems Kernels An example in one dimension Linear regression approximations Ridge regression Kernel partial least squares Fisher discriminant Classification of underwater targets Sonar images and feature extraction Quadratic optimization vs least square regression Performance of various kernels Classification performance for various underwater targets Summary DRDC Atlantic TM vii

12 References Annex A: Additional figures viii DRDC Atlantic TM

13 List of figures Figure 1: Hard margin decision boundary Figure 2: Soft margin decision boundary Figure 3: Discriminant function with Gaussian kernel Figure 4: Types of objects used for the Citadel trial Figure 5: High frequency sidescan sonar image Figure 6: Sonar images of cylinder shapes Figure 7: Quadratic optimization results for various values of C, Gaussian kernel Figure 8: Quadratic optimization vs Ridge regression, Gaussian kernel Figure 9: Results from various least square regression techniques, Gaussian kernel Figure 10: Gaussian kernel results for various values of σ Figure 11: Gaussian kernel vs distance kernels Figure 12: All mines vs clutter Figure 13: Cylinder shapes vs clutter Figure 14: Manta shapes vs clutter Figure 15: Rockan shapes vs clutter Figure A.1: Gaussian kernel results for various values of σ Figure A.2: Gaussian kernel vs homogeneous polynomial kernels Figure A.3: Gaussian kernel vs non homogeneous polynomial kernels Figure A.4: Gaussian kernel vs cosine and Minkowski kernels Figure A.5: Gaussian kernel vs other types of kernels DRDC Atlantic TM ix

14 This page intentionally left blank. x DRDC Atlantic TM

15 1 Introduction High-frequency sidescan sonars have become a fundamental tool for modern mine hunting operations. They produce quality images of the seabed and a high number of contacts can be detected. A crucial classification phase is then needed to accurately identify these contacts as harmless or as potential underwater mines. To ensure the security of the follow-on traffic, all mines have to be identified accurately. At the same time, to avoid delays in the minefield clearing operation, clutter should not be misclassified as mines. For this contact identification task, multiple classification tools are available [1]. This technical memorandum focuses on a powerful classification tool: support vector machines. This classification tool is already used for various tasks such as data mining [2], face recognition [3], medical detection of micro calcifications [4], and speech recognition [5]. One of the objectives of this technical memorandum is to provide a comprehensive introduction to support vector machines classifiers. Their usefulness for the classification of underwater objects in sidescan sonar imagery is also investigated. Other studies used this tool for computer-aided classification of underwater objects [6],[7] [8]. This memorandum provides a more detailed and complete classification performance analysis. Section 2 describes the fundamental principles of support vector machines and presents the main elements that made this classification tool so powerful. A 2-class problem is used to derive the support vector problem for the case of linearly separable classes and then for the more realistic case of non-linearly separable classes. The use of kernel functions for higher-dimensional mapping is also described. A simple onedimensional example is used to illustrate the discriminant function obtained with support vectors. Section 2 concludes with a description of linear regression approximation techniques used to decrease the time needed to obtain the support vector weights for large problems. In Section 3, the utility of support vector machines for the classification of underwater targets in sidescan sonar imagery is explored. The database is made of mine shapes and various clutter first described only by shadow-based features. Results obtained with quadratic optimization and linear regression are compared and the classification performance of various kernels is investigated. Finally, using support vector machines, the impact of adding highlight-based features on the classification performance is quantified. DRDC Atlantic TM

16 2 Classification with support vector machines In this section, we first derive the support vector problem for linearly separable classes and then for the more realistic case of non-linearly separable classes. The use of kernel functions for higher-dimensional mapping is also described. A simple onedimensional example is used to illustrate the discriminant function obtained with support vectors. We conclude this section with a description of linear regression approximation techniques used to decrease the time needed to obtain the support vector weights for large problems. 2.1 Support vector machines In this section, we use the following pattern classification terms. The feature space refers to the R n space (n-dimensional, real-valued number space) where each object to be classified is represented as a n-dimensional point. The dimension n is determined by the number of features used to characterize each object. Features are specific attributes of an object that describe it or set it apart from similar objects. For example, features could include quantities such as length, width or area. A label is an integer associated to each known object and identifying the class to which these objects belong. A discriminant function is a function taking the feature values as input and combining them to yield a classification label. A training set is a subset of all the available objects. The training set is used to create the discriminant function. A testing set is a subset of all the available objects which do not belong to the training set. The testing set is used to evaluate the classification performance of the discriminant function Linearly separable problems When faced with a 2-class classification problem, a common approach [9] [10] [11] is to find an hyperplane that separates the two classes in the feature space. Let {x 1,..., x n } be our training set with associated labels y i { 1, 1}, i =1...n, the Equation of this hyperplane is given by: w T x + b =0, w R n, b R (1) 2 DRDC Atlantic TM

17 Among all hyperplanes separating the classes, the optimal hyperplane is the one maximizing the distance, or margin, between the two classes. Such an hyperplane for linearly separable classes is called a hard margin hyperplane. Under the assumption of linearly separable classes, there exists an hyperplane separating the two classes perfectly. To find the optimal values of the parameters w and b, we can rescale them such that the points closest to the optimal hyperplane satisfy w T x i + b = ±1. The optimal hyperplane will be the one maximizing the distance between the hyperplane w T x + b = 1 and the hyperplane w T x + b = 1. The distance between these two hyperplanes is given by 2. Figure 1 illustrates these principles. w margin w T x + b = 1 ω 2 ω 1 w T x + b = 1 w T x + b = 0 Figure 1: Hard margin decision boundary. Therefore, to maximize the distance between the two classes, one needs to minimize w, orequivalently w 2. This can be written as a quadratic optimization problem: Minimize 1 2 w 2 (2) Subject to y i (w T x i + b) 1 i (3) The constraint can be written as: 1 y i (w T x i + b) 0 i (4) DRDC Atlantic TM

18 The solution vector w has an expansion w = s i=1 v ix i in terms of a subset of the training patterns falling on the hyperplanes w T x i + b = ±1 [11]. These training patterns carry all the information relevant to our classification problem and they are called the support vectors. This is one of the fundamental advantages of support vector machines: when faced with a large number of training points, the use of these support vectors can significantly reduce the complexity of the classification problem. Quadratic optimization problems such as the one defined by Equations 2 and 4 can easily be solved with commercially available solvers. Using w 2 = w T w,thelagrangian for this constrained optimization is: L (w,α,b)= 1 2 wt w + n ( α i 1 yi (w T x i + b) ) (5) i=1 Differentiating with respect to the primal variables w and b and setting to 0 we get: and w = n α i y i x i (6) i=1 n α i y i =0 (7) i=1 By substituting Equation 6 in the Lagrangian 5, we obtain after simplification: L = 1 2 n i=1 n α i α j y i y j x T i x j + j=1 n α i (8) Our new objective function is now in terms of α i only and it is this form of the optimization problem that we solve in S ection 3. This is known as the dual problem and it can be shown using Karush-Kuhn-Tucker theory [1] that instead of minimizing, we must instead maximize: i=1 Maximize W (α) = 1 2 n n α i α j y i y j x T i x j + i=1 j=1 n α i (9) i=1 4 DRDC Atlantic TM

19 Subject to n α i y i =0, α i 0 (10) i=1 The vector w can be recovered with: w = n α i y i x i (11) i=1 When testing a new data point z, the discriminant function f(z) isgivenby: Primal form: f(z) =w T z + b (12) Dual form: f(z) = x k S α k y k x T k z + b (13) Here S is the set of support vectors. The bias parameter b is obtained from: n b = y i y j α j x T j x i (14) j=1 Point z is classified as belonging to class one if f(z) 0andtoclasstwoiff(z) < Non-linearly separable problems In the previous section, we assumed our training points were linearly separable. For the majority of real life classification problems, such a perfect separation of the two classes is not possible. The optimal hyperplane separating the classes has to allow for misclassified points. To allow for the possibility of having training points violating constraint 3, we introduce slack variables ɛ i, i =1,..., n [10]. The hyperplane obtained with such an approach is called a soft margin hyperplane. Figure 2 illustrates these principles. The slack variables ɛ i can be computed by: DRDC Atlantic TM

20 x i ε i margin w T x + b = 1 ω 2 ω 1 w T x + b = 1 ε j x j w T x + b = 0 Figure 2: Soft margin decision boundary. y i ( w T x i + b ) 1 ɛ i, ɛ i 0 i (15) We also introduce a trade off parameter C>0 between the error and the margin. The optimization problem in this case becomes: Minimize 1 2 w 2 + C n ɛ i (16) i=1 Subject to y i ( w T x i + b ) 1 ɛ i, ɛ i 0 i (17) As in the previous section, the dual problem associated with this optimization is given by: Maximize W (α) = 1 2 n n α i α j y i y j x T i x j + i=1 j=1 n α i (18) i=1 6 DRDC Atlantic TM

21 The constraints are now: 0 α i C, n α i y i = 0 (19) i=1 As before, w can be recovered with Equation 11. The discriminant function and the bias terms are given by Equations 13 and 14. In Section 3, we use soft margins classifiers for our study of the classification of underwater targets. We also study the impact of the value of the trade off parameter C on the classification performance. 2.2 Kernels In the original feature space, separating the classes with an hyperplane might not yield the optimal classification performance as the true optimal decision boundaries might be non-linear. In Section 2.1, we presented the first fundamental principle of support vector machines: not all training points are used to define the discriminant function. In this section, we explore a second fundamental concept of support vector machines. To account for the possibility of non-linear decision boundaries in the original feature space, we map the training data into a higher-dimensional feature space using a non-linear transformation: Φ:R N F (20) x i Φ(x i ) We then find an optimal hyperplane in this higher-dimensional feature space [9] [12] [13]. In the optimization problem defined by Equations 18 and 19, the training points only appear as the inner products x T i x j with i, j =1...n. These inner products are now replaced by Φ(x i ) T Φ(x j ). Clearly, if the mapped feature space F is high-dimensional, the mapping and the computation of these inner products can be computationally expensive. To solve this problem, we introduce the notion of kernel function. A kernel is defined as: κ(x, y) =Φ(x) T Φ(y) (21) These kernel functions are quite useful as they allow us to compute the inner products of the mapped features in the original feature space. As long as we can do this, we do DRDC Atlantic TM

22 Table 1: Types of kernels Kernel type κ(x, y) Gaussian Homogeneous polynomial Non homogeneous polynomial Distance Homogeneous Cosine Minkowski All subsets exp( x y 2 ) 2σ 2 (x T y) ( d 1+x T y ) d x y d sign(x T y)(x T y) d 1 x T y x y ( x y d ) 1 d n i=1 (1 + x iy i ) not need the higher-dimensional mapping explicitly. Examples of kernels commonly used are given in table 1. Each of these kernel functions is associated with a specific mapping Φ. For example, with x =[x 1,x 2 ], the non homogeneous polynomial kernel of degree two corresponds to a mapping Φ(x) = ( 1, 2x 1, 2x 2, 2x 1 x 2,x 2 1,x2 2) : Φ(x) T Φ(y) = 1+2x 1 y 1 +2x 2 y 2 +2x 1 x 2 y 1 y 2 + x 2 1y1 2 + x 2 2y2 2 (22) = (1+x 1 y 1 + x 2 y 2 ) 2 = ( 1+x T y ) 2 We now need to update our optimal hyperplane problem to take into account the use of a kernel function. Replacing all inner products by κ (x i, x j ), the optimization problem 18, 19 becomes: Maximize W (α) = 1 2 n n α i α j y i y j κ (x i, x j )+ i=1 j=1 n α i (23) i=1 Subject to 0 α i C, n α i y i = 0 (24) i=1 8 DRDC Atlantic TM

23 The equation to obtain w becomes: w = n α i y i Φ(x i ) (25) i=1 Our new discriminant function f(z) is given by: f(z) =w T Φ(z)+b = x k S α k y k κ (x k, z)+b (26) Here S is the set of support vectors and the bias parameter b is obtained from: b = y i n y j α j κ(x i, x j ) (27) j=1 The choice of the best kernel to use for classification depends on the nature of the problem. In Section 3, we investigate different types of kernel for classification of underwater targets. 2.3 An example in one dimension To illustrate the utility of support vector machines for classification, we construct a simple one-dimensional example. Constructing an hyperplane using a kernel function leads to a non-linear decision boundary in the original feature space. The first two columns of table 2 contain the data points x i and their associated labels y i.weusea Gaussian kernel as defined in table 1 with σ = 3. Quadratic optimization is used to solve the system defined by Equations 23 and 24. We use C = 10. These parameter values for the optimization are determined by numerical experimentation. In general, one might wish to use a portion of the training set to determine the parameter values which minimize the classification error. The weights α i obtained are displayed in the third column of Table 2. Note that out of the six data points, only four are used as support vectors. These four points are used in Equation 26 to obtain a discriminant function for new test points z. The bias term is obtained with Equation 27. Values of this discriminant function with respect to z and the decision boundaries are shown in Figure 3. The decision boundaries lie where f(z) =0. Forf(z) 0, the data points are classified as having label y = 1 and for f(z) < 0, the associated label is y = 1. We see that all training points are correctly classified. DRDC Atlantic TM

24 Table 2: An example x i y i α i Discriminant function with Gaussian kernel ω 1 ω 2 ω 1 0 f(z) f(z) ω 1, y = 1 ω 2, y = z Figure 3: Discriminant function with Gaussian kernel. 2.4 Linear regression approximations When using large training sets with many features, the quadratic optimization of the support vector weights can take an excessive amount of time. In this section, we briefly describe three alternative techniques to approximate the support vector weights α i in the dual discriminant function: f(z) = n α i κ(x i, z)+b (28) i=1 For large problems, these regression techniques are faster than quadratic optimization. Their performance for the classification of underwater targets is discussed in Section DRDC Atlantic TM

25 2.4.1 Ridge regression A first technique to approximate the support vector weights is based on a least squares approach and is called Ridge regression [9]. Using κ as our kernel matrix and y as the column vector containing the training labels, the weights α i are obtained with [14]: α =(κ + λi n ) 1 y (29) Here λ is a regularization parameter used to ensure that the kernel matrix is not singular. Typically, a value of λ =0.01 is used Kernel partial least squares A second technique to estimate the weights α i in the discriminant function is the kernel partial least squares approach. A detailed description of partial least squares is outside the scope of this technical memorandum, but for more details the reader can refer to [9]. The fundamental principle of partial least square regression is to use the covariance of the training features with the associated labels to reduce, or deflate, the feature matrix and then perform least squares regression in this reduced feature space. This reduction of the feature space can be quite useful when we use a kernel function to map the features in a higher-dimensional space where the new coordinates can be correlated. Instead of using the original feature matrix, we use the kernel matrix and this matrix is deflated in the directions of maximum covariance. We first define κ 0 = κ and ŷ = y. Also, β j are the projection directions, c j are the output vectors and τ j are the scaled output vectors. For a fixed number of iterations j =1,..., n, the deflation of the kernel matrix is obtained with the following algorithm taken from [9]: β j = ŷ ŷ (30) τ j = κ j 1 β j (31) c j = ŷt τ j τ j 2 (32) ŷ =ŷ τ j c T j (33) DRDC Atlantic TM

26 κ j = ( I τ j τ T j / τ 2 j ) κj 1 ( I τj τ T j / τ 2 j ) (34) After completion of the n iterations, we can construct matrices B and Z: Finally, the weights α i are computed as: B =[β 1,..., β n ], Z =[τ 1,..., τ n ] (35) Fisher discriminant α = B(Z T κb) 1 Z T y (36) Our third technique to estimate the support vector weights is the Fisher discriminant approach. In the primal form of the discriminant function 12, this approach finds the direction w of the optimal hyperplane by maximizing the ratio [9]: J(w) = (μ+ w μ w )2 (σ + w) 2 +(σ w) 2 (37) Here μ + w is the mean of the projection of the training points with positive labels onto the direction w and μ w is the mean of the projection of the training points with negative labels. σ w + and σ w are the associated standard deviations. The use of this ratio ensures that the chosen direction w maximizes the separation of the means of the two classes scaled according to their variances in that direction. A more detailed description of the Fisher discriminant can be found in [1] and [9]. To find the weights α i for the dual form of the discriminant function 28, we first define n as the total number of training points, n + as the number of training points with associated label y i =1andn as the number of training points with associated label y i = 1. Of course, n = n + + n. To find α i, we define a diagonal matrix D, and two matrices C + and C : { 2n D ii = /n if y i =1 2n + /n if y i = 1 (38) 12 DRDC Atlantic TM

27 C + ij = { 2n /(nn + ) if y i = y j =1 0 otherwise (39) { 2n C ij = + /(nn ) if y i = y j = 1 0 otherwise (40) We then create a matrix B, givenbyb = D C + C. Finally, the weights α i can be obtained with : α =(Bκ + λi) 1 y (41) 3 Classification of underwater targets In this section, support vector machines are used for the classification of underwater targets in sidescan sonar imagery. Our database is made of bottom influence mine shapes and various harmless objects like rocks or seabed features (referred to as clutter) first described only by shadow-based features. We compare results obtained with quadratic optimization and linear regression and we investigate the classification performance of various kernels. Finally, using support vector machines, we quantify the impact of adding highlight-based features on the classification performance. 3.1 Sonar images and feature extraction Having introduced support vector machines in the previous section, we now explore their utility for the classification of underwater targets in sidescan sonar imagery. The data used in this study were gathered during the Citadel sea trial which took place during October 2005 at the NATO Undersea Research Center in La Spezia, Italy. Several targets designed to mimic various bottom influence mine types, as well as rocks, were deployed. Sonar images of these objects were obtained. The main asset used during this trial was the Dorado semi-submersible vehicle which towed a commercially available Klein 5500 sidescan sonar. This sonar operates at a centre frequency of 455 khz with a 20 khz bandwidth, giving an image with a nominal resolution of 0.10 m in the along-track by m in the across-track directions. Pictures of the deployed objects are shown in Figure 4. Figure 5 shows an example of a sonar image collected during the trial. The arrow in Figure 5 identifies a cylindrical mine shape. DRDC Atlantic TM

28 Cylinder shape Manta shape Rockan shape Rock Figure 4: Types of objects used for the Citadel trial. Figure 5: High frequency sidescan sonar image. The arrow identifies a cylindrical mine shape. 14 DRDC Atlantic TM

29 The objects are grouped into two classes. The first class is identified as clutter and includes 1080 images. This class includes unidentified natural clutter found while conducting the Citadel trial as well as 93 images of the rocks deployed during the trial. The second class is identified as mine shapes and includes 310 images. Among these mine images, we have 112 images of cylinder shapes, 99 images of Manta shapes and 99 images of Rockan shapes. Figure 6 shows examples of sonar images for the three mine shapes and also the rocks. Cylinder shape Manta shape Rockan shape Rock Figure 6: Sonar images of cylinder shapes. For our classification task, we are not using the sonar images directly. From these images, we extract multiple features. The first step of the feature extraction process is called the segmentation. Several methods exist for segmenting sonar images into highlight, shadow and background regions. Most of these segmentation algorithms employ some type of Markov Random Field, for instance those described by Reed [15] and Mignotte [16]. Our segmentation method is based mostly on work by Myers [17]. Myers algorithm employs a fuzzy decision function to determine whether a pixel belongs to the shadow or background region, where a pixel s class is flipped depending on its neighbours classes. This is sometimes called a relaxation method. The algorithm requires an initial image thresholded into shadow and background with which to perform the iterative algorithm. For our classification task, the shadow threshold was chosen at the 10th percentile of the cumulative image histogram while the highlight was thresholded at the 95th percentile. DRDC Atlantic TM

30 Once the shadow and highlight regions are obtained, we compute multiple features to represent each image in a multi-dimension feature space. The complete lists of shadow and highlight features used are given in tables 3 and 4. In Sections 3.2 and 3.3, we only use shadow based features. In Section 3.4, we include the highlight features to determine their impact on the classification performance. Table 3: Shadow Features Area Ratio area / best ellipse fit area Ratio perimeter of shadow / (perimeter of convex hull + 1) Sine of the orientation of the leading edge Sine of the orientation of the trailing edge Sine of the orientation of the best ellipse fit Estimated length of object from shadow (ellipse filtered) Estimated length of object from shadow Height profile Ratio standard deviation of height profile / mean height Maximum value of height profile Maximum value of height profile (ellipse filtered) Ratio major axis of ellipse / (minor axis of ellipse + 1) Speckle contrast (standard deviation of pixel amplitudes / mean of pixel amplitudes) Table 4: Highlight Features Area Ratio area / best ellipse fit area Estimated length of object from highlight Estimated length of object from highlight (ellipse filtered) Estimated across track extent of highlight (ellipse filtered) Sine of the orientation of the best ellipse fit Ratio major axis of ellipse / (minor axis of ellipse + 1) Speckle contrast (standard deviation of pixel amplitudes / mean of pixel amplitudes) 16 DRDC Atlantic TM

31 3.2 Quadratic optimization vs least square regression Now that we have defined our feature sets, we can proceed to the classification task. The clutter are identified with labels y i = 1 and the mine shapes are associated with labels y i = 1. The complete set of objects is randomly partitioned in a training set and a testing set. To ensure that the training will not be biased towards one of the classes, the same number of samples from each class is used in the training set. This number is chosen as 50% of the minimum between the number of samples of the first class and second class. All objects not selected in the training set are used in the testing set. The training set is used to obtain the support vector weights in the discriminant function. This discriminant function is then used to obtain predicted labels for the objects in the testing set. Relating these predicted labels to the true labels, we obtain two measures of performance: the hit rate and the false alarm rate. The hit rate is the fraction of mines correctly classified as mines and the false alarm rate is the fraction of clutter misclassified as mines. These classification performance measures are displayed as a receiver operating characteristic (ROC) curve where the hit rate is plotted as a function of the false alarm rate [1]. The standard threshold of the discriminant function is f(z) =T =0. Forf(z) T, the object is classified as belonging to the class 1 and for f(z) <T, the object is classified as belonging to class 1. This T = 0 threshold is but one point on our ROC curves. To construct a complete ROC curve, we increase the threshold value from negative values to positive values. Each point on the ROC curve is therefore associated with a given threshold value. To obtain smooth ROC curves, the random partitioning of the training/testing sets has to be repeated about 100 times and the hit and false alarm rates are obtained over all the testing sets realisations. We first use the soft margin approach as discussed in Section and we obtain the support vector weights with quadratic optimization. A Gaussian kernel with σ =10 is used. This choice of kernel is justified in Section 3.3. The value of the trade off parameter C in Equation 19 has an impact on the classification performance and Figure 7 shows the results obtained for various values of C. The best performance is achieved around C = 100. With this value of C, the hit rate reaches 70.0% for a false alarm rate of 29.5%. This indicates that our support vector classifier is producing satisfactory results when differentiating mines from clutter. One drawback of using quadratic optimization on a large dataset is the long time it takes to obtain the support vector weights. In addition, for our classification performance study, this optimization time is multiplied by the number of random training sets used. Section presented an alternative to approximate the support vector weights by using Ridge regression. Ridge regression is faster, but does it decrease the classification performance? Figure 8 compares the classification performance of the quadratic optimization approach with the performance of the Ridge regression approach. We see that the Ridge regression classifier yields a classification performance DRDC Atlantic TM

32 comparable to the quadratic optimization approach: a hit rate of 70.0% is obtained for a false alarm rate of 29.4%. We now examine the relative performance of the two linear regression techniques presented in Sections and Figure 9 compares the performance of the Ridge regression with the performances of the kernel partial least square and of the Fisher discriminant approaches. We see that the partial least square approach produces results comparable to the Ridge regression, and therefore to the quadratic optimization approach. However, the Fisher discriminant performance is lower reaching a hit rate of 70.0% at a false alarm rate of 37.0%. As the Ridge regression classifier is faster and produces satisfactory classification results, we will use this approach to obtain our ROC curves in Sections 3.3 and ROC Curve Quadratic Optimization Hit False Alarm C = 10 C = 50 C = 100 C = 200 Figure 7: Quadratic optimization results for various values of C, Gaussian kernel. 18 DRDC Atlantic TM

33 1 ROC Curve Quadratic optimization vs Ridge regression Hit False Alarm Quadratic optimization Ridge regression Figure 8: Quadratic optimization vs Ridge regression, Gaussian kernel. 1 ROC Curve SV regression Hit False Alarm PLS Fisher Ridge Figure 9: Results from various least square regression techniques, Gaussian kernel. DRDC Atlantic TM

34 3.3 Performance of various kernels Various types of kernel are defined in table 1, but which ones are the best for the classification of underwater targets? Using these kernels and various values of their parameters, we obtained a series of ROC curves showing the various classification performances for the case mine shapes versus clutter. Figure 10 shows the performance of a Gaussian kernel for various values of the parameter σ. The optimal classification performance is obtained around σ = 10. Performances of additional values of σ are shown in the appendix Figure A.1. Figure 11 and appendix Figures A.2 to A.5 show the results obtained for the other types of kernel. We see that for our mine classification problem, the best choices of kernels are the Gaussian kernel with σ =10andthe distance kernel with exponent 1. Some kernels like the non-homogeneous polynomial 2 kernel of degree 3 and the all subsets kernel yield very poor performances equivalent to random guessing. 1 ROC Curve Gaussian Kernel Hit False Alarm σ = 10 σ = 30 σ = 50 σ = 70 σ = 100 Figure 10: Gaussian kernel results for various values of σ. 20 DRDC Atlantic TM

35 1 ROC Curve Gaussian Kernel vs Distance Kernels Hit False Alarm Distance 1 Distance 2 Distance 0.5 Gaussian Figure 11: Gaussian kernel vs distance kernels. 3.4 Classification performance for various underwater targets Having identified the Gaussian kernel with σ = 10 as being one of the best kernels and knowing that the Ridge regression classifier produces satisfactory results, we now extend our analysis to include the highlight-based features. Figure 12 compares the ROC curve obtained with only the shadow based features, to the ROC curve obtained when adding the highlight based features to the shadow based ones. We see that adding the highlight features produces a significant increase in classification performance. With only the shadow features, a hit rate of 70.0% is reached at a false alarm rate of 29.4%, but with the highlight features, a hit rate of 70.0% is reached at a very small false alarm rate of 5.2%. Such a classification performance is excellent. In the previous analysis, the three mine shapes were grouped in one class. What kind of classification performance can we achieve for each type of mine shapes taken individually? Figures 13 to 15 summarize the results for the three classification problems: cylinders versus clutter, Mantas versus clutter and Rockans versus clutter. Once again, we see that using highlight based features in addition to the shadow based features increases the classification performance significantly. Using all features, we obtain outstanding classification performances with our support vectors approach. Looking this time at a 90.0% hit rate, we obtain false alarm rates of 10.0%, 12.5% and 19.0% respectively for the Mantas, cylinders and Rockans cases. DRDC Atlantic TM

36 1 ROC Curve Mine shapes vs Clutters Hit False Alarm Shadow only Shadow and highlight Figure 12: All mines vs clutter. 1 ROC Curve Cylinder shapes vs Clutters Hit False Alarm Shadow only Shadow and highlight Figure 13: Cylinder shapes vs clutter. 22 DRDC Atlantic TM

37 1 ROC Curve Manta shapes vs Clutters Hit False Alarm Shadow only Shadow and highlight Figure 14: Manta shapes vs clutter. 1 ROC Curve Rockan shapes vs Clutters Hit False Alarm Shadow only Shadow and highlight Figure 15: Rockan shapes vs clutter. DRDC Atlantic TM

38 4 Summary This technical memorandum investigated the utility of support vector machines for the classification of underwater targets. Support vectors are an efficient approach to large training sets as not all training points are used to define the discriminant function. Only the training points falling on the hyperplanes bounding the margin are used as they carry all relevant information for the classification problem. The use of kernel functions also allows for non-linear decision boundaries in the original feature space by mapping the training data into a higher-dimensional feature space where an hyperplane can be found. This mapping is not needed explicitly as kernel functions compute the inner products of the mapped features in the original feature space. Support vector machines were used to discriminate mine shapes from clutter in sidescan sonar imagery. This classification tool yielded excellent classification performances. It was found that the Ridge regression approximation is faster than quadratic optimization and yields a comparable performance. The partial least square approach also produces results comparable to the Ridge regression and quadratic optimization. The performance of the Fisher discriminant was however lower. For our mine classification problem, the best choices of kernels are the Gaussian kernel with σ =10and the distance kernel with exponent 1. With a Gaussian kernel, the Ridge regression 2 approximation and only the shadow based features, a hit rate of 70.0% is reached at a false alarm rate of 29.4%. Adding the highlight based features significantly increased the classification performance yielding a hit rate of 70.0% for a very small false alarm rate of 5.2%. With the mine shapes taken individually, the complete shadow and highlight feature sets produced outstanding classification results. At a 90.0% hit rate, we obtain false alarm rates of 10.0%, 12.5% and 19.0% respectively for the Manta, cylinder and Rockan shapes. Much work remains to be done in the field of computer aided-classification. Sidescan sonar images are currently widely used for mine countermeasures operations, but will certainly be replaced by the new synthetic aperture sonar images. These images have a much higher image resolution and should therefore yield a higher classification performance. If data from synthetic aperture sonars were available, it would be interesting to evaluate the level of classification performance achieved and contrast this performance to the one obtained in this memorandum. Furthermore, the usefulness of support vector machines is not limited to sidescan sonar classification problems, multiple other areas of interest could be investigated such as any other underwater acoustics detection/classification problem, studies of sphere properties and even tracking. 24 DRDC Atlantic TM

MULTI-VIEW TARGET CLASSIFICATION IN SYNTHETIC APERTURE SONAR IMAGERY

MULTI-VIEW TARGET CLASSIFICATION IN SYNTHETIC APERTURE SONAR IMAGERY David Williams a, Johannes Groen b ab NATO Undersea Research Centre, Viale San Bartolomeo 400, 19126 La Spezia, Italy Contact Author: