(a) Input data X n. (b) VersNet. (c) Output data Y n. (d) Supervsed data D n. Fg. 2 Illustraton of tranng for proposed CNN. 2. Related Work In segment

Size: px

Start display at page:

Download "(a) Input data X n. (b) VersNet. (c) Output data Y n. (d) Supervsed data D n. Fg. 2 Illustraton of tranng for proposed CNN. 2. Related Work In segment"

Magdalene Jacobs
6 years ago
Views:

一般社団法人電子情報通信学会 THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS 信学技報 IEICE Techncal Report SANE2017-92 (2018-01) Deep Learnng for End-to-End Automatc Target Recognton from

1 一般社団法人電子情報通信学会 THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS 信学技報 IEICE Techncal Report SANE ( ) Deep Learnng for End-to-End Automatc Target Recognton from Synthetc Aperture Radar Imagery Hdetosh FURUKAWA Toshba Infrastructure Systems & Solutons Corporaton 1 Komukatoshba-cho, Sawa-ku, Kawasak-sh, Kanagawa, Japan E-mal: hdetosh.furukawa@toshba.co.jp arxv: v1 [cs.cv] 25 Jan 2018 Abstract The standard archtecture of synthetc aperture radar (SAR) automatc target recognton (ATR) conssts of three stages: detecton, dscrmnaton, and classfcaton. In recent years, convolutonal neural networks (CNNs) for SAR ATR have been proposed, but most of them classfy target classes from a target chp extracted from SAR magery, as a classfcaton for the thrd stage of SAR ATR. In ths report, we propose a novel CNN for end-to-end ATR from SAR magery. The CNN named verfcaton support network (VersNet) performs all three stages of SAR ATR end-to-end. VersNet nputs a SAR mage of arbtrary szes wth multple classes and multple targets, and outputs a SAR ATR mage representng the poston, class, and pose of each detected target. Ths report descrbes the evaluaton results of VersNet whch traned to output scores of all 12 classes: 10 target classes, a target front class, and a background class, for each pxel usng the movng and statonary target acquston and recognton (MSTAR) publc dataset. Key words Automatc target recognton (ATR), Mult-target detecton, Mult-target classfcaton, Pose estmaton, Convolutonal neural network (CNN), Synthetc aperture radar (SAR) 1. Introducton Synthetc aperture radar (SAR) transmts mcrowaves and generates magery usng mcrowaves reflected from objects, under all weather, day and nght condtons. However, t s dffcult for a human to recognze a target from SAR magery snce there s no color nformaton and the shape reflected (a) Input. (b) VersNet. (c) Output. from a target changes. Therefore, automatc target recognton (ATR) from SAR magery (or mage) has been studed for many years. The standard archtecture of SAR ATR conssts of three Fg. 1 Illustraton of nput and output of proposed CNN. The CNN named VersNet performs automatc target recognton of mult-class / mult-target n varable sze SAR mage. In ths case, the nput s a sngle mage wth three classes and four targets (upper left and lower rght targets stages: detecton, dscrmnaton, and classfcaton. Detecton: the frst stage of SAR ATR detects a regon of nterest are the same class). VersNet outputs the poston, class, and pose (front sde) of each detected target. (ROI) from a SAR mage. Dscrmnaton: the second stage of SAR ATR dscrmnates whether an ROI s a target or non-target regon, and outputs the dscrmnated ROI as a target chp. Classfcaton: the thrd stage of SAR ATR classfes target classes from a target chp. In recent years, methods usng convoluton neural network (CNN) [1] [4] have been successful n the classfcaton of mage recognton. Smlarly, CNNs for SAR ATR have been proposed. On the movng and statonary target acquston and recognton (MSTAR) publc dataset [5], the target classfcaton accuracy of the CNNs [6] [9] exceeds conventonal methods (support vector machne, etc.). However, most of CNNs for SAR ATR classfy target classes from a target chp extracted from SAR mage but do not classfy multple targets or a target chp (or SAR mage) of an arbtrary sze. In addton, a CNN for target classfcaton can output score or probablty of each class as classfcaton result, but t s dffcult for a human to verfy the classfcaton result. We propose a new CNN whch nputs a SAR mage of varable szes wth mult-target and outputs a SAR ATR mage. 35 Ths artcle s a techncal report wthout peer revew, and ts polshed and/or extended verson may be publshed elsewhere. Copyrght 2018 by IEICE

(a) Input data X n. (b) VersNet. (c) Output data Y n. (d) Supervsed data D n. Fg. 2

Related Work In segmentaton of mage recognton gvng classfcaton label for each pxel of an mage, methods usng CNN [10] [12] show a good performance n recent years.

The CNN [13], WD-CFAR [14], and other methods [15] [17] have been proposed for segmentaton of a SAR mage.

2 (a) Input data X n. (b) VersNet. (c) Output data Y n. (d) Supervsed data D n. Fg. 2 Illustraton of tranng for proposed CNN. 2. Related Work In segmentaton of mage recognton gvng classfcaton label for each pxel of an mage, methods usng CNN [10] [12] show a good performance n recent years. For SAR mage, segmentaton of a target regon whch reflected from a target, and a shadow regon whch not reflected from a target by radar shadow s performed. The CNN [13], WD-CFAR [14], and other methods [15] [17] have been proposed for segmentaton of a SAR mage. The reference [18] descrbes manually generatng the segmentaton of target and shadow regons as ground truth. Generally, for segmentaton usng CNN, supervsed learnng s performed usng label mages correspondng to nput mages, but the dffculty of the generaton of label mages for SAR ATR s a problem for applyng ths method. In response to ths problem, the reference [13] descrbes the CNN whch traned to output a contour usng the contour data of target and shadow regons generated by computer graphcs as ground truth. In contrast, our proposed CNN performs target detecton, target classfcaton, and pose estmaton by segmentaton. 3. Proposed Method A proposed CNN named verfcaton support network (VersNet) nputs an arbtrary sze SAR mage wth multple classes and multple targets, and outputs the poston, class, and pose of each detected target as a SAR ATR mage. Fgure 1 shows the outlne of VersNet for end-to-end SAR ATR. VersNet s a CNN composed of an encoder and a decoder. The encoder of VersNet extracts features from an nput SAR mage. The decoder converts the features based on the converson rule n the tranng data and outputs t as a SAR ATR mage. Here, we defne the end-to-end SAR ATR as a task of supervsed learnng. Let {(X n,d n),n=1,..., N} be the tranng dataset, where X n = {x (n),=1,..., X n } s SAR mage as nput data, D n = {d (n), =1,..., D n,d (n) {1,..., N c}} Table 1 Dataset. The tranng and testng data contan respectvely 2747 and 2420 target chps from the MSTAR. Class Tranng data Testng data 2S BMP2 (9563) BRDM BTR BTR D T T72 (132) ZIL ZSU Total s label mage for X n, whch s the supervsed data of VersNet output data Y n = f(x n; θ). The values of X n and D n represent the number of pxels (vertcal horzontal) of SAR and label mage, respectvely. When d (n) s 1, t represents a background class, and when d (n) s 2 or more, t ndcates a correspondng target class. Let L(θ) be a loss functon, the network parameters θ are adjusted usng tranng data so that the output of loss functon becomes small. 4. Experments 4. 1 Dataset For tranng and testng of VersNet, we used the ten classes data shown n Table 1 from the MSTAR [5]. The dataset contans 2747 target chps wth a depresson angle of 17 for the tranng and 2420 target chps wth a depresson angle of 15 for the testng. As descrbed later n Appendx 1, fve target chps of target class BTR60 for testng data were excluded. Of course, label mages for segmentaton do not exst n the MSTAR dataset. Therefore, we create label mages for VersNet. Fgure 2(d) shows samples of label mages. The label mages have all 12 classes: 10 target classes, a target front class, and a background class. 1 Fve target chps were excluded. Appendx 1 shows the detals. 36

Table 2 Classfcaton accuracy of testng data. The overall accuracy s 99.55%. Class Accuracy (%) 2S1 100.00 BMP2 100.00 BRDM2 98.54 BTR60 98.42 BTR70 98.98 Average Overall D7 100.00 99.52 99.55 T62 99.

3 Table 2 Classfcaton accuracy of testng data. The overall accuracy s 99.55%. Class Accuracy (%) 2S BMP BRDM BTR BTR Average Overall D T T ZIL ZSU Table 3 Defntons of TP, FP, FN, and TN. True Condton postve Condton negatve Predcted Condton postve True postve (TP) False postve (FP) Condton negatve False negatve (FN) True negatve (TN) Fg. 3 Detal archtecture of VersNet for experments. The VersNet refers to the fully convolutonal network called FCN- 32s VersNet: Proposed CNN Fgure 3 shows a detaled archtecture of VersNet for experments. The encoder of the VersNet conssts of four convoluton blocks and two convoluton layers. The convoluton block contans two convoluton layers of kernel sze 3 3 and a poolng layer smlarly to VGG [19]. The actvaton functon of all convolutons except the fnal convoluton uses rectfed lnear unt (ReLU) [20]. Dropout [21] s appled after a convoluton of kernel sze 6 6. Batch normalzaton [22] s not appled. The decoder of the VersNet conssts of a transposed convoluton [23] that performs 16 tmes upsamplng. As the loss functon, we use cross entropy expressed by L(θ) = p(x) log q(x). (1) x For the optmzaton of the loss functon, we use stochastc gradent descent (SGD) wth momentum. Snce the VersNet s a CNN wthout fully connected layers called fully convolutonal network (FCN) [10], even f tranng s done wth small sze mages, the VersNet can process SAR mages of arbtrary sze Classfcaton Accuracy Frst, we show results of classfcaton accuracy. Table 2 shows classfcaton accuracy for the target chps of testng f we smply select the majorty class of maxmum probablty for each pxel from ten target classes as the predcted class. An average accuracy of ten target classes s 99.52%, and an overall accuracy s 99.55% (2409/2420), whch s almost the same as a state-of-the-art accuracy. Also, Table A 2 of Appendx shows a confuson matrx for the target chps of testng Segmentaton Performance Next, we show results of segmentaton performance. We use precson, recall, F 1, and ntersecton over unon (IoU) as metrcs of segmentaton performance. Each metrcs s gven by Precson = TP TP + FP, (2) TP Recall = TP + FN, (3) precson recall F 1 =2 precson + recall, (4) TP IoU = TP + FP + FN, (5) where the defntons of TP, FP, FN, and TN are shown n Table 3. Table 4 shows precson, recall, F 1, and IoU for all the pxels of testng. The average IoU of all 12 classes and 10 target classes are and 0.923, respectvely. Also, Table A 3 of Appendx shows a confuson matrx for all the pxels of testng. Fgure 4 shows a hstogram of IoU for each mage wth ten target classes. A mean and a standard devaton of the IoU are and 0.082, respectvely. 37

4 Table 4 Segmentaton performance for all pxels of testng. The average IoU of ten target classes s Class Precson Recall F 1 IoU Background S BMP BRDM BTR BTR D T T ZIL ZSU Front Average of Average of Fg. 4 Hstogram of IoU. Fg. 5 Cumulatve dstrbuton of IoU. Fgure 5 shows a cumulatve dstrbuton of IoU for each mage wth ten target classes. The emprcal cumulatve dstrbuton functon P(IoU 0.5) and P(IoU 0.9) are about 0.01 and 0.1, respectvely Mult-Class and Mult-Target Fnally, we show the VersNet output for mult-class and mult-target nput. Fgure 6 shows nput (SAR mage of 10 target classes and 25 targets), output (SAR ATR mage), and ground truth. 5. Concluson By applyng CNN to the thrd stage classfcaton n the standard archtecture of SAR ATR, the performance has been mproved. In order to mprove the overall performance of SAR ATR, t s mportant not only to mprove the performance of the thrd stage classfcaton but also to mprove the performance of the frst stage detecton and the second stage dscrmnaton. In ths report, we proposed a CNN based on a new archtecture of SAR ATR that conssts of a sngle stage,.e. endto-end, not the standard archtecture of SAR ATR. Unlke conventonal CNNs for target classfcaton, the CNN named VersNet nputs a SAR mage of arbtrary szes wth multple classes and multple targets, and outputs a SAR ATR mage representng the poston, class, and pose of each detected target. We traned the VersNet to output scores nclude ten target classes on MSTAR dataset and evaluated ts performance. The average IoU for all the pxels of testng (2420 target chps) s over 0.9. Also, the classfcaton accuracy s about 99.5%, f we select the majorty class of maxmum probablty for each pxel as the predcted class. References [1] A. Krzhevsky, I. Sutskever, and G.E. Hnton, Imagenet classfcaton wth deep convolutonal neural networks, Advances n neural nformaton processng systems, pp , [2] M.D. Zeler and R. Fergus, Vsualzng and understandng convolutonal networks, European conference on computer vson, pp , [3] C. Szegedy, W. Lu, Y. Ja, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabnovch, Gong deeper wth convolutons, Proceedngs of the IEEE conference on computer vson and pattern recognton, pp.1 9, [4] K. He, X. Zhang, S. Ren, and J. Sun, Deep resdual learnng for mage recognton, Proceedngs of the IEEE conference on computer vson and pattern recognton, pp , [5] T. Ross, S. Worrell, V. Velten, J. Mossng, and M. Bryant, Standard sar atr evaluaton experments usng the mstar publc release data set, Proc. SPIE, vol.3370, pp , [6] S. Chen, H. Wang, F. Xu, and Y.Q. Jn, Target classfcaton usng the deep convolutonal networks for sar mages, IEEE Transactons on Geoscence and Remote Sensng, vol.54, no.8, pp , [7] S. Wagner, Sar atr by a combnaton of convolutonal neu- 2 The average of all 12 classes. 3 The average of ten target classes. 38

52, no.6, pp.2861 2872, 2016. [8] Y. Zhong and G. Ettnger, Enlghtenng deep neural networks wth knowledge of confoundng factors, arxv preprnt arxv:1607.02397, 2016. [9] H.

5 (a) Input (SAR mage of 10 target classes and 25 targets). (b) Output (SAR ATR mage). (c) Ground truth. Fg. 6 Input (SAR mage of multple classes and multple targets), output (SAR ATR mage), and ground truth. ral network and support vector machnes, IEEE Transactons on Aerospace and Electronc Systems, vol.52, no.6, pp , [8] Y. Zhong and G. Ettnger, Enlghtenng deep neural networks wth knowledge of confoundng factors, arxv preprnt arxv: , [9] H. Furukawa, Deep learnng for target classfcaton from sar magery: Data augmentaton and translaton nvarance, IEICE Tech. Rep., vol.117, no.182, SANE , pp.13 17, [10] J. Long, E. Shelhamer, and T. Darrell, Fully convolutonal networks for semantc segmentaton, Proceedngs of the IEEE Conference on Computer Vson and Pattern Recognton, pp , [11] O. Ronneberger, P. Fscher, and T. Brox, U-net: Convolutonal networks for bomedcal mage segmentaton, Internatonal Conference on Medcal Image Computng and Computer-Asssted Interventon, pp , [12] V. Badrnarayanan, A. Kendall, and R. Cpolla, Segnet: A deep convolutonal encoder-decoder archtecture for mage segmentaton, arxv preprnt arxv: , [13] D. Malmgren-Hansen and M. Nobel-J, Convolutonal neural networks for sar mage segmentaton, 2015 IEEE Internatonal Symposum on Sgnal Processng and Informaton Technology (ISSPIT), pp , [14] S. Huang and T.Z. Wenzhun Huang, A new sar mage segmentaton algorthm for the detecton of target and shadow regons, Scentfc reports 6, Artcle number: 38596, [15] Y. Han, Y. L, and W. Yu, Sar target segmentaton based on shape pror, 2014 IEEE Internatonal Geoscence and Remote Sensng Symposum (IGARSS), pp , [16] E. Atnour, S. Wang, and D. Zou, Segmentaton of small vehcle targets n sar mages, Proc. SPIE, vol.4726, no.1, pp.35 45, [17] R.A. Wesenseel, W.C. Karl, D.A. Castanon, G.J. Power, and P. Douvlle, Markov random feld segmentaton methods for sar target chps, Proc. SPIE, vol.3721, pp , [18] G.J. Power and R.A. Wesenseel, Atr subsystem performance measures usng manual segmentaton of sar target chps, Algorthms for Synthetc Aperture Radar Imagery VI, vol.3721, pp , [19] K. Smonyan and A. Zsserman, Very deep convolutonal networks for large-scale mage recognton, arxv preprnt Table A 1 Lst of target chps excluded from testng data. Class Flename Aspect angle ( ) BTR60 HB BTR60 HB BTR60 HB BTR60 HB BTR60 HB arxv: , [20] V. Nar and G.E. Hnton, Rectfed lnear unts mprove restrcted boltzmann machnes, Proceedngs of the 27th nternatonal conference on machne learnng (ICML-10), pp , [21] N. Srvastava, G.E. Hnton, A. Krzhevsky, I. Sutskever, and R. Salakhutdnov, Dropout: a smple way to prevent neural networks from overfttng., Journal of Machne Learnng Research, vol.15, no.1, pp , [22] S. Ioffe and C. Szegedy, Batch normalzaton: Acceleratng deep network tranng by reducng nternal covarate shft, Internatonal Conference on Machne Learnng, pp , [23] V. Dumouln and F. Vsn, A gude to convoluton arthmetc for deep learnng, arxv preprnt arxv: , Appendx 1. Excluded Target Chps from Testng Data Table A 1 shows a lst of target chps excluded from testng data. Fgure A 1 shows nputs (target chps) and outputs (SAR ATR mages) of the VersNet. 2. Confuson Matrx of Testng Data Table A 2 shows a confuson matrx for the mages (target chps) of testng, and Table A 3 shows a confuson matrx for all the pxels of testng. Each column n the confuson matrxes represents the actual target class, and each row represents the target class predcted by the VersNet. 39

Table A 2 Confuson matrx for mages of testng.

6 (a) Inputs (target chps of BTR60 wth target aspect angle from (b) Outputs (SAR ATR mages). 292 to 313 ). Fg. A 1 Excluded target chps from testng data. As for the 2nd-row fve target chps of nputs (a), the nfluence of the radar shadow of a certan other object appears strongly. The nfluence appears also n the outputs (b) of the VersNet. Table A 2 Confuson matrx for mages of testng. True 2S1 BMP2 BRDM2 BTR60 BTR70 D7 T62 T72 ZIL131 ZSU234 Predcted 2S BMP BRDM BTR BTR D T T ZIL ZSU Accuracy (%) Average Overall Table A 3 Confuson matrx for all pxels of testng. True Background 2S1 BMP2 BRDM2 BTR60 BTR70 D7 T62 T72 ZIL131 ZSU234 Front Precson Predcted Background S BMP BRDM BTR BTR D T T ZIL ZSU Front Recall F IoU

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth