Viewpoints combined classification method in imagebased plant identification task

Vewponts combned classfcaton method n magebased plant dentfcaton task Gábor Szűcs, Dávd Papp 2, Dánel Lovas 2 Inter-Unversty Centre for Telecommuncatons and Informatcs, Kassa str. 26., H-4028, Debrecen, Hungary 2 Department of Telecommuncatons and Meda Informatcs, Budapest Unversty of Technology and Economcs, Magyar Tudósok krt. 2., H-7, Budapest, Hungary, szucs@tmt.bme.hu, pappdavd27@gmal.com, lovas.danel@smony.bme.hu Abstract. The mage-based plant dentfcaton challenge was focused on tree, herbs and ferns speces dentfcaton based on dfferent types of mages. The am of the task was to produce relevant speces for each observaton of a plant of the test dataset. We have elaborated a vewponts combned classfcaton method for ths challenge. We have appled dense SIFT for feature detecton and descrpton; and Gaussan Mxture Model based Fsher vector was calculated to represent an mage wth hgh-level descrptor. The chosen classfer was the C-support vector classfcaton algorthm wth RBF (Radal Bass Functon) kernel, and we have optmzed two hyperparameters (C from C-SVC and γ from RBF kernel) by a grd search wth two-dmensonal grd. We have constructed a combned classfer usng the weghted average of relablty values of classfer at each vewpont. The results show that our combned method exceeds our best classfer among the lst of classfers constructed for dfferent vewponts. Keywords: GMM based Fsher vector, C-support vector classfcaton, vewpont combnaton Introducton Accurate knowledge of the dentty, statstcs and uses of plants s essental n the agrcultural development. Identfyng plant speces s usually a very dffcult task, even for professonals (such as farmers or wood exploters) or for the botansts themselves. Usng mage retreval technologes s nowadays consdered by botansts as a promsng drecton n ths problem, and n order to solve t a challenge s announced n the LfeCLEF campagn [3]. The mage-based plant dentfcaton task [7] was focused on tree, herbs and ferns speces dentfcaton based on dfferent types of mages. There are 7 vewponts at the mages: branch, leaf, scan (scan or scan-lke pctures of leaf, brefly LeafScan ), 763

flower, frut, stem, and entre vews. The number of speces was about 500, whch s an mportant step towards coverng the entre flora of a gven regon. The am of the task was to produce a lst of relevant speces for each observaton of a plant of the test dataset,.e. one or a set of several pctures related to a same event: one same person photographng several detaled vews on varous organs the same day wth the same devce wth the same lghtenng condtons observng one same plant. So the task was observaton-centered (not mage-centered). The task was based on the Pl@ntVew dataset focusng plants on France (some plants observatons came from neghbourng countres). It contans more than 60000 pctures belongng each to one of the 7 types of vew reported nto the meta-data, n an xml fle (one per mage) wth explct tags, lke ObservatonId, speces names, date, etc. The task was evaluated as a plant speces retreval task based on mult-mage plant observatons queres. The goal was to retreve the correct plant speces among the top results of a ranked lst of speces returned by the evaluated system. An observaton may contan to 5 mages depctng the same ndvdual plant observed by the same person the same day. Each mage of a query observaton s assocated wth a sngle vew type (entre plant, branch, leaf, frut, flower, stem or leaf scan) and wth contextual metadata (data, locaton, and author). Each partcpatng group was allowed to submt up to 4 runs bult from dfferent methods. User ratng nformaton (pctures wth the average of the user ratngs on mage qualty) was also avalable, but we have not used ths addtonal nformaton. 2 Image-based plant classfcaton 2. Elaboraton of mage descrptors The frst part of the classfcaton s the accomplshment of representaton of each mage based on the vsual content. Ths conssts of three steps: () feature detecton, () feature descrpton, () mage descrpton as usual phases n computer vson. Feature detecton: Lots of dfferent feature types can be detected n an mage, e.g. corners, edges, rdges, as nterestng part of an mage. Furthermore many possble feature extracton methods are avalable for mages, but we have chosen SIFT (Scale-Invarant Feature Transform) algorthm [][2], because ths s a wdely used method n practce and n theoretcal works (as well) wth some possble further development of ths method. 764

Feature descrpton: In our soluton we have used dense samplng method wth SIFT (brefly dense SIFT). Ths samplng method can be consdered as a two-dmensonal grd upon the mage, where SIFT descrptors were calculated at each grd pont. After that we have used PCA (Prncpal Component Analyss) [][9] to reduce the dmensons of the descrptor vectors from 28 to 80. Ths descrptor vector belongs to only one nterestng pont of an mage, but an mage possesses many feature descrptor vectors, whch should be aggregated nto an mage descrptor. Image descrpton: The fnal step of the representaton creatng s the completon of hgh level representaton of each mage. We have appled BoW (bag-of-words) model [6][0] for ths purpose, where mages are treated as documents. Accordng to ths, vsual words (so called codewords ) n mages need to be defned from feature descrptors. The whole set of codewords gves the codebook (smlarly to dctonary n text tasks). To determne the codebook we used GMM (Gaussan Mxture Model) [5][7]. Ths s a parametrc probablty densty functon represented as a weghted sum of (n our case 256) Gaussan component denstes. GMM parameters were estmated based on the tranng set by usng the teratve EM (Expectaton Maxmzaton) algorthm [5], but an ntal model was needed for EM. In our tranng procedure the k- means clusterng [3] was performed over all the vectors wth 256 clusters, whch resulted the ntal model for EM. As a result of the algorthms descrbed above, a codebook wth 256 codewords was avalable for further calculatons, whch can be consdered as a concse representaton of the mage set. Accordng to the codebook the next step s to create a descrptor that specfes the dstrbuton of the vsual codewords n any mage, called hgh-level descrptor. To represent an mage wth hgh-level descrptor, the GMM based Fsher vector [4][5] was calculated. These vectors were the fnal representaton (mage descrptor) of the mages. The code used to tran GMM vocabulares and compute the Fsher vectors s a standalone C++ lbrary, developed by Jorge Sánchez, to support the research of Vsual Geometry Group of Oxford Unversty [8]. 2.2 Tranng the classfer For the classfcaton task we have dvded the labelled mage set nto three subsets: tranng, valdaton and test set (the last one s used for prelmnary testng). The valdaton mage set was used for calbraton of the traned model durng the valdaton phase of the tranng procedure. To tran the classfer (classfcaton model) based on tranng mage set, a varaton of SVM (Support Vector Machne) was used, the C-SVC (C-support vector classfcaton) [2][4] wth RBF (Radal Bass Functon) kernel. The SVM s bascally a bnary lnear classfer, thus n order to extend t to a number of classfed categores, the one-aganst-all technque was used. Durng ths method a bnary classfer was created for each category n the tranng set. 765

The two hyperparameters (C from C-SVC and γ from RBF kernel) were optmzed by a grd search wth two-dmensonal grd. The algorthm was traned wth the tranng mage set, and then valdated on the valdaton set, whle the hyperparameters were dfferent n each teraton. The parameter par that gave the best result s selected to tran the fnal classfcaton model (for each category) based on the whole mage set. 2.3 Prelmnary testng After the tranng, the codebook was already avalable and only Fsher vector of each mage should be computed. At the prelmnary testng we have selected only 50 speces (classes) for tranng and testng as well. RBF based kernel matrx was bult from the Fsher vectors of the test and tranng mages. Each C-SVC classfer was parametered wth ths matrx and the hyperparameters were the same as n the fnal classfcaton models. Snce the classfers are assgned to speces, the generated model for a classfer s responsble to separate the desgnated class from the other ones. Thus a classfer s able to provde a confdence value showng a certanty of the class n a gven mage. We have traned 7 classfers for each vewpont and we have evaluated as prelmnary testng based on precson and computer run tme. The results of the prelmnary testng can be seen n Table. Table. Results of the prelmnary testng vewpont precson testng tme (per mage) [sec] Branch 0.34.82 Leaf 0.583.59 LeafScan 0.965 0.95 Stem 0.492.39 Flower 0.52.6 Entre 0.34.44 Frut 0.482.56 2.4 Vewponts combnaton for observaton classfcaton The decson about the observaton could be based on majorty votng of mage decsons, but we have used contnuous nformaton nstead of dscrete one. C-SVC classfer calculates contnuous relablty value for each class at each mage, and we have constructed a combned classfer usng the weghted average of relablty values. Our 766

combned classfer has appled a formula (as can be seen n Equaton.) for the aggregated relablty value that an mage belongs to class c (speces c). R( c) NVP w NVP w R ( c) 7 w 7 w N, p N, p n r ( c) n () NVP s the number of vewponts, whch equals to seven n ths challenge w s the weght parameter of vewpont r n(c) s relablty value for class c comng from C-SVC classfer N,p s the number of mages n vewpont taken from the p-th plant observed Based on R(c) values the fnal decson s always the speces that possesses the largest R(c) value. In the challenge the order of predcted speces should have been submtted, and we have constructed the order based on R(c) values as well. At the estmaton of weght parameters we have taken the goodness of dfferent vewpont classfers nto the consderaton. As can be seen n the results of the prelmnary testng (at Table ), the LeafScan has the best precson. So the LeafScan has got the largest weght parameter, and on an emprcal way we have chosen the followng weght parameters: LeafScan: 7.5, Leaf: 2.5, Flower:.5, Frut:.5, Stem:.5, Branch:.5, Entre:.5. 3 Evaluaton 3. Evaluaton metrcs In the offcal evaluaton nstead of precson (as used n our prelmnary testng) a new evaluaton metrc was defned for measurement of goodness of the observaton classfcaton. Ths metrc (S score) s defned as follows. S U U u P P u u p S u, p (2) U : number of users (who have at least one mage n the test data) Pu : number of ndvdual plants observed by the u-th user Nu,p : number of pctures taken from the p-th plant observed by the u-th user Su,p : score between and 0 equals to the nverse of the rank of the correct speces (for the p-th plant observed by the u-th user) 767

Although the goal was to classfy the observatons contanng more mages, an addtonal metrc was defned for the mage classfcaton as can be seen n Equaton 3. S mage U U u P P u N N u, p u p u, p n S u, p, n (3) U : number of users (who have at least one mage n the test data) Pu : number of ndvdual plants observed by the u-th user Nu,p : number of pctures taken from the p-th plant observed by the u-th user Su,p,n : score between and 0 equals to the nverse of the rank of the correct speces (for the n-th pcture taken from the p-th plant observed by the u-th user) 3.2 Fnal offcal results S mage score can be calculated for each vewpont, and these scores can be compared. Our fnal offcal results for each vewpont and the observaton can be seen n Table 2., and t can be shown that S score of observaton exceeds the best S score of all vewponts. Table 2. Our fnal offcal results vewponts and observaton S score Branch 0.052 Leaf 0.09 LeafScan 0.9 Stem 0.072 Flower 0.5 Entre 0.06 Frut 0.07 Observaton 0.255 Our fnal offcal observaton results (BME TMIT) compared wth other partcpants can be seen n Fg.. 768

Fg.. Fnal offcal observaton results of partcpants 4 Concluson We have elaborated a vewponts combned classfcaton method for mage-based plant dentfcaton task. We have appled dense SIFT for feature detecton and descrpton; and Gaussan Mxture Model based Fsher vector was calculated to represent an mage wth hgh-level descrptor. The chosen classfer was the C-support vector classfcaton algorthm wth RBF (Radal Bass Functon) kernel, and we have optmzed two hyperparameters (C from C-SVC and γ from RBF kernel) by a grd search wth two-dmensonal grd. We have constructed a combned classfer usng the weghted average of relablty values of classfer at each vewpont. The weght parameters of the combned classfer were based on our prelmnary testng results. Our observaton result of the combned method exceeds our best score of all vewponts. At the offcal evaluaton our soluton has reached 0.255 score value. Acknowledgement The publcaton was supported by the TÁMOP-4.2.2.C-//KONV-202-000 project. The project has been supported by the European Unon, co-fnanced by the European Socal Fund. 769

References. Abd H., Wllams L. J.: Prncpal Component Analyss, Wley Interdscplnary Revews: Computatonal Statstcs, Vol 2. No. 4, pp. 433-459 (200) 2. Boser, B., Guyon, I., Vapnk, V.: A Tranng Algorthm for Optmal Margn Classfer, Proc. of the 5th Annual ACM Workshop on Computatonal Learnng Theory, pp. 44-52 (992) 3. Joly, A., Müller, H., Goëau, H., Glotn, H., Spampnato, C., Rauber, A., Bonnet, P., Vellnga, W.P., Fsher, B.: Lfeclef 204: multmeda lfe speces dentfcaton challenges. In: Proceedngs of CLEF 204 (204) 4. Cortes, C., Vapnk, V.: Support-vector networks, Machne Learnng, Vol. 20, No. 3, pp. 273-297 (995) 5. Dempster A., Lard N., Rubn D.: Maxmum lkelhood from Incomplete Data va the EM Algorthm, Journal of the Royal Statstcal Socety, Vol. 39, No., pp. -38 (977) 6. Fe-Fe, L., Fergus, R., & A. Torralba, A.: Recognzng and Learnng Object Categores, IEEE Computer Socety Conference on Computer Vson and Pattern Recognton (CVPR), (2007) 7. Goeau, H., Joly, A., Bonnet, P., Selm, S., Molno, J.F., Barthélémy, D., Boujemaa, N.: Lfeclef plant dentfcaton task 204. In: CLEF workng notes 204 (204) 8. K. Chatfeld, V. Lemptsky, A. Vedald and A. Zsserman.: The devl s n the detals: an evaluaton of recent feature encodng methods, Brtsh Machne Vson Conference, pp. 76.-76.2, (20) 9. Ke, Y., & Sukthankar, R.: PCA-SIFT: A more dstnctve representaton for local mage descrptors, In Computer Vson and Pattern Recognton, CVPR 2004. Proceedngs of the 2004 IEEE Computer Socety Conference on, Vol. 2, pp. II-506. (2004) 0. Lazebnk, S., Schmd, C. and Ponce, J.: Beyond Bags of Features: Spatal Pyramd Matchng for Recognzng Natural Scene Categores, Proceedngs of the IEEE Conference on Computer Vson and Pattern Recognton, New York, Vol. 2, pp. 269-278 (2006). Lowe, D. G.: Dstnctve Image Features from Scale-Invarant Keyponts, Internatonal Journal of Computer Vson, Vol. 60, No 2., pp. 9-0 (2004) 2. Lowe, D. G.: Object Recognton from local scale-nvarant features, In Internatonal Conference on Computer Vson, Corfu, Greece, pp. 50-57 (999) 3. MacQueen, J.: Some methods for classfcaton and analyss of multvarate observatons, Proceedngs of the Ffth Berkeley Symposum on Mathematcal Statstcs and Probablty, Vol., pp. 28-297 (967) 4. Perronnn, F., Dance, C.: Fsher kernel on vsual vocabulares for mage categorzaton, IEEE Computer Socety Conference on Computer Vson and Pattern Recognton (CVPR), (2007) 5. Reynolds D. A.: Gaussan Mxture Models, Encyclopeda of Bometrc Recognton, Sprnger, February, pp. 659-663 (2009) 6. Sánchez, J. Perronnn, F., Mensnk, T.: Improved Fsher Vector for Large Scale Image Classfcaton, In Proc. of the th European Conference on Computer Vson (ECCV): Part IV, September 05-, pp. 43-56 (200) 7. Tomas C.: Estmatng gaussan mxture denstes wth EM: A tutoral, (Tech. rep., Duke Unversty); Chnese Journal of Electron Devces, pp, 5-8 (2004) 770