Evaluation of Second-order Visual Features for Land-Use Classification

Similar documents
Web scale image retrieval using compact tensor aggregation of visual descriptors

Detection and Recognition of Alert Traffic Signs

Journal of World s Electrical Engineering and Technology J. World. Elect. Eng. Tech. 1(1): 12-16, 2012

Compact Vectors of Locally Aggregated Tensors for 3D shape retrieval

Controlled Information Maximization for SOM Knowledge Induced Learning

An Unsupervised Segmentation Framework For Texture Image Queries

IP Network Design by Modified Branch Exchange Method

AUTOMATED LOCATION OF ICE REGIONS IN RADARSAT SAR IMAGERY

HISTOGRAMS are an important statistic reflecting the

Spiral Recognition Methodology and Its Application for Recognition of Chinese Bank Checks

An Extension to the Local Binary Patterns for Image Retrieval

Segmentation of Casting Defects in X-Ray Images Based on Fractal Dimension

Illumination methods for optical wear detection

A Two-stage and Parameter-free Binarization Method for Degraded Document Images

Optical Flow for Large Motion Using Gradient Technique

Free Viewpoint Action Recognition using Motion History Volumes

Image Enhancement in the Spatial Domain. Spatial Domain

Color Correction Using 3D Multiview Geometry

Clustering Interval-valued Data Using an Overlapped Interval Divergence

MULTI-TEMPORAL AND MULTI-SENSOR IMAGE MATCHING BASED ON LOCAL FREQUENCY INFORMATION

Image Registration among UAV Image Sequence and Google Satellite Image Under Quality Mismatch

Module 6 STILL IMAGE COMPRESSION STANDARDS

Computational and Theoretical Analysis of Null Space and Orthogonal Linear Discriminant Analysis

Lecture # 04. Image Enhancement in Spatial Domain

Cardiac C-Arm CT. SNR Enhancement by Combining Multiple Retrospectively Motion Corrected FDK-Like Reconstructions

Topic -3 Image Enhancement

= dv 3V (r + a 1) 3 r 3 f(r) = 1. = ( (r + r 2

A Shape-preserving Affine Takagi-Sugeno Model Based on a Piecewise Constant Nonuniform Fuzzification Transform

AXON 2 A visual object recognition system for non-rigid objects

RANDOM IRREGULAR BLOCK-HIERARCHICAL NETWORKS: ALGORITHMS FOR COMPUTATION OF MAIN PROPERTIES

Point-Biserial Correlation Analysis of Fuzzy Attributes

SYSTEM LEVEL REUSE METRICS FOR OBJECT ORIENTED SOFTWARE : AN ALTERNATIVE APPROACH

A Minutiae-based Fingerprint Matching Algorithm Using Phase Correlation

Effective Data Co-Reduction for Multimedia Similarity Search

Linear Ensembles of Word Embedding Models

A Recommender System for Online Personalization in the WUM Applications

A Novel Automatic White Balance Method For Digital Still Cameras

3D Hand Trajectory Segmentation by Curvatures and Hand Orientation for Classification through a Probabilistic Approach

A modal estimation based multitype sensor placement method

Communication vs Distributed Computation: an alternative trade-off curve

Full-Polarimetric Analysis of MERIC Air Targets Data

Ego-Motion Estimation on Range Images using High-Order Polynomial Expansion

Color Interpolation for Single CCD Color Camera

DAG-BASED VISUAL INTERFACES FOR NAVIGATION IN INDEXED VIDEO CONTENT 1

Optimal Adaptive Learning for Image Retrieval

Improvement of First-order Takagi-Sugeno Models Using Local Uniform B-splines 1

Hierarchical Region Mean-Based Image Segmentation

Detection and tracking of ships using a stereo vision system

Machine Learning for Automatic Classification of Web Service Interface Descriptions

On Error Estimation in Runge-Kutta Methods

Assessment of Track Sequence Optimization based on Recorded Field Operations

And Ph.D. Candidate of Computer Science, University of Putra Malaysia 2 Faculty of Computer Science and Information Technology,

Transmission Lines Modeling Based on Vector Fitting Algorithm and RLC Active/Passive Filter Design

MapReduce Optimizations and Algorithms 2015 Professor Sasu Tarkoma

INDEXATION OF WEB PAGES BASED ON THEIR VISUAL RENDERING

Effective Missing Data Prediction for Collaborative Filtering

Towards Adaptive Information Merging Using Selected XML Fragments

Aggregating Descriptors with Local Gaussian Metrics

Aggregating Local Descriptors for Epigraphs Recognition

An Optimised Density Based Clustering Algorithm

A VECTOR PERTURBATION APPROACH TO THE GENERALIZED AIRCRAFT SPARE PARTS GROUPING PROBLEM

COLOR EDGE DETECTION IN RGB USING JOINTLY EUCLIDEAN DISTANCE AND VECTOR ANGLE

Multi-azimuth Prestack Time Migration for General Anisotropic, Weakly Heterogeneous Media - Field Data Examples

Satellite Image Analysis

ANALYTIC PERFORMANCE MODELS FOR SINGLE CLASS AND MULTIPLE CLASS MULTITHREADED SOFTWARE SERVERS

Multidimensional Testing

BUPT at TREC 2006: Spam Track

Prof. Feng Liu. Fall /17/2016

Frequency Domain Approach for Face Recognition Using Optical Vanderlugt Filters

Robust Object Detection at Regions of Interest with an Application in Ball Recognition

Multiview plus depth video coding with temporal prediction view synthesis

All lengths in meters. E = = 7800 kg/m 3

Effects of Model Complexity on Generalization Performance of Convolutional Neural Networks

The EigenRumor Algorithm for Ranking Blogs

Statistics of Images. Ioannis Rekleitis

Modelling, simulation, and performance analysis of a CAN FD system with SAE benchmark based message set

Real-Time Speech-Driven Face Animation. Pengyu Hong, Zhen Wen, Tom Huang. Beckman Institute for Advanced Science and Technology

A New Finite Word-length Optimization Method Design for LDPC Decoder

Any modern computer system will incorporate (at least) two levels of storage:

Robust Object Detection at Regions of Interest with an Application in Ball Recognition

Tissue Classification Based on 3D Local Intensity Structures for Volume Rendering

3D inspection system for manufactured machine parts

Title. Author(s)NOMURA, K.; MOROOKA, S. Issue Date Doc URL. Type. Note. File Information

COMPARISON OF CHIRP SCALING AND WAVENUMBER DOMAIN ALGORITHMS FOR AIRBORNE LOW FREQUENCY SAR DATA PROCESSING

LOSSLESS audio coding is used in such applications as

INFORMATION DISSEMINATION DELAY IN VEHICLE-TO-VEHICLE COMMUNICATION NETWORKS IN A TRAFFIC STREAM

A MULTIRESOLUTION AND OPTIMIZATION-BASED IMAGE MATCHING APPROACH: AN APPLICATION TO SURFACE RECONSTRUCTION FROM SPOT5-HRS STEREO IMAGERY

An Assessment of the Efficiency of Close-Range Photogrammetry for Developing a Photo-Based Scanning Systeminthe Shams Tabrizi Minaret in Khoy City

Supervised Coupled Dictionary Learning with Group Structures for Multi-Modal Retrieval

Concomitants of Upper Record Statistics for Bivariate Pseudo Weibull Distribution

DUe to the recent developments of gigantic social networks

(a, b) x y r. For this problem, is a point in the - coordinate plane and is a positive number.

A Neural Network Model for Storing and Retrieving 2D Images of Rotated 3D Object Using Principal Components

A ROI Focusing Mechanism for Digital Cameras

Topological Characteristic of Wireless Network

SAR: A Sentiment-Aspect-Region Model for User Preference Analysis in Geo-tagged Reviews

Kalman filter correction with rational non-linear functions: Application to Visual-SLAM

Forward-Decoding Kernel-Based Phone Sequence Recognition

Several algorithms exist to extract edges from point. system. the line is computed using a least squares method.

A Comparative Impact Study of Attribute Selection Techniques on Naïve Bayes Spam Filters

Transcription:

Evaluation of Second-ode Visual Featues fo Land-Use Classification Romain Negel, David Picad and Philippe-Heni Gosselin ETIS/ENSEA - UCP - CNRS, F95014 Cegy, Fance Email: omain.negel,picad,gosselin@ensea.f Texmex team, Inia, F35042 Rennes, Fance Email: philippe.gosselin@inia.f Abstact This pape investigates the use of ecent visual featues based on second-ode statistics, as well as new pocessing techniques to impove the quality of featues. Moe specifically, we pesent and evaluate Fishe Vectos (FV), Vectos of Locally Aggegated Desciptos (VLAD), and Vectos of Locally Aggegated Tensos (VLAT). These techniques ae combined with seveal nomalization techniques, such as powe law nomalization and othogonalisation/whitening of descipto spaces. Results on the UC Meced land use dataset shows the elevance of these new methods fo land-use classification, as well as a significant impovement ove Bag-of-Wods. I. INTRODUCTION In land-use classification of high-esolution ovehead imagey, the most popula pipeline is composed of two pats: the extaction of image desciptos; and the use of statistical leaning tools. To extact the desciptos, two types of methods ae used: global desciptos; aggegated desciptos. The global desciptos commonly used ae colo desciptos [1] (e.g. histogam of RGB, HSV, CIE Lab) and homogeneous textue desciptos [1] (Gabo filte esponses). The aggegated image desciptos ae computed in two steps: the extaction of a set of local desciptos; and thei aggegation in a single descipto. The most polula local desciptos is the well known SIFT descipto [2]. Many aggegation methods have been poposed and evaluated [1], [3 7]. Yang et al. [1], [3] evaluated numeous methods used in compute vision: the bag-of-visualwods [8] and its spacial extentions (Spatial Pyamid Match Kenel [9] and Spatial Co-occuence Kenel [10]). Risojevic et al. [4] popose to use the coss-ceelations between Gabo wavelet coefficient and quatenion famewok fo the epesentation of colo images to compute the image descipto. Recently, Cheiyadat et al. [5] popose to use a coding/pooling method [11] to aggegate the local desciptos into a single descipto. In this pape, we popose to evaluate ecent methods of aggegated desciptos fo visual epesentation of land-use classification in high-esolution ovehead imagey [1]. This includes impoved Fishe Vectos (FV) [12], Vectos of Locally Aggegated Desciptos (VLAD) [13], Vectos of Locally Aggegated Tensos (VLAT) [14], and many impovements that wee ecently poposed [15]. These methods ae the extension of visual dictionaies appoaches intoduced by Bag-of-Wods (BoW) [8], thanks to seveal key ideas. A fist key idea is the intoduction of the deviation appoach, which was fist motivated by statistical models fom Fishe Kenels [12]. The idea is to conside the deviation between the local image model and a global model, athe than only the local image model. As a esult, specific popeties of an image ae bette emphasised. A second key idea is the intoduction of secondode statistics, fo instance using covaiance data in addition to mean data. While the featue size is then significantly inceased, ecent techniques fo high dimensionality eduction have solved this poblem [13], [15]. Anothe key idea is the intoduction of nomalisation pocessing at the diffeent levels of the tool-chain, like the powe law [12] o cluste-wise component analysis [15]. A last key idea is to only conside featues vectos compaed with a linea similaity (e.g. dot poduct). These constaints allow the use of vey efficient etieval and leaning techniques, like Stochastic Gadient Descent SVM [16]. Futhemoe, in most cases this leads to methods with computational and memoy complexity linea with the size of datasets, athe than quadatic complexity with non-linea featues. In this scope, we fist popose a detailed pesentation of these methods in Section II. Then, we pesent in Section III evaluation esults on UC Meced land use dataset [1]. II. IMAGE FEATURES Most image featues ae obtained by a two steps scheme. The fist step is to extact a set of local visual desciptos fom the images. The most commonly used visual desciptos ae highly disciminant local desciptos (HOG, SIFT, SURF,...). Regions of inteest can be selected by unifom sampling, o by automatic point of inteest detection. The set extacted fom an image is called a bag. We denote by B i = {b i } the set of desciptos b i R D in image i. The second step is to map the desciptos of the bag B i into a single vecto x i R W, known as the image featue. A. Statistical Appoaches The fist methods to map the desciptos in a featue ae based on the statistical study of the distibution of desciptos in the bag. These appoaches have been inspied by text etieval methods. To study the distibution of desciptos, we use a visual codebook composed by C visual wods. The visual codebook is geneally computed by a clusteing algoithm (e.g., k-means) on a lage sample of desciptos. A bag can then be descibed by a statistical analysis of occuences of visual wods. The fist method of this kind, named Bag of Wods (BoW) [8] counts the numbe of desciptos belonging to

each cluste. The dimension of the featue is then C. Many extension of BoW has been poposed [11], fo example in the classification of uban scenes in geo-efeenced images [17]. These appoaches obtain good esults in similaity seach and in images classification. Howeve, to obtain good esults with these methods it is necessay to use visual codebooks with vey lage dictionaies (about 100k visual wod), and the use of a non-linea metic. B. Model Deviation Appoaches Peonnin et al. [12] poposed a successful method called Fishe Vectos. This method uses a pobability density function denoted by u λ of paametes λ as model of the desciptos space. To descibe the image, they compute the deivative of the log-likelihood of image desciptos to the model: λ = 1 T λ log u λ (B i ). (1) The authos popose to use a Gaussian Mixtue Model (GMM) of paametes µ c and σ c. Elements of the Fishe Vecto fo each Gaussian c can be witten as: µ,c = σ,c = 1 T ( bi µ γ c (b i ) c ω c σ c 1 T [ (bi µ γ c (b i ) c ) 2 ω c σ 2 c ), (2) ] 1. (3) Whee (ω c, µ c, σ c ) ae the weight, mean and standad deviation of Gaussian c, and γ c (b i ) the nomalized likelihood of b i to Gaussian c. The final featue is obtained by concatenation of Gµ,c Bi and Gσ,c Bi fo all Gaussians. Fishe Vectos achieve vey good esults [12]. Howeve, Fishe Vectos ae limited to the simple model of mixtues of Gaussians with diagonal covaiance matices. Moeove, the GMM algoithm is computationally vey intensive. Jegou et al. [13] poposed a simplified vesion of Fishe Vecto by aggegating local desciptos, called Vectos of Locally Aggegated Desciptos (VLAD). They poposed to model the desciptos space by a small codebook obtained by clusteing a lage set of desciptos. The model is simply the sum of all centeed desciptos B ci = {b ci } B i fom image i and cluste c: ν ci = b ci µ c (4) b ci B ci with µ c the cente of cluste c. The final featue is obtained by a concatenation of ν ci fo all c. The featue size is D C. Picad et al. [14] poposed an extension of VLAD by aggegating tenso poducts of local desciptos, called Vecto of Locally Aggegated Tensos (VLAT). They poposed to use the covaiance matix of the desciptos of each cluste. Let us denote by µ c the mean of cluste c and T c the covaiance matix of cluste c with b ci desciptos belonging to cluste c: µ c = 1 b ci (5) c T c = 1 c i (b ci µ c )(b ci µ c ), (6) i with c being the total numbe of desciptos in cluste c. Fo each cluste c, the featue of image i is the sum of centeed tensos of centeed desciptos belonging to cluste c: T ic = (b ci µ c )(b ci µ c ) T c. (7) Each T ic is flattened into a vecto v ic. The VLAT featue v i fo image i consists of the concatenation of v ic fo all clustes: v i = (v i1... v ic ). (8) As the T ic matices ae symmetic, only the diagonal and the uppe pat ae kept while flattening T ic into a vecto v ic. The size of the featue is then C D (D+1) 2. C. Nomalization Seveal nomalization pocessing ae poposed in the liteatue to enhance the quality of visual featues. The most popula one is the powe law nomalization, which fist aises each value to a powe α, and then l 2 -nomalize the esulting vecto: x i = v i v i, j, v i[j] = sign ( v i [j] ) vi [j] α, (9) with α typically set to 0,5. This nomalization was fist intoduced fo the final visual featues [12], and moe ecently fo the nomalization of low-level desciptos, such as Root- SIFT [18]. Anothe common impovement is the othogonalization and/o whitening of vecto spaces, using a Pincipal Component Analysis. This can be pefomed at diffeent levels. Fo instance, this is a equied pe-pocessing on low-level desciptos fo Fishe Vectos [12]. It can also be used to nomalize each cluste of the dictionay, as it is done fo VLAD [19] and fo VLAT [15]. In the case of VLAT, the pocessing is the following one. Fist, we compute the eigendecomposition of the covaiance matix of each cluste c: T c = V c D c V c, (10) whee D c is a eal non-negative diagonal matix (eigenvalues), and V c is unitay (eigenvectos). Then we poject the centeed desciptos belonging to c on the eigenvectos: b ci = V c (b ci µ c ). (11) Combining eq.(11) and eq.(7), we get: ( ) T ic = Vc (b ci µ c )(b ci µ c ) T c V c = V ( c (bci µ c )(b ci µ c ) ) V c D c = ( V c (b ci µ c ) ) ( Vc (b ci µ c ) ) Dc. The cluste-wise nomalized VLAT featue of image i in cluste c is the sum of tensos of pojected desciptos b ci belonging to cluste c, centeed by D c : T ic = b cib ci D c. (12)

Agicultual Aiplane Base Ball D. Beach Foest Feeway Golf Couse Habo Intesection Medium Resi.Mobile H. P. Rive Runway Spase Resid. Stoage Ta. Tennis Cout Ove Pass Paking Lot Fig. 1. Buildings Chapaal Dense Resid. UC Meced land use dataset. III. E XPERIMENTS In this section, we pesent the esult using FV, VLAD and VLAT featues on UC Meced land use dataset [1]. TABLE I. C LASSIFICATION ACCURACY (%) FOR VARIOUS EXTRACTION SCALES OF HOG DESCRIPTORS AND VARIOUS VISUAL CODEBOOKS SIZE WITH VLAD FEATURE. D 4 6 8 10 4+6+8+10 A. Dataset This dataset is composed of 256 256 pixels RGB images, with pixel esolution of one foot. They ae manually classified into 21 classes, coesponding to vaious land cove and land use types: agicultual, aiplane, baseball diamond, beach, buildings, chapaal, dense esidential, foest, feeway, golf couse, habo, intesection, medium density esidential, mobile home pak, ovepass, paking lot, ive, unway, spase esidential, stoage tanks, and tennis couts. Each class contains 100 images. Examples ae pesented in Fig. 1. We evaluate methods using the same potocol as in [1]. This is a five-fold multi-classification potocol. We andomly split the dataset into five subsets, tain using fou subsets and test using the emaining one. The esults pesented in the following sections ae then aveage classification accuacy ove the five uns. We follow a one-vesus-all classification stategy: fo each class, we tain a linea SVM classifie, and label each test image with the classifie that etuns the highest classification scoe. Fo all the following expeiments, we `2 nomalize lowlevel desciptos (no powe law at this stage), and nomalize final visual featues using a powe law with α = 0.5. B. Featues compaison We fist evaluated the scale paamete of low-level desciptos. Fo this pupose, we used HOG desciptos at diffeent scales: cells of 4 pixels, 6 pixels, 8 pixels and 10 pixels. Then, we caied out expeiment with VLAD featues fo diffeent dictionay sizes D, fom 32 to 256 keywods. As pesented in Table I, individual scales pesents simila pefomance. Futhemoe, this behaviou is the same with diffeent dictionay 32 84.5 84.8 83.2 83.8 85.2 64 86.7 86.0 86.7 86.5 128 88.5 87.6 87.7 87.8 89.6 256 89.0 89.0 88.6 88.7 90.9 TABLE II. C LASSIFICATION ACCURACY (%) FOR HOG AND RGB DESCRIPTORS AND FOR VARIOUS VISUAL CODEBOOKS SIZE WITH FV, VLAD, VLAT FEATURES. D HOG RGB FV VLAD VLAT FV VLAD VLAT 32 88.3 85.2 91.5 84.3 80.0 64 90.0 91.7 85.6 82.5 88.1 128 91.2 89.6 91.8 87.2 83.7 88.1 256 91.8 90.9 92.3 88.2 85.4 88.9 size: in all cases, pefomance is impoved by the size of the dictionay, but is simila fom one scale to anothe. Howeve, when combining all scales, the pefomance is impoved. As a esult, we always conside the combination of these fou scales in the following expeiments. The second set of expeiments compaes FV, VLAD and VLAT using HOG desciptos and RGB desciptos [12], fo diffeent dictionay sizes D. Results ae pesented in Table II. If we compae HOG and RGB desciptos, HOG ae moe effective. Let note that we did not combine HOG and RGB desciptos because no nomalization is pefomed on descipto spaces in these expeiments. Focusing on the dictionay size, impovement can be obseved with VLAD, howeve, this is less significant fo FV and VLAT. Finally, the best featue in these expeiment is the VLAT featue.

TABLE IV. COMPARISON TO STATE-OF-THE-ART RESULTS. Method BOVW[1] SCK[1] BOVW+SCK[1] ColoRGB[1] ColoHLS[1] Textue[1] Ou (VLAT) Accuacy (%) 76.81 72.52 77.71 76.71 81.19 76.91 94.3 Method SPCK[3] SPCK+[3] SPCK++[3] Colo O.F.[4] Colo O.D.[4] Quate. O.D.[4] Ou (FV) Accuacy (%) 73.14 76.05 77.38 81.10 84.86 85.48 93.8 TABLE III. CLASSIFICATION ACCURACY (%) FOR HOG AND RGB DESCRIPTORS AND FOR VARIOUS COMPRESSION RATIOS OF DESCRIPTORS WITH FV, VLAD, VLAT FEATURES (D = 64). HOG RGB HOG+RGB C. Descipto spaces nomalization d 128 96 64 32 FV 92.0 91.5 91.2 90.1 VLAD 89.1 88.8 88.9 87.4 VLAT 92.5 92.7 92.7 91.8 FV - 89.3 89.4 87.2 VLAD - 86.5 86.0 84.6 VLAT - 90.6 90.5 89.4 FV - 93.8 93.4 92.8 VLAD - 92.5 92.5 91.6 VLAT - 94.1 94.3 93.8 In this section, we analyse the nomalization of descipto spaces using methods based on Pincipal Components Analysis (PCA). The motivation is two-fold: impove the quality of featues, but also combine seveal featues fom diffeent desciptos. Fo FV, we consideed diffeent numbe d of component fo the nomalization and eduction of low-level desciptos. Fo VLAD and VLAT, this pocessing is pefomed fo each cluste of the visual dictionay. Results ae pesented in Table III, wee d is the numbe of pincipal component fo the global PCA (FV) o fo cluste-wise PCAs (VLAD and VLAT). Note that cases with 128 components ae not pesented fo RGB desciptos, since the oiginal descipto only have 96 dimensions. Futhemoe, the size of the dictionay is always set of 64. Oveall, if we compae to the pevious set of expeiments with D = 64, pefomance is inceased in all cases, even if we educe the size of desciptos to d = 32 dimensions. If we compae HOG and RGB desciptos, HOG ae also moe effective. Thanks to the nomalization, we can ceate a elevant combination of HOG and RGB desciptos, which pesent the highest esults, especially with VLAT featues. We pesent in Fig. 2 these esults accoding to the size of featues, using HOG and RGB desciptos, and dictionay sizes D = 32 and D = 64. If we focus on featues with 1,000 dimensions, we can see that pefomance ove 92% can be obtained with FV and VLAD methods. This can be compaed to the esults using BoW [1], which ae fom 77% to 82% with the same numbe of dimensions. Finally, we compae ou esults to the state of the at in Table IV. As we can see, model deviation appoaches clealy outpefom thei methods by a fai magin of at least 9%. IV. CONCLUSION In this pape, we pesented ecent methods fo computing visual featues, as well as elated nomalization techniques. We caied out expeiments to analyse the benefit of each new component on the UC Meced land use dataset [1]. Oveall, all consideed method (FV, VLAD and VLAT) showed high classification accuacies. Futhemoe, a high accuacy can be Classification accuacy (%) 94.5 94 93.5 93 92.5 92 91.5 91 90.5 90 89.5 10 4 10 5 Signatue dimension VLAT D=64 VLAT D=32 FV D=64 FV D=32 VLAD D=64 VLAD D=32 Fig. 2. Classification accuacy (%) as function of dimensionality eduction of desciptos (i.e., d {96, 64, 32}) fo vaious visual codebooks size and featues. obtain with small visual dictionaies (64 wods), especially fo FV and VLAT featues. Descipto space nomalization using PCA-based methods impove the esults, but also allows effective combination of diffeent desciptos types (HOG and RGB in ou expeiments). Finally, when compaed to BoW methods, these methods lead to lage pefomance impovement, as it is seen fo genealist image categoization. A new component that will be woth investigating in the futue is the compession of visual featues fo lage-scale indexing, and thei evaluation on vey lage land-use datasets. REFERENCES [1] Y. Yang and S. Newsam, Bag-of-visual-wods and spatial extensions fo land-use classification, in ACM SIGSPATIAL GIS, 2010. [2] D. G. Lowe, Object ecognition fom local scale-invaiant featues, in Compute vision, 1999. The poceedings of the seventh IEEE intenational confeence on, vol. 2. IEEE, 1999, pp. 1150 1157. [3] Y. Yang and S. Newsam, Spatial pyamid co-occuence fo image classification, in ICCV. IEEE, 2011, pp. 1465 1472. [4] V. Risojevic and Z. Babic, Oientation diffeence descipto fo aeial image classification, in IWSSIP. IEEE, 2012, pp. 150 153. [5] A. M. Cheiyadat, Unsupevised featue leaning fo aeial scene classification, pp. 439 451, 2014. [6] Y. Zhang, X. Sun, H. Wang, and K. Fu, High-esolution emote-sensing image classification via an appoximate eath move s distance-based bag-of-featues model, pp. 1055 1059, 2013. [7] E. Aptoula, Remote sensing image etieval with global mophological textue desciptos, IEEE Tansactions on Geoscience and Remote Sensing, vol. 52, no. 5, pp. 3023 3034, May 2014. [8] J. Sivic and A. Zisseman, Video google: A text etieval appoach to object matching in videos, in ICCV, vol. 2, 2003, pp. 1470 1477. [9] S. Lazebnik, C. Schmid, and J. Ponce, Beyond bags of featues: Spatial pyamid matching fo ecognizing natual scene categoies, in Compute Vision and Patten Recognition, 2006 IEEE Compute Society Confeence on, vol. 2. IEEE, 2006, pp. 2169 2178.

[10] R. M. Haalick, K. Shanmugam, and I. H. Dinstein, Textual featues fo image classification, Systems, Man and Cybenetics, IEEE Tansactions on, no. 6, pp. 610 621, 1973. [11] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, Localityconstained linea coding fo image classification, in CVPR, 2010, pp. 3360 3367. [12] F. Peonnin, J. Sánchez, and T. Mensink, Impoving the fishe kenel fo lage-scale image classification, in ECCV, 2010, pp. 143 156. [13] H. Jegou, F. Peonnin, M. Douze, J. Sanchez, P. Peez, and C. Schmid, Aggegating local image desciptos into compact codes, IEEE Tans. on PAMI, vol. 1, pp. 3304 3311, 2012. [Online]. Available: http://hal.inia.f/inia-00633013/en/ [14] D. Picad and P. Gosselin, Efficient image signatues and similaities using tenso poducts of local desciptos, CVIU, vol. 117, p. 680687, 2013. [15] R. Negel, D. Picad, and P. Gosselin, Web scale image etieval using compact tenso aggegation of visual desciptos, IEEE Multimedia, vol. 20, no. 3, pp. 24 33, 2013. [16] L. Bottou, Lage-scale machine leaning with stochastic gadient descent, in COMPSTAT, Y. Lechevallie and G. Sapota, Eds. Pais, Fance: Spinge, August 2010, pp. 177 187. [Online]. Available: http://leon.bottou.og/papes/bottou-2010 [17] C. Iovan, D. Picad, N. Thome, and M. Cod, Classification of Uban Scenes fom Geo-efeenced Images in Uban Steet-View Context, in ICMLA, vol. 2, United States, 2012, pp. 339 344. [Online]. Available: http://hal.achives-ouvetes.f/hal-00794980 [18] R. Aandjelovic and A. Zisseman, Thee things eveyone should know to impove object etieval, in CVPR. IEEE, 2012, pp. 2911 2918. [19] J. Delhumeau, P.-H. Gosselin, H. Jegou, and P. Peez, Revisiting the vlad image epesentation, in ACM Multimedia, Bacelona, Spain, Octobe 2013.