PERSON RE-IDENTIFICATION is the problem of recognizing

Size: px
Start display at page:

Download "PERSON RE-IDENTIFICATION is the problem of recognizing"

Transcription

1 JOURNAL OF TCSVT 1 Deep Deformable Patch Metric Learning for Person Re-identification Sławomir Bak Peter Carr Disney Research Pittsburgh, PA, USA, 1213 {slawomir.bak,peter.carr}@disneyresearch.com Abstract The methodology for finding same individual in a network cameras must deal significant changes in appearance caused by variations in illumination, viewing angle and a person s pose. Re-identification requires solving two fundamental problems: (1) determining a distance measure between features extracted from different cameras that copes illumination changes (metric learning); and (2) ensuring that matched features refer same body part (correspondence). Most metric learning approaches focus on finding a robust distance measure between bounding box images, neglecting alignment aspects. In this paper, we propose learn appearance measures for es that are combined using deformable models. Learning metrics for es avoids strong dimensionality reduction, thus keeping more information. Additionally, we allow es change ir locations, directly addressing correspondence problem. As es from different locations may share same metric, our method effectively multiplies amount training data and allows metrics be learned on smaller amounts labeled images. Different metric learning approaches (KISSME, XQDA, LSSL) ger different deformable models (spring constraints, one--one matching constraints) are investigated and compared. For describing es, we propose learn a deep feature representation Convolutional Neural Networks (CNNs), thus obtaining highly effective features for re-identification. We demonstrate that our approach significantly outperforms state---art methods on multiple datasets. Index Terms metric learning, deformable models. 1 INTRODUCTION PERSON RE-IDENTIFICATION is problem recognizing same individual across a network cameras. In a realworld scenario, transition time between cameras may significantly decrease search space, but temporal information alone is not usually sufficient solve problem. As a result, visual appearance models have received a lot attention in computer vision research [], [2], [26], [29], [], [41], [47]. The underlying challenge for visual appearance is that models must work under significant appearance changes caused by variations in illumination, viewing angle and a person s pose. Metric learning approaches ten achieve best performance in re-identification. These methods learn a distance function between features from different cameras such that relevant dimensions are emphasized while irrelevant ones are ignored. Many metric learning approaches [], [19], [24] divide a bounding box pedestrian image in a fixed grid regions and extract descriprs which are n concatenated in a high-dimensional feature vecr. Afterwards, dimensionality reduction is applied, and n metric learning is performed on reduced subspace differences between feature vecrs. To avoid overfitting, dimensionality must be significantly reduced. In practice, subspace dimensionality is about three orders magnitude smaller than original. Such strong dimensionality reduction might result in loss discriminative information. Additionally, features extracted on a fixed grid (see Fig. 1), may not correspond. Copyright c 17 IEEE. Personal use this material is permitted. However, permission use this material for any or purposes must be obtained from IEEE by sending an pubs-permissions@ieee.org. Input Bounding Box ML DPML Similarity=0. Similarity=0.8 Fig. 1: Full bounding box metric learning vs. deformable metric learning (DPML). The corresponding es in grid (highlighted in red) do not correspond same body part because pose change. Information from such misaligned features might be lost during metric learning step. Instead, our DPML deforms maximize similarity using metrics learned on a level. even though it is same person (e.g. due a pose change). Metric learning is unable recover this lost information. In this paper, instead learning a metric for concatenated features extracted from full bounding boxes from different cameras, we propose learn metrics for 2D es. Learning metrics for es is less prone overfitting (because lower dimensionality) and it requires less compression. As a result it keeps more information. Furrmore, we do not assume es must be located on

2 JOURNAL OF TCSVT 2 a fixed grid. Our model allows es perturb ir locations when computing similarity between two images (see Fig. 1). This model is inspired from part-based object detection [12], [43], which decomposes appearance model in local templates geometric constraints (conceptualized as springs). Our main contributions are: We propose learn metrics locally, on feature vecrs extracted from es. These metrics can be combined in a unified distance measure. We introduce two deformable -based models for accommodating pose changes and occlusions: (1) an unsupervised deformable model that introduces a global one-one matching constraint solved by a linear assignment problem, and (2) a supervised deformable model that combines an appearance term a deformation cost that controls relative placement es. For describing es, we propose learn a deep feature representation Convolutional Neural Networks (CNNs). The CNN is learned through challenging multiclass identification task. We force CNN recognize not only person identity from which has been extracted but also location. This results in highly effective representation, significantly improving re-identification accuracy. Our experiments illustrate merits -based techniques and achieve new state---art performance on multiple datasets outperforming existing approaches by large margins. 2 RELATED WORK Person re-identification approaches can be divided in two groups: feature modeling [4], [11] designs descriprs (usually handcrafted) which are robust changes in imaging conditions, and metric learning [1], [], [19], [23], [24], [42], [0] searches for effective distance functions compare features from different cameras. Robust features can be modeled by adopting perceptual principles symmetry and asymmetry human body [11]. The correspondence problem can be approached by locating body parts [4], [8] and extracting local descriprs (color hisgrams [8], color invariants [21], covariances [4], CNN [29]). However, find a proper descripr, we need look for a trade-f between its discriminative power and invariance between cameras. This task can be considered a metric learning problem that maximizes inter-class variation while minimizing intra-class variation. Many different machine learning algorithms have been considered for learning a robust similarity function. Gray et al. employed Adaboost for feature selection and weighting [14], Prosser et al. defined person re-identification as a ranking problem and used an ensemble RankSVMs [32]. Recently features learned from deep convolution neural networks have been investigated [1], [7], [23], [3], [37], [], [48]. However, most common choice for learning a metric remains family Mahalanobis distance functions. These include Large Margin Nearest Neighbor Learning (LMNN) [39], Information Theoretic Metric Learning (ITML) [9] and Logistic Discriminant Metric Learning (LDML) [1]. These methods usually aim at improving k-nn classification by iteratively adapting metric. In contrast se iterative methods, Köstinger [19] proposed KISS metric which uses a statistical inference based on a likelihood-ratio test two Gaussian distributions modeling positive and negative pairwise differences between features. Owing its effectiveness and efficiency, KISS metric is a popular baseline that has been extended linear [2], [] and nonlinear [29], [41] subspace embeddings. Most se approaches learn a Mahalanobis distance function for feature vecrs extracted from full bounding box images. Integration feature learning directly metric learning approach has been proposed in [38]. Mahalanbois-like function ger feature representation is learned a novel end--end framework throughout a triplet embedding. Recently, a trend learning similarity measures for es [2], [3], [33], [34], [1] has emerged. Operating on es allows directly address person pose variations and camera viewpoint changes. Shen et al. [33] learns correspondence structure that captures spatial correspondence patterns across camera viewpoints The correspondence structure is represented by -wise matching probabilities learned using a boosting-like approach. Patchwise correspondence is also introduced in [34]. First body is divided in upper and lower body parts and n trees are independently constructed find correspondence. Zheng et al. [1] shows that introducing -level matching model based on a sparse representation can help in handling inaccurate person detecrs as well as large amount occlusion. This paper is based on our previous work [2], where we have proposed learn dissimilarity functions for es in bounding boxes, and n combine ir scores in a robust distance measure. We have shown that our approach has clear advantages over existing algorithms. In this paper we continue our analysis by evaluating additional parameters (e.g. size es and ir layouts) and by employing novel metric learning approaches. We also investigate additionally an unsupervised deformable model based on one--one matching constraint. Finally, we propose learn features directly from data through challenging multi-class identification task employing CNN model []. This results in highly effective representation that brings significant improvement in performance. Compared state-- art methods, our approach yields significantly higher recognition accuracy. 3 METHOD Often dissimilarity Ψ(i, j) between two bounding box images i and j taken from different cameras is defined as a Mahalanobis metric. The Mahalanobis metric measures squared distance between feature vecrs extracted from se bounding box images, x i and x j Ψ(i, j) = d 2 (x i, x j ) = (x i x j ) T M(x i x j ), (1) where M is a matrix encoding basis for comparison. M is usually learned in two stages: dimensionality reduction is first applied on x i and x j (e.g. principle component analysis - PCA), and n metric learning (e.g. KISS metric [19]) is performed on reduced subspace. To avoid overfitting, dimensionality must be significantly reduced keep free parameters low [16], [2]. In practice, x i and x j are high dimensional feature vecrs and ir reduced dimensionality is usually about three orders magnitude smaller than original [19], [2], [29]. Such strong dimensionality reduction might lose discriminative information, especially in case misaligned features in x i and x j (e.g. highlighted es in Fig. 1).

3 JOURNAL OF TCSVT 3 We propose learn a metric for matching es in bounding box. We perform dimensionality reduction on features extracted from each. The reduced dimensionality is usually only one order magnitude smaller than original one, thus keeping more information (see Section 4). In Section 3.1 we fer a -based metric learning. Section 3.2 introduces three state---art Mahalanobis-like metric learning approaches: KISSME [19], XQDA [2] and LSSL [42]. In Section 3.3, we propose two methods for integrating metrics in a single similarity measure and in Section 3.4 we show how learn a very effective representation Convolutional Neural Networks. 3.1 Patch-based Metric Learning We divide bounding box image i in a dense grid overlapping rectangular es. From each location k, we extract feature vecr p k i. We represent bounding box image i as an ordered set features X i = {p 1 i, p2 i,..., pk i }, where K is es. Usually in standard metric learning approaches [19], [26], [29], [], se descriprs are furr concatenated in a single high dimensional feature vecr (e.g. x i = [p 1 i p2 i... pk i ]) and metric learning ger dimensionality reduction is n performed. Instead, we learn a dissimilarity function Φ for feature vecrs extracted from es. Patch dissimilarities are furr combined in a unified dissimilarity by integration function Z (see Section 3.3). We define dissimilarity between two images i and j as ) Ψ(i, j) = Z k,l 1...K (Φ(p k i, p l j; θ(k)) (2) where p k i and pl j are feature vecrs extracted from es at locations k and l, respectively, in bounding box images i and j. Images i and j are assumed come from different cameras. Set parameters θ determines function Φ and it is learned using a given metric learning approach (see Section 3.2). Notice that Ψ(i, j) is defined as an asymmetric dissimilarity measure due k dependency. If symmetry is a concern, one can redefine final dissimilarity as a function both Ψ(i, j) and Ψ(j, i), e.g. Ψ (i, j) = min(ψ(i, j), Ψ(j, i)). Although, it is possible learn one metric for each location k, this might be o many degrees freedom. In practice, multiple locations might share a common metric, and in extreme case a single θ could be learned for all locations. We investigated re-identification performance different s metrics (see Section 4.2.1) and found that in some cases multiple metrics might perform better than a single one. Regions statistically different amounts background noise should have different metrics (e.g. es close head contain more background noise than es close rso). However, we also found that recognition performance is a function available training data (see Section 4.2.1), which limits metrics that can be learned efficiently. In standard approach, a pair bounding boxes corresponds a single training example. Breaking a bounding box in a set es increases amount training data if a reduced metrics is learned (e.g. some locations k share same metric/parameters θ). When a single θ is learned, amount training data increases by combining es for all K locations in a single set (K more positive examples for learning a metric compared standard approach). In experiments we show that this can significantly boost performance when training dataset is small (e.g. ilids dataset). 3.2 Metric learning (Φ) Given pairs sample bounding boxes (i, j) we introduce space pairwise differences p k ij = pk i pk j and partition training data in p k+ ij when i and j are bounding boxes containing same person and p k ij orwise. Note that for learning we use differences on es from same location k KISS metric learning Köstinger et al. [19] proposed an effective and efficient way learning a Mahalanobis metric by assuming a Gaussian structure difference space (i.e. p k ij ). When employing KISSME our dissimilarity measure becomes Φ(p k i, p k j ; θ(k)) = (p k i p k j ) T M (k) (p k i p k j ), (3) thus θ(k) = {M (k) }. To learn M (k) we follow Köstinger [19] and assume a zero mean Gaussian structure on difference space and employ a log likelihood ratio test. This results in M (k) = Σ 1 k+ Σ 1 k, (4) where Σ k+ and Σ k are covariance matrices p k+ ij respectively and p k ij, Σ k+ = (p k+ ij )(pk+ ij )T, () Σ k = (p k ij )(pk ij )T. (6) Computing Eq. (4) requires inverting two covariance matrices. In practice, as p k i s are still relatively high dimensional (see Sec ), Σ k+ is ten singular, thus Σ 1 k+ cannot be computed. As a result, dimensionality reduction on p k i is usually applied (e.g. PCA), which allows invert Σ k+. Keeping dimensionality low also avoids overfitting. One can find optimal principal components using cross-validation techniques. Liao et al. [2] proposed an alternative solution that simultaneously learns metric M and low dimensional subspace W, referred as XQDA XQDA metric learning Using XQDA [2] dissimilarity can be written as Φ(p k i, p k j ; θ(k)) = (p k i p k j ) T (W (k) )M (k) (W (k) ) T (p k i p k j ), (7) where M (k) = ( ) 1 ( 1 (W (k) ) T Σ k+ (W (k) ) (W (k) ) T Σ k (W )) (k). (8) Original feature dimension d p k i is reduced by subspace W (k) R d r. W (k) is learned using Generalized Rayleigh Quotient objective, solved by generalized eigenvalue decomposition problem similar LDA [2]. Notice that Σ k+ is computed in original d-dimensional space, thus its singularity remains a problem. Liao et al. proposes add a small regularizer diagonal elements Σ k+, which is a common trick in LDA-like problems. This makes estimation Σ k+ more smooth and robust. As a result, learning parameters become θ(k) = {W (k), M (k) }.

4 JOURNAL OF TCSVT LSSL metric learning Yang et al. [42] introduced large scale similarity learning (LSSL) that combines feature difference (p k ij = p k i pk j ) and commonness (q k ij = p k i + pk j ), thus producing more discriminative measure. The main idea comes from insights found in a 2-dimensional Euclidean space. Consider l 2 -normalized 2- dimensional feature space. Notice that for similar vecrs (i and j containing same person) p k ij is expected be small but qk ij should be very large, in contrary for dissimilar vecrs (i and j containing different people) p k ij is expected be large and qk ij should be relatively small. Therefore, by combining difference and commonness we can expect more discriminative metric compared metric learning methods that only employ differences p k ij. The dissimilarity measure n becomes Φ(p k i, p k j ; θ(k)) = (p k ij) T M (k) p (p k ij) T λ(q k ij) T M (k) q (q k ij) T, (9) where both M (k) p and M (k) q can be inferred analogically KISS metric learning [19]. Yang et al. [42] shows furr, that based on a pair-constrained Gaussian assumption, covariance for pairs containing different people (p k ij and q k ij ) can be directly deduced from image pairs containing same person (for details see [42]). Parameter λ is used balance between difference and commonness feature vecrs. Similarly [42], we set λ = 1. in all experiments. As a result, learning parameters become θ(k) = {M (k) p, M (k) q } and PCA is applied on p k i avoid covariance singularity problem. 3.3 Integrated dissimilarities for images (Z) To compute tal dissimilarity between two bounding box images i and j, we propose several strategies for aggregating metrics learned for es. First, we introduce a rigid model (PML) illustrate that learning metric for es keeps more information avoiding strong dimensionality reduction (Section 3.3.1). Additionally, learning metrics on level might effectively multiply amount training data yielding significant boost in recognition performance for smaller datasets. Pose changes and different camera viewpoints make reidentification more difficult as features extracted on a fixed grid may not correspond even though it is same person. Breaking a bounding box image in es allows us introduce deformable models that can effectively cope pose changes, enabling es in one bounding box perturb ir locations (deform) when matching anor bounding box. Independently metric learning, our task is find a strategy that can perturb locations simulate pose changes. We investigate two deformable models (1) un unsupervised deformable model based one--one matching constraint (HPML) that does not require any additional training apart metric learning (Section 3.3.2) and (2) a supervised deformable model geometric constraints (conceptualized as springs) (DPML) that we train by introducing an optimization problem as a relatvie distance comparison triplets (Section 3.3.3) Rigid model (PML) We combine dissimilarity scores by summing over all es K Z PML = Φ(p k i, p k j ; θ(k)). () k=1 #es 1 #es 1 Fig. 2: Deformable models: K K cost matrix, which is used as an input Hungarian algorithm for finding optimal one-one correspondence. The dissimilarity between two es becomes if distance between ir spatial locations η(, ) is greater than assumed threshold δ. Compared standard approach (e.g. in case KISS metric), this is equivalent learning a block diagonal matrix M Z PML = [ p 1 ij, p 2 ij,..., p K ] 0 M ij M K p 1 ij p 2 ij. p K ij (11) where all M (k) are learned independently. We refer this formulation as PML Unsupervised Deformable Model (HPML) Patch-based methods [2], [34] ten allow es adjust ir locations when comparing two bounding box images. Sheng et al. [34] assumed correspondence structure be fixed and learned it using a boosting-like approach. Instead, we define correspondence task as a linear assignment problem. Given K es from bounding box image i and K es from bounding box image j we create a K K cost matrix that contains similarity scores in a fixed neighborhood (see Fig 2). To avoid es freely changing ir location, we introduce a global one-one matching constraint and solve a linear assignment problem Ω ij = arg min Ω ij ( K k=1 s.t. ( Ω ij (k), k ) = Φ(p Ωij(k) i, p k j ; θ(k)) + ( Ω ij (k), k ) ), {, η(ωij (k), k) > δ; (12) 0, orwise, where Ω ij is a permutation vecr mapping es p Ωij(k) i es p k j and Ω ij (k) and k determine locations, (, ) is a spatial regularization term that constrains search neighborhood, where η corresponds distance between two locations and threshold δ determines allowed displacement (different δ s are evaluated in Fig 1(a)). We find optimal assignment Ω ij ( correspondence) using Kuhn-Munkres (Hungarian) algorithm []. This yields tal dissimilarity: Z HPML = K k=1 ( Φ p Ω ij (k) i We refer this formulation as HPML. ), p k j ; θ(k). (13)

5 JOURNAL OF TCSVT Supervised Deformable model (DPML) We employ a model which approximates continuous non-affine warps by translating 2D templates [12], [43] (see Fig. 1). We use a spring model limit displacement es. The deformable dissimilarity score for matching at location k in bounding box i bounding box j is defined as ψ(p k i, j) = min l [ Φ(p k i, p l j; θ(k)) + α k (k, l) ], (14) where feature p l j is extracted from bounding box j at location l; appearance term Φ(p k i, pl j ; θ(k)) computes feature dissimilarity between es and deformation cost α k (k, l) refers a spring model that controls relative placement es k and l. (k, l) is squared distance between locations. α k encodes rigidity spring: α k = corresponds a rigid model, while α k = 0 allows a change its location freely. Notice difference HPML, for which definition allows us perform discrete optimization (Ω stands for optimal global one--one assignment). For DPML we define a continues function and we first optimize alignment locally (ψ(p k i, j)) and n combine se deformable dissimilarity scores in a unified dissimilarity measure Z DPML = K w k ψ(p k i, j) k=1 = w, ψ ij, (1) where w is a vecr weights and ψ ij corresponds a vecr dissimilarity scores. Learning α k and w: Similarly [29], we define optimization problem as a relative distance comparison triplets {i, j, z} such that w, ψ iz > w, ψ ij for all i, j, z; where i and j correspond bounding boxes extracted from different cameras containing same person, and i and z are bounding boxes from different cameras containing different people. Unfortunately, Eq. 14 is non-convex and we can not guarantee avoiding local minima. In practice, we use a limited unique spring constants α k and apply two-step optimization. First, we optimize α k w = 1, by performing exhaustive grid search (see Section 4.3) while maximizing Rank-1 recognition rate. Second, we fix α k and determine best w using structural SVMs [18]. This approach is referred as DPML. 3.4 Deep es It is common practice in person re-identification combine handcrafted color and texture descriprs for describing image regions and n let metric learning discover relevant features and discard irrelevant ones. Often color hisgrams in different color spaces ger SIFT-like features are concatenated in high-dimensional feature vecrs [2]. Xiao et al. [] showed that CNN models also can effectively be applied person reidentification despite insufficient data. They proposed train jointly CNN data from multiple datasets and n finetune model a given camera pair using a domain-guided dropout strategy. In this work we adopt CNN model from [], but instead training it for whole images, we train it for es obtain highly robust feature representation. This model learns a set high-level feature representations through challenging multi-class identification tasks, i.e., classifying a training image in one C identities. As generalization capabilities id=1 id=11 id=13 id=1 id=17 id=12 id=14 id=16 id=18 " CNN deep feature " fc7 st-max layer " id=12 Fig. 3: Deep feature learning CNN: each image is divided in a set 8 non-overlapping es. The identity each is extended by its location. As a result, CNN is forced recognize not only person identity but also location. learned features increase classes predicted during training [36], we need C be relatively large (e.g. several thousand). While training CNN for es, we modify training strategy. First, each image is divided in a set 8 nonoverlapping es size height/4 width/2 and n each (although comes from same image but from different location) gets assigned a new identity. As a result, CNN model is forced determine not only person identity from which has been extracted but also location. Given a dataset images M identities, task becomes classify es in C = 8M identities. When CNN is trained classify a large identities and configured keep dimension last hidden layer relatively low (e.g., setting dimensions for fc7 26 []), it forms compact and highly robust feature representations for re-identification. We found that learned deep feature representation is very effective and combined metric learning approaches it significantly outperforms state---art techniques. Figure 3 explains training our deep features. 4 EXPERIMENTS We carry out experiments on four challenging datasets: [13], i-lids [49], CUHK01 [22] and CUHK03 [23]. The results are analyzed in terms recognition rate, using cumulative matching characteristic (CMC) [13] curve its rank-1 accuracy. The CMC curve represents expectation finding correct match in p r matches. The curve can be characterized by a scalar value computed by normalizing area under curve referred as nauc value. Section 4.1 describes benchmark datasets used in experiments. We explore our rigid metric model (PML) ger its parameters, including different metric learning approaches (θ) in Section 4.2. The deformable models (HPML and DPML) are discussed in Section 4.3. Finally, in Section 4.4, we compare our performance or state art methods. 4.1 Datasets [13] is one most popular person re-identification datasets. It contains 632 image pairs pedestrians captured by two outdoor cameras. images contain large variations in lighting conditions, background, viewpoint, and image quality (see Fig. 4). Each bounding box is cropped and scaled be pixels. We follow common evaluation procol for

6 JOURNAL OF TCSVT Fig. 4: Sample images from dataset. Top and botm lines correspond images from different cameras. Columns illustrate same person. 6 Fig. 6: Sample images from CUHK01 dataset. Top and botm lines correspond images from different cameras. Columns illustrate same person. detected pedestrians, while training is performed employing both manually cropped and aumatically detected images. We follow a single-shot setting. Fig. : Sample images from i-lids dataset. Top and botm lines correspond images from different cameras. Columns illustrate same person. this database: randomly dividing 632 image pairs in 316 image pairs for training and 316 image pairs for testing. We repeat this procedure times and compute average CMC curves for obtaining reliable statistics. i-lids [49] consists 119 individuals 476 images. This dataset is very challenging since re are many occlusions. Often only p part person is visible and usually re is a significant scale or viewpoint change as well (see Fig. ). We follow evaluation procol [29]: dataset is randomly divided in image pairs used for training and remaining 9 image pairs are used for testing. This procedure is repeated times for obtaining averaged CMC curves. CUHK01 [22] contains 971 persons captured two cameras. For each person, 2 images for each camera are provided. The images in this dataset are better quality and higher resolution than in two previous datasets. Each bounding box is scaled be 1 pixels. The first camera captures side view pedestrians and second camera captures frontal view or back view (see Fig. 6). We follow common evaluation setting: persons are split in 48 for training and 486 for testing. We repeat this procedure times for computing averaged CMC curves. CUHK03 [23] is one largest published person reidentification datasets. It contains 1467 persons, where each person has 4.8 images on average. The dataset provides both manually cropped bounding box images and aumatically detected bounding box images a pedestrian detecr [12]. For evaluation we follow testing procol [23]: identities are randomly divided in non-overlapping training and test sets. The training set consists 1367 persons and test set consists 0 persons. For testing we only use aumatically Training deep features: To learn our deep representation we used two datasets: CUHK03 [23] and PRID11 [17]. From CUHK03 we used 1367 identities that were randomly selected for training []. PRID11 contains 0 individuals appearing in two cameras and additionally it contains 18 identities that appear in first camera but do not reappear in second one, and 49 identities that appear only in second camera, in tal 934 identities. Merging both datasets, we have M = 21 identities and by furr dividing images in a set 8 non-overlapping es CNN is forced perform multi-class identification C = 8 21 = 188 identities. The dimensionality last hidden layer is kept be low (26), which stands for our deep feature representation. The architecture and training parameters are kept same as in []. Unlike [], we do not perform any fine-tuning deep feature representation on test datasets. Instead, we proposed perform Mahalanobis metric learning adjust metric particular camera-pair variations. 4.2 Rigid Patch Metric Learning (PML) In this section, we first compare our rigid model (PML) standard full bounding box approach (BBOX). BBOX is equivalent method presented in [19]. Each bounding box size w h is divided in a grid K = overlapping es size w4 w2 stride w8 w4 resulting in a 3 layout (different layouts are discussed in Section 4.2.2). In this experiment, es are represented by concatenated hisgrams in LAB and HSV color space ger color SIFT (see details on different representations in Section 4.2.3). For full bounding box case, we concatenate extracted feature vecrs in a high dimensional feature vecr. PCA is applied obtain a 62-dimensional feature space (where optimal dimensionality is found by cross-validation). Then, KISS metric [19] is learned in 62-dimensional PCA subspace. For PML, instead learning a metric for concatenated feature vecr, we learn metrics for features. In this way, we avoid undesirable compression. The dimensionality feature vecr is reduced by PCA 3 (also found by cross-validation) and metrics are learned independently for each location. Fig. 7 illustrates comparison on three datasets. It is apparent that PML significantly improves re-identification performance by keeping a higher degrees freedom (3 ) when learning dissimilarity function.

7 JOURNAL OF TCSVT 7 CUHK01 ilids %(0.972) PML 26.%(0.969) BBOX %(0.966) PML 16.41%(0.943) BBOX %(0.9) PML 28.%(0.912) BBOX Fig. 7: Performance comparison Patch based Metric Learning (PML) vs. full bounding box metric learning (BBOX). Rank-1 identification rates as well as nauc values provided in brackets are shown in legend next method name %(0.943) PML m= %(0.9) PML m= %(0.963) PML m= %(0.967) PML m= %(0.972) PML m= 1 2 Fig. 8: Performance comparison w.r.t. M. Using different metrics for different image regions yields better performance. (a) nauc m = 2 m = 6 m = 13 m = (b) clusters Fig. 9: Dividing image regions in several metrics. (a) nau C values w.r.t. a location a learned metric; (b) results for different clusters m Number Patch Metrics As mentioned earlier, our formulation allows θ be learned per location. In practice, re may be insufficient training data for this many degrees freedom. We evaluate two extremes: learning m = independent KISS metrics (one per location) and learning a single KISS metric for all es (m = 1), see Fig. 8. The results indicate that multiple metrics lead significantly better recognition accuracy. To understand variability in learned metrics, we setup following experiment: learn a metric for a particular location k, and n apply this metric compute dissimilarity scores for all or locations. We plot nauc values w.r.t. location learned metric in Fig. 9(a). It is apparent that metrics learned at different locations yield different performances. Surprisingly, higher performance is obtained by metrics learned on es at lower locations in bounding box (corresponding leg regions). We believe that it is due significant images in dataset having dark and cluttered backgrounds in upper regions (see last 3 p images in Fig. 4). Lower parts bounding boxes usually have more coherent background from sidewalks. Additionally, we cluster locations spatially using hierarchical (botm-up), where similarity between regions is computed using nau C values. Fig. 9(b) illustrates results w.r.t. clusters. Next, we learn metrics for each cluster locations. These metrics are n used for computing similarity in corresponding image regions. Recall from Fig. 8 that best performance was achieved m =. In this circumstance, re appears be sufficient data train an independent metric for each location. We test this hyposis by reducing amount training data and evaluating optimal metrics when fewer training examples are available. Fig. illustrates that based approach achieves high performance much faster than full bounding box metric learning. Interestingly, for a small positive pairs (less than 0), a reduced metrics gives better performance. When a common metric is learned for multiple locations, amount training data is effectively increased because features from multiple es can be used as examples for learning same metric (Section 3.1) Patch layout Our model consists a set rectangular es extracted on a grid layout. The size and grid density (determined by a stride) define tal es K and a level overlap. To investigate impact se parameters on re-identification performance, we evaluate PML KISS metric for different layouts. Fig. 11(a) shows results for different K defined by different sizes and different

8 JOURNAL OF TCSVT 8 Rank-1 re-identification rate BBOX PML m=1 PML m=2 PML m=6 PML m=13 PML m= Number positive pairs Fig. : Rank-1 recognition rate varying size training dataset. strides (e.g. K = is a result size w 4 w 2 stride w 8 w 4 and for K = 8 stride dimensions are equal size, which corresponds a configuration non-overlapping es). From Fig. 11(a) we can notice that having small es (e.g. for K = 1 size w 4 w 4 ) might slightly decrease performance, and in general keeping es larger (see K = 39 and K = ) yields better recognition accuracy. The results also indicate that overlapping es (when stride is smaller than size) perform significantly better than non-overlapping es (K = 8 and K = in Fig. 11(a) correspond layouts non-overlapping es). Similarly, using overlapping deep features yields better performance compared non-overlapping es (see Fig. 11(b)). As a result, in furr performance evaluations we select layouts that consist overlapping es for both handcrafted features as well as deep features. For handcrafted features we select K = best performing configuration in Fig. 11(a). For deep es we also select a configuration overlapping es but K = 39 match size used during deep training (see Section 3.4). Notice that we train deep feature using non-overlapping es, which we found perform slightly better Patch representation It is common practice in person re-identification combine color and texture descriprs for describing an image. We evaluated performance different combinations representations, including Lab, RGB and HSV hisgrams, each bins per channel. Texture information was captured by color SIFT, which is SIFT descripr extracted for each Lab channel and n concatenated. In our previous work [2], we selected combination Lab, HSV and color SIFT as best descripr. The dimensionality concatenated HSV, Lab and color SIFT is 64 ( = 64). In this work, instead handcrafting representation, we propose learn features directly from data CNNs through multiclass identification task (Sec. 3.4). As a result, each deep is represented by 26-dimensional feature vecr (fc7) and we reduce its dimensionality by PCA before running KISS metric learning. Fig. 12(a) illustrates averaged CMC curves for data set. It is clear that proposed deep CUHK01 ilids METHOD H-C CNN H-C CNN H-C CNN PML, KISS PML, XQDA PML, LSSL TABLE 1: Performance comparison different metric learning approaches using handcrafted features (HSV+Lab+ColorSIFT) denoted by H-C and deep es CNN. CMC rank-1 accuracies are reported. representation (CNN) outperforms all handcrafted representations by a large margin. When learning representation, we propose force CNN recognize not only person identity but also location. To evaluate effectiveness our approach we also trained CNN for es, while neglecting locations (CNN (-k)). Fig. 12(b) illustrates that including information on locations allows us learn more effective features. The CNNs learned locations perform significantly better for both L2 and KISS metric learning Patch metric learning In this section we evaluate our PML model, while employing previously discussed metric learning approaches: KISS metric learning [19], XQDA metric learning [2] and LSSL metric learning [42]. We investigate performance while employing both handcrafted features (HSV+Lab+ColorSIFT) and deep features. As i-lids dataset contains a relatively small training samples (only subjects available for training) and as indicated in our previous analysis (Section 4.2.1), we learn a single θ for all es, thus increasing amount training examples (m = 1). From Fig. 13 it is apparent, that deep features significantly improve recognition accuracy on all datasets. It is also clear that LSSL metric learning consistently achieves best performance among all metric learning approaches. Surprisingly, XQDA ten performs worse than standard KISS metric learning, especially when using deep es. This discrepancy might be due fact that deep es are already highly discriminative and applying additional discriminative objectives ( Generalized Rayleigh Quotient) may decrease performance. Table 1 summarizes results. Additionally we evaluated performance CNN model while learning on whole images (equivalent JSTL model []) combined metric learning approaches. Recall that we only use CUHK03 and PRID11 datasets for training deep features. Each image is represented by 26-dimensional feature vecr and we reduce its dimensionality 0 by PCA before running KISS and LSSL metric learning (found by cross-validation). From Fig. 14 it is apparent that metric learning improves recognition accuracy global deep features (compare JSTL, L2). However, it is also clear best performance JSTL combined metric learning is far behind proposed based learning combined our deep representation. 4.3 Deformable Patch Metric Learning Unsupervised Deformable Model (HPML) Fig. 1(a) illustrates impact our unsupervised deformable model on recognition accuracy. We also compare effectiveness different neighborhoods on overall accuracy. In Eq. (12), we

9 JOURNAL OF TCSVT %(0.963) PML, K=8, h4 x w2, stride h4 x w %(0.971) PML, K=21, h4 x w2, stride h8 x w %(0.9) PML, K=39, h4 x w2, stride h16 x w %(0.962) PML, K=, w4 x w2, stride w4 x w2 33.4%(0.972) PML, K=, w4 x w2, stride w8 x w4 29.%(0.967) PML, K=1, w4 x w4, stride w8 x w %(0.972) PML, K=1, w4 x w2, stride w8 x w16 0.3%(0.971) PML, K=8, h4 x w2, stride h4 x w %(0.979) PML, K=39, h4 x w2, stride h16 x w4 1 2 (a) HSV+Lab+ColorSIFT 1 2 (b) CNN Fig. 11: Performance comparison on dataset w.r.t. different layouts; (a) using handcrafted features HSV+Lab+ColorSIFT; (b) using deep es CNN. Overlapping es yield better performance %(0.979) CNN 33.4%(0.972) HSV+Lab+ColorSIFT 27.12%(0.961) HSV+Lab 18.73%(0.93) HSV 18.93%(0.94) Lab.00%(0.887) RGB 11.14%(0.874) ColorSIFT 1 2 (a) %(0.979) CNN, KISS 3.%(0.969) CNN (-k), KISS 21.%(0.817) CNN, L %(0.781) CNN (-k), L2 1 2 (b) Fig. 12: Performance comparison different descriprs for dataset; (a) best performance is achieved by our deep representation (CNN); (b) forcing CNN determine locations along person identities increases effectiveness deep features: CNN (-k) corresponds CNN trained out locations, and CNN was learned locations. constrain displacement es δ horizontal δ vertical pixels. Interestingly, allowing es move vertically (δ vertical > 0) generally decreases performance. We believe that this is due fact that images in all se datasets were annotated manually and vertical alignment (from head feet) people in se images is usually correct. Allowing es move horizontally consistently improves performance for all datasets. The highest gain in accuracy is obtained on ilids dataset (3%), which contains inaccurate detections and large amount occlusions. This indicates that our linear assignment approach provides a reliable solution for pose changes Supervised Deformable model (DPML) We simplify Eq. 14 by restricting unique spring constants. Two parameters α 1, α 2 are assigned locations obtained by hierarchical clusters m = 2 (see Fig. 1(b)). α k encodes rigidity es at particular locations. We perform an exhaustive grid search iterating through α 1 and α 2 while maximizing Rank-1 recognition rate. Fig 1(b) illustrates recognition rate map as a function both coefficients. Interestingly, rigidity (high spring constants) is useful for lower es ( dark red region in left-botm corner map) but not so for es in upper locations bounding box. This might be related fact that metrics learned on lower locations have higher performance (compare nauc values in Fig. 9). Fig. 16 illustrates performance comparison different integration functions Z. We employ LSSL metric learning ger deep features. The results clearly show that introducing deformable models consistently improves recognition accuracy in all datasets and that best performance is obtained by supervised spring model DPML.

10 JOURNAL OF TCSVT 0 CUHK01 0 ilids %(0.983) PML, LSSL 43.24%(0.979) PML, KISS 37.67%(0.94) PML, XQDA 21.%(0.817) PML, L2 1 2 (a) CNN %(0.99) PML, LSSL 61.0%(0.987) PML, KISS 49.%(0.9) PML, XQDA 11.72%(0.87) PML, L2 1 2 (b) CNN %(0.9) PML, LSSL 74.83%(0.9) PML, KISS 73.22%(0.91) PML, XQDA.2%(0.943) PML, L2 1 2 (c) CNN 0 CUHK01 0 ilids %(0.977) PML, LSSL 33.4%(0.972) PML, KISS 29.3%(0.96) PML, XQDA 14.48%(0.837) PML, L2 1 2 (d) HSV+Lab+ColorSIFT %(0.978) PML, LSSL 37.49%(0.966) PML, KISS 39.%(0.967) PML, XQDA 1.91%(0.867) PML, L2 1 2 (e) HSV+Lab+ColorSIFT 0.76%(0.91) PML, LSSL 4.49%(0.9) PML, KISS 7.%(0.943) PML, XQDA 38.14%(0.86) PML, L2 1 2 (f) HSV+Lab+ColorSIFT Fig. 13: Performance comparison different metric learning approaches using deep es CNN p row and handcrafted features - HSV+Lab+ColorSIFT botm row. LSSL metric learning performs best among all metric learning techniques %(0.983) PML, LSSL 23.3%(0.947) JSTL, LSSL 18.9%(0.936) JSTL, KISS 2.24%(0.937) JSTL, XQDA.02%(0.4) JSTL, L2 1 2 Fig. 14: Performance comparison global deep features (JSTL model) combined metric learning approaches vs. Patchbased Metric Learning (PML) based on proposed deep features. Deep features significantly outperform global deep features. Computational complexity Although rigid model (PML) does not perform as good as deformable models, it is less computationally expensive. It requires only K similarities be computed compare two images. However, although HPML requires solving Hungarian algorithm (Eq. (12)), in practice matrix K K (see Fig. 2) can be relatively sparse (compare performance different neighborhoods in Fig. 1(a)). Given τ non-infinite entries in this matrix, we employed QuickMatch algorithm [28] that runs in linear time O(τ). As a result, deep texture feature extraction is slowest part and it depends on GPU architecture (e.g. on Tesla K experiment takes 3s, 3s spent on deep feature extraction). DPML is slowest model and same experiment takes around min. 4.4 Comparison Or Methods Table. 2 reports performance comparison our based methods state---art approaches across 4 datasets:, CUHK01, ilids and CUHK03-detected. Our methods outperform all state---art techniques on all datasets. The maximum improvement is achieved on ilids dataset. We improve state---art rank-1 accuracy (64.6%) by almost 18% (82.2%). This dataset contains a relatively small training samples (we use only subjects for training). Driven by our previous analysis (Section 4.2.1), we learn a single θ for all es, thus increasing training set. As a result, PML, HPML and DPML significantly outperform state---art approaches. There are three aspects that make our approach more effective on ilids: (1) we are able generate a significantly larger training set using m = 1, (2) occlusions in images pollute only a few scores in our similarity measure, while in case full-image based metric learning y might have a global impact on final dissimilarity measure, (3) misaligned features can be corrected by our deformable models. Notice, that our simplest aggregation technique (PML) already achieves very competitive results. This highlights effectiveness combining driven LSSL metric learning deep representation.

11 images in figure 6), while botm parts usually have more clusters). Next, we learn mfigure particular metrics. by fact as well as met-this8:might TODO 4 24 formance isfull-image achieved. be learning. explained values. illustrates results w.r.t.(botm-up), 468 than based metric Interestingly, for a Figure locations using hierarchical 473 coherent background coming from sidewalk. rics using es belonging particular clusters. These that lower m, more es are used for learning clusters (m corresponds metrics 469 positive pairs, lower m, better per-21 where similarity between is computed using nau C small 474 or regions metrics are n for computing similarity in corfurr, study metricsnext, m, weused cluster.1. Learning deformation cost particular metrics. as values. well as TCSVT Figure clusters). we learn m metof 11 4 JOURNAL is achieved. This might be explained 47 illustrates results w.r.t. As than 22 responding image regions. shown informance figure based 4 best full-image metric learning. Interestingly, forbya fact using hierarchical (botm-up), 473 locations 27 rics using es belonging particular clusters. These 471 decrease 2 that comment. lower m, We more es are used perforparameters learning clusters (m corresponds using metrics performance is achieved m small =. TODO: positive pairs, lower m, better where between regions is computed nau C 474 similarity 28 metrics are nimpact used for computing similarity inby cor.1. particular Learning deformation cost hungarian algorithm (, ) that are assigned locations obta metrics. as well as clusters). Next, we learn m met24 This might be explained by fact 4 values. Figure illustrates resultsin w.r.t responding image regions. As shown figure4 best formance is achieved. 0 12x0 478 erarchical cluste 473 % rics using es belonging particular clusters. These 2 We parameters 2for parameters decrease lower m, morees are used learning performance clusters (m corresponds.1.4 comment. metrics 476 Performance w.r.t. that amount training data 24x0 by m =. TODO: 3 is achieved (figure, m = 2). TODO 479 0x8 used 474 metrics are n for computing similarity in cor26 ( 1,.1. ) that are assigned locations cost obtained by hiparticular as477 well as 12x8 clusters). Next, we learn m met2metrics. 31 Learning deformation 34% 2 4 We carry out experiments show evolution per47 24x16 27 clusters m =state 2 responding image regions. As shown in figure rics belonging particular clusters. These4 besterarchical using es 32 Comparison art.1.4 Performance w.r.t. amount training data training image 481 formance pairs. In this We decrease parameters 2 parameters 1 476, m = 2). TODO 28 achieved by m =similarity. TODO: comment..1.(figure metrics are performance n used foriscomputing in corlearning deformationbaseline cost - KISSME31%- small pc 482 experiment we were training models on different (, ) that are assigned locations obtained by hi Weimage carry out showin evolution perresponding regions. As shown figure 4 best 0 experiments Comparison 483 positive pairspairs. fromina this training set and testing on 316state Baseline -art KISSME -clusters crossvalidation We.2. decrease parameters 2 parameters erarchical m = 3 2 ( 62 p formance training image performance is achieved by m =. TODO: comment. 29%.1.4 Performance w.r.t. from amount training data -1 testing set. Figure that mxqda[17] ( 81illustrates, 2Baseline ) that are- KISSME assigned locations obtained by[24] hi(figure, = 2). TODO small pc Kernel-based experiment we were 484 training pairs models ona different based approach achieves high performance much quicker Ours p, PCA, metric learning, pooling -2 28% erarchical clusters m = We positive pairs from a training set and testing on316 Baseline - KISSME - crossvalidation ( 62 pc) 32 4 carry out experiments show evolution per.1.4 Performance w.r.t. amount training data state art,.2. m =Comparison 2). TODO [24] pairs from -3a testing set. Figure 8 illustrates that -In (figure XQDA[17] Kernel-based formance training image pairs. this 27% CUHK01 ilids approachwe achieves high performance much quicker Ours -Baseline p, PCA, metric learning, pooling 34 We carrybased out experiments show evolution per482 KISSME small pc experiment were training models on different (a) HPML (b) DPML.2. Comparison state art 3 formance training image pairs. In this 483 positive pairs from a training set and testing on 316 Baseline - KISSME - crossvalidation ( 62 pc) 1: model (a) HPML comparison different allowable neighborhoods - KISSME - small[24] pc (horizontal vertical) when 36 experiment we Deformable were models on different 484 Fig. pairs from training a testing set.parameters: Figure 8 illustrates that XQDA[17] Kernel-based Baseline applying for matching es; (b) DPMLBaseline exhaustive grid search over α1 and α2 coefficients 48 positive pairshungarian from a algorithm training andperformance testing on much KISSME crossvalidation (pooling 62 pc) for. α1 37 based approach achievesset high quicker Ours - p,- PCA, metric learning, and aα2testing correspond es locations w.r.t. left image.xqda[17] Grid searchkernel-based map illustrates[24] Rank-1 recognition rate as a function 38 pairs from set. Figure 8 illustrates that (α1, α2 ). The white dot optimal 39 based approach achieves highhighlights performance muchoperating quickerpoint. Ours - p, PCA, metric learning, pooling 0 Gain rank-1 accuracy(%) CUHK01 ilids %(0.986) DPML, LSSL 48.29%(0.983) HPML, LSSL 47.01%(0.983) PML, LSSL %(0.997) DPML, LSSL 72.84%(0.996) HPML, LSSL 71.34%(0.99) PML, LSSL %(0.9) DPML, LSSL 81.36%(0.974) HPML, LSSL 78.31%(0.9) PML, LSSL Fig. 16: Performance comparison Patch based Metric Learning (PML) our deformable models: unsupervised HPML and supervised DPML. Rank-1 identification rates as well as nau C values provided in brackets are shown in legend next method name. CUHK01 ilids CUHK03 DPML-CNN HPML-CNN PML-CNN DPML [2] PML [2] esdc [46] SDALF [11] TL [31] Dropout [] KISSME [19] LOMO+XQDA [2] Mirror [6] Ensembles [29] MidLevel [47] kldfa [41] DeepNN [1] Null Space [44] Null Space (fusion) [44] Triplet Loss [7] Gaussian+XQDA [27] Joint CNN [37] Sample-Specific SVM [4] M ETHOD TABLE 2: Performance comparison on, CUHK01, ilids and CUHK03-detected; CMC rank-1 accuracies are reported. The best scores are shown in red. The second best scores are highlighted in blue. Our approach significantly outperforms best state art approaches. S UMMARY Re-identification must deal appearance differences arising from changes in illumination, viewpoint and a person s pose. Traditional metric learning approaches do not address registration errors and instead only focus on feature vecrs extracted from bounding boxes. In contrast, we propose a -based approach. Operating on es has several advantages: Extracted feature vecrs have lower dimensionality and do not have be subject same levels compression as feature vecrs extracted for entire bounding box. Multiple locations can share same metric, which effectively increase amount training data. We allow es adjust ir locations when comparing two bounding boxes. The idea is similar part-based models used in object detection. As a result, we directly address registration errors while simultaneously evaluating appearance consistency. Learning deep features directly from data and forcing CNN determine also location results in highly effective representation.

12 JOURNAL OF TCSVT Our experiments illustrate how se advantages lead new state art performance on well established, challenging reidentification datasets. R EFERENCES [1] [2] [3] [4] [] [6] [7] [8] [9] [] [11] [12] [13] [14] [1] [16] [17] [18] [19] [] [21] [22] [23] [24] [2] [26] [27] [28] [29] [] [31] [32] E. Ahmed, M. Jones, and T. K. Marks. An improved deep learning architecture for person re-identification. In CVPR, 1. S. Bak and P. Carr. Person re-identification using deformable metric learning. In WACV, 16. S. Bak and P. Carr. One-shot metric learning for person re-identification. In CVPR, 17. S. Bak, E. Corvee, F. Bremond, and M. Thonnat. Person re-identification using spatial covariance regions human body parts. In AVSS,. D. Chen, Z. Yuan, G. Hua, N. Zheng, and J. Wang. Similarity learning on an explicit polynomial kernel feature map for person re-identification. In CVPR, 1. Y.-C. Chen, W.-S. Zheng, and J. Lai. Mirror representation for modeling view-specific transform in person re-identification. In IJCAI, 1. D. Cheng, Y. Gong, S. Zhou, J. Wang, and N. Zheng. Person reidentification by multi-channel parts-based cnn improved triplet loss function. In CVPR, June 16. D. S. Cheng, M. Cristani, M. Sppa, L. Bazzani, and V. Murino. Cusm picrial structures for re-identification. In BMVC, pages , 11. J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon. Informationoretic metric learning. In ICML, 07. M. Dikmen, E. Akbas, T. S. Huang, and N. Ahuja. Pedestrian recognition a learned metric. In ACCV, pages 01 12,. M. Farenzena, L. Bazzani, A. Perina, V. Murino, and M. Cristani. Person re-identification by symmetry-driven accumulation local features. In CVPR,. P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection discriminatively trained part based models. TPAMI,. D. Gray, S. Brennan, and H. Tao. Evaluating Appearance Models for Recognition, Reacquisition, and Tracking. PETS, 07. D. Gray and H. Tao. Viewpoint invariant pedestrian recognition an ensemble localized features. In ECCV, 08. M. Guillaumin, J. Verbeek, and C. Schmid. Is that you? metric learning approaches for face identification. In ICCV, 09. M. Guillaumin, J. Verbeek, and C. Schmid. Multiple instance metric learning from aumatically labeled bags faces. In ECCV,. M. Hirzer, C. Beleznai, P. M. Roth, and H. Bisch. Person reidentification by descriptive and discriminative classification. In SCIA, pages 91 2, 11. T. Joachims, T. Finley, and C.-N. Yu. Cutting-plane training structural svms. Machine Learning, 09. M. Koestinger, M. Hirzer, P. Wohlhart, P. M. Roth, and H. Bisch. Large scale metric learning from equivalence constraints. In CVPR, 12. H. W. Kuhn. The hungarian method for assignment problem. Naval Research Logistics Quarterly, 2(1-2), 19. I. Kviatkovsky, A. Adam, and E. Rivlin. Color invariants for person reidentification. TPAMI, 13. W. Li, R. Zhao, and X. Wang. Human reidentification transferred metric learning. In ACCV, 12. W. Li, R. Zhao, T. Xiao, and X. Wang. Deepreid: Deep filter pairing neural network for person re-identification. In CVPR, 14. Z. Li, S. Chang, F. Liang, T. Huang, L. Cao, and J. Smith. Learning locally-adaptive decision functions for person verification. In CVPR, 13. S. Liao, Y. Hu, X. Zhu, and S. Z. Li. Person re-identification by local maximal occurrence representation and metric learning. In CVPR, 1. N. Martinel, C. Micheloni, and G. Foresti. Saliency weighted features for person re-identification. In ECCV Workshops, 14. T. Matsukawa, T. Okabe, E. Suzuki, and Y. Sa. Hierarchical gaussian descripr for person re-identification. In CVPR, 16. J. B. Orlin and Y. Lee. QuickMatch: A very fast algorithm for Assignment Problem. Technical Report WP , Massachusetts Institute Technology, S. Paisitkriangkrai, C. Shen, and A. van den Hengel. Learning rank in person re-identification metric ensembles. In CVPR, 1. S. Pedagadi, J. Orwell, S. A. Velastin, and B. A. Boghossian. Local fisher discriminant analysis for pedestrian re-identification. In CVPR, 13. P. Peng, T. Xiang, Y. Wang, M. Pontil, S. Gong, T. Huang, and Y. Tian. Unsupervised cross-dataset transfer learning for person re-identification. In CVPR, June 16. B. Prosser, W.-S. Zheng, S. Gong, and T. Xiang. Person re-identification by support vecr ranking. In BMVC,. 12 [33] Y. Shen, W. Lin, J. Yan, M. Xu, J. Wu, and J. Wang. Person reidentification correspondence structure learning. In ICCV, 1. [34] H. Sheng, Y. Huang, Y. Zheng, J. Chen, and Z. Xiong. Person reidentification via learning visual similarity on corresponding pairs. In KSEM, 1. [3] H. Shi, Y. Yang, X. Zhu, S. Liao, Z. Lei, W. Zheng, and S. Z. Li. Embedding deep metric for person re-identification: A study against large variations. In B. Leibe, J. Matas, N. Sebe, and M. Welling, edirs, ECCV, 16. [36] Y. Sun, X. Wang, and X. Tang. Deep learning face representation from predicting,000 classes. In CVPR, June 14. [37] F. Wang, W. Zuo, L. Lin, D. Zhang, and L. Zhang. Joint learning single-image and cross-image representations for person re-identification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 16. [38] G. Wang, L. Lin, S. Ding, Y. Li, and Q. Wang. DARI: distance metric and representation integration for person verification. In AAAI, 16. [39] K. Q. Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. In NIPS, 06. [] T. Xiao, H. Li, W. Ouyang, and X. Wang. Learning deep feature representations domain guided dropout for person re-identification. In CVPR, 16. [41] F. Xiong, M. Gou, O. Camps, and M. Sznaier. Person re-identification using kernel-based metric learning methods. In ECCV, 14. [42] Y. Yang, S. Liao, Z. Lei, and S. Z. Li. Large scale similarity learning using similar pairs for person verification. In AAAI, 16. [43] Y. Yang and D. Ramanan. Articulated human detection flexible mixtures parts. TPAMI, 13. [44] L. Zhang, T. Xiang, and S. Gong. Learning a discriminative null space for person re-identification. In CVPR, 16. [4] Y. Zhang, B. Li, H. Lu, A. Irie, and X. Ruan. Sample-specific svm learning for person re-identification. In CVPR, 16. [46] R. Zhao, W. Ouyang, and X. Wang. Unsupervised salience learning for person re-identification. In CVPR, 13. [47] R. Zhao, W. Ouyang, and X. Wang. Learning mid-level filters for person re-identification. In CVPR, 14. [48] L. Zheng, Z. Bie, Y. Sun, J. Wang, C. Su, S. Wang, and Q. Tian. MARS: A video benchmark for large-scale person re-identification. In ECCV, 16. [49] W.-S. Zheng, S. Gong, and T. Xiang. Associating groups people. In BMVC, 09. [0] W.-S. Zheng, S. Gong, and T. Xiang. Person re-identification by probabilistic relative distance comparison. In CVPR, 11. [1] W.-S. Zheng, X. Li, T. Xiang, S. Liao, J. Lai, and S. Gong. Partial person re-identification. In ICCV, 1. Sławomir Bak is an Associate Research Scientist at Disney Research Pittsburgh. His research focuses on recognition (person re-identification), visual tracking, Riemannian manifolds and metric learning. Before joining Disney Research, he spent several years as a Postdocral Fellow and PhD student at INRIA Sophia Antipolis. He received his PhD from INRIA, University Nice in 12 for a sis on person re-identification. Sławomir obtained his Master s Degree in Computer Science (Intelligent Decision Support Systems) from Poznan University Technology in 08. Peter Carr is a Research Scientist at Disney Research, Pittsburgh. His research interests lie at intersection computer vision, machine learning and robotics. Peter joined Disney Research in as a Postdocral Researcher. Prior Disney, Peter received his PhD from Australian National University in, under supervision Pr. Richard Hartley. Peter received a Master s Degree in Physics from Centre for Vision Research at York University in Toron, Canada, and a Bachelor s Applied Science (Engineering Physics) from Queen s University in Kingsn, Canada.

Face detection and recognition. Detection Recognition Sally

Face detection and recognition. Detection Recognition Sally Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy BSB663 Image Processing Pinar Duygulu Slides are adapted from Selim Aksoy Image matching Image matching is a fundamental aspect of many problems in computer vision. Object or scene recognition Solving

More information

Dimension Reduction CS534

Dimension Reduction CS534 Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of

More information

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped

More information

Local Feature Detectors

Local Feature Detectors Local Feature Detectors Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Slides adapted from Cordelia Schmid and David Lowe, CVPR 2003 Tutorial, Matthew Brown,

More information

Bilevel Sparse Coding

Bilevel Sparse Coding Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional

More information

Robotics Programming Laboratory

Robotics Programming Laboratory Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Chapter 3 Image Registration. Chapter 3 Image Registration

Chapter 3 Image Registration. Chapter 3 Image Registration Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation

More information

Person Re-Identification by Efficient Impostor-based Metric Learning

Person Re-Identification by Efficient Impostor-based Metric Learning Person Re-Identification by Efficient Impostor-based Metric Learning Martin Hirzer, Peter M. Roth and Horst Bischof Institute for Computer Graphics and Vision Graz University of Technology {hirzer,pmroth,bischof}@icg.tugraz.at

More information

Inception Network Overview. David White CS793

Inception Network Overview. David White CS793 Inception Network Overview David White CS793 So, Leonardo DiCaprio dreams about dreaming... https://m.media-amazon.com/images/m/mv5bmjaxmzy3njcxnf5bml5banbnxkftztcwnti5otm0mw@@._v1_sy1000_cr0,0,675,1 000_AL_.jpg

More information

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 Vector visual representation Fixed-size image representation High-dim (100 100,000) Generic, unsupervised: BoW,

More information

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning Justin Chen Stanford University justinkchen@stanford.edu Abstract This paper focuses on experimenting with

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Atul Kanaujia, CBIM, Rutgers Cristian Sminchisescu, TTI-C Dimitris Metaxas,CBIM, Rutgers 3D Human Pose Inference Difficulties Towards

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

CHAPTER 5 GLOBAL AND LOCAL FEATURES FOR FACE RECOGNITION

CHAPTER 5 GLOBAL AND LOCAL FEATURES FOR FACE RECOGNITION 122 CHAPTER 5 GLOBAL AND LOCAL FEATURES FOR FACE RECOGNITION 5.1 INTRODUCTION Face recognition, means checking for the presence of a face from a database that contains many faces and could be performed

More information

Metric learning approaches! for image annotation! and face recognition!

Metric learning approaches! for image annotation! and face recognition! Metric learning approaches! for image annotation! and face recognition! Jakob Verbeek" LEAR Team, INRIA Grenoble, France! Joint work with :"!Matthieu Guillaumin"!!Thomas Mensink"!!!Cordelia Schmid! 1 2

More information

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT SIFT: Scale Invariant Feature Transform; transform image

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

Person re-identification employing 3D scene information

Person re-identification employing 3D scene information Person re-identification employing 3D scene information Sławomir Bak, Francois Brémond INRIA Sophia Antipolis, STARS team, 2004, route des Lucioles, BP93 06902 Sophia Antipolis Cedex - France Abstract.

More information

Linear Discriminant Analysis for 3D Face Recognition System

Linear Discriminant Analysis for 3D Face Recognition System Linear Discriminant Analysis for 3D Face Recognition System 3.1 Introduction Face recognition and verification have been at the top of the research agenda of the computer vision community in recent times.

More information

Combining PGMs and Discriminative Models for Upper Body Pose Detection

Combining PGMs and Discriminative Models for Upper Body Pose Detection Combining PGMs and Discriminative Models for Upper Body Pose Detection Gedas Bertasius May 30, 2014 1 Introduction In this project, I utilized probabilistic graphical models together with discriminative

More information

Learning a Manifold as an Atlas Supplementary Material

Learning a Manifold as an Atlas Supplementary Material Learning a Manifold as an Atlas Supplementary Material Nikolaos Pitelis Chris Russell School of EECS, Queen Mary, University of London [nikolaos.pitelis,chrisr,lourdes]@eecs.qmul.ac.uk Lourdes Agapito

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

CS 231A Computer Vision (Fall 2012) Problem Set 3

CS 231A Computer Vision (Fall 2012) Problem Set 3 CS 231A Computer Vision (Fall 2012) Problem Set 3 Due: Nov. 13 th, 2012 (2:15pm) 1 Probabilistic Recursion for Tracking (20 points) In this problem you will derive a method for tracking a point of interest

More information

Person Re-identification Using Clustering Ensemble Prototypes

Person Re-identification Using Clustering Ensemble Prototypes Person Re-identification Using Clustering Ensemble Prototypes Aparajita Nanda and Pankaj K Sa Department of Computer Science and Engineering National Institute of Technology Rourkela, India Abstract. This

More information

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Hyunghoon Cho and David Wu December 10, 2010 1 Introduction Given its performance in recent years' PASCAL Visual

More information

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking Feature descriptors Alain Pagani Prof. Didier Stricker Computer Vision: Object and People Tracking 1 Overview Previous lectures: Feature extraction Today: Gradiant/edge Points (Kanade-Tomasi + Harris)

More information

CS 664 Segmentation. Daniel Huttenlocher

CS 664 Segmentation. Daniel Huttenlocher CS 664 Segmentation Daniel Huttenlocher Grouping Perceptual Organization Structural relationships between tokens Parallelism, symmetry, alignment Similarity of token properties Often strong psychophysical

More information

Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching

Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching Correlated Correspondences [ASP*04] A Complete Registration System [HAW*08] In this session Advanced Global Matching Some practical

More information

Bumptrees for Efficient Function, Constraint, and Classification Learning

Bumptrees for Efficient Function, Constraint, and Classification Learning umptrees for Efficient Function, Constraint, and Classification Learning Stephen M. Omohundro International Computer Science Institute 1947 Center Street, Suite 600 erkeley, California 94704 Abstract A

More information

LEARNING TO GENERATE CHAIRS WITH CONVOLUTIONAL NEURAL NETWORKS

LEARNING TO GENERATE CHAIRS WITH CONVOLUTIONAL NEURAL NETWORKS LEARNING TO GENERATE CHAIRS WITH CONVOLUTIONAL NEURAL NETWORKS Alexey Dosovitskiy, Jost Tobias Springenberg and Thomas Brox University of Freiburg Presented by: Shreyansh Daftry Visual Learning and Recognition

More information

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University. Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer

More information

Selection of Scale-Invariant Parts for Object Class Recognition

Selection of Scale-Invariant Parts for Object Class Recognition Selection of Scale-Invariant Parts for Object Class Recognition Gy. Dorkó and C. Schmid INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 3833 Montbonnot, France fdorko,schmidg@inrialpes.fr Abstract

More information

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision Object Recognition Using Pictorial Structures Daniel Huttenlocher Computer Science Department Joint work with Pedro Felzenszwalb, MIT AI Lab In This Talk Object recognition in computer vision Brief definition

More information

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Mobile Human Detection Systems based on Sliding Windows Approach-A Review Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg

More information

Face Recognition by Combining Kernel Associative Memory and Gabor Transforms

Face Recognition by Combining Kernel Associative Memory and Gabor Transforms Face Recognition by Combining Kernel Associative Memory and Gabor Transforms Author Zhang, Bai-ling, Leung, Clement, Gao, Yongsheng Published 2006 Conference Title ICPR2006: 18th International Conference

More information

Category vs. instance recognition

Category vs. instance recognition Category vs. instance recognition Category: Find all the people Find all the buildings Often within a single image Often sliding window Instance: Is this face James? Find this specific famous building

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Learning to Recognize Faces in Realistic Conditions

Learning to Recognize Faces in Realistic Conditions 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Mustafa Berkay Yilmaz, Hakan Erdogan, Mustafa Unel Sabanci University, Faculty of Engineering and Natural

More information

Computer Vision for HCI. Topics of This Lecture

Computer Vision for HCI. Topics of This Lecture Computer Vision for HCI Interest Points Topics of This Lecture Local Invariant Features Motivation Requirements, Invariances Keypoint Localization Features from Accelerated Segment Test (FAST) Harris Shi-Tomasi

More information

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon Deep Learning For Video Classification Presented by Natalie Carlebach & Gil Sharon Overview Of Presentation Motivation Challenges of video classification Common datasets 4 different methods presented in

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Weighted Multi-scale Local Binary Pattern Histograms for Face Recognition

Weighted Multi-scale Local Binary Pattern Histograms for Face Recognition Weighted Multi-scale Local Binary Pattern Histograms for Face Recognition Olegs Nikisins Institute of Electronics and Computer Science 14 Dzerbenes Str., Riga, LV1006, Latvia Email: Olegs.Nikisins@edi.lv

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM

CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM 96 CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM Clustering is the process of combining a set of relevant information in the same group. In this process KM algorithm plays

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

Three things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012)

Three things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012) Three things everyone should know to improve object retrieval Relja Arandjelović and Andrew Zisserman (CVPR 2012) University of Oxford 2 nd April 2012 Large scale object retrieval Find all instances of

More information

Object Category Detection. Slides mostly from Derek Hoiem

Object Category Detection. Slides mostly from Derek Hoiem Object Category Detection Slides mostly from Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical template matching with sliding window Part-based Models

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Supervised vs unsupervised clustering

Supervised vs unsupervised clustering Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful

More information

String distance for automatic image classification

String distance for automatic image classification String distance for automatic image classification Nguyen Hong Thinh*, Le Vu Ha*, Barat Cecile** and Ducottet Christophe** *University of Engineering and Technology, Vietnam National University of HaNoi,

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Face Recognition A Deep Learning Approach

Face Recognition A Deep Learning Approach Face Recognition A Deep Learning Approach Lihi Shiloh Tal Perl Deep Learning Seminar 2 Outline What about Cat recognition? Classical face recognition Modern face recognition DeepFace FaceNet Comparison

More information

Facial Expression Recognition Using Non-negative Matrix Factorization

Facial Expression Recognition Using Non-negative Matrix Factorization Facial Expression Recognition Using Non-negative Matrix Factorization Symeon Nikitidis, Anastasios Tefas and Ioannis Pitas Artificial Intelligence & Information Analysis Lab Department of Informatics Aristotle,

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,

More information

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Estimating Human Pose in Images. Navraj Singh December 11, 2009 Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks

More information

arxiv: v2 [cs.cv] 24 May 2016

arxiv: v2 [cs.cv] 24 May 2016 Structured learning of metric ensembles with application to person re-identification Sakrapee Paisitkriangkrai a, Lin Wu a,b, Chunhua Shen a,b,, Anton van den Hengel a,b a School of Computer Science, The

More information

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06 Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,

More information

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation. Equation to LaTeX Abhinav Rastogi, Sevy Harris {arastogi,sharris5}@stanford.edu I. Introduction Copying equations from a pdf file to a LaTeX document can be time consuming because there is no easy way

More information

Classification: Feature Vectors

Classification: Feature Vectors Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12

More information

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops Translation Symmetry Detection: A Repetitive Pattern Analysis Approach Yunliang Cai and George Baciu GAMA Lab, Department of Computing

More information

CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs

CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs Felix Wang fywang2 John Wieting wieting2 Introduction We implement a texture classification algorithm using 2-D Noncausal Hidden

More information

CS 223B Computer Vision Problem Set 3

CS 223B Computer Vision Problem Set 3 CS 223B Computer Vision Problem Set 3 Due: Feb. 22 nd, 2011 1 Probabilistic Recursion for Tracking In this problem you will derive a method for tracking a point of interest through a sequence of images.

More information

CHAPTER 3 PRINCIPAL COMPONENT ANALYSIS AND FISHER LINEAR DISCRIMINANT ANALYSIS

CHAPTER 3 PRINCIPAL COMPONENT ANALYSIS AND FISHER LINEAR DISCRIMINANT ANALYSIS 38 CHAPTER 3 PRINCIPAL COMPONENT ANALYSIS AND FISHER LINEAR DISCRIMINANT ANALYSIS 3.1 PRINCIPAL COMPONENT ANALYSIS (PCA) 3.1.1 Introduction In the previous chapter, a brief literature review on conventional

More information

Su et al. Shape Descriptors - III

Su et al. Shape Descriptors - III Su et al. Shape Descriptors - III Siddhartha Chaudhuri http://www.cse.iitb.ac.in/~cs749 Funkhouser; Feng, Liu, Gong Recap Global A shape descriptor is a set of numbers that describes a shape in a way that

More information

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong) Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong) References: [1] http://homepages.inf.ed.ac.uk/rbf/hipr2/index.htm [2] http://www.cs.wisc.edu/~dyer/cs540/notes/vision.html

More information

A Keypoint Descriptor Inspired by Retinal Computation

A Keypoint Descriptor Inspired by Retinal Computation A Keypoint Descriptor Inspired by Retinal Computation Bongsoo Suh, Sungjoon Choi, Han Lee Stanford University {bssuh,sungjoonchoi,hanlee}@stanford.edu Abstract. The main goal of our project is to implement

More information

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601 Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601 Introduction Face ID is complicated by alterations to an individual s appearance Beard,

More information

Using the Kolmogorov-Smirnov Test for Image Segmentation

Using the Kolmogorov-Smirnov Test for Image Segmentation Using the Kolmogorov-Smirnov Test for Image Segmentation Yong Jae Lee CS395T Computational Statistics Final Project Report May 6th, 2009 I. INTRODUCTION Image segmentation is a fundamental task in computer

More information

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction Preprocessing The goal of pre-processing is to try to reduce unwanted variation in image due to lighting,

More information

Minimizing hallucination in Histogram of Oriented Gradients

Minimizing hallucination in Histogram of Oriented Gradients Minimizing hallucination in Histogram of Oriented Gradients Javier Ortiz Sławomir Bąk Michał Koperski François Brémond INRIA Sophia Antipolis, STARS group 2004, route des Lucioles, BP93 06902 Sophia Antipolis

More information

Classification of objects from Video Data (Group 30)

Classification of objects from Video Data (Group 30) Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time

More information

Selecting Models from Videos for Appearance-Based Face Recognition

Selecting Models from Videos for Appearance-Based Face Recognition Selecting Models from Videos for Appearance-Based Face Recognition Abdenour Hadid and Matti Pietikäinen Machine Vision Group Infotech Oulu and Department of Electrical and Information Engineering P.O.

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

P-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh

P-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh P-CNN: Pose-based CNN Features for Action Recognition Iman Rezazadeh Introduction automatic understanding of dynamic scenes strong variations of people and scenes in motion and appearance Fine-grained

More information

CAP 6412 Advanced Computer Vision

CAP 6412 Advanced Computer Vision CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong April 21st, 2016 Today Administrivia Free parameters in an approach, model, or algorithm? Egocentric videos by Aisha

More information

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints Last week Multi-Frame Structure from Motion: Multi-View Stereo Unknown camera viewpoints Last week PCA Today Recognition Today Recognition Recognition problems What is it? Object detection Who is it? Recognizing

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract

More information

Binary Online Learned Descriptors

Binary Online Learned Descriptors This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI.9/TPAMI.27.267993,

More information

Object Category Detection: Sliding Windows

Object Category Detection: Sliding Windows 04/10/12 Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Robust PDF Table Locator

Robust PDF Table Locator Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule. CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit

More information

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS Kirthiga, M.E-Communication system, PREC, Thanjavur R.Kannan,Assistant professor,prec Abstract: Face Recognition is important

More information

Unsupervised Learning

Unsupervised Learning Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised

More information

MAny surveillance systems require autonomous longterm

MAny surveillance systems require autonomous longterm IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 PRISM: Person Re-Identification via Structured Matching Ziming Zhang, and Venkatesh Saligrama, Member, IEEE arxiv:1406.4444v4 [cs.cv] 8 May

More information

arxiv: v2 [cs.cv] 17 Sep 2017

arxiv: v2 [cs.cv] 17 Sep 2017 Deep Ranking Model by Large daptive Margin Learning for Person Re-identification Jiayun Wang a, Sanping Zhou a, Jinjun Wang a,, Qiqi Hou a arxiv:1707.00409v2 [cs.cv] 17 Sep 2017 a The institute of artificial

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

CS229 Final Project: Predicting Expected Response Times

CS229 Final Project: Predicting Expected  Response Times CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time

More information

Deep Convolutional Neural Network using Triplet of Faces, Deep Ensemble, and Scorelevel Fusion for Face Recognition

Deep Convolutional Neural Network using Triplet of Faces, Deep Ensemble, and Scorelevel Fusion for Face Recognition IEEE 2017 Conference on Computer Vision and Pattern Recognition Deep Convolutional Neural Network using Triplet of Faces, Deep Ensemble, and Scorelevel Fusion for Face Recognition Bong-Nam Kang*, Yonghyun

More information

CRF Based Point Cloud Segmentation Jonathan Nation

CRF Based Point Cloud Segmentation Jonathan Nation CRF Based Point Cloud Segmentation Jonathan Nation jsnation@stanford.edu 1. INTRODUCTION The goal of the project is to use the recently proposed fully connected conditional random field (CRF) model to

More information

Towards direct motion and shape parameter recovery from image sequences. Stephen Benoit. Ph.D. Thesis Presentation September 25, 2003

Towards direct motion and shape parameter recovery from image sequences. Stephen Benoit. Ph.D. Thesis Presentation September 25, 2003 Towards direct motion and shape parameter recovery from image sequences Stephen Benoit Ph.D. Thesis Presentation September 25, 2003 September 25, 2003 Towards direct motion and shape parameter recovery

More information