Hierarchy-based Image Embeddings for Semantic Image Retrieval

Similar documents
Lecture 8: The Traveling Salesman Problem

Additional Remarks on Designing Category-Level Attributes for Discriminative Visual Recognition

Seminar on Algorithms and Data Structures: Multiple-Edge-Fault-Tolerant Approximate Shortest-Path Trees [1]

The Encoding Complexity of Network Coding

Figure 1: Two polygonal loops

Complexity of Disjoint Π-Vertex Deletion for Disconnected Forbidden Subgraphs

Mining Generalised Emerging Patterns

OBJECT DETECTION HYUNG IL KOO

Cluster Analysis. Angela Montanari and Laura Anderlucci

Hamiltonian cycles in bipartite quadrangulations on the torus

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection

9 Distributed Data Management II Caching

Ranking Clustered Data with Pairwise Comparisons

Lecture 2 - Introduction to Polytopes

CS103 Spring 2018 Mathematical Vocabulary

Lecture 6: External Interval Tree (Part II) 3 Making the external interval tree dynamic. 3.1 Dynamizing an underflow structure

Chapter 2 Basic Structure of High-Dimensional Spaces

Mesh Based Interpolative Coding (MBIC)

Additional Remarks on Designing Category-Level Attributes for Discriminative Visual Recognition

Discrete mathematics

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching

Preferred directions for resolving the non-uniqueness of Delaunay triangulations

4 Fractional Dimension of Posets from Trees

Supervised classification of law area in the legal domain

Information Integration of Partially Labeled Data

Pebble Sets in Convex Polygons

7 Distributed Data Management II Caching

Models of distributed computing: port numbering and local algorithms

Prentice Hall Mathematics: Course Correlated to: Ohio Academic Content Standards for Mathematics (Grade 7)

A General Solution to the Silhouette Problem

arxiv: v1 [cs.db] 11 Feb 2017

Binary search trees (chapters )

1 The range query problem

NATURAL LANGUAGE PROCESSING

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

Comparison of FP tree and Apriori Algorithm

1 Minimum Cut Problem

Nearest neighbor classification DSE 220

Overlapping Cluster Planarity.

Reflection in the Chomsky Hierarchy

Getting Ready. Preschool. for. Fun with Dinosaurs. and. Monsters

Lecture 20 : Trees DRAFT

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

CPSC 340: Machine Learning and Data Mining

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees

Presentation of IPC Scheme

Sergei Silvestrov, Christopher Engström. January 29, 2013

Joint Entity Resolution

Benchmarking the UB-tree

Evolutionary tree reconstruction (Chapter 10)

Final Exam 1:15-3:15 pm Thursday, December 13, 2018

Experiments of Image Retrieval Using Weak Attributes

ON SWELL COLORED COMPLETE GRAPHS

arxiv: v1 [cs.cc] 30 Jun 2017

Contents. About this Book...1 Audience... 1 Prerequisites... 1 Conventions... 2

Building a Concept Hierarchy from a Distance Matrix

Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs Supplementary Material

Polyominoes and Polyiamonds as Fundamental Domains for Isohedral Tilings of Crystal Class D 2

Binary Decision Diagrams

Theorem 2.9: nearest addition algorithm

Lecture 5 Finding meaningful clusters in data. 5.1 Kleinberg s axiomatic framework for clustering

Copyright 2000, Kevin Wayne 1

Polyominoes and Polyiamonds as Fundamental Domains for Isohedral Tilings of Crystal Class D 2

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

Horn Formulae. CS124 Course Notes 8 Spring 2018

Trinets encode tree-child and level-2 phylogenetic networks

Search Trees. Undirected graph Directed graph Tree Binary search tree

1 Variations of the Traveling Salesman Problem

NP and computational intractability. Kleinberg and Tardos, chapter 8

doc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008

Project 3 Q&A. Jonathan Krause

Answer each of the following problems. Make sure to show your work. Points D, E, and F are collinear because they lie on the same line in the plane.

An undirected graph is a tree if and only of there is a unique simple path between any 2 of its vertices.

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

CS 231A Computer Vision (Fall 2011) Problem Set 4

Independence and Cycles in Super Line Graphs

Chapter S:II. II. Search Space Representation

Instructor: Padraic Bartlett. Lecture 2: Schreier Diagrams

Alphabet Headbands. (There are 2 templates for each hat. One without the words, and one with the words. Print up the template you desire).

Discrete mathematics II. - Graphs

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

11.9 Connectivity Connected Components. mcs 2015/5/18 1:43 page 419 #427

A SIMPLE APPROXIMATION ALGORITHM FOR NONOVERLAPPING LOCAL ALIGNMENTS (WEIGHTED INDEPENDENT SETS OF AXIS PARALLEL RECTANGLES)

- Content-based Recommendation -

Spatial Localization and Detection. Lecture 8-1

Deep Learning for Face Recognition. Xiaogang Wang Department of Electronic Engineering, The Chinese University of Hong Kong

Three applications of Euler s formula. Chapter 10

Supplementary Materials for DVQA: Understanding Data Visualizations via Question Answering

From the Grade 8, High School, Functions Progression Document, pp. 7-8:

Solving inequalities Expanding brackets and simplifying the result Graphing quadratic functions in simple cases Interpreting real-life graphs, e.g.

Improving One-Shot Learning through Fusing Side Information

Localization in Graphs. Richardson, TX Azriel Rosenfeld. Center for Automation Research. College Park, MD

7.1 Introduction. A (free) tree T is A simple graph such that for every pair of vertices v and w there is a unique path from v to w

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Nearest Neighbor Predictors

Computing optimal total vertex covers for trees

Chapter 2 Notes on Point Set Topology

Selective Search for Object Recognition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

Transcription:

Hierarchy-based Image for Image Retrieval Supplemental Material 1. d G applied to trees is a metric Theorem 1. Let G = (V, E) be a directed acyclic graph whose egdes E V V define a hyponymy relation between the semantic concepts in V. Furthermore, let G have exactly one unique root node root(g) with indegree deg (root(g)) = 0. The lowest common subsumer lcs(u, v) of two concepts u, v V hence always exists. Moreover, let height(u) denote the maximum length of a path from u V to a leaf node, and C V be a set of classes of interest. Then, the semantic dissimilarity d G : C C R between classes given by is a proper metric if d G (u, v) = height(lcs(u, v)) max w V height(w) (a) G is a tree, i.e., all nodes u V \ {root(g)} have indegree deg (u) = 1, and (b) all classes of interest are leaf nodes of the hierarchy, i.e., all u C have outdegree deg + (u) = 0. Proof. For being a proper metric, d G must possess the following properties: (i) Non-negativity: d G (u, v) 0. (ii) Symmetry: d G (u, v) = d G (v, u). (iii) Identity of indiscernibles: d G (u, v) = 0 u = v. (iv) Triangle inequality: d G (u, w) d G (u, v) + d G (v, w). The conditions (i) and (ii) are always satisfied since height : V R is defined as the length of a path, which cannot be negative, and the lowest common subsumer (LCS) of two nodes is independent of the order of arguments. The proof with respect to the remaining properties (iii) and (iv) can be conducted as follows: (b) (iii) Let u, v C be two classes with d G (u, v) = 0. This means that their LCS has height 0 and hence must be a leaf node. Because leaf nodes have, by definition, no further children, u = lcs(u, v) = v. On the other hand, for any class w C, d G (w, w) = 0 because lcs(w, w) = w and w is a leaf node according to (b). (a) (iv) Let u, v, w C be three classes. Due to (a), there exists exactly one unique path from the root of the hierarchy to any node. Hence, lcs(u, v) and lcs(v, w) both lie on the path from root(g) to v and they are, thus, either identical or one is an ancestor of the other. Without loss of generality, we assume that lcs(u, v) is an ancestor of lcs(v, w) and thus lies on the root-paths to u, v, and w. In particular, lcs(u, v) is a subsumer of u and w and, therefore, height(lcs(u, w)) height(lcs(u, v)). In general, it follows that d G (u, w) max{d G (u, v), d G (v, w)} d G (u, v) + d G (v, w), given the non-negativity of d G. (1) Remark regarding the inversion If d G is a metric, all classes u C of interest must necessarily be leaf nodes, since d G (u, u) = 0 height(lcs(u, u)) = height(u) = 0. However, (iv) (a) does not hold in general, since d G may even be a metric for graphs G that are not trees. An example is given in Fig. 1a. Nevertheless, most such graphs violate the triangle inequality, like the example shown in Fig. 1b.

(a) A non-tree hierarchy where d G is a metric. (b) A non-tree hierarchy where d G violates the triangle inequality. Figure 1: Examples for non-tree hierarchies.

2. Further Quantitative Results Figure 2: Hierarchical precision on NAB and ILSVRC 2012. Method CIFAR-100 Plain-11 ResNet-110w PyramidNet NAB from scratch fine-tuned ILSVRC -based 0.2078 0.4870 0.3643 0.0283 0.2141 0.2184 -based + L2 Norm 0.2666 0.5305 0.4621 0.0363 0.2568 0.2900 DeViSE 0.2879 0.5016 0.4131 Center Loss 0.4180 0.4153 0.3029 0.1591 0.1840 0.1285 Label Embedding 0.2747 0.6202 0.5920 0.1271 0.2004 0.2683 (L CORR ) [ours] 0.5660 0.5900 0.6642 0.4249 0.5241 0.3037 (L CORR+CLS ) [ours] 0.5886 0.6107 0.6808 0.4316 0.5764 0.4508 Table 1: Classical mean average precision (map) on all datasets. The best value per column is set in bold font. Obviously, optimizing for a classification criterion only leads to sub-optimal features for image retrieval.

3. Qualitative Results on ILSVRC 2012 Query #11 #21 #31 #41 #51 #61 #71 #81 #91 howler monkey capuchin black bear ibex howler monkey wild boar howler monkey Lesser panda great grey owl great grey owl great grey owl great grey owl madagascar cat great grey owl indri vulture king penguin colobus great grey owl great grey owl great grey owl great grey owl great grey owl coucal kite bald eagle vulture vulture #1 great grey owl fire salamander admiral earth star terrapin admiral coral fungus lycaenid lycaenid cabbage b.fly admiral lycaenid fur coat lion lab coat samoyed brown bear hyena arctic fox brown bear black bear restaurant meat loaf burrito mashed potato pizza orange pizza bakery meat loaf cheeseburger ice lolly ice lolly bubble trifle sunscreen consomme chocolate sauce Figure 3: Comparison of a subset of the top 100 retrieval results using L2-normalized classification-based and our hierarchybased semantic features for 3 exemplary queries on ILSVRC 2012. Image captions specify the ground-truth classes of the images and the border color encodes the semantic similarity of that class to the class of the query image, with dark green being most similar and dark red being most dissimilar.

Image -based (ours) 1. Giant Panda (1.00) 1. Giant Panda (1.00) 2. American Black Bear (0.63) 2. Lesser Panda (0.89) 3. Ice Bear (0.63) 3. Colobus (0.58) 4. Gorilla (0.58) 4. American Black Bear (0.63) 5. Sloth Bear (0.63) 5. Guenon (0.58) 1. Great Grey Owl (1.00) 1. Great Grey Owl (1.00) 2. Sweatshirt (0.16) 2. Kite (0.79) 3. Bonnet (0.16) 3. Bald Eagle (0.79) 4. Guenon (0.42) 4. Vulture (0.79) 5. African Grey (0.63) 5. Ruffed Grouse (0.63) 1. Monarch (1.00) 1. Monarch (1.00) 2. Earthstar (0.26) 2. Cabbage Butterfly (0.84) 3. Coral Fungus (0.26) 3. Admiral (0.84) 4. Stinkhorn (0.26) 4. Sulphur Butterfly (0.84) 5. Admiral (0.84) 5. Lycaenid (0.84) 1. Ice Bear (1.00) 1. Ice Bear (1.00) 2. Arctic Fox (0.63) 2. Brown Bear (0.95) 3. White Wolf (0.63) 3. Sloth Bear (0.95) 4. Samoyed (0.63) 4. Arctic Fox (0.63) 5. Great Pyrenees (0.63) 5. American Black Bear (0.95) 1. Ice Cream (1.00) 1. Ice Cream (1.00) 2. Meat Loaf (0.63) 2. Ice Lolly (0.84) 3. Bakery (0.05) 3. Trifle (0.89) 4. Strawberry (0.32) 4. Chocolate Sauce (0.58) 5. Fig (0.32) 5. Plate (0.79) 1. Cocker Spaniel (1.00) 1. Cocker Spaniel (1.00) 2. Irish Setter (0.84) 2. Sussex Spaniel (0.89) 3. Sussex Spaniel (0.89) 3. Irish Setter (0.84) 4. Australien Terrier (0.79) 4. Welsh Springer Spaniel (0.89) 5. Clumber (0.89) 5. Golden Retriever (0.84) Figure 4: Top 5 classes predicted for several example images by a ResNet-50 trained purely for classification and by our network trained with L CORR+CLS incorporating semantic information. The correct label for each image is underlined and the numbers in parentheses specify the semantic similarity of the predicted class and the correct class. It can be seen that class predictions made based on our hierarchy-based semantic embeddings are much more relevant and consistent.

4. Low-dimensional Figure 5: Hierarchical precision of our method for learning image representations based on class embeddings with varying dimensionality, compared with the usual baselines. As can be seen from the description of our algorithm for computing class embeddings in section 3.2 of the paper, an embedding space with n dimensions is required in general to find an embedding for n classes that reproduces their semantic similarities exactly. This can become problematic in settings with a high number of classes. For such scenarios, we have proposed a method for computing low-dimensional embeddings of arbitrary dimensionality approximating the actual relationships among classes in section 3.3 of the paper. We experimented with this possibility on the NAB dataset, learned from scratch, to see how reducing the number of features affects our algorithm for learning image representations and the semantic retrieval performance. The results in Fig. 5 show that obtaining low-dimensional class embeddings through eigendecomposition is a viable option for settings with a high number of classes. Though the performance is worse than with the full amount of required features, our method still performs better than the competitors with as few as 16 features. Our approach hence also allows obtaining very compact image descriptors, which is important when dealing with huge datasets. Interestingly, the 256-dimensional approximation even gives slightly better results than the full embedding after the first 50 retrieved images. We attribute this to the fact that fewer features leave less room for overfitting, so that slightly lowerdimensional embeddings can generalize better in this scenario.

5. Taxonomy used for CIFAR-100