IPL at ImageCLEF 2017 Concept Detection Task

Similar documents
IPL at CLEF 2016 Medical Task

Application of Deep Learning Neural Network for Classification of TB Lung CT Images Based on Patches

Textured Graph-model of the Lungs for Tuberculosis Type Classification and Drug Resistance Prediction: Participation in ImageCLEF 2017

CPPP/UFMS at ImageCLEF 2014: Robot Vision Task

CIS UDEL Working Notes on ImageCLEF 2015: Compound figure detection task

IPL at CLEF 2013 Medical Retrieval Task

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

Aggregated Color Descriptors for Land Use Classification

NovaSearch on medical ImageCLEF 2013

AAUITEC at ImageCLEF 2015: Compound Figure Separation

BUAA AUDR at ImageCLEF 2012 Photo Annotation Task

Multimodal Medical Image Retrieval based on Latent Topic Modeling

Overview of ImageCLEF Mauricio Villegas (on behalf of all organisers)

Fuzzy based Multiple Dictionary Bag of Words for Image Classification

Beyond Bags of Features

DEMIR at ImageCLEFMed 2012: Inter-modality and Intra-Modality Integrated Combination Retrieval

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition

AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH

IPL at ImageCLEF 2010

Multimodal Information Spaces for Content-based Image Retrieval

3D-CNN and SVM for Multi-Drug Resistance Detection

DUTH at ImageCLEF 2011 Wikipedia Retrieval

Enhanced and Efficient Image Retrieval via Saliency Feature and Visual Attention

ECNU at 2017 ehealth Task 2: Technologically Assisted Reviews in Empirical Medicine

Content Based Image Retrieval Using Color Quantizes, EDBTC and LBP Features

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Video annotation based on adaptive annular spatial partition scheme

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

MICC-UNIFI at ImageCLEF 2013 Scalable Concept Image Annotation

The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System

A MEDICAL X-RAY IMAGE CLASSIFICATION AND RETRIEVAL SYSTEM

Selection of Scale-Invariant Parts for Object Class Recognition

String distance for automatic image classification

Object detection using non-redundant local Binary Patterns

Improving Recognition through Object Sub-categorization

Rushes Video Segmentation Using Semantic Features

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

A REVIEW ON IMAGE RETRIEVAL USING HYPERGRAPH

Part-based and local feature models for generic object recognition

Aggregating Descriptors with Local Gaussian Metrics

CS 231A Computer Vision (Fall 2011) Problem Set 4

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task

AUTOMATIC IMAGE ANNOTATION AND RETRIEVAL USING THE JOINT COMPOSITE DESCRIPTOR.

arxiv: v3 [cs.cv] 3 Oct 2012

Computer Vision. Exercise Session 10 Image Categorization

Supervised Models for Multimodal Image Retrieval based on Visual, Semantic and Geographic Information

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

An Efficient Semantic Image Retrieval based on Color and Texture Features and Data Mining Techniques

Kernels for Visual Words Histograms

SVM classification of moving objects tracked by Kalman filter and Hungarian method

Image matching for individual recognition with SIFT, RANSAC and MCL

ImageCLEF 2011

I2R ImageCLEF Photo Annotation 2009 Working Notes

Morphable Displacement Field Based Image Matching for Face Recognition across Pose

TEXTURE CLASSIFICATION METHODS: A REVIEW

MIL at ImageCLEF 2013: Scalable System for Image Annotation

Multiple-Choice Questionnaire Group C

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach

A Study on Query Expansion with MeSH Terms and Elasticsearch. IMS Unipd at CLEF ehealth Task 3

VK Multimedia Information Systems

Keypoint-based Recognition and Object Search

CENTERED FEATURES FROM INTEGRAL IMAGES

Beyond Bags of features Spatial information & Shape models

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

An Efficient Methodology for Image Rich Information Retrieval

Artistic ideation based on computer vision methods

Automatic Ranking of Images on the Web

Multimedia ImageCLEF 2018 Lifelog Moment Retrieval Task

VIDEO OBJECT SEGMENTATION BY EXTENDED RECURSIVE-SHORTEST-SPANNING-TREE METHOD. Ertem Tuncel and Levent Onural

Speed-up Multi-modal Near Duplicate Image Detection

Fish species recognition from video using SVM classifier

Large-scale visual recognition The bag-of-words representation

Efficient Indexing and Searching Framework for Unstructured Data

A Survey on Multimedia Information Retrieval System

A Rapid Automatic Image Registration Method Based on Improved SIFT

Image retrieval based on region shape similarity

Large scale object/scene recognition

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Appearance-Based Place Recognition Using Whole-Image BRISK for Collaborative MultiRobot Localization

Automatic Image Annotation by Classification Using Mpeg-7 Features

A region-dependent image matching method for image and video annotation

Overview of the medical task of ImageCLEF Alba G. Seco de Herrera Stefano Bromuri Roger Schaer Henning Müller

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Bayesian Approach to Hybrid Image Retrieval

Figure-Ground Segmentation Techniques

Specular 3D Object Tracking by View Generative Learning

Evaluation of GIST descriptors for web scale image search

Light-Weight Spatial Distribution Embedding of Adjacent Features for Image Search

CS 231A Computer Vision (Fall 2012) Problem Set 4

Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track

SIFT, BoW architecture and one-against-all Support vector machine

LIG and LIRIS at TRECVID 2008: High Level Feature Extraction and Collaborative Annotation

Concept detection on medical images using Deep Residual Learning Network

ImgSeek: Capturing User s Intent For Internet Image Search

III. VERVIEW OF THE METHODS

ISSN: , (2015): DOI:

Recognition. Topics that we will try to cover:

THE description and representation of the shape of an object

Transcription:

IPL at ImageCLEF 2017 Concept Detection Task Leonidas Valavanis and Spyridon Stathopoulos Information Processing Laboratory, Department of Informatics, Athens University of Economics and Business, 76 Patission Str, 10434, Athens, Greece valavanisleonidas@gmail.com, spstathop@aueb.com http://ipl.cs.aueb.gr/index_eng.html Abstract. In this paper we present the methods and techniques performed by the IPL Group for the concept detection task of ImageCLEF 2017. A probabilistic k-nearest neighbor approach was used for automatically detecting multiple concepts in medical images. The visual representation of images was based on the well known, bag of visual words and bag-of-colors models. Detection performance was further enhanced by applying late fusion on the results obtained using different image representations. Our best results were ranked 2nd compared with runs under the same conditions. Keywords: probabilistic k-nearest neighbors, image annotation, concept detection, quad-tree bag of colors, bag of visual words 1 Introduction Automatic Image annotation is an important and challenging task within the field of computer vision with applications in several domains. In the medical domain it plays an important role in supporting image search, browsing and organization for clinical diagnosis and treatment. Image retrieval based on semantic information has many advantages and is more robust than using only low-level visual features. In the case of absence of semantic information a typical method to bridge the gap between low level visual features and high level semantics is through the automatic image annotation. This is achieved by applying machine learning techniques to learn a mapping of visual features to textual words. The learned model is then used to assign semantic concepts to new unseen images. The ImageCLEFcaption 2017 task, [3], part of ImageCLEF 2017 [5], consists of 2 subtasks: concept detection and caption prediction. Our group participated in the concept detection subtask. For this task, participating groups were asked to develop systems to identify the presence of relevant bio-medical concepts in medical images. Details of this task can be found in the overview paper [3] and the web page of the contest 1. Our approach to concept detection is based on a Probabilistic 1 http://www.imageclef.org/2017/caption

k-nearest neighbor (PKNN) merging two well known models for image representation, that of the Bag of Visual Words (BoVW), [6] and an improved version of bag of colors (QBoC), [10]. When combined with late fusion, results are further improved, ranking 2nd in best performing runs compared to algorithms that don t rely on external data sources. The following sections, present the image representation methods and our algorithm for concept detection. Finally we report on our results and conclude with possible venues for further research. 2 Visual representation of images Three different visual representation models were used in our experiments: 1. Localized compact features 2. Bag of Visual Words (BoVW) 3. Bag of Colors (BoC) 2.1 Localized compact features Compact visual descriptors have been used extensively in the past years to efficiently represent images in a dataset. For this task, two kinds of visual features were extracted: 1. Color and Edge Directivity Descriptor (CEDD)[1]. 2. Fuzzy Color and Texture Histogram (FCTH)[2]. However, in their original form, these descriptors are extracted globally from an image. In order to include a degree of spatial information, features are extracted over a spatial 4x4 grid. The image is first resized into 256 x 256 pixels and then is split into a 4x4 grid of non-overlaping image blocks. The visual features are then extracted for each block and their corresponding vectors are concatenated to form a single feature vector. The final vector size for the CEDD and FCTH is 4 4 144 = 2, 303 and 4 4 192 = 3, 702 respectively. 2.2 Dense SIFT features and BoVW Inspired from text retrieval, the Bag-of-visual Words (BoVW) approach has shown promising results in the field of image retrieval and classification. Here the BoVW model was implemented using the DenseSIFT visual descriptor. The Dense SIFT algorithm [7], is a variant of the SIFT algorithm, which is equivalent to extracting SIFT [8] from a dense grid of locations at a fixed scale and orientation. The SIFT feature is invariant with respect to many common image deformations, including position, scale, illumination, rotation, and affine transformation. The number of features extracted from local interest points using the Dense SIFT descriptor may vary, depending on the image. In order to have a fixed number of feature dimensions, a visual codebook is created by clustering the

extracted local interest points of a number of sample images, using the k-means clustering algorithm. After experiments, the number of clusters used is 4, 096. Each cluster (visual word), represents a different local pattern which shares similar interest points. The histogram of an image, is created by performing a vector quantization which assigns each key-point to its closest cluster (visual word). 2.3 Quad-Tree Bag-of-Colors Model (QBoC) The QBoC representation was successfully used for representing images in previous works [9, 10]. With the BoC model [12] a color vocabulary is learned from a sub-set of the image collection. This vocabulary is used to extract the color histograms for each image. The BoC model was used for classification of biomedical images in [4] and it was shown that it is combined successfully with the BoW- SIFT model in a late fusion manner. Similarly to the BoW model the main drawback with the BoC is the lack of spatial information. Quad-Tree decomposition sub-divides an image into regions of homogeneous colors. Each time the image is split into four equal size squares and the process continues until we reach a sub-region of size 1 1 pixel (see Figure 1b). In both models the Term Frequency-Inverse Document Frequency (TF-IDF) weights of visual words were calculated and the image vectors were normalized with the L 1 norm. (a) (b) Fig. 1. Representation of image 1743-7075-7-33-5 (a) original image; (b) QBoC image. 3 Probabilistic k-nn concept detection (PKNN) In this section we briefly present our baseline algorithm for automatic concept detection in medical images. The algorithm is divided into two main phases, namely, the visual retrieval step and the annotation step.

In the visual retrieval phase, for a given test image, a sample of the k most visually similar images from the training dataset is retrieved. Several experiments on the validation set helped to determine the optimum value for k (k=100). In the annotation step, the concepts associated to the k retrieved images form the candidate concepts for a test image. The final assigned concepts are determined by a probability score based on the occurrence of concepts in the selected sample. For every distinct concept, w, present in the retrieved training subset, we calculate ConceptScore(w) as: ConceptScore(w) = k P (j) P (w j) (1) where j is an image from the top-k results and P (w j) is calculated by: P (w j) = j=1 count(w, j) W j where count(w, j) is the number of times concept w is found in image j, and W j is the total number of concepts in image j. P (j) is considered uniform for all images and thus it is ignored. The top 6 concepts with the highest ConceptScore are selected for the test image. This number was determined by calculating the average number of concepts per image in the training set (i.e. 5.58). (2) 3.1 Concept scoring with Random walks with Restart (RWR) As an alternative concept scoring method, a Random Walk with Restart (RWR) algorithm [11] was tested. First, from the set of the top k retrieved images an adjacency matrix A of size [c c] is constructed, where c is the number of distinct concepts in the retrieved images. Matrix A defines the graph whose nodes correspond to concepts and edges connect concepts if they are assigned in the same image of the train set. Next, the RWR algorithm is applied to matrix A resulting in a vector r of size [c 1] representing the most probable concepts for the test image. Similarly, the top 6 concepts with the highest r(w) value are selected as the concepts for the test image. 3.2 Late fusion In order to improve results, late fusion was applied to the ranked lists of concepts for each test image. In late fusion, the ranked concept score lists from different visual descriptors are combined. A new score is calculated based on the combsum function: combsum(w) = D ConceptScore d (w) (3) d=1 where D is the number of descriptors to combine.

4 Experiments and Submitted Runs To determine our system s optimal parameters and best visual features we experimented with the validation set provided by the organizers. Trying different visual descriptors and concept detection algorithms, helped us conclude on the submitted runs. Table 1 presents some of the top results obtained from these experiments on the validation set. The PKNN prefix in run id corresponds to experiments using the probabilistic concept detection algorithm and RWR the random walks with restarts algorithm. The LFS prefix corresponds to runs using the late fusion method described in Section 3.2. Table 2 presents the runs submitted to clef and their corresponding results. The same prefixes are also used to describe the individual runs. Run ID Accuracy PKNN CEDD 0.113 PKNN CEDD 4x4 0.114 PKNN FCTH 0.112 PKNN FCTH 4x4 0.120 PKNN GBOC 0.123 PKNN DSIFT 0.137 RWR GBOC 0.132 RWR DSIFT 0.140 LFS PKNN (FCTH 4x4 DSIFT GBOC) 0.147 LFS PKNN (CEDD 4x4 DSIFT GBOC) 0.145 Table 1. Results on the validation set Run ID Accuracy LFS PKNN DSIFT GBOC 0.1436 LFS PKNN CEDD4x4 DSIFT GBOC 0.1418 LFS RWR DSIFT GBOC 0.1417 LFS PKNN FCTH4x4 DSIFT GBOC 0.1415 LFS RWR CEDD4x4 DSIFT GBOC 0.1414 DET LFS RWR FCTH4x4 DSIFT GBOC 0.1394 RWR DSift Top100 L2 SqrtNorm L1Norm 0.1365 PKNN DSift Top100 L2 SqrtNorm L1Norm 0.1364 RWR GBOC Top100 L2 SqrtNorm L1Norm 0.1212 PKNN GBOC Top100 L2 SqrtNorm L1Norm 0.1208 Table 2. Results of the submitted runs with the test set 5 Concluding Remarks In this report, we presented the image concept detection methods used by the IPL Group for the medical concept detection subtask at ImageCLEF 2017. For

our runs, we used a simple Probabilistic k-nearest Neighbor approach. Experiments show that using late fusion on BoVW and QBoC performs best and that the image representation plays an important role in performance. Furthermore, the Random walks with Restarts algorithm seemed to perform slightly less, however, a more systematic research is currently underway for this method. Our best run was ranked 2nd in the top performing runs compared to algorithms that don t rely on any external data sources. This results are encouraging and lead to further research on improving the concept detection algorithm with additional textual meta-data. References 1. Chatzichristofis, S.A., Boutalis, Y.S.: Cedd: Color and edge directivity descriptor: A compact descriptor for image indexing and retrieval. In: ICVS. pp. 312 322 (2008) 2. Chatzichristofis, S.A., Boutalis, Y.S.: Fcth: Fuzzy color and texture histogram - a low level feature for accurate image retrieval. In: WIAMIS. pp. 191 196 (2008) 3. Eickhoff, C., Schwall, I., García Seco de Herrera, A., Müller, H.: Overview of Image- CLEFcaption 2017 - image caption prediction and concept detection for biomedical images. In: CLEF 2017 Labs Working Notes. CEUR Workshop Proceedings, CEUR-WS.org <http://ceur-ws.org>, Dublin, Ireland (September 11-14 2017) 4. García Seco de Herrera, A., Markonis, D., Müller, H.: Bag of colors for biomedical document image classification. In: Medical Content-Based Retrieval for Clinical Decision Support, pp. 110 121. Springer (2013) 5. Ionescu, B., Müller, H., Villegas, M., Arenas, H., Boato, G., Dang-Nguyen, D.T., Dicente Cid, Y., Eickhoff, C., Garcia Seco de Herrera, A., Gurrin, C., Islam, B., Kovalev, V., Liauchuk, V., Mothe, J., Piras, L., Riegler, M., Schwall, I.: Overview of ImageCLEF 2017: Information extraction from images. In: Experimental IR Meets Multilinguality, Multimodality, and Interaction 8th International Conference of the CLEF Association, CLEF 2017. Lecture Notes in Computer Science, vol. 10456. Springer, Dublin, Ireland (September 11-14 2017) 6. Li, F.F., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: CVPR (2). pp. 524 531 (2005) 7. Liu, C., Yuen, J., Torralba, A.: Sift flow: Dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978 994 (May 2011), http://dx.doi.org/10.1109/tpami.2010.147 8. Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2. pp. 1150 1157. ICCV 99, IEEE Computer Society, Washington, DC, USA (1999), http://dl.acm.org/citation.cfm?id=850924.851523 9. Valavanis, L., Stathopoulos, S., Kalamboukis, T.: IPL at CLEF 2016 medical task. In: Working Notes of CLEF 2016 - Conference and Labs of the Evaluation forum, Évora, Portugal, 5-8 September, 2016. pp. 413 420 (2016), http://ceur-ws.org/ Vol-1609/16090413.pdf 10. Valavanis, L., Stathopoulos, S., Kalamboukis, T.: Fusion of bag-of-words models for image classification in the medical domain. In: Advances in Information Retrieval - 39th European Conference on IR Research, ECIR 2017, Aberdeen, UK, April 8-13, 2017, Proceedings. pp. 134 145 (2017), https://doi.org/10.1007/ 978-3-319-56608-5_11

11. Wang, C., Jing, F., Zhang, L., Zhang, H.J.: Image annotation refinement using random walk with restarts. In: Proceedings of the 14th ACM International Conference on Multimedia. pp. 647 650. MM 06, ACM, New York, NY, USA (2006), http://doi.acm.org/10.1145/1180639.1180774 12. Wengert, C., Douze, M., Jégou, H.: Bag-of-colors for improved image search. In: Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28 - December 1, 2011. pp. 1437 1440 (2011), http://doi. acm.org/10.1145/2072298.2072034