Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

Similar documents
Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

Image classification by a Two Dimensional Hidden Markov Model

Multiresolution Image Classification by Hierarchical Modeling with Two-Dimensional Hidden Markov Models

Jia Li Department of Statistics The Pennsylvania State University University Park, PA 16802

From Pixels to Semantics Mining Digital Imagery Data for Automatic Linguistic Indexing of Pictures

340 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 3, MARCH Studying Digital Imagery of Ancient Paintings by Mixtures of Stochastic Models

Differential Compression and Optimal Caching Methods for Content-Based Image Search Systems

Resolution 1. Resolution 2. Resolution 3. Model about concept 1. concept 2. Resolution 1. Resolution 2. Resolution 3. Model about concept 2.

Mining Digital Imagery Data for Automatic Linguistic Indexing of Pictures Λ

Textural Features for Image Database Retrieval

Image Similarity Measurements Using Hmok- Simrank

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

COMS 4771 Clustering. Nakul Verma

Det De e t cting abnormal event n s Jaechul Kim

Datasets Size: Effect on Clustering Results

FRACTAL DIMENSION BASED TECHNIQUE FOR DATABASE IMAGE RETRIEVAL

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models

3. Data Structures for Image Analysis L AK S H M O U. E D U

Random projection for non-gaussian mixture models

MR IMAGE SEGMENTATION

An Introduction to Content Based Image Retrieval

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network

Chapter 10. Conclusion Discussion

AN EFFICIENT BATIK IMAGE RETRIEVAL SYSTEM BASED ON COLOR AND TEXTURE FEATURES

Lab 9. Julia Janicki. Introduction

CONTENT BASED IMAGE RETRIEVAL SYSTEM USING IMAGE CLASSIFICATION

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Texture Image Segmentation using FCM

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Rough Feature Selection for CBIR. Outline

Unsupervised Learning : Clustering

Information Retrieval and Web Search Engines

Introduction to Mobile Robotics

Video Key-Frame Extraction using Entropy value as Global and Local Feature

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

Latest development in image feature representation and extraction

IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK

An algorithm for Trajectories Classification

A Study on the Effect of Codebook and CodeVector Size on Image Retrieval Using Vector Quantization

Fabric Image Retrieval Using Combined Feature Set and SVM

Information Retrieval and Web Search Engines

Tag Based Image Search by Social Re-ranking

Exploratory Analysis: Clustering

Content-based Image Retrieval (CBIR)

Probabilistic Graphical Models Part III: Example Applications

Content-Based Image Retrieval of Web Surface Defects with PicSOM

A Miniature-Based Image Retrieval System

ENHANCEMENT OF METICULOUS IMAGE SEARCH BY MARKOVIAN SEMANTIC INDEXING MODEL

Fitting D.A. Forsyth, CS 543

ANALYSIS OF SPIHT ALGORITHM FOR SATELLITE IMAGE COMPRESSION

Web Page Recommender System based on Folksonomy Mining for ITNG 06 Submissions

Lecture 11: Classification

Fingerprint Image Compression

Statistical Techniques in Robotics (STR, S15) Lecture#06 (Wednesday, January 28)

Using Hidden Markov Models to analyse time series data

Handwritten Word Recognition using Conditional Random Fields

CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs

Ranking Error-Correcting Output Codes for Class Retrieval

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

CS299 Detailed Plan. Shawn Tice. February 5, The high-level steps for classifying web pages in Yioop are as follows:

Forensic Image Recognition using a Novel Image Fingerprinting and Hashing Technique

Lecture 3: Conditional Independence - Undirected

Finding Similar Sets. Applications Shingling Minhashing Locality-Sensitive Hashing

Basic Concepts of Reliability

Classification and Detection in Images. D.A. Forsyth

Large Scale Chinese News Categorization. Peng Wang. Joint work with H. Zhang, B. Xu, H.W. Hao

CHAPTER 4 SEMANTIC REGION-BASED IMAGE RETRIEVAL (SRBIR)

CS 534: Computer Vision Segmentation and Perceptual Grouping

Administrative. Machine learning code. Supervised learning (e.g. classification) Machine learning: Unsupervised learning" BANANAS APPLES

Texture Modeling using MRF and Parameters Estimation

Automatic Image Annotation and Retrieval Using Hybrid Approach

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Extraction of Semantic Text Portion Related to Anchor Link

CS 534: Computer Vision Segmentation and Perceptual Grouping

Topic Diversity Method for Image Re-Ranking

Graphical models are a lot like a circuit diagram they are written down to visualize and better understand a problem.

Large-scale Satellite Image Browsing using Automatic Semantic Categorization and Content-based Retrieval

A New Technique to Optimize User s Browsing Session using Data Mining

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

CHAPTER 5 OPTIMAL CLUSTER-BASED RETRIEVAL

FUSION OF MULTITEMPORAL AND MULTIRESOLUTION REMOTE SENSING DATA AND APPLICATION TO NATURAL DISASTERS

AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Introduction to Machine Learning CMU-10701

Application of Improved Lzc Algorithm in the Discrimination of Photo and Text ChengJing Ye 1, a, Donghai Zeng 2,b

A Spatial Point Pattern Analysis to Recognize Fail Bit Patterns in Semiconductor Manufacturing


Structured Learning. Jun Zhu

Enhanced Image Retrieval using Distributed Contrast Model

Building Classifiers using Bayesian Networks

International Journal of Software and Web Sciences (IJSWS) Web service Selection through QoS agent Web service

A Review: Content Base Image Mining Technique for Image Retrieval Using Hybrid Clustering

Using the Kolmogorov-Smirnov Test for Image Segmentation

Inverted Index for Fast Nearest Neighbour

Value Added Association Rules

Color Image Segmentation

Lecture 10: Semantic Segmentation and Clustering

What is this Song About?: Identification of Keywords in Bollywood Lyrics

Text Document Clustering Using DPM with Concept and Feature Analysis

Transcription:

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach Abstract Automatic linguistic indexing of pictures is an important but highly challenging problem for researchers in content-based image retrieval. So far, there have been three categories of technology trends to realize content-based image retrieval system. In this paper, the author introduces a 2D MHMM (2-dimensional Multiresolution Hidden Markov Model) modeling approach to solve this problem. Experiments have demonstrated the good accuracy of the complementation of this approach and its high potential in linguistic indexing of photographic images. Automatic linguistic indexing of pictures is a challenge problem for the content-based image retrieval system. Why? Let s first check what the content-based image retrieval is. 1. Content-based image retrieval definition: Content-based image retrieval (CBIR) is aimed at efficient retrieval of relevant images from large image databases based on automatically derived imagery features. So far there are three categories of CBIR technology. 1.1. Categories of CBIR technology: High level semantics description: That kind of CBIR system is almost divided into two components. One part is to process the semantics information of every image before stored into the image database. The other part is to process user queries. Semantics information is described using ontology. So the whole system is actually an image knowledge base. Projects include The Helsinki University

Museum, MINDSWAP, etc. Big problem: This system doesn t process the physical features of images. So user can t query the image using physical features. All these semantics information are edited manually by builders of this image database. It s a hard work. Low level feature classification: This kind of CBIR system analyzes one image s physical characters and returns images with highest possibilities to users. For sure in some of these systems, semantics information feedback is introduced. But their focus is on physical characters. Semantics information is just the tool to help classify the image more accurate. Projects include QBIC, PicSeek, MARS, Netra etc. Big problem: These systems can t satisfy users semantic query. For example, Give me one image of this dog when it was a puppy. Because these systems don t know what dog is, what puppy is, and such kinds of stuffs. So the third technology developed. Link high level semantics description to low level feature information: That means automatically assigning comprehensive textual description to pictures. This imitates the thinking way of our human beings when we see one image. The technology introduced in this article belongs to this category. 1.2. Problems in automatic annotation to images: Automatic mapping between low level feature and knowledge: This means before one image is stored into the image database, system can conclude the image s physical characters and assign knowledge corresponding to these physical characters to the image (e.g. Who s that guy in the picture? What is he doing?). How does model the semantics content? That means computer can use what kind of domain knowledge to describe this kind of image and how can computer acquire and store this knowledge. In this article, article solved part of the first problem. The best way of concluding the image s physical features is statistical method. Using statistical models, computer can classify images according to it statistical rules. Now we will see what this article s approach is.

2. Approach: Using 2D MHMM (2-dimensional multiresolution Hidden Markov Model) Why does it choose 2D MHMM? HMM is suitable for block-based image classification. Block-based image classification means when classifying one image, firstly the image is divided into blocks. Then block size is the critical point. If block size is big, each block will include more objects. So it s hard to classify it. If block size is small, there will be dependence between blocks. HMM can be used to model dependence information. For HMM, the image s category is the state, and its feature vector is observation symbols for the state. Compared 1-D HMM, 2-D HMM solve the problem of overlocalization. In 2-D HMM, there is a set of superstates. Within each superstate, there is a set of simple Markovian states. Superstates consist rows in 2 dimensions. And simple states are columns corresponding to one superstate. This concept reflects in the figure 1. The state transition probability of A2 depends on A1 and A3. In particular application, this model works better than 1-D HMM Figure 1

In 2-D HMM, global information can be used efficiently. But from the view of computation complexity, it s necessary to increase the size of one block and prevent from including more objects. So for this purpose, the author introduces multiresolution. Lower resolution images include less states than higher resolution images does. Figure 2

2.1. Application architecture The following figure is the application architecture of this technology. Figure 3 2.1.1. Select one category of images to train for one concept: A concept corresponds to a particular category of images. (A concept doesn t just correspond to one word. A cluster of words can be considered as a concept.) These images do not have to be visually similar. 2.1.2. Extract features from this category of images: Every picture s pixel is 384 * 256. An image is partitioned into 4 * 4 blocks. For each block, the system extracts a feature vector six dimensions using wavelet transform. 2.1.3. Statistical Modeling: To get a 2-D MHMM, there are several assumptions: 1. Si,j the state of block (i, j), Ui,j the feature vector of block (i, j) Where m = Si-1,j n = Si,j-1 l = Si,j 2. The second assumption is that, given every state, the feature vectors follow a Gaussian distribution.

3. For the MHMM, denote the set of resolutions by, with r = R the finest resolution. Let the collection of block indices at resolution r be 4. In particular, given the states at the parent resolution, the states at the current resolution are conditionally independent of the other preceding resolutions, so that 5. In addition, given its state, a feature vector at any resolution is conditionally independent of any other states and feature vectors. 6. Several independence: The feature vector is conditionally independent of information on other blocks once the state of a block of the feature vector is known. The states of one resolution are conditionally independent of the other preceding resolutions. 7. According to the above assumption, we can get the joint probability of a particular set of states and the feature vector: 8. Also assume that child blocks descended from different parent blocks are conditionally independent (The states of its child blocks are independent of states of their uncle blocks.) But the state transition probabilities depend on the state of their parent block. So compute the transition probabilities in this formula: Where 9. The joint probability of states and feature vectors at all the resolutions in (1)

is then derived as To summarize, a 2D MHMM captures both the inter-scale and intra-scale statistical dependence. The inter-scale dependence is modeled by the Markov chain over resolutions. The intra-scale dependence is modeled by the HMM. 10. This model is trained using EM algorithm. 2.1.4. Automatic Linguistic Indexing of Pictures After getting this model, we can start automatic linguistic indexing of pictures. Use the models of every concept to compute the log probabilities of generating, that is. Sort the log value to find K top ranked categories. (The selection of k is somewhat arbitrary. An adaptive way to decide k is to use categories with likelihoods exceeding a threshold. However, it is found that the range of likelihoods computed from a query image varies greatly depending on the category the image belongs to. A fixed threshold is not useful. When there are a large number of categories in the database, it is observed that choosing a fixed number of top-ranked categories tends to yield relatively robust annotation). After getting K candidate concepts, the author doesn t use these concepts to annotate the image. K maybe is too large for a short description of one image. So they introduce a tricky method to select a subset of words from K concepts. j, k: The word appear j times in k categories. A small probability indicates it is unlikely that the word has appeared simply by chance, and also indicates a high level of significance for this given word.

The advantage: The proposed scheme of choosing words favors rare words. It tends to provide relatively specific or interesting information about the query and avoids using words that fit a large number of image categories. 2.2. Experiment The author conducted experiment on COREL dataset which includes 600 categories and every category has 100 images. So they trained 600 concepts and 40 concepts for each concept. 4,630 test images outside the training set. Manually assign these words to every image category. Complexity of training for each of the 600 categories of images: Training process: 15-40 minutes; Configuration: 800MHz Pentium Ⅲ PC

2.2.1. Accuracy Table 2 Accuracy means the match percentage of 4,630 images. match means the test image annotated by this system is actually included in this category. 2.3. Conclusion and Future work 2.3.1. Conclusion My opinion: This article proposed one approach to tackle part of the first problems in automatic annotation of pictures. It also seems like a classification technology for pictures. Its advantage compared with other low level feature classification technologies is that they link concepts and features in order to establish the concept indexing to make keyword queries a little intelligent. But because it doesn t care the second problems, it still isn t intelligent enough for content-based image retrieval. But for the other query method, which is the user inputs an image, this paper doesn t give us a comparison with other technologies. These are conclusion from this article. You can take a look at them in the paper. Proposed a 2D MHMM modeling approach to solve the problem of automatic linguistic indexing of pictures. The index is the model of one category of pictures. Advantage of this approach: 1. Models for different concepts can be independently trained and retrained. Hence the system has good scalability; 2. Spatial relation among image pixels within and across resolutions is taken into consideration with probabilistic likelihood as a universal measure. Limitation: 1. Train the concept dictionary using only 2D images without a sense of object

size. 2. 40 training images are insufficient for the computer program to build a reliable model for a complex concept. 2.3.2. Future work Improve the indexing speed of the system by using approximation in the likelihood computation. A rule-based system may be used to process the words annotated automatically to eliminate conflicting semantics. 3. Reference: [1] J. Li, R.M. Gray, and R.A. Olshen, Multiresolution Image Classification by Hierarchical Modeling with Two Dimensional Hidden Markov Models, IEEE Trans. Information Theory, vol. 46, no. 5, pp. 1826-41, Aug. 2000. [2] J. Li, A. Najmi, and R.M. Gray, Image Classification by a Two Dimensional Hidden Markov Model, IEEE Trans. Signal Processing, vol. 48, no. 2, pp. 517-33, Feb. 2000. [3] J.Z. Wang, J. Li, and G. Wiederhold, SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 9, pp. 947-963, Sept. 2001.