Semantic Indexing Of Images Using A Web Ontology Language. Gowri Allampalli-Nagaraj

Size: px

Start display at page:

Download "Semantic Indexing Of Images Using A Web Ontology Language. Gowri Allampalli-Nagaraj"

Julianna Clark
6 years ago
Views:

1 Semantic Indexing Of Images Using A Web Ontology Language Gowri Allampalli-Nagaraj A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science University of Washington 2007 Program Authorized to Offer Degree: Institute of Technology - Tacoma

2 University of Washington Graduate School This is to certify that I have examined this copy of a master s thesis by Gowri Allampalli-Nagaraj and have found that it is complete and satisfactory in all respects, and that any and all revisions required by the final examining committee have been made. Committee Members: Isabelle Bichindaritz George Mobus Date:

3 In presenting this thesis in partial fulfillment of the requirements for a master s degree at the University of Washington, I agree that the Library shall make its copies freely available for inspection. I further agree that extensive copying of this thesis is allowable only for scholarly purposes, consistent with fair use as prescribed in the U.S. Copyright Law. Any other reproduction for any purposes or by any means shall not be allowed without my written permission. Signature Date

4 University Of Washington Abstract Semantic Indexing Of Images Using A Web Ontology Language Gowri Allampalli-Nagaraj Chair of the Supervisory Committee: Professor Isabelle Bichindaritz Computing and Software Systems This paper presents a system implemented to evaluate the retrieval efficiency of images when they are semantically indexed using a combination of a Web Ontology Language and the low level features of the image. Finding a similarity measure algorithm to retrieve images based on the semantic metadata can be very challenging due to diverse image content and inadequate domain specific ontologies describing the content. Existing methods for indexing images are primarily based on text. While this method is widely used due to its simplicity, it is not very efficient as it requires a domain expert and the textual interpretations of image content vary from person to person. In our approach, we leverage sophisticated image processing techniques to extract image content information and associate them to existing domain ontologies developed by experts thereby, bridging the gap between low level features and high level semantics. The work described in this paper shows that a high retrieval accuracy rate is obtained when all the image descriptors are combined with an ontology while building the semantic metadata for indexing images.

5 TABLE OF CONTENTS List Of Figures... iii List Of Tables... iv Chapter Introduction... 1 Chapter Motivation... 3 Chapter Problem Statement... 4 Chapter Background Ontology Image Databases Image Semantic Representation Languages Image Interpretation Software MPEG Distance Measure Chapter Datasets Visible Human Image Data Set: University Of Washington Digital Anatomist Reference Ontology Chapter Preprocessing Tools MySQL Adobe Photo Shop M - Ontomat Annotizer Chapter Preprocessing Methods Selection Of Images From Visible Human Extraction Of UWDA Ontological Terms From The UMLS Database Creation Of UWDA Reference Ontology In DAML (Darpa Agent Mark Up Language) Conversion Of Image Format To JPEG Extracting Image Content And Linking To Domain Ontology Chapter Methods Training And Test Set Extracting Image Content From XML Files Calculating Distance Measure Calculating Combined Distance Measure Creating Distance Matrix i

6 8.6 Calculating Retrieval Accuracy Rate Improving Retrieval Accuracy Rates Chapter Results, Discussion And Analysis Initial Results Increased Training To Test Ratio Combined Descriptors Ensemble Classification Ten Fold Cross Validation Excluding Descriptors Empirical Weight Optimization Chapter Related Work Knowledge Assisted Video Analysis And Object Detection Retrieval of Multimedia Objects By Combining Semantic Information From Visual And Textual Descriptors Chapter Educational Statement Chapter Conclusion Bibliography Appendix A Presentation Slides Appendix B Installation & User Manual Appendix C System Output Appendix D Image Descriptor Files Appendix E DAML Ontology File Appendix F Image Annotation Files ii

7 LIST OF FIGURES Figure Number Page 1: Image of Abdomen from Visible Human Data Set : Image of Thigh from Visible Human Data Set : Screenshot of SQL query used to Extract UWDA terms from UMLS : Screenshot of VDE tool in M-Ontomat Annotizer showing the image feature extraction and annotation process iii

8 LIST OF TABLES Table Number Page 1: Accuracy rate for training set : Accuracy rate for test set : Accuracy rate for 75% images in training set and 25% images in test set : Accuracy rate for 50% images in training set and 50% images in test set : Combined accuracy rate for training Set = 50 % and test Set = 50% : Combined accuracy rate for training set = 75 % and test set = 25% : Accuracy rate for Ensemble Classification for 50% test and 50% training : Accuracy rate for Ensemble Classification for 75% training and 25% training : Accuracy rate for Ten Fold Cross Validation for 75% training and 25 % test : Accuracy rate for Ten Fold Cross Validation for 50% training and 50 % test : Accuracy Rate excluding Contour Shape and Texture Browsing : Accuracy rate excluding Contour Shape descriptor : Accuracy rates for Empirical Weight Optimization iv

9 ACKNOWLEDGEMENTS Special thanks to Professor Isabelle Bichindaritz for all her assistance, guidance and feedback during the course of this thesis. Her involvement was essential in the completion of this thesis. I am also very thankful to Professor George Mobus for all his help and valuable feedback. Thanks to the members of the committee for all their valuable input. v

10 DEDICATION To my husband, family and friends. vi

11 1 Chapter 1 INTRODUCTION With the advances in medical technology over the years we have a large number of digital images like Magnetic Resonance Images (MRI), X-Rays, anatomical and pathological images, etc. Medical research has led to the development of valuable knowledge bases consisting of formal domain ontologies, electronic patient records, statistical medical data and results of various medical studies. Analysis of these images is of utmost importance to study the different aspects of a problem. To analyze the information stored in these images, the concerned doctors / scientists should be able to access the image information easily and effectively [15]. Until lately, medical databases mostly used textual information to store and retrieve images not making potential use of the rich image content present in the digital images. Handling large collections of images is a growing challenge and there has been a lot of research in the area of image retrieval systems to efficiently store and retrieve image collections. The main goal for this thesis work is to aid the ongoing research in the area of semantic indexing of images by evaluating the retrieval effectiveness of image collections when image content information is combined with a formalized ontology to automatically index images by content. Research in this area has raised questions as to whether or not it is possible to develop a semantic indexing system with an efficient rate of image retrieval [34]. The challenge involved is to develop a similarity matching algorithm for analyzing the image content extracted and producing a match. In the system presented here, we use medical anatomical images from the Visible Human [24] data set and the Digital Anatomist [22] formal medical ontology developed for the human anatomical terms. In our approach, we extract various image features like color,

12 2 shape, texture, etc in MPEG-7[35] standard image feature description format and associate them to the related anatomical terms thus building the semantic metadata. An important feature of this system is the similarity matching algorithm developed to calculate the matching between images thereby determining the retrieval accuracy rate for the system. Various experiments based on different approaches for improving the accuracy rates were performed to evaluate the retrieval efficiency of the system. Chapter 2 describes the motivation behind this research. A detailed description of the problem being solved and the background information required to understand this research area are illustrated in Chapters 3 and 4. Chapters 5 and 6 illustrate the dataset and preprocessing tools and resources used to process the data for further analysis. Chapter 7 describes the methods used in pre processing the data. The architecture of the system and the methodology used to solve the research issue is described in Chapter 8. The experimental results, analysis and discussion are described in Chapter 9. Chapter10 describes other related work in this area. Chapter 11 contains the Educational Statement. Finally, Chapter 12 contains the conclusions derived from this implementation.

13 3 Chapter 2 MOTIVATION With the number of digital images increasing rapidly, there is a great need to manage digital image repositories. There is a need to store and retrieve images just like text documents. Advances in the field of medical technologies have encouraged hospitals and medical research centers to use various machines like X-Ray, Magnetic Resonance Imaging (MRI), Scan, etc. The use of such machines has resulted in the production of valuable data in the form of digital images on different diseases, physical structures, various organisms, etc. Analysis of these images is of utmost importance to study the different aspects of a problem. To analyze the information stored in these images, the concerned doctors / scientists should be able to access the image information easily and effectively. By indexing images based on semantic descriptors of low level features, doctors can submit a query like find images with round calcifications [3]. In such a query, calcification is the textual description representing the semantics of the region of interest and the shape round is the textual annotation representing the low-level shape feature. Executing such a query would avoid the retrieval of images with just a round shape or with just the associated text calcification. Another example query can be of this form- find all the images having a blue sky. Such a query would yield images whose semantic descriptor is blue and the corresponding feature representation is the color blue. This kind of semantic annotation for images greatly improves the image classification and query mechanisms. There is a growing need for research in the area of attaching semantics to low level features to improve image retrieval and storage methods [25]. In our implementation, images are indexed based on their semantic content, in order to address the growing need for representing images with meaningful annotations and improve their retrieval efficiency.

14 4 Chapter 3 PROBLEM STATEMENT The number of digital images is growing rapidly, driving the need for the development of efficient tools to browse, retrieve and navigate through these large image collections. As information contained in images is complex, containing different colors, shapes, textures and subject, indexing methods designed for storing and retrieving textual content will not work effectively. There is a need to explicitly capture a sufficient amount of content information as well as application specific semantics by means of a variety of metadata like multimedia indexes, attribute based annotations and intentional descriptions to allow appropriate selection, browsing and retrieval of images from large collections [1]. Potentially, images have many types of attributes that could be used for storage and retrieval. Presence of a particular combination of color, texture or shape features, presence of a specific type of object, depiction of a particular event, presence of individuals / locations, presence of specific emotions or metadata such as who created the image, where and when, etc., are some image attributes that could be used for indexing images. Images can be indexed based on a single attribute or a combination of attributes to improve the efficiency of the image retrieval system. Traditionally images are indexed based on textual annotations. Every image is examined individually and a textual annotation describing the various characteristics of the image is stored along with the image for the purposes of indexing. Given the large number of images being produced, manual annotations tend to be very time consuming and prone to error. Querying images with textual annotations is also not very effective, as images have so much more content in them making it harder to describe the image with plain text [15, 34].

15 5 Another approach to indexing images is to extract the content of images like color, shape and texture and to store the feature representation of such content along with the images for indexing purposes. With this approach of indexing, the images could only be queried on their color, shape and texture but not on the actual subject matter. This approach is not useful in querying images containing a particular subject matter and is said to have many limitations when applied to image databases with a broad content [15]. The most recent approach to indexing images is to use the low level features of the image as semantic descriptors of the image thus bridging the gap between the above two approaches of indexing images. Digital images are composed of pixels arranged in an infinite variety of patterns and, in general, it is difficult to predict the particular pattern that would match the information need. Deciding on the aspects of the image that are appropriate for indexing is very challenging. Interpretation of the semantic content is in itself a challenging task as every interpretation can be different. Such an indexing would greatly improve the querying capability of images as they can be queried for both low level features as well as high level semantics. The feature representation and the semantic descriptors of the image thus obtained are mapped onto domain ontologies in order to classify the images for retrieval purposes. Determining the association between semantic descriptors and ontologies is a difficult task. Having a system which indexes images based on the semantic metadata would be very beneficial to retrieve large collections of images more effectively and efficiently. With this approach, one can leverage and combine the research efforts in the areas of domain ontologies and image processing to build an effective image indexing system.

16 6 Chapter 4 BACKGROUND 4.1 Ontology Ontology is a formal, explicit specification of a shared conceptualization. A conceptualization refers to the abstract model of some phenomenon in the world, identifying the relevant concepts of that phenomenon. Explicit means that the type of concepts is explicitly defined and formal refers to the fact that the ontology can be expressed mathematically. As a result it is machine readable and understandable. In image retrieval applications, ontology allows the description of semantics, establishes a common and shared understanding of a domain and facilitates the implementation of a user oriented vocabulary of terms and their relationship with objects in images [12]. 4.2 Image Databases Image data such as satellite images, medical images and digital pictures are generated in large numbers every day. The World Wide Web itself is a huge repository of images. As a result of the huge volume of image data, the use of multimedia databases is very essential. Multimedia databases store and retrieve images, texts, videos, sounds and data stored on any media. The analysis of such images is very useful for archival and retrieval purposes in fields like medicine, environmental studies, military purposes, etc. Multimedia databases support querying images based on their content. Images can be queried based on the shape of the objects present in the image, colors of the object, textures, volume, spatial relationships, motion, etc.

17 7 4.3 Image Semantic Representation Languages Searching for images by content implies a first step of extracting features from the images, to be able to search these features. Image mining deals with the extraction of this semantic content from a large collection of images. Associating the semantic content with the images is called annotation. Semantic content of images can be stored with images using standard languages. In image annotation different objects of the image are attached with textual and spatial information and stored in a database using a standard representation. Images can be queried effectively by indexing the images along with their semantic content. Metadata is the most important part of data archive and it provides descriptive data about every stored object. Metadata includes indexing information that can be described using a standardized framework to represent an image along with its semantic content. Resource Description Framework (RDF)[20] is used to represent information and to exchange knowledge on the Web. Web Ontology Language (OWL)[20] used to publish and share sets of terms called ontologies, supporting advanced Web search, software agents and knowledge management. The DARPA Agent Markup Language (DAML)[20] is an extension of XML, which provides a rich set of constructs to create ontologies and to markup information so that it is machine readable and understandable. DAML, RDF and OWL are some of the languages that have been developed to represent the semantic content of the images. MPEG-7[35] offers a comprehensive set of audiovisual description tools to create metadata descriptions which will form the basis for applications enabling the needed effective and efficient access to multimedia content.

18 8 4.4 Image Interpretation Software Image analysis software provides the tools for segmentation, feature extraction and statistical analysis of content in images. Segmentation deals with the identification of objects of interest within an image. Feature extraction is extracting information from the images by measuring the number, size, shape or color of objects. 4.5 MPEG 7 MPEG-7[35] is an ISO/IEC standard developed by MPEG (Moving Picture Experts Group). MPEG-7, formally named "Multimedia Content Description Interface", is a standard for describing the multimedia content data that supports some degree of interpretation of the information meaning, which can be passed onto, or accessed by, a device or a computer code. MPEG-7 is not aimed at any one application in particular; rather, the elements that MPEG-7 standardizes support as broad a range of applications as possible. MPEG-7 Visual Description Tools included in the standard consist of basic structures and descriptors that cover the following basic visual features: Color, Texture, Shape and Motion, Localization, and Face recognition. Each category consists of elementary and sophisticated descriptors. In this implementation, we are only using the Color, Texture and Shape descriptors. The following section provides a brief description of the image descriptors used. Dominant Color. This color descriptor is most suitable for representing local (object or image region) features where a small number of colors are enough to characterize the color information in the region of interest. Whole images are also applicable, for example, flag images or color trademark images. Color quantization is used to extract a small number of representing colors in each region/image. The percentage of each

19 quantized color in the region is calculated correspondingly. A spatial coherency on the entire descriptor is also defined, and is used in similarity retrieval. 9 Scalable Color. The Scalable Color Descriptor is a Color Histogram in HSV Color Space, which is encoded by a Haar transform. Its binary representation is scalable in terms of bin numbers and bit representation accuracy over a broad range of data rates. The Scalable Color Descriptor is useful for image-to-image matching and retrieval based on color feature. Retrieval accuracy increases with the number of bits used in the representation. Color Layout. This descriptor effectively represents the spatial distribution of color of visual signals in a very compact form. This compactness allows visual signal matching functionality with high retrieval efficiency at very small computational costs. It provides image-to-image matching as well as ultra high-speed sequence-to-sequence matching, which requires so many repetitions of similarity calculations. Color Structure. The Color Structure descriptor is a color feature descriptor that captures both color content (similar to a color histogram) and information about the structure of this content. Its main functionality is image-to-image matching and its intended use is for still-image retrieval, where an image may consist of either a single rectangular frame or arbitrarily shaped, possibly disconnected, regions. The extraction method embeds color structure information into the descriptor by taking into account all colors in a structuring element of 8x8 pixels that slides over the image, instead of considering each pixel separately. Texture Browsing. The Texture Browsing Descriptor is useful for representing homogeneous texture for browsing type applications, and requires only 12 bits (maximum). It provides a perceptual characterization of texture, similar to a human characterization, in terms of regularity, coarseness and directionality. The computation of

20 10 this descriptor proceeds similarly as the Homogeneous Texture Descriptor. First, the image is filtered with a bank of orientation and scale tuned filters (modeled using Gabor functions); from the filtered outputs, two dominant texture orientations are identified. Three bits are used to represent each of the dominant orientations. This is followed by analyzing the filtered image ions along the dominant orientations to determine the regularity (quantified to 2 bits) and coarseness (2 bits x 2). The second dominant orientation and second scale feature are optional. Edge Histogram. The edge histogram descriptor represents the spatial distribution of five types of edges, namely four directional edges and one non-directional edge. Since edges play an important role for image perception, it can retrieve images with similar semantic meaning. Thus, it primarily targets image-to-image matching (by example or by sketch), especially for natural images with non-uniform edge distribution. In this context, the image retrieval performance can be significantly improved if the edge histogram descriptor is combined with other Descriptors such as the color histogram descriptor. Region Shape. The shape of an object may consist of either a single region or a set of regions as well as some holes in the object. Since the Region Shape descriptor makes use of all pixels constituting the shape within a frame, it can describe any shapes, i.e. not only a simple shape with a single connected region but also a complex shape that consists of holes in the object or several disjoint regions. The Region Shape descriptor not only can describe such diverse shapes efficiently in a single descriptor, but is also robust to minor deformation along the boundary of the object. Contour Shape. The Contour Shape descriptor captures characteristic shape features of an object or region based on its contour. It uses so-called Curvature Scale-Space representation, which captures perceptually meaningful features of the shape.

21 Distance Measure A distance is a numerical description of how far apart objects are at any given moment in time. In physics or everyday discussion, distance may refer to a physical length, a period of time, etc. In mathematics, the Euclidean distance or Euclidean metric is the "ordinary" distance between two points that one would measure with a ruler, which can be proven by repeated application of the Pythagorean Theorem.

12 Chapter 5 DATASETS This chapter illustrates the image data set and the reference ontology used for this implementation. 5.1 Visible Human Image Data Set: Images from the Visible Human [24] Data Set were used.

22 12 Chapter 5 DATASETS This chapter illustrates the image data set and the reference ontology used for this implementation. 5.1 Visible Human Image Data Set: Images from the Visible Human [24] Data Set were used. The Visible Human dataset contains anatomically detailed, three-dimensional representations of the normal male and female human bodies. This digital image dataset contains complete human male and female cadavers in MRI, CT and anatomical modes. The images were obtained via academic licensing through National Library of Medicine. Figure 1: Image of Abdomen from Visible Human Data Set.

23 13 Figure 2: Image of Thigh from Visible Human Data Set. 5.2 University Of Washington Digital Anatomist Reference Ontology The University of Washington Digital Anatomist (UWDA) [22] reference ontology from the medical domain was chosen. UWDA is an abridged version of the Foundation Model of Anatomy [27] Ontology and is incorporated into the UMLS (Unified Medical Language System) Meta source. UWDA is a domain ontology that represents knowledge of the human body. It contains classes and relationships that provide a symbolic model of the structure of the human body. This domain is computer based and was designed for bioinformatics. It was developed by the structural information group at the University of Washington. UMLS was obtained through academic licensing in order to access the UWDA Ontology.

24 14 Chapter 6 PREPROCESSING TOOLS This chapter illustrates the tools used to process the image data set and create the reference ontology. 6.1 MySQL MySQL is an open source SQL Database Management System. MySQL was used in this implementation to house the UMLS database containing the University of Washington Digital Anatomist reference ontology. The ontological terms contained in the UWDA ontology was retrieved using SQL queries from the MySQL instance of UMLS. 6.2 Adobe Photo Shop Adobe Photoshop is a graphics editor developed by Adobe Systems for image manipulation. Images obtained from the visible human data set are in the raw format. Adobe Photoshop was used to convert these images to JPEG format in order to access any information contained in the images. 6.3 M - Ontomat Annotizer M-OntoMat-Annotizer (M stands for Multimedia)[26] is a user-friendly tool developed inside the acemedia. It is an extension of the CREAM (CREAting Metadata for the Semantic Web) framework and its reference implementation, OntoMat- Annotizer. M-OntoMat-Annotizer Visual Descriptor Extraction Tool developed as a plug in to Ontomat Annotizer presents a graphical interface for loading and

25 15 processing visual content (images and videos), extraction of visual features and association with domain ontology concepts. M-OntoMat-Annotizer is a Java-based application and is distributed under the GNU LESSER GENERAL PUBLIC LICENSE [R1].

26 16 Chapter 7 PREPROCESSING METHODS The following chapter describes the various steps involved in preparing the image data set and the reference ontology for this implementation using the tools and data sets described in the above chapters. 7.1 Selection Of Images From Visible Human A subset of 90 images from the Visible Human Data Set was chosen. This subset consisted of both the male and female images spanning from head to toe of the human body. 15 categories based on different regions of the human body such as Head, Abdomen, Thigh, Abductor Magnus, Kidney, Eyes, Brain, Gluteal Muscles, Hamstring, Biceps, Pectoralis Major, Colon, Pelvis, Thorax and Lungs were chosen. The categories were chosen such that the images range in their content i.e. they have different colors, shapes and textures. 90 images were selected by picking 6 images from each of the 15 categories to act as test and training images for our experiments. 7.2 Extraction Of UWDA Ontological Terms From The UMLS Database A subset of 15 UWDA ontological terms corresponding to the 15 categories of images described in the above section was extracted from the UMLS database for our experiment. MySQL was used to install the UMLS database and SQL queries were designed to extract the UWDA ontological terms from the UMLS database. The UMLS database has various tables in the databases containing information such as concepts, definitions, terms, etc. The following SQL query was used to extract the UWDA ontological terms and their definition from the UMLS tables.

27 17 Figure 3: Screenshot of SQL query used to Extract UWDA terms from UMLS. 7.3 Creation Of UWDA Reference Ontology In DAML (Darpa Agent Mark Up Language) An empty ontology file was created in the DAML format. The 15 extracted ontology terms and definitions were then added to the file in DAML format using the DAML references and guidelines. This file containing the 15 UWDA ontological terms was used in M-Ontomat Annotizer as the reference ontology file in DAML format. 7.4 Loading Domain Ontology In M-Ontomat Annotizer The reference ontology DAML file is loaded into M-Onto Annotizer using the Ontology Explorer. The Ontology Explorer displays all the ontological terms contained in

28 the domain ontology file created above. Ontology Explorer provides a way to create prototype instances for ontology terms to be linked to image feature content Conversion Of Image Format To JPEG The subset of images chosen for the implementation from the Visible Human Data Set is in the raw format. These images need to be converted to the bitmap or JPEG format to access the image content information. The raw images were opened with Adobe Photoshop after specifying the width, size and resolution as per guidelines set by National Library of Medicine for this data set. These images were then saved as JPEG files through Adobe Photoshop. The JPEG image files were then used for image segmentation and feature extraction as described in the next section. 7.6 Extracting Image Content And Linking To Domain Ontology The Visual Descriptor Extraction (VDE) tool in M- Ontomat Annotizer was used for loading the JPEG image files and selecting and extracting image content information. An ontology term from one of the 15 terms was selected from the Ontology Explorer. A new prototype instance of this ontology term was created in order to link the image content features for the new image. An image was chosen from the same category as the ontology term and uploaded to the VDE tool. An electronic pen or mouse was used to select the region on the image corresponding to the ontological term for this image. For example, if the chosen ontological term is Head, then an image from the category Head is chosen and uploaded to the VDE tool. The Head region is then selected on the image for image content extraction. VDE provides the functionality to extract the following content from the images Texture Browsing, Region Shape, Dominant Color, Scalable Color, Contour Shape, Edge Histogram, Color Structure and Color Layout. Once the region of interest was selected on the image, all the above image features were extracted using the VDE tool one by one. The features are extracted into XML files and the association with the prototype instance is also

29 19 stored in the XML file for each image feature by the VDE tool. This procedure was followed for all the 90 images in the data set. Each image will have 8 XML files containing the image content, 1 RDF file containing the domain ontology and references to the XML files and 1 DAML file containing the domain ontology terms. These files form the core data set and were used to build the semantic retrieval system described in the next section. Figure 4: Screenshot of VDE tool in M-Ontomat Annotizer showing the image feature extraction and annotation process.

30 20 Chapter 8 METHODS This chapter describes the methodology used in the development of the system to semantically index images and calculate the retrieval efficiency. The first step in the implementation involved selecting the test and the training images. Once the test set and the training set was obtained, every test image was compared to a training image by extracting all the feature descriptors for each image and calculating the distance measure for each feature type. Distance matrices were built containing the distance measures for test versus training images for every feature. The test images were then classified using similarity matching algorithms and the Ensemble classification approach. The accuracy rate was determined for every approach. The following sections describe the methods and approach used to develop the system. 8.1 Training And Test Set The chosen subset of 90 images is divided into 2 sets. The first set is the training set and the second set is the test set. 3 approaches were followed for populating the test set and the training set. In the first approach, 15 representative images from each category were used as the training set and the remaining images were in the test set. Many studies show that with a larger training set, the accuracy rate results can be improved. Hence, in the second approach, a training set that contained 50% of the images and a test set that contained the remaining 50% of the images were used. Also, an algorithm was developed to randomly populate both the test and the training images. In the third approach, the test and the training images were randomly populated. However, the training set contained 75% of images and the remaining 25 % of the images were in the test set.

31 21 For every image in the test set, the distance measure between the test image and every other training image for a particular feature descriptor was calculated and stored in a distance matrix for that feature descriptor. Also, for every training image, the distance between the training image and every other training image for a particular feature descriptor was calculated and stored in a distance matrix for that particular feature descriptor. 8.2 Extracting Image Content From XML Files Image content information for a particular image is extracted from the descriptor XML files. Every visual descriptor file has a different format and hence different XPath expression methods were developed for parsing each type of file. Image content from the XML files are extracted at run time while calculating similarity measure for each image. 8.3 Calculating Distance Measure Distance measure calculations require the image content information for the 2 images whose distance needs to be calculated. The image content information is extracted for the 2 images as described in the above section. Every feature descriptor has a different formula for calculating the distance as attributes of the descriptor are unique to a particular descriptor. The distance measure is thus calculated using one of the following formulae depending on which feature descriptor the distance measure is being calculated for. Dominant Color. The distance between two dominant color descriptors, F 1 and F 2, is calculated by the following distance function [28]:.

32 22 where F is the dominant color and p is the corresponding percentage value. N is the total number of dominant colors, and a k,l is the similarity coefficient between two colors. The formula for a k,l is shown below :. d k,l, T d and d max are defined as follows:.. where is the dominant color coefficient between 1 and 1.5 [28].. where, c k and c l, are colors. Color Layout. The distance between two color layout descriptors values [Y,Cb,Cr] and [Y,Cb,Cr ] can be calculated as follows[28]:.

33 where, and denote weighting values for each coefficient. Y, Cb and Cr are color layout descriptors also known as YCoeff, CbCoeff and CrCoeff. 23 Color Structure. The color structure distance measure between their descriptors is shown in the following formula [28]:. where h A and h B are the color structure descriptor vectors of images A and B and i is the total number of color structure descriptors. Texture Browsing. The texture browsing descriptor captures the regularity v1, direction v2 and v4, and scale v3 and v5 in the texture pattern. The distances between two sets of corresponding coefficients of TBC vector is shown in following formula [28]: TBC =. Edge Histogram. Edge histogram distance E is measured as the distances between two sets of inverse quantized edge histograms A and B is shown below [28]:. where, and are Edge Histogram descriptors and i is the total number of Edge Histogram descriptors. Contour Shape. Contour shape distance measure M is computed as a weighted sum of the distance measure between the global curve parameters and the distance measure between the Curvature Scale Space (CSS) peaks associated with the object and the semantic entity [28].

34 24. where E and C are the absolute values of Eccentricity and Circularity. M css is the distance measure value between the CSS matching peaks with an additional penalty for each unmatched peak equivalent to the missing peak height [28].. where xpeak and ypeak are coordinate values in x and y axes and i is the total number of Contour Shape descriptors. Region Shape. The distance function between 2 region shape descriptor is obtained from the following formula [28]:.. where p and q are region shape attributes and i is the total number of attributes. Scalable Color. The distance function between 2 scalable color descriptors is obtained from the following formula [28]:. where p and q are scalable color attributes and i is the total number of attributes.

35 8.4 Calculating Combined Distance Measure 25 Combined distance measure is calculated by summing the weighted distances obtained for all the image descriptors as described in the above section. Different weights were used while combining all the distances. The process of weight determination is explained in the Results and Analysis section. 8.5 Creating Distance Matrix A distance matrix is created for every feature descriptor. The elements of the matrix are the distance measures calculated using the methods stated in the above section. The dimensions of the matrix are Test X Training or Training X Training. Totally, 17 distance matrices are generated for image retrieval calculations. 8 matrices, one for every feature description is created for the dimension - Test X Training. The remaining 8 matrices, one for every feature description is created for the dimension Training X Training. These distance matrices are used in the image retrieval algorithms to calculate the retrieval accuracy rate as described in the following sections. The elements of the last distance matrix contain the combined distances of all image descriptors. 8.6 Calculating Retrieval Accuracy Rate Two algorithms based on different classification approaches were developed to calculate the retrieval accuracy rate. The first algorithm uses a simple classification technique based on smallest distance matching. The second algorithm follows the Ensemble Classification technique. Smallest Distance Classification. The algorithm for smallest distance classification is based on calculation on distance matrices. To further explain the algorithm, let us consider any distance matrix - Test X Training for Scalable Color. The first row of the matrix

36 26 containing the distance measure for the test image and all the training images is scanned and the smallest distance measure is calculated using fundamental sorting techniques. Once, the smallest distance measure is obtained, the first row is scanned again to find all the training images that have the same smallest distance measure. A count of all the matches and the matching training images ID s are stored for calculating the retrieval accuracy. The ontology term for the test image is retrieved using XPath expression parsing of the ontology RDF files. The ontology terms are retrieved for all the matching training images using XPath expressions as well. If any one of the training ontology terms matches the test ontology term then the algorithm classifies the image to the right category for identification. Each positive match is reflected in the accuracy count. The algorithm is repeated for all the rows in the distance matrix. The overall accuracy is obtained once the algorithm finishes with all the rows. The overall accuracy is a percentage obtained as a ratio of the number of test images classified over the total number of test images. The following are the different retrieval efficiencies that were calculated for all the test and training images using the smallest distance matching algorithm, Independent retrieval efficiency for every feature descriptor and Retrieval efficiency when combining all the feature descriptors. Ensemble Classification. The Ensemble technique is a popular and efficient classification technique. It derives from the concept of voting. Every image descriptor votes for a particular category. The test image will be classified to the category that has the maximum number of votes. An algorithm was developed to reflect this method. The algorithm uses the distance matrices produced for all the image descriptors. The algorithm considers the distance matrices belonging to a particular image descriptor. The first row of the matrix containing the distance measure for the test image and all the training images is scanned and the smallest distance measure is calculated using fundamental sorting techniques. Once, the smallest distance measure is obtained, the first row is scanned again to find all the training images that have the same smallest distance measure. A count of all the matches and the matching training images ID s are stored for calculating the retrieval

37 27 accuracy. The ontology term for the test image is retrieved using XPath expression parsing of the ontology RDF files. The ontology terms are retrieved for all the matching training images using XPath expressions as well. The training ontology terms retrieved is stored in an array. These steps are repeated for the first row of every distance matrix belonging to all the image descriptors. At the end of these steps, the array contains the matched training image ontology terms. Each set of ontology terms added to this list by the feature descriptors are analogous to votes added. The frequency of all the ontology terms is counted and the term with the highest frequency/vote is the obtained. This term is then compared to the ontology term for the test image and classified as positive if they match and the count of positive matches is tracked for retrieval accuracy rate calculations. The above procedure is repeated for all the rows in the distance matrices i.e. for all the test images. The overall retrieval accuracy rate is calculated as described earlier. 8.7 Improving Retrieval Accuracy Rates. Ten Fold Cross Validation and Empirical Weight Optimization techniques were used to improve the retrieval accuracy rates produced by the system. Ten Fold Cross Validation. In the Ten Fold Cross Validation approach, all the calculations performed in system are repeated 10 times and the calculations are averaged at the end of the last iteration. This approach is aimed at generalizing the errors caused by random operations such as populating the test set and the training set. The whole program runs in a loop of 10 iterations. In each of the iterations, the training and the test sets are populated, the distance matrices and accuracy rates are calculated. At the end of each of the iterations the results are summed. At the end of all the iterations the results are averaged.

38 28 Empirical Weight Optimization. Empirical weight optimization technique was used to determine the weights while calculating the weighted combined distance measure. Combined distance measure is calculated as a weighted sum of all the descriptors. To start with, all the descriptors are assigned equal weights. One of the descriptors is chosen and its corresponding weight is varied from +1 to -1 in increments of +/- 0.1 each time. For every weight measure, the difference between the maximum weight and the weight chosen for the descriptor is calculated and the difference is distributed as among all the other descriptors equally. Combined accuracy rate is calculated for every variation. This technique is then applied to all the other descriptors.

39 29 Chapter 9 RESULTS, DISCUSSION AND ANALYSIS The following chapter illustrates the results obtained from the implementation approach described above. An analysis of the results the various methods used to improve the implementation results are described in detail in this section. 9.1 Initial Results The initial results for the implementation contained 15 images in the training set and 75 images in the test set. The tables below show the results for test vs. training and training vs. training. Table 1: Accuracy rate for training set. Training Set = 15 Images, Training Set = 15 Images Image Descriptor Accuracy Rate Color Layout 100% Color Structure 100% Contour Shape 100% Dominant Color 100% Edge Histogram 100% Region Shape 100% Scalable Color 100% Texture Browsing 100% From the training vs. training results table we can see that the retrieval accuracy rate for all training images is 100%. The retrieval rate for training images is calculated to verify that the algorithm developed is able to correctly classify images in the training set.

40 30 Table 2: Accuracy rate for test set. Training Set = 15 Images, Test Set = 75 Images Image Descriptor Accuracy Rate Color Layout % Color Structure % Contour Shape % Dominant Color % Edge Histogram % Region Shape 68% Scalable Color % Texture Browsing % From the test vs. training results table we see that highest accuracy rate is obtained by indexing images only on the Region Shape descriptor. Scalable Color and Color Structure provide the second best retrieval rates. This accuracy rate is definitely better compared to a random classifier accuracy rate of 6.66 %. The random classifier rate is obtained as the percentage probability of the test image being classified as one of the 15 training images. 9.2 Increased Training To Test Ratio Data mining best practices indicate that the Training to Test ratio should be high for improved retrieval accuracy rates. In our experiments we selected 2 ratios for training and test sets. The first ratio was 2/3 rd training and 1/3 rd test. The second ratio was 1/2 training and 1/2 test. The training and the test sets were populated randomly based on another data mining best practice guidelines. The following table indicates the results obtained with the 2 ratios of training and test sets.

41 31 Table 3: Accuracy rate for 75% images in training set and 25% images in test set. Training Set = 75%, Test Set = 25% Image Descriptor Accuracy Rate Color Layout 48% Color Structure 87% Contour Shape 26.07% Dominant Color 47.83% Edge Histogram 65.22% Region Shape 65.22% Scalable Color 78.26% Texture Browsing 52.17% With training to test ratio being 2/3 rd and 1/3 rd, the best retrieval accuracy rates are obtained for Color Structure descriptor. Scalable Color also gives good results. Table 4: Accuracy rate for 50% images in training set and 50% images in test set. Training Set = 50%, Test Set = 50% Image Descriptor Accuracy Rate Color Layout 44.44% Color Structure 57.77% Contour Shape 17.77% Dominant Color 37.77% Edge Histogram 44.44% Region Shape 68.88% Scalable Color 64.44% Texture Browsing 62.22%

42 32 With training to test ratio being1/2 and 1/2, the best retrieval accuracy rates are obtained for Region Shape descriptor followed by Scalable Color. From the results, we can see that the retrieval accuracy rates have significantly improved with a higher number of images in the training set. By increasing the number of images in the training set, the maximum value for the retrieval accuracy rate for a descriptor has increased from 68% to 87%. 9.3 Combined Descriptors To further improve the accuracy rate, we combined the distance measures for all the descriptors and calculated the accuracy rate on the combined value. Above mentioned ratios for the test and training sets were used. The test and the training sets were also randomly populated. Table 5: Combined accuracy rate for training Set = 50 % and test Set = 50%. Image Descriptors Combined Descriptors (Equal Weights) Training Set = 50%, Test Set = 50% Accuracy Rate 73.33% above. With test to training ratio being ½ and ½, the combined accuracy rate is shown

43 33 Table 6: Combined accuracy rate for training set = 75 % and test set = 25%. Image Descriptors Combined Descriptors (Equal Weights) Training Set = 75%, Test Set = 25% Accuracy Rate 86.95% above. With test to training ratio being 1/3 and 2/3, the combined accuracy rate is shown The retrieval accuracy rate obtained by combining all the descriptors is almost equivalent to the highest retrieval accuracy rate obtained for one of the descriptors in the previous experiment (Color Structure). Due to the combined retrieval accuracy rates not being significantly higher compared to accuracy rates obtained by single descriptors, we experimented with some more methods to improve the combined accuracy rates as described in the following sections. 9.4 Ensemble Classification Classification. The next approach used to improve the retrieval accuracy rate was Ensemble Table 7: Accuracy rate for Ensemble Classification for 50% test and 50% training. Training Set = 50%, Test Set = 50% Image Descriptors Accuracy Rate Ensemble 37.77%

44 34 above. With test to training ratio being ½ and ½, the Ensemble accuracy rate is shown Table 8: Accuracy rate for Ensemble Classification for 75% training and 25% training. Training Set = 75%, Test Set = 25% Image Descriptors Accuracy Rate Ensemble 43.47% above. With test to training ratio being 1/3 and 2/3, the Ensemble accuracy rate is shown Good results were not obtained using the Ensemble classification approach due to the votes being distorted for certain descriptors. Due to the nature of the image descriptors, we found that there were more than one training images with the smallest distance measures for a particular test image. The images from the Visible Human Data Set are very similar in terms of dominant colors and textures in the images. Many training images having the same smallest distance measure meant that the test images were voted to be in different training classes thereby skewing the voting calculations for the Ensemble classification method. For example, for test image 1, training images 3, 8, and 9 had the same smallest distance measures. However, training images 3 and 9 voted the test image to be in the Head class whereas training image 8 voted for Eyes. While predicting the class of the test images using the Ensemble classification technique, we considered all the votes for a particular test image across all the descriptors distance matrices and calculated the vote with the maximum occurrence and assigned the test image to the class with the maximum

45 35 votes. In the above example, the test image will be assigned to the Head class. In actual, the test image belongs to the Eyes class. Hence, the retrieval accuracy rate is reduced due to incorrect classification. 9.5 Ten Fold Cross Validation We used Ten Fold Cross Validation method to further improve the accuracy rates for single descriptors and combined descriptors. With the Ten Fold Cross Validation we can average out any errors that might occur due to random selection of training and test images. From the table below, for the training to test ratio of 2/3 rd and 1/3 rd, the best results are obtained when all the descriptors are combined. The Ensemble accuracy rate is also improved but the results are not as high as the combined accuracy rate. However, Scalable Color, Edge Histogram, Color Structure provide good results as well. Table 9: Accuracy rate for Ten Fold Cross Validation for 75% training and 25 % test. Training Set = 75%, Test Set = 25% Ten Fold Cross Validation Image Descriptor Accuracy Rate Color Layout 55.65% Color Structure % Contour Shape 30.86% Dominant Color 52.60% Edge Histogram 71.73% Region Shape 66.95% Scalable Color 75.65% Texture Browsing 65.65% Combined Descriptors (Equal Weights) 84.34% Ensemble 64.78%

46 36 From the table below, for the training to test ratio of ½ and ½, the best results are obtained when all the descriptors are combined. The Ensemble accuracy rate is also improved but the results are not as high as the combined accuracy rate. However, Scalable Color and Region Shape descriptors provide good results as well. Table 10 : Accuracy rate for Ten Fold Cross Validation for 50% training and 50 % test. Training Set = 50%, Test Set = 50% Ten Fold Cross Validation Image Descriptor Accuracy Rate Color Layout 52% Color Structure 64.22% Contour Shape 26.22% Dominant Color 46.44% Edge Histogram 63.55% Region Shape 68.44% Scalable Color 70.22% Texture Browsing 55.11% Combined Descriptors (Equal 81.33% Weights) Ensemble 62.22% From all the above experiments, we observed that Scalable Color, Color Structure, Region Shape and Edge Histogram provided consistent good results. However, Contour Shape consistently has the lowest accuracy rates followed by Texture Browsing and Dominant Color. Color Layout lies in between, with an average of around 50% accuracy rate across all experiments. The next section describes experiments done by excluding descriptors with low individual retrieval accuracy rates while calculating the overall combined accuracy rate.

47 Excluding Descriptors Texture Browsing and Contour Shape descriptors were excluded from the combined accuracy rate calculations. The results obtained from this exclusion are shown below. There is an increase in the combined accuracy rate (87.39%) compared to previous experiment results (~84%). Although, Contour Shape has consistently given low accuracy rates, Texture Browsing did give average results in some of the experiments described above. Hence, removing both the Texture Browsing descriptor and the Contour Shape descriptor from the combined descriptor calculations did not significantly improve the accuracy rates. Table 11: Accuracy rate excluding Contour Shape and Texture Browsing. Training Set = 50%, Test Set = 50%, Training Set = 75%, Test Set = 25% Image Descriptors Accuracy Rate Combined Descriptors (Equal 84.88% Weights, No Contour Shape and Texture Browsing) Combined Descriptors (Equal 87.39% Weights, No Contour Shape and Texture Browsing) The combined accuracy rate significantly improved when Contour Shape Descriptor was excluded from the combined accuracy rate calculations. A high accuracy rate of % was obtained with the training and test ratio as 2/3 rd and 1/3 rd.

48 38 Table 12: Accuracy rate excluding Contour Shape descriptor. Training Set = 50%, Test Set = 50%, Training Set = 75%, Test Set = 25% Image Descriptors Accuracy Rate Combined Descriptors (Equal 84.44% Weights, No Contour Shape) Combined Descriptors (Equal % Weights, No Contour Shape) Accuracy rates obtained for Contour Shape have been consistently lower across all experiments and hence excluding it from the combined descriptor calculations significantly improved the retrieval accuracy rates. 9.7 Empirical Weight Optimization By using the Empirical Weight Optimization Technique, we were able to further improve the retrieval accuracy rates by combining weighted descriptors and not excluding any descriptors from the semantic metadata. The highest retrieval accuracy rate obtained from this approach is 93.48% with weights for the descriptors as shown in the table. These results also show that by maximizing the weight for Region Shape, the accuracy rates significantly improve when combining all the descriptors. Table 13: Accuracy rates for Empirical Weight Optimization. Training Set = 75%, Test Set = 25% Image Descriptor Weights Region Shape = 1.9 Other descriptors = Accuracy Rate 93.48%

49 39 Chapter 10 RELATED WORK 10.1 Knowledge Assisted Video Analysis And Object Detection Gabriel Tsechpenakis, Giorgos Akrivas, Giorgos Andreou, Giorgos Stamou and Stefanos Kollias presented a method for object recognition in video sequences [28]. The goal of the system is to extract semantics automatically by detecting and tracking moving objects in video sequences and then using low-level features of each semantic entity, in order to associate moving objects with them. The proposed algorithm consists of two main steps: the detection and localization of regions-of-interest in a sequence, and the estimation of the main mobile object contours. Visual descriptors, which are used to model visual content associated with semantic entities, are categorized according to the MPEG-7 framework. Visual descriptors extracted were mapped to the conceptual terms to build the semantic indexing metadata. Similarity matching algorithms were used to match the moving regions extracted. The simulation of this system was able to identify moving regions based on the extracted semantics. A similar approach was used in our implementation. Our implementation focused on the content of images and not videos. The main difference is that in our implementation, the semantics are manually extracted by selecting the region of interest and formalized domain ontology is used for mapping the extracted content to meaningful terms. Also, the above system used similar videos to build the training and test sets whereas in our implementation, we used images diverse in their content.

50 10.2 Retrieval of Multimedia Objects By Combining Semantic Information From Visual And Textual Descriptors 40 Mats Sjöberg, Jorma Laaksonen, Matti Pöllä and Timo Honkela proposed a method of content-based multimedia retrieval of objects with visual, aural and textual properties [33]. In their method, training examples of objects belonging to a specific semantic class are associated with their low-level visual descriptors (such as MPEG-7) and textual features such as frequencies of significant keywords extracted from audio tracks. A fuzzy mapping of a semantic class in the training set to a class of similar objects in the test set was created by using Self-Organizing Maps (SOMs) trained from the visual and textual descriptors. Query by example (QBE) is the main operating principle in SOM, meaning that the user provides the system a set of example objects of what he or she is looking for, taken from the existing database. The various experiments performed by them on the system proposed showed a promising increase in retrieval performance. The results also showed that the retrieval performance increased with the use of textual features. The implementation approach described above is less similar to the approach used in our implementation. We classified images using a similarity matching algorithm based on smallest distances and Ensemble classification. This approach is slightly different to the SOM approach used in the implementation described above. Also, in our approach all the training images in a particular class have the same textual descriptor whereas this implementation uses a range of words and their frequencies.

51 41 Chapter 11 EDUCATIONAL STATEMENT This research work benefited from the knowledge obtained from many classes taken as a part of the Graduate curriculum at the Institute of Technology, UW Tacoma. Strong foundations obtained from the TCSS 543 Advanced Algorithms class helped in the mathematical aspects involved in this research. Knowledge obtained from this class was also useful in selecting and implementing the right data structures needed for this implementation. Image processing foundations from the TCSS Digital Media class was very useful in extracting image features which was a significant part of this implementation. Database design basics learnt from the TCSS 545 class was extremely helpful during the data pre processing phase. The basics of scientific research obtained from the TCSS 598 Master s Seminar was extremely helpful while researching on this area. The exposure to formal technical writing in this class was also very helpful while writing this paper. Concepts of Bioinformatics such as data mining and domain ontologies helped me a lot when trying to understand the concepts related to the medical domain. TCSS588 - Bioinformatics class was very useful in determining areas for future research that would benefit the medical domain. Apart from these classes, programming knowledge gained from many other classes was very useful in the design and implementation stages. Exposure to image processing tools, similarity matching algorithms and techniques proved to be very knowledgeable, as it can be applied to solve indexing problems in various domains. Many indexing algorithms were researched during the course of this research. This knowledge will be very useful to build information retrieval applications in the future. This research also proved to be very beneficial in learning the languages of the Semantic Web such as RDF and DAML. Working on this thesis has given me the opportunity to research and learn about various areas of computer science like imaging,

52 multimedia databases, knowledge representation languages, etc. I thoroughly enjoyed the learning experience and exposure to various technologies during the course of this research. 42

53 43 Chapter 12 CONCLUSION The implementation described in this paper has shown that a high retrieval accuracy rate is obtained by semantically indexing images using a web ontology language and the visual descriptors of the image. The biggest challenge in this implementation was to develop a similarity matching algorithm to retrieve matching images by combining all the visual descriptors and the ontology terms. A retrieval accuracy rate of % was obtained using the algorithm developed. The approach proposed in this paper will benefit the medical community to a large extent as large collections of medical images can be indexed and retrieved semantically. Future improvements to this implementation include automating the image segmentation and feature extraction phase and using learning techniques to improve the similarity matching algorithm developed.

54 44 BIBLIOGRAPHY [1] Boll, S., Klas, W., Sheth, A. (1998). Overview on Using Metadata to Manage Multimedia Data. In Multimedia Data Management Using Metadata to Integrate and Apply Digital Media (1-24). [2] Chavez-Aragon, A., Starostenko, O. (2004). Ontological Shape Description, A New Method for Visual Information Retrieval. Proceedings of the 14th IEEE International Conference on Electronics, Communications and Computers. Retrieved Nov 27, 2004, from [3] Comaniciu, D., Foran, D., Meer, P. (1998). Shape Based Image Indexing and Retrieval for Diagnostic Pathology. Proceedings of the 14th IEEE International Conference on Pattern Recognition, 1 ( ). Retrieved Nov 27, 2004, from [4] Fayyad, U.M. (1996). Automating the Analysis and Cataloging of Sky Surveys. In Advances in Knowledge Discovery and Data Mining ( ) [5] Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., et al. (1995). Query by Image and Video Content. IEEE Computer, 28(9), (23-31). Retrieved Nov 1, 2004, from [6] GIS Images. Retrieved Nov 10, 2004, from [7] Golbeck, J., Alford, A., Hendler, J. Organization and Structure of Information using Semantic Web Technologies. Maryland Information and Network Dynamics Laboratory, University of Maryland. Retrieved Nov 1, 2004, from [8] Hand D., Mannila, H., Smyth, P. (2001). Retrieval by Content. In Principles of Data Mining ( ). England: The MIT Press. [9] Hu, B., Dasmahapatra, S., Lewis, P., Shadbolt, N. (2003). Ontology Based Medical Image Annotation with Description Logics. Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence. Retrieved Nov 1, 2004, from

55 [10] ImageJ. Retrieved Nov 11, 2004, from 45 [11] Jorgensen, C. Image Indexing- An Analysis of Selected Classification Systems in Relation to Image Attributes Named by Naïve Users. Retrieved Nov 8, 2004, from qid=8078 [12] Knublauch, H., Olivier, D., Musen M. Weaving the Biomedical Semantic Web with the Protégé OWL Plug-in. Stanford Medical Informatics, Stanford University: Stanford. Retrieved Nov 18, 2004, from [13] Maybury, M.T. (Ed.). (1997). Intelligent Multimedia Information Retrieval. Menlo Park, CA: AAAI Press. [14] Mejino, J., Rosse, C. Conceptualization of Anatomical Spatial Entities in the Digital Anatomist foundation Model. Structured Informatics Group, Department of Biological Structure, University of Washington School of Medicine. Retrieved Nov 4, 2004 from [15] Mojsilovic, A., and Gomes, J. (2002). Semantic Based Categorization, Browsing and Retrieval in Medical Image Databases. IEEE International Conference on Image Processing, III ( ).. Retrieved Nov 1, 2004, from [16] Ontology Web Language. Retrieved Nov 21, 2004, from [17] Pentland, A., Picard, R.W., Sclaroff, S. (1994). Photobook: Tools for content-based manipulation of image databases. International Journal of Computer Vision, 18 ( ). [18] Protégé. Retrieved Nov 3, 2004, from [19] Rui, Y. Huang, T.S., Ortega, M., Mehrotra, S. (1997). Relevance feedback: a power tool in interactive content-based image retrieval. Proceedings of the IEEE Transactions on Circuits and Systems for Video. Maybury, M.T. (Ed.) Intelligent Multimedia Information Retrieval Technology, 8(5), ( ). Retrieved Nov 1, 2004, from

56 [20] Semantic Web. Retrieved Oct 17, 2004, from, 46 [21] Smith, J.R., Chang, S. (1997). Querying by color regions using VisualSeek contentbased visual query system. Intelligent Multimedia Information Retrieval, In: Maybury, M.T. (Ed.) (23-41). Menlo Park, CA: AAAI Press. [22] The Digital Anatomist. Retrieved Oct 17, 2004, from, [23] UMLS. Retrieved Oct 17, 2004, from [24] Visible Human. Retrieved Oct 17, 2004 from, [25] Visser, P., Bench-Capon, T. (1996). On the Reusability of Ontologies in Knowledge - System Design. Conference Proceedings of the Seventh International Workshop on Database and Expert Systems Applications, ( ) [26] M Ontomat Annotizer. Retrieved Jan 30, 2006 from, [27] Foundation Model of Anatomy. Retrieved Nov 11, 2005 from, [28] Tsechpenakis, G., Akrivas, G., Andreou, G., Stamou, G., Kollias, S. Knowledge Assisted Video Analysis and Object Detection. Image Video and Multimedia Laboratory, Department of Electrical and Computer Engineering, National Technical University of Athens. Retrieved Oct 30, 2006 from, [29] Christopoulas, C., Berg, D., Skodras, A. The Colour In the Upcoming MPEG 7 Standard. Retrieved Jan 5, 2007 from, [30] Eidenberger, E. Evaluation and Analysis of Similarity Measures for Content Based Visual Information Retrieval. Interactive Media Systems Group, Institute of Software Technology and Interactive Systems, Vienna University of Technology. Retrieved Dec 15, 2006 from,

57 47 [31] Geradts, Z., Hardy, H., Poortman, A. Bijhold, J. Evaluation of contents based image retrieval methods for a database of logos on drug tablets. Netherlands Forensic Institute. Retrieved Nov 21, 2006 from, cumentszszarticleszszspie2001zszdrugs.pdf/geradts01evaluation.pdf [32] Papadopoulos, S., Mezaris, V., Kompatsiaris, I., Strintzis, M.G. A Region Based Approach to Conceptual Image Based Classification. Information Processing Laboratory, Electrical and Computer Engineering Dept., Aristotle University of Thessaloniki. Retrieved Jan 5th, 2006 from, [33] Sj oberg, M., Laaksonen, J., P oll a, M., Honkela, T. Retrieval of Multimedia Objects by Combining Semantic Information from Visual and Textual Descriptors. Laboratory of Computer and Information Science, Helsinki University of Technology. Retrieved Feb 15, 2007 from, [34] Eakins, J., Graham, M. Content Based Image Retrieval. University of Northumbria at Newcastle. Retrieved Dec 15 th, 2006 from [35] MPEG - 7. Retrieved Nov 11, 2005 from,

58 48 APPENDIX A PRESENTATION SLIDES This appendix contains the PowerPoint slides prepared for the thesis presentation.

59 49

60 50

61 51

62 52

63 53

64 54

65 55

66 56

67 57

68 58

69 59

70 60

71 61

72 62

73 63

74 64

75 65

76 66

77 67

78 68

79 69

80 70

81 71

82 72

83 73

84 74

85 75

86 76

87 77

88 78

89 79

90 80

91 81

92 82

93 83

94 84

95 85

96 86

97 87

98 88

99 89

100 90

101 91

102 92

103 93

104 94

105 95

106 96

107 97

108 98 APPENDIX B INSTALLATION & USER MANUAL Installation: 1. Download images from Visible Human Project available from the National Library of Medicine FTP site - vhnet.nlm.nih.gov ( ). 2. Install Adobe Photoshop Graphics Software available as Compact Discs after academic purchase. 3. Install MySQL Database Management System from the MySQL download site Download UMLS database from the National Library of Medicine site Install UMLS as a MySQL database on the MySQL server. 6. Download M-Ontomat Annotizer from the Acemedia site Install Microsoft Visual Studio Integrated Developer Environment (IDE) or you can use any IDE such as Eclipse available through academic purchase. 8. Install Microsoft.NET Framework available through academic purchase.

109 99 User Manual: Preprocessing Steps: 1. Convert images from.raw format to JPEG files using Adobe Photoshop and specify the values shown below for conversion: Anatomy CT MRI Header Width Height Channels Interlaced X X 2. Store the JPEG image files in individual folders( one for each image) 3. Extract University of Washington Digital Anatomist (UWDA) ontology terms from UMLS using the SQL query shown below: SELECT * FROM MRCONSO WHERE SAB = UWDA ; MRCONSO is the table containing the ontological concepts and SAB is the name of the column representing the source of the terms in UMLS Database. 4. Create an empty text file in DAML format using the standard XML schema for DAML. Store the extracted terms in the file created to form the DAML ontology file.

110 Run M-Ontomat Annotizer, open the Ontology Explorer and load the ontology DAML file created in the earlier step. 6. Open the Visual Description Extraction (VDE) Tool in M-Ontomat Annotizer and load an image for image segmentation and feature extraction. 7. Select an ontology term in the ontology explorer and create a prototype instance for the ontology term. 8. Corresponding to the selected ontology term, select the region of interest on the image and extract all the image descriptors for this region using the VDE tool. 9. Store the image descriptor files and annotation files generated by M-Ontomat after the prototype instance creation and extraction of image descriptors. 10. Repeat steps 6-9 for all the images. Execution Steps for new images: 1. Pre-process the images as described in the earlier section. 2. Open Visual Studio and load the Semantic Indexing.csproj file stored in the Compact Disc submitted with this thesis. 3. Specify size of the image dataset, location of the image and M-Ontomat output files in the MainProgram.cs file in the project. Also, specify the location to output the results files. 4. Build the project and execute to start the semantic indexing system.

111 Results are all stored in text files. Execution Steps for existing images: 1. Download the contents of the folder Project from the compact disc submitted. 2. Navigate to the Semantic Indexing folder under top Folder Project and execute the file Semantic Indexing. Exe 3. Results will be obtained in the Results folder under top Folder Project.

112 102 APPENDIX C SYSTEM OUTPUT This appendix contains a screenshot of the sample output produced by the implementation as shown below.

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES

188 CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 6.1 INTRODUCTION Image representation schemes designed for image retrieval systems are categorized into two