Using an Image-Text Parallel Corpus and the Web for Query Expansion in Cross-Language Image Retrieval

Size: px

Start display at page:

Download "Using an Image-Text Parallel Corpus and the Web for Query Expansion in Cross-Language Image Retrieval"

Sydney Elliott
5 years ago
Views:

1 Using an Image-Text Parallel Corpus and the Web for Query Expansion in Cross-Language Image Retrieval Yih-Chen Chang and Hsin-Hsi Chen * Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan ycchang@nlg.csie.ntu.edu.tw, hhchen@csie.ntu.edu.tw Abstract. ImageCLEF2007 photo task is different from those of the previous years in two aspects. The caption field in the image annotations and the narrative field in the text queries are removed, and the example images in the visual queries are also removed from the image collection. In the new definition, the information that can be employed is less than before. Thus matching query words and annotations directly is not feasible. This paper explores the web to expand queries and documents. The experiments show that query expansion improves the performance 16.11%, however, document expansion brings in too much noise and the performance decreases 28.24%. The media mapping method based on an image-text parallel corpus is regarded as query expansion. The results of the formal runs show this method performs the best. Compared with the performance of the models without expansion, the MAP improves about 86.69%~143.12%. Integration of the external and the internal resources gains no benefits in the further experiments. 1 Introduction Image retrieval becomes more important since large scale image data are available on the web. In ImageCLEFphoto task, each topic, which is composed of a text query and a visual query, simulates the information need of users. In the previous years [1] [2], a text query includes topic and narrative fields in several different languages and a visual query includes two or three example images in the image data set. Each image in the image collection is annotated with title, location, date, notes and a detail caption. The task definitions of ImageCLEFphoto2007 [3] are changed in two aspects. First, the caption field in the image annotations and the narrative field in the queries are removed. These changes aim to reflect the information needs of the real world, i.e., image annotations and queries are usually short and rough. Second, the example images of the visual queries do not belong to the image collection. This change reflects that users may use their own photos as examples rather than images in the collection. When the caption field in the image annotation is removed, matching query words and image annotations becomes more challenging than before. We will explore an external resource, e.g., the web, to expand queries and documents. Through a web * Corresponding author. C. Peters et al. (Eds.): CLEF 2007, LNCS 5152, pp , Springer-Verlag Berlin Heidelberg 2008

2 Using an Image-Text Parallel Corpus and the Web for Query Expansion 505 search engine such as Google, we can retrieve relevant web pages, and use them for expansion. Compared with query expansion of using pseudo relevance feedback in the corpus, the outside resource may bring in information that the target corpus may not have. However, the information retrieved from the web may contain noise at the same time. How to filter out noise is an important issue. In this paper, we restrict the search space to some kinds of web sites, e.g., Wikipedia, and investigate if it is helpful for retrieval. In addition to external resources, we employ internal resources such as an imagetext parallel corpus, i.e., the target collection itself. Under such a trans-media corpus, two approaches say, media mapping [4] and a trans-media dictionary [5] were proposed before. Media mapping approach, which can be regarded as a kind of pseudo relevance feedback across different media, has better performance than trans-media dictionary approach. We will employ media mapping to ImageCLEFphoto 2007, examine its performance in the new definitions, and analyze if the integration of external and internal resources is helpful. This paper is organized as follows. Section 2 introduces the three methods we explore. Section 3 specifies official and unofficial runs we design. Section 4 shows and discusses the experiment results. Finally, we conclude the remarks in Section 5. 2 Methods Three methods including query (document) expansion using the web and query expansion with media mapping via a cross-media corpus are presented in the following subsections. 2.1 Query Expansion Using the Web Queries and image annotations are both short in ImageCLEFphoto In this situation, we plan to expand the given queries and get more information. Several previous experiments have shown that query expansion using pseudo relevance feedback is very useful in this task. In this paper, we employ outside resource like the web for query expansion and analyze if it can bring in useful information. The best way to access the web is through a web search engine like Google. We submit a text query to retrieve relevant web pages. Because the language of a text query may be different from the language of image annotations, we have to introduce the language translation mechanism. There are two alternative ways to deal with this problem. The first is to submit a text query to the web search engine directly and then to translate the retrieved web pages into target language. The second is to translate a text query into target language and then to submit the translated query to the web search engine. The drawback of the first approach (i.e., translation after retrieval) is the cost to translate all the web pages we get. In contrast, the translation cost of the second approach (i.e., translation before retrieval) is relatively low. However, when there are named entities in queries, the second approach may get wrong translation and thus the retrieved web pages may be unrelated to the original query. In the experiments, we adopt the approach of translation before retrieval. Next, we have to select words from the retrieved results to expand the given query. The selection mechanism can filter out noisy information, but it may also lose some useful

3 506 Y.-C. Chang and H.-H. Chen information. Here, we adopt the simplest way, i.e., to employ the top ranked snippets to expand our query. For the issue of noise, we limit the websites we access to encyclopedia-based ones, e.g., Wikipedia, by adding a web site name as an extra query term when submitting a query to a web search engine. 2.2 Document Expansion Using the Web Direct keyword matching may not be workable after query expansion if the relevant documents do not mention the words in the expanded query. There are two alternative ways to deal with this problem. First, we can expand a query with the words appearing in a document. Query expansion using relevance feedback belongs to this type. Second, we may expand the documents. In this paper, we explore the document expansion using the web. This method is similar to the one used in query expansion. We consider the title field of an image annotation as a query, and submit it to the web search engine to get the top ranked snippets to expand documents. Because documents are in target language, language translation is not necessary during document expansion. That is the major difference from query expansion. Although document expansion avoids translation errors, expanding too many words may introduce noise. In document expansion, we restrict the selection scope as follows. Only those words nearby the words in the title field of image annotations are considered as candidates. We set a window size (e.g., 5) in the experiments. Besides the above noise issue, document expansion has a logical problem. Assume the word animal is in title field of an image annotation. When we expand hyponym words such as tiger, cat, dog, etc., we do not know which animals are actually mentioned in that image. If the image talks about rabbit, the wrong expansion may introduce erroneous terms. 2.3 Media Mapping with an Image-Text Parallel Corpus Media mapping method [4][6] regards the target collection as an image-text parallel corpus, and employs such an intermedia to translate a visual query into a text one, and vise versa. The intermedia link two kinds of media (i.e., text and image) in this paper. Media mapping method can be seen as relevance feedback across different media and used in query expansion. In ImageCLEFphoto2006, we created a new query using media mapping and merged the results of the new query and the original visual query. In ImageCLEFphoto 2007, we regard the media mapping as query expansion in the following way. First, we submit a visual query to a content-based information retrieval (CBIR) system. Because images and the corresponding text annotations are parallelized in the collection, we then rerank the top n returned images by using a text query. Finally, the text annotations of the top m images are added to the original text query. We submit the expanded query to a text retrieval system and get the final result. We can compare the results of query expansion using the web and the media mapping with intermedia. In addition, we can examine the feasibility of the media mapping method in the new task definitions. There are two new challenges. First, retrieving the related images in the intermedia via visual query becomes harder. In the past, the visual queries are images in the image collection, so that they always appear

4 Using an Image-Text Parallel Corpus and the Web for Query Expansion 507 in the top of the returned images. Second, the caption field in an image annotation has been removed beforehand, so that the text information we can get from the image counterparts is less than the one in the tasks of previous years. We are interested in if the media mapping method is still workable in the new environment. 3 Experiments We submitted 27 official runs including 18 cross-lingual runs for eight different languages, 8 mono-lingual runs for three different languages, and 1 run for visual query only. All the queries with different source languages were translated into target language (e.g., English) using SYSTRAN system. We adopted Okapi with BM25 formula for text retrieval. The experiments consider the following issues. First, we want to check if the retrieved web pages can bring in new information for query expansion. We examine the expanded words when the recall is improved. Second, we compare the results of query expansion runs that limit or do not limit the web sites. Third, we want to check the effects that document expansion achieves. The runs using both query expansion and document expansion are also checked. Fourth, we examine the performance of media mapping method. Then, we employ both media mapping and the web for query expansion, and check if the web can bring in new information that media mapping cannot do. Some of the above issues are verified in the official runs, while some are done in the unofficial runs. Our official runs are described as follows. A run is named by the format Source- Language-TargetLanguage-Automatic-FeedBack-Media, where DE (German), ES (Spanish), EN (English), FR (French), JP (Japanese), RU (Russian), ZHT (Traditional Chinese), ZHS (Simplified Chinese), AUTO (Automatic), NOFB (No Feedback), TE (Document Expansion), FBQE (Feedback and Query Expansion), TXT (Text), IMG (Image), and TXTIMG (Text and Image) cross-lingual runs that use text query only, and do not consider query expansion: ES-EN-AUTO-NOFB-TXT, FR-EN-AUTO-NOFB-TXT, RU-EN-AUTO-NOFB-TXT, PT-EN-AUTO-NOFB-TXT, JA-EN-AUTO-NOFB-TXT, IT-EN-AUTO-NOFB-TXT, ZHT-EN-AUTO-NOFB-TXT, ZHS-EN-AUTO-NOFB-TXT 2. 3 mono-lingual runs that use text query only, and do not consider query expansion: EN-EN-AUTO-NOFB-TXT, ES-ES-AUTO-NOFB-TXT, DE-DE-AUTO-NOFB-TXT 3. 3 mono-lingual runs that adopt the media mapping method for query expansion: ES-ES-AUTO-FBQE-TXTIMG, EN-EN-AUTO-FBQE-TXTIMG, DE-DE-AUTO-FBQE-TXTIMG 4. 8 cross-lingual runs that use the media mapping method for query expansion: PT-EN-AUTO-FBQE-TXTIMG, ES-EN-AUTO-FBQE-TXTIMG, RU-EN-AUTO-FBQE-TXTIMG, IT-EN-AUTO-FBQE-TXTIMG, ZHT-EN-AUTO-FBQE-TXTIMG, ZHS-EN-AUTO-FBQE-TXTIMG, JA-EN-AUTO-FBQE-TXTIMG, FR-EN-AUTO-FBQE-TXTIMG 5. 2 runs that expand query with the web, but do not consider document expansion: EN-EN-AUTO-QE-TXT-TOPIC, ZHT-EN-AUTO-QE-TXT-TOPIC 6. 2 runs that expand document with the web, but do not consider query expansion: EN-EN-AUTO-TE-TXT-CAPTION, ZHT-EN-AUTO-TE-TXT-CAPTION

5 508 Y.-C. Chang and H.-H. Chen 7. 1 run that use visual query and the media mapping method only: IMG-EN-AUTO-FB-TXTIMG Some unofficial runs are described as follows runs that consider both query expansion and document expansion: EN-EN-AUTO-TE-QE-TXT, ZHT-EN-AUTO-TE-QE-TXT 2. 2 runs that expand query with the web and limit the search space: EN-EN-AUTO-QE-WIKI-TXT, ZHT-EN-AUTO-QE-WIKI-TXT 3. 2 runs that use both the web and the media mapping for query expansion: EN-EN-AUTO-QE-FBQE-TXTIMG, ZHT-EN-AUTO-QE-FBQE-TXTIMG. 4 Results and Discussions In the first set of experiments, we use the top one snippet returned by Google to expand the text queries. Table 1 shows the results of runs EN-EN-AUTO-QE-TXT-TOPIC, ZHT-EN-AUTO-QE-TXT-TOPIC, EN-EN-AUTO-NOFB-TXT, and ZHT-EN-AUTO- NOFB-TXT. In both cross-lingual and mono-lingual cases, the performance of systems with query expansion is better than that without query expansion. After expansion, both recall and precision are improved. In the original expectation, precision is decreased since we do not apply any strategies to filter noise. Table 1. Results of models with/without query expansion Query Language- Document Language Evaluation Metric Query Expansion Using the Web Query Without Expansion Traditional Chinese- MAP ( %) English Recall ( %) English-English MAP (+7.57 %) Recall (+14.84%) In the second set of experiments, we compare the results of query expansion with and without limiting the search space. Table 2 shows that the performance does not change very much after restricting the web sites for cross-lingual retrieval. The performance even has a little decrease in mono-lingual runs. We find that restrictive access may retrieve unrelated pages in some cases. The third set of experiments aims to evaluate the effects of document expansion. Table 3 summarizes the results. Document expansion does not take positive effects. In both cross-lingual and mono-lingual runs, recall and MAP are decreased when document expansion is introduced no matter whether query expansion is employed or not. The major reason may be that the strategy brings in too much noise. The fourth set of experiments examines the performance of the media mapping method in the new definitions. The results are shown in Table 4. Total 8 cross-lingual runs and 3 mono-lingual runs are tested. Media mapping achieves very good performance. Compared with the performance of the models without expansion, the MAP improves about 86.69%~143.12%. In last year, media mapping improves the performance about 71%~119%. This result shows that media mapping method is robust under different task definitions.

6 Using an Image-Text Parallel Corpus and the Web for Query Expansion 509 Table 2. Results of models with and without limiting the search space Run Name (cross-lingual/mono-lingual) Limitation Recall MAP ZHT-EN-AUTO-QE-TXT-TOPIC (cross-lingual) No ZHT-EN-AUTO-QE-WIKI-TXT (cross-lingual) Yes EN-EN-AUTO-QE-TXT-TOPIC (mono-lingual) No EN-EN-AUTO-QE-WIKI-TXT English (mono-lingual) Yes Table 3. Results of models using or not using document expansion Runs Name (cross-lingual/mono-lingual) Document Query Recall MAP Expansion Expansion ZHT-EN-AUTO-NOFB-TXT (cross) No No ZHT-EN-AUTO-TE-TXT-CAPTION (cross) Yes No ZHT-EN-AUTO-QE-TXT-TOPIC (cross) No Yes ZHT-EN-AUTO-TE-QE-TXT (cross) Yes Yes EN-EN-AUTO-NOFB-TXT (mono) No No EN-EN-AUTO-TE-TXT-CAPTION (mono) Yes No EN-EN-AUTO-QE-TXT-TOPIC (mono) No Yes EN-EN-AUTO-TE-QE-TXT (mono) Yes Yes Table 4. Results of using the media mapping as query expansion Query Language- Document Language Traditional Chinese-English Simplified Chinese-English Portuguese-English Spanish-English Russian-English Italian-English French-English Japanese-English English-English Spanish-Spanish German-German Metric Query Expansion using Media Mapping Without Expansion MAP ( %) Recall ( %) MAP ( %) Recall ( %) MAP ( %) Recall ( %) MAP ( %) Recall ( %) MAP ( %) Recall ( %) MAP ( %) Recall ( %) MAP ( %) Recall ( %) MAP ( %) Recall ( %) MAP ( %) Recall ( %) MAP ( %) Recall ( %) MAP ( %) Recall ( %)

7 510 Y.-C. Chang and H.-H. Chen Tables 1 and 4 conclude that media mapping with an image-text parallel corpus and query expansion using the web are very useful, and the former is better than the latter. The last set of experiments checks if integrating the internal and the external resources gets better performance. Table 5 shows that such an integration does not have positive effects. MAP is decreased when the web is used. It may be due to that the external resource (i.e., the web) has more noise than the internal resource (i.e., the image-text parallel corpus). Table 5. Results of the models using both media mapping and the web Run Media Mapping the Web Recall MAP ZHT-EN-AUTO-FBQE-TXTIMG Yes No ZHT-EN-AUTO-QE-FBQE-TXTIMG Yes Yes EN-EN-AUTO-FBQE-TXTIMG Yes No EN-EN-AUTO-QE-FBQE-TXTIMG Yes Yes In the above sets of experiments, we compare the performance of different kinds of approaches. Media mapping with an image-text parallel corpus is the best of all. Table 6 summarizes the ranks of our official runs with media mapping approach in Image- CLEFphoto2007. Each row lists the language pair, total submitted runs and the rank of our runs. Compared with the runs of different participants, media mapping approach performs quite well in different language pairs. Except English mono-lingual and Simplified Chinese-English cross-lingual runs, our system ranks number 1 in the rest of official runs we submitted. That shows the robustness of our media mapping approach in integrating text and visual information. Table 6. Ranks of official runs using media mapping approach in ImageCLEFphoto2007 Mono-Lingual Run/Cross-Lingual Run Total Submitted Runs Rank English English German German 30 1 Spanish Spanish 15 1 Simplified Chinese English 23 2 Tradition Chinese English 1 1 French English 21 1 Italian English 10 1 Japanese English 6 1 Portuguese English 9 1 Russian English 6 1 Spanish English Conclusion This paper explores the use of the web for query and document expansion. The experiments show that the named entities expanded from the web are useful. Limiting the search web sites to Wikipedia seems not to improve the performance and may filter out some related webs. Document expansion brings in too much noise, so that

8 Using an Image-Text Parallel Corpus and the Web for Query Expansion 511 the performance decreases 28.24%. Regarding media mapping as query expansion improves the retrieval performance very much. It shows the robustness of media mapping method even the new task definition is more challenging than before. Integrating both the web and an image-text parallel corpus for query expansion cannot improve the performance. Acknowledgments. Research of this paper was partially supported by National Science Council, Taiwan, and Excellent Research Projects of National Taiwan University, under the contracts E and 96R0062-AE References 1. Clough, P., Sanderson, M., Müller, H.: The CLEF 2004 Cross-Language Image Retrieval Track. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds.) CLEF LNCS, vol. 3491, pp Springer, Heidelberg (2005) 2. Clough, P., Müller, H., Deselaers, T., Grubinger, M., Lehmann, T.M., Jensen, J., Hersh, W.: The CLEF 2005 Cross-Language Image Retrieval Track. In: Peters, C., Clough, P., Gonzalo, J., Jones, G., Kluck, M., Magnini, B. (eds.) CLEF LNCS, vol. 4022, pp Springer, Heidelberg (2006) 3. Grubinger, M., Clough, P., Hanbury, A., Müller, H.: Overview of the ImageCLEFphoto 2007 Photographic Retrieval Task. In: Nardi, A., Peters, C. (eds.) Working Notes of the 2007 CLEF Workshop (2007) 4. Chang, Y.C., Chen, H.H.: Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W., de Rijke, M., Stempfhuber, M. (eds.) CLEF LNCS, vol. 4730, pp Springer, Heidelberg (2007) 5. Lin, W.C., Chang, Y.C., Chen, H.H.: Integrating Textual and Visual Information for Cross- Language Image Retrieval: A Trans-Media Dictionary Approach. Information Processing and Management 43, (2007) 6. Chen, H.H., Chang, Y.C.: Language Translation and Media Transformation in Cross- Language Image Retrieval. In: Sugimoto, S., Hunter, J., Rauber, A., Morishima, A. (eds.) ICADL LNCS, vol. 4312, pp Springer, Heidelberg (2006)

Experiment for Using Web Information to do Query and Document Expansion

Experiment for Using Web Information to do Query and Document Expansion Yih-Chen Chang and Hsin-Hsi Chen * Department of Computer Science and Information Engineering National Taiwan University Taipei,