Cross-Media Meta Search: Query Relaxation and Information Integration for Heterogeneous-Media Search Engines

Size: px

Start display at page:

Download "Cross-Media Meta Search: Query Relaxation and Information Integration for Heterogeneous-Media Search Engines"

Bonnie Hall
5 years ago
Views:

1 Master Thesis Cross-Media Meta Search: Query Relaxation and Information Integration for Heterogeneous-Media Search Engines Supervisor Professor Katsumi TANAKA Department of Social Informatics Graduate School of Informatics Kyoto University Akihiro KUWABARA February 9, 2004

2 Cross-Media Meta Search: Query Relaxation and Information Integration for Heterogeneous-Media Search Engines Akihiro KUWABARA Abstract In recent years, the quantity of multimedia contents on WWW has been increasing with improvement in the speed of transmission speed, and the spread of digital cameras. Since the contents are distributed on a lot of websites and expressed by various media, it is important to construct a function which searches the contents effectively. Conventional meta search engines retrieve the results from several search engines and automatically classify them. The meta search engines dispatch queries to same type search engines, and retrieve mono-type contents. Moreover, only the link to a Web page is shown in ranking form as a unified reference result. However, in such conventional meta search engines, the information on various media is simultaneously uncollectible. If a user follows the link of a reference result and peruses two or more Web pages, he can acquire effective information at last. In this paper, in order to solve this problem, we propose a system which dispatch queries to adequate search engines based on the media type. Moreover, we propose the system which extracts only the information in relation to the reference keyword group from the Web page obtained as a reference result, and unifies these automatically. The information over various media relevant to the reference keywords are searched and unified. It can be called output like an encyclopedia using Web page. Users can only input reference keywords and can peruse easily various information currently distributed on Web about the keyword. Users can effort of perusing the Web page of a link place sequentially from the result of higher rank and discovering effective information from Web pages. And also users can peruse the information over reference keywords now at a glance effectively so that it may say that this language is such meanings. In this system, the case where are the reference question by two or more i

3 ii keywords is inputted by the user is considered. The search engine used here corresponds to various media. That is, it is the search engine with which the types of a text search, a image search, a music search, etc. We focus on a text search and a image search. we propose the method of transforming a reference question and collecting Web pages efficiently. A text search has many hits of a solution, but it is difficult to acquire effective information. A image search has high precision and can obtain the picture of a reference keyword simply easily, it has very few hits of a solution. In order to solve this, we use query relaxation. The query relaxation method assigns whether it is used for text search, or it is used for image search for every word of a reference question. And each search engine performs separately and the common Web page of a reference result becomes the answer of reference. By using this query relaxation, it becomes possible to utilize effectively taking advantage of the strong point of the search engine of two different media. That is, it becomes possible to raise recall, maintaining precision. Next, only the portion in relation to the reference keyword is extracted from the inside of a Web page set of the solution collected using query relaxation method. A related portion is extracted in consideration of frequency of the word which appears in the Web page group of a search result, and position relation between the picture of image search and a paragraph. Thus, it enables a user to peruse effectively by unifying only the important portion of a Web page instead of the Web page itself. By cross-media meta search, we think that users become possible to peruse the suitable information about the inputted reference keywords simply and to search multimedia contents easily.

4 iii WWW Web WWW Web Web Web. Web Web Web Web

5 iv And Web.... Web... Web Web. Web Web.

6 Cross-Media Meta Search: Query Relaxation and Information Integration for Heterogeneous-Media Search Engines Contents Chapter 1 Introduction 1 Chapter 2 Cross-Media Meta Search 5 Chapter 3 Query Relaxation Basic concept Query Relaxation method Answers by Query Relaxation Example of Query Relaxation The degree of Query Relaxation Merit of Query Relaxation Chapter 4 Experiments and Evaluation The outline of experiment Experiment results Consideration Chapter 5 Improvement of Query relaxation Method of improvement Execution order of subset Breadth-first search and Depth-first search The statistical technique The linguistic technique The extended proposal of the query relaxation Chapter 6 Information Integration Extract information Copyright problem Searching based on Web page Informational arrangement... 35

7 Chapter 7 Prototype System Implement Implemeting display Chapter 8 Related work 41 Chapter 9 Conclusion 45 Acknowledgments 47 References 48

8 Chapter 1 Introduction Since the Internet environment has spread increasingly, the number of Web pages is increasing steadily. The quantity of multimedia contents on WWW has been increasing by the spread of a broadband, digital cameras, etc. Thus, since Web space is flooded with various information, it is becoming very difficult for users to collect only useful information. It is the search engine which a user uses for looking for a Web page needed from a huge Web page. The conventional search engine shows the link to the Web page of a reference result by inputting a reference question. However, there is a limit in the amount of information which each search engine has, and since it is decided like the text search engine and the image search engine what media are searched for every search engine, there is a limitation also in the kind of information which can be searched. It is the meta search which is compensated with the ability not to do with one search engine. A meta search increases the amount of information by using two or more search engines collectively. Moreover meta search engine is used as a means to search information effectively and to unify it. In the conventional meta search engine, if a user inputs a reference keyword group and performs reference, user input keywords are passed to each search engine, each search engine carries out retrieval separately, and collects Web pages. A meta search engine, to the Web page which each search engine collected, removes duplication, operates a classification automatically, and outputs a reference result. But, meta search engines have three problems. 1. The meta search engines uses same type search engines which search Web pages based on text style. In the existing meta search, only the text search engine is used mostly. Thereby, since only the text document in a Web page is taken into consideration, it is thought on the Web page which has the present various media that sufficient reference cannot be performed. 1

9 2. the same reference question is performed to every search engine which the meta search engine uses. When there were many reference keywords, or when unrelated for every reference keyword, a reference result which a user desires does not come out. In order to solve this problem, the given keyword is not used as it is, but it is considered that it is necessary to make it change into a certain form. 3. Finally, as unified reference results show the links to Web pages, users have to browse one by one. In almost all search engines, the link to a Web page is displayed as a reference result. Therefore, when a user peruses the Web page of a reference result, a user is very troublesome, in order to have to repeat operation of perusing the Web page of each of reference results until it discovers the Web page which has started the contents which can be judged to be useful information. Moreover, since various contents are described in one Web page, a user cannot collect only useful information efficiently. Thus, it is not easy for users to acquire useful information in the conventional search engines. It is important that users can acquire various information, such as text, and pictures, to having inputted the reference keyword into one search engine, and users can easily find useful information. Now, the search engine is highly efficient that it is improved rapidly and is easy to use rapidly. I think that meta search of using them well collectively is a very effective means. If a meta search which solves a problem which was described above exists, a user will be considered that it can collect information very efficiently. Then, we propose the Cross-media meta search. We define the Cross-media meta search as being collecting reference results using a search engine of a different kind. By using a search engine of a different kind simultaneously, the problem that only the text search used mostly is solvable. Moreover, I think that the Cross-media meta search needs to transform a reference question in order to use a search engine of a different kind. I think that informational processing 2

10 and informational integration are required, in order to show the reference result of each media efficiently. Information integration Search result Input keywords The system Return answer Change keywords Search engine Web space Collect Web pages Figure 1: Conventional Meta Search 3

11 Fig.1 shows the whole system image. The system has the following functions. the system uses search engines of a different kind. the system changes user input keywords. the system integrations information which each search engine collects The remainder of this paper is organized as follows: Section 2 explains Cross-Media Meta search, Section 3 explains Query Relaxation, Section 4 discusses our experiments, Section 5 discusses improvement of query relaxation, Section 6 explains Information Integration, Section 7 explains the prototype system, Section 8 discusses related works, and we conclude in Section 9. 4

12 Chapter 2 Cross-Media Meta Search We think there is much information on Web space. If we can use Web well, we can receive a lot of benefit. Especially, it is very effective that users get only the information which users want. Then, we think that Web can be expressed like an encyclopedia. When we can only look up a word in an encyclopedia, we can know the meaning, a photograph, a related word of the word. We put this in for Web retrieval. Thus, users only inputted the reference keyword and get various information. We use a meta search in order to realize this. Each of conventional search engines cannot cover the huge information on Web space. And, since each of search engines has a field, such as text search and image search, users cannot get the information on various forms in one search engine. By using meta search engines, we can increase the amount of information and can also receive several kinds of media. In this paper, we propose Cross-media meta search. The Cross-media meta search engine intuitively collects several types of information from the Web based on user input keyword queries. The differences between conventional Web meta-search engines and Cross-media meta search engines are shown in Figs.2 and 3 are summarized as follows: Conventional meta search Conventional Web meta search engines send the user input keyword query Q (possibly with minor changes) directly to several search engines. Modifications are not made to a user-input keyword queries. To retrieve the results conventional meta-search engines return a list of pertinent URLs with duplicates removed and with ranking scores. Cross-Media meta search The Cross-media meta search engine is designed to collect not only Web pages, but also mixed types of multimedia contents (images, sounds etc.) by sending, query Q to several search engines, each of which is dedicated to a specific type of media content. On sending query Q, the search engine 5

13 may modify and/or relax the term into Q 1,...,Q n, according to the characteristics of each media type. The output of the Cross-media meta search engine is not simply a list of URLs, but a mixture of texts, images, and sounds, edited like an encyclopedia. Search engine E1 Text search result Q Search engine E2 Text search result Intersection Search engine Em Text search result List of URL Figure 2: Conventional Meta Search Q1 Search engine E1 Text search Web pages I Q Q2.. Qn Search engine E2 Image search. Search engine Em Music, etc search Web pages I Web pages Intersection Web pages of result Information Integration Figure 3: Cross-Media Meta Search 6

14 There are two points where Cross-media meta search is characterized. The first is that Cross-media meta search uses the search engine of the type of various media in order to collect various information at once. We think it important that a system changes a user s question into the form which was adapted for each search engine without using as it is, in order to use the search engine of different media efficiently. Then, we propose the method of query relaxation. The second is that Cross-media meta search extracts one portion out of a Web page, integrates those information and displays on users. Because it is bad efficiency that a system displays URLs of Web pages collected with each search engine. It is thought that the information which a user needs is a part in the Web page of the collected result. Then, we propose the method of information integration of Web pages collected of each search engines. 7

15 Chapter 3 Query Relaxation In this chapter, we describe Query Relaxation of Cross-media meta search. In the conventional meta search, the user s input query is used for each search engine as it is. However, in Cross-Media meta search, each search engine is used by using the Query Relaxation method. So, in Cross-Media meta search, each engine can be used efficiently. The method of Query Relaxation is as follow. 1. User input query is divided into a subset. 2. Each subset is used for each search engine. 3. The reference results of each search engine are collected. 4. The common Web page of each reference result is result of Cross-Media meta search. The method of Query Relaxation is described in detail below. 3.1 Basic concept We assume that users wish to have multimedia content (texts, images, sounds, etc.) that is related to their keywords K 1,K 2,...,K n (n 2). Conventionally, if a user wants the text about these keywords, he will use a text search and if a user wants the picture about these keywords, he will use a image search. These actions are usual. However, since it is a thing relevant to these keywords, if a text and a picture can be searched simultaneously and are able to peruse a reference result simultaneously, we think that a user can peruse information very efficiently. Then, our Cross-media meta search engine uses various search engines. (namely, E 1,E 2,...,E m (m 2)). For example, E 1 is Google[5], E 2 is AltaVista[6], E 3 is Google image search[7], and so on. Thus, the key feature of our system is that it can use a mixture of search engines with different media types. We describe the search engines which Cross-media meta search uses. A text search engine is typified by Google and AltaVista. By inputting a keyword, these search engines outputs a reference result Web pages which contains all keywords that user inputs from the Web page 8

16 in the database which crawler collected. Many search engines correspond to this and are called so-called robot type search engine. This robot type search engine is putting many pages into the database, and attaches the index. Since it will search out of a lot of pages if it searches, there is very much hit number of cases. Moreover, by using algorithm called an original page rank by each search engine, the optimal solution comes to a higher rank. However, since there is much hit number of cases, there is a fault that there will be many noises or a page which is unrelated and which does not have validity will hit mostly. An image search engine is typified by Google image search. When keywords with which a user is related to a picture needed is inputted, a system judges contents, by analyzing the text which adjoins a picture, the caption of a picture, and many other factors. Moreover, duplication is eliminated using advanced algorithm and the picture of the highest quality is displayed first. Thus, an image search engine obtains a picture in inputting a text. It is very efficient for collecting pictures so as to extract a picture, Since the image search engine is looking at only a picture and its very near, its relevance for every keyword is high, and its relevance of a keyword and a picture is high. However, the hit number of cases may decrease very much, and a reference result may become zero. an animation and music search engine is typified by Naver. This is the search engine which specialized in the animation file on the Internet. The animation file of various form, such as rm, asf, mpg, mov, and swf, can be searched. an animation and music search engine obtains these files in inputting a text and displays these file in link form there is a problem that we must visit each link and check files. Fig.4, 5, 6 shows the search results of each search engine. In this paper, We focus on two search engines, the text search engine and image search engine considered to be effective for acquiring information most. 9

Title and Link A picture of image search A part of text of Web page Link of Web page Text search engine Figure 4: text search Image search engine Figure 5: image search Link

realized using these two search engines. 3.2 Query Relaxation method Most users do not know which keywords are most appropriate for each search engine.

Unfortunately, when we use an image search engine, if the number of keywords exceeds 3, it is difficult to obtain any images for the query.

17 Title and Link A picture of image search A part of text of Web page Link of Web page Text search engine Figure 4: text search Image search engine Figure 5: image search Link of movie or music file Link of Web page Type of file and file size Movie and music search engine Figure 6: each search engine We describe how Cross-media meta search is realized using these two search engines. 3.2 Query Relaxation method Most users do not know which keywords are most appropriate for each search engine. Therefore, we allow users to input their query in the form of a conjunctive query Q = K 1 K 2... K n. Unfortunately, when we use an image search engine, if the number of keywords exceeds 3, it is difficult to obtain any images for the query. but we cannot get efficient reference result in most case. That is query which consists of three or more words is severe as conditions for reference. To solve the problem, we propose relaxing conditions method of queries to receive adequate results by sharing the keywords for some kinds of search engines. 10

18 For example, when three keywords, Mt. Fuji, snow and sunset, are inputted Google image search engine, a referent result is nothing. But, inside of a referent result when two keywords, Mt. Fuji and snow, are inputted Google image search engine, four Web pages contain the text, sunset. In this case, we can receive the pages which are related to three keywords by relaxing conditions. Let Ans (Q) be the set of answers for a user-input conjunctive keyword query Q. In our query relaxation approach, the conjunctive query Q is divided into a set of tuples bound by sub-query: K 1,...,K n K 1, K 2,...,K n K 2, K 1,K 3...,K n. K 1,K 2, K 3,...,K n K 1,K 3, K 2,K 4...,K n. K 1,...,K n 1, K n K 1,...,K n In this case, the query is divided into two separate sub-queries, because two kinds of search engines are required for the query. Generally, the number of sub-queries is dependent on the number of search engines. The sub-query for each set of tuples is translated for the image search engine. For example, the first tuple (i.e. [, K 1,...,K n ]) shows that there are no keywords for the text search engine, and K 1,K 2,...,K n are the input keywords for the image search engine. In general, the larger the number of keywords, the more difficult the image search, as shown by the latter sub-query. Therefore, more images can be obtained by the latter half of the tuple, because there are fewer keywords in the latter sub-query. Fig.7 shows two separate sub-queries. 11

19 ,{ K1, K 2,..., K n} { K1},{ K 2,..., K n} { Kn},{ K1,..., Kn 1} { K 1, K 2 },{ K 3, K 4,..., K n} { K n 1, Kn },{ K1, K 2,..., K n 2} { K1 2 n, K,..., K }, The former element (blue) is used for text search. The latter element (pink) is used for image search. Figure 7: sub-queries 3.3 Answers by Query Relaxation Answers for query Q are retrieved as unions of all the sub-query tuples: Ans(Q) = Ans(K 1... K n,e 2 ) (Ans(K 1,E 1 ) Ans(K 2... K n,e 2 )) (Ans(K 2,E 1 )... Ans(K 1 K 3... K n,e 2 )) (Ans(K 1 K 2,E 1 ) Ans(K 3... K n,e 2 )) (Ans(K 1 K 3,E 1 )... Ans(K 2 K 4... K n,e 2 )) Ans(K 1... K n,e 1 ) 12

20 where Ans(Q, E i ) means the answers of the query Q by the search engine i. In this case, E 1 is a text search engine and E 2 is an image search engine. The engines return URLs which match the queries. To compute (Ans(K 1 K 2,E 1 ) Ans(K 3... K n,e 2 )), Ans(K 1 K 2,E 1 ) and Ans(K 3... K n,e 2 ) are processed separately. Then the intersection of the answers is calculated. The answers are retrieved in the same way for all of the tuples. Finally, Ans(Q) can be retrieved as unions of all the answers. { K 1, K 2},{ K 3,..., K n} K1 K 2 input text search engine output K3... K n input Image search engine output Common web pages Web pages of results of a text search Web pages contained picture of results of an image search Web pages of results by Query relaxation Figure 8: Answers by Query Relaxation 13

3.4 Example of Query Relaxation We would now like to demonstrate the results obtained using the query relaxation approach with the following query: Q = q 1 q 2, where q 1 is Mt. Fuji and q 2 is snow.

The right diagram shows the results using the query relaxation approach.

21 3.4 Example of Query Relaxation We would now like to demonstrate the results obtained using the query relaxation approach with the following query: Q = q 1 q 2, where q 1 is Mt. Fuji and q 2 is snow. Fig. 9 shows venn diagrams of the results of query Q. The left diagram shows the results from a conventional meta-search engine. The right diagram shows the results using the query relaxation approach. In the left diagram, the hatched areas show the results for Ans(q1 q2, SE text ), and the right hatched areas show the results for Ans(q1 q2, SE img ), where SE text is the text search engine and SE img is an image search engine. The top hatched areas on the right diagram shows the results for (Ans(q 1,SE text ) Ans(q 2,SE img )). The bottom hatched areas shows the results for (Ans(q 2,SE text ) Ans(q 1,SE img )). Therefore, using proposed query relaxation, the possibility for obtaining pertinent results is increased. Conventional results Query relaxation results Figure 9: Venn diagram of results of results of query Q Thus, in addition to results of the conventional meta search, the common Web pages of text search and image search become results by using this technique. That is, the Web pages of Fig.10 are also collected. 14

Text 䇸snow䇹 Image 䇸Mt. Fuji䇹 Figure 10: the sample pages which the system collects 3.

Why does it call it the degree of Query Relaxation? Because, text search engine has more hit number of cases than image search engine. This shows that the way of image search engine is severe.

That is, when three of the keywords are used for an image search engine, the degree of relaxation is considered to be zero.

22 Text 䇸snow䇹 Image 䇸Mt. Fuji䇹 Figure 10: the sample pages which the system collects 3.5 The degree of Query Relaxation We deﬁne the degree of relaxation as the number of keywords in a user input query Q that are actually used by a Web text search engine. Why does it call it the degree of Query Relaxation? Because, text search engine has more hit number of cases than image search engine. This shows that the way of image search engine is severe. Since using text search engine has relaxed search, it makes it the degree of relaxation for how many keywords to have used in text search engine. That is, when three of the keywords are used for an image search engine, the degree of relaxation is considered to be zero. When two keywords are used for an image search engine and the third keyword is used for a text search engine, the degree of relaxation is considered to be one. We compare the results of 0 of relaxation, to 1 of relaxation, and 2 of relaxation respectively. Fig.11 shows the example putted concrete keywords. 3.6 Merit of Query Relaxation A user wants to peruse only eﬀective information. The index of the search engine from which a type is diﬀerent can be used. The Web pages which have 15

23 0 of relaxation, { Mt. Fuji snow sunset} Using text search Using image search 1 of relaxation { Mt. Fuji},{ sunset snow} { sunset }, { Mt. Fuji snow} { snow },{ Mt. Fuji sunset} 2 of relaxation { Mt. Fuji snow},{ sunset} { Mt. Fuji sunset},{ snow} { snow sunset},{ Mt. Fuji} 3 of relaxation { Mt. Fuji snow sunset }, Figure 11: Lattice of query Q (Q = Mt. Fuji snow sunset) not been collected until now are also collected taking advantage of the diversity of media. Moreover, if there are many questions inputted in order to extract conditions, a reference result will not come out, If there are few questions in order to search broadly, it will increase to an excessive page to a reference result. It is difficult to adjust this well. Since an image search outputs the picture a user wants as a search result, it is not necessary to carry out text search, and it does not need to look for a picture out of Web pages. Therefore, we think that exact comprehensive and information will be collectable, because it uses mixing the picture reference with this high accuracy, and the text reference which can search many Web pages. We think that we create an encyclopedia using Web by using this technique. 16

24 Chapter 4 Experiments and Evaluation 4.1 The outline of experiment By using the method of Chapter 3, it is proved what result is obtained actually. We investigated the hit number of pages and the effective number of pages of search results at the time of inputting various keywords in the experiment. The keyword used for an experiment chooses what had become the title of the report of the news page which mainly exists from various fields. We use Google for text search, and Google image for image search. The user input keyword is set to three. This time, the reference results of the degree 0 of relaxation, the degree 1 of relaxation, and the degree 2 of relaxation are compared. Here, the result of the degree 1 of relaxation also includes the result of the degree 0 of relaxation. The result of the degree 2 of relaxation includes the result of the degree 0 of relaxation, and the degree 1 of relaxation. However, since we want the result using image search, we don t use the result of the degree 3 of relaxation of text search only. 4.2 Experiment results A part of experiment result is shown below The keywords of text search are inputted into a text search. The keywords of image search are inputted into an image search. The hit number shows the common web pages of a text search result and an image search result. The number of pertinent pages shows the page which we judge to be described the contents of a reference keyword appropriately in the hit pages. Fig.12, 13, 14 shows table about experiments. Fig.15, 16 shows graph about recall and precision for every reference keyword. Precision and recall are displayed for every degree of relaxation. Precision is the average precision to the result of every degree of relaxation. Since the whole solution set on Web is not understood, recall shall consider total of the effective page for every experiment as the whole solution set. Therefore, it takes cautions that the recall at the time of the degree 2 of relaxation is 100%. So this experiment is the evaluation relatively. 17

25 keywords soccer Nakata team of Japan The degree of relaxation 0 Keywords for image search soccer Nakata team of Japan Keywords for text search Number of hits 19 Number of Pertinent pages 17 soccer Nakata team of Japan soccer team of Japan Nakata Nakata team of Japan soccer soccer Nakata team of Japan Nakata soccer team of Japan team of Japan soccer Nakata 13 0 The degree of relaxation keywords Kyoto Autumnal leaves Koudai-ji Keywords for image Keywords for text Number of Number of search search hits Pertinent pages Kyoto Koudai-ji Autumnal leaves 6 6 Kyoto Autumnal leaves Koudai-ji Kyoto Koudai-ji Autumnal leaves Autumnal leaves Koudai-ji Kyoto 10 8 Kyoto Autumnal leaves Koudai-ji 7 1 Koudai-ji Kyoto Autumnal leaves 6 2 Autumnal leaves Kyoto Koudai-ji keywords musical Shiki theatre company performance The degree of relaxation 0 Keywords for image search musical performance Shiki theatre company Keywords for text search Number of hits 19 Number of Pertinent pages 17 musical Shiki theatre company performance musical performance Shiki theatre company Shiki theatre company performance musical musical Shiki theatre company performance Shiki theatre company musical performance performance musical Shiki theatre company 8 2 Figure 12: table of type A 18

26 keywords Mt. Fuji sunset snow The degree of relaxation Keywords for image search Keywords for text search Number of hits Number of Pertinent pages 0 Mt.Fuji sunset snow 0 0 Mt.Fuji sunset snow Mt.Fuji snow sunset sunset snow Mt.Fuji 3 3 Mt.Fuji sunset snow snow Mt.Fuji sunset 1 1 sunset Mt.Fuji snow 0 0 keywords typhoon flood heavy rain The degree of relaxation Keywords for image search Keywords for text search Number of hits Number of Pertinent pages 0 1 typhoon flood heavy rain typhoon heavy rain typhoon flood flood heavy rain heavy rain flood typhoon typhoon heavy rain heavy rain flood typhoon flood flood typhoon heavy rain 22 5 Figure 13: table of type B keywords SARS Pneumonia Condition The degree of relaxation Keywords for image search Keywords for text search Number of hits Number of Pertinent pages 0 SARS, Pneumonia, Condition 0 0 SARS, Pneumonia condition SARS, Condition Pneumonia, 0 0 Pneumonia, Condition SARS SARS Pneumonia Pneumonia, Condition SARS, Condition Condition SARS, Pneumonia 0 0 Figure 14: table of type C 19

27 Precision(%) average degrees 0 of relaxation degrees 1 of relaxation degrees 2 of relaxation Recall(%) Figure 15: graph of typep A Precision(%) average degrees 0 of relaxation degrees 1 of relaxation degrees 2 of relaxation Recall(%) Figure 16: graph of type B 20

28 4.3 Consideration The results of the experiment are shown in Fig.12, 13, 14. The recall and precision graphs for each keyword are also shown in Fig.15, 16. The recall ratio for the results increases from 0 to 1 for every degree of relaxation. That is, increasing the number of keywords for the text search engine is an effective means of obtaining pertinent Web pages. Moreover, compared with change of the recall from the degree 0 of relaxation to the degree 1 of relaxation, the change to the degree 2 of relaxation from the degree 1 of relaxation was small. In the graph, there are some patterns. Type (A) is normal. The recall ratio is increased and the precision ratio is decreased according to the degree of relaxation. Type (B) is especial. Type (B) shows from 0 to 1 of relaxation with and increased precision and recall ratio that is particularly apparent. No answers were obtained at 0 of relaxation, but by relaxing the degree, many answers were obtained. We think that it is the pattern in which the validity by relaxing the query is shown notably. Type (C) does not have any answers. When there is no keyword which can express a concrete picture like Type (C), even if it makes the degree of relaxation high, a reference result does not come out. When there are few especially results of the degree 0 of relaxation, this pattern appears in many cases. This shows that it is effective to relax a query, if there are few results of the degree 0 of relaxation. By the result of an experiment, query relaxation can improve recall considerably, maintaining precision to about 70% at one of degree. Through an experiment, even if a user inputs a few keyword into image search engine, it turns out that the poor field that there is little hit number of cases also exists. In the case of the words and phrases to which each keyword cannot express a concrete picture, even if it raises the degree of relaxation, there is very little hit number. It turns out that the remarkable difference is attached 21

29 to the number of hits, and the effective page by the keyword relaxed. Depending on the keyword, the hit number of a result of the degree 2 of relaxation become fewer than of the degree 1 of relaxation. When it leaves a keyword more concrete than such a result to image search, it is thought that the number of hits will increase. However, we think that the degree of relation of reference keywords, the degree of coincidence, the frequency for which a keyword is used as contents of the metadata of a picture, etc. are related. However, in the present evaluation, cautions are required for it to be dependent on the picture reference algorithm of Google too much. 22

30 Chapter 5 Improvement of Query relaxation We described the query relaxation method in Chapter 3. However, in the method of calculating all subsets, when a user inputs a lot of keywords, the efficiency of the query relaxation method will become very bad. When a question increases, it is because the number of subsets increases explosively. Therefore, we consider the increase in efficiency of the query relaxation method supposing the case where the reference question Q consists of a lot of keywords. That is the case where many keywords are inputted into a search engine in order that a user may extract and look for his information needed, the case where one sentence of a Web page is copied, etc. We describe the approaches of the query relaxation method for lots of keywords (N keywords) below. 5.1 Method of improvement Execution order of subset Many subsets are made by inputting many keywords. We focus on the point whether it is good to somewhere perform the subset. The lattice structure of a subset in the query relaxation method is shown in Fig.7. The number of search engines is two, the number of keywords is N. The element of the former of each subset in a figure is used for text search and the element of the latter is used for image search. That is, they are the degree 0 of relaxation, and the degree 1 of relaxation sequentially from a top. It is clear by looking at the lattice structure of a figure that the number of subsets of query will become a very huge quantity, If the number of keywords will be N even if the number of search engines is two. Since it is such, in order to collect solutions efficiently by Cross-media meta search, we think it important not to collect solution pages in all subsets, but to collect solutions sequentially. We think each search engine. About text search engine, The number of solutions decreases like the direction under a figure, that is, the degree of relaxation becomes large. Since a reference question increases and conditions are extracted, this is natural. About image search engine, The number of solutions 23

31 decreases like the direction up a figure, that is, the degree of relaxation becomes small. Thus, although it is clear about the solution of each search engine, since the question easing method is used simultaneously, such a relation is not realized. This is the point which makes it difficult to decide the execution order of the query relaxation method. Pruning of every search engine is considered by the easy method. The condition of the solution which put keyword K 1 K 2 K 3 into search engine E 1, that is Ans(K 1 K 2 K 3,E 1 ), is more severe than of Ans(K 1 K 2,E 1 ). Since the keyword is added, this is natural. Therefore, the number of the solutions of Ans(K 1 K 2 K 3,E 1 ) should become below Ans(K 1 K 2,E 1 ). When Ans(K 1 K 2,E 1 ) does not have a solution, Ans(K 1 K 2 K 3,E 1 ) does not have a solution. If this is used, the subset with the element without a solution of a subset does not have a solution. If it becomes so, it is not necessary to search one of the two any longer. It can specify not performing beforehand reference which does not have a solution or does not exceed a threshold with the hit number. Thus, by being simplified, pruning of each search engine is possible. The argument on an execution order is described below Breadth-first search and Depth-first search The foundations of the order of from which subset to perform are breadth-first search and depth-first search. breadth-first search The degree 0 of relaxation is performed first. Since what has high accuracy is good as for reference, Web pages which has high accuracy are showed sequentially. Second, every one degree of relaxation is raised. Finally, a query is relaxed until the contents which a user satisfies are obtained. By performing breadth-first search, it can see sequentially from what has high accuracy. We think that wide range of information can be acquired by seeing the sequence with the same degree of relaxation. depth-first search The degree 0 of relaxation is performed first. Second, a user considers which keyword it leaves and the subset which raised the degree of relaxation 24

32 is performed. For example, the case of the search Mt.F uji snow sunset, suppose that the user specified that he wanted to see the picture which is Mt. Fuji. Then, after showing that Mt.F uji snow is used for image search, the solution by which Mt.F uji is used for image search is displayed. When the same keyword can see what is used for picture reference, we think that it is suitable in the specific thing when you want a user. In order to perform these, the statistical technique is used in breadth-first search and the linguistic technique is used in depth-first search. The technique is shown below The statistical technique By the present query relaxation method, all subsets are to be performed. However, calculation time is in a starting result display, and they are stripes. The method of performing from the optimal subset and showing a solution one by one is required. In order to judge which subset is the optimal, there is a method of asking the hit number of cases of each keyword for the optimal subset. The reason using the hit number is that what it is easy to acquire as information which it had the front in the stage which begins reference is the hit number of cases. First, the one where the degree of relaxation is smaller, that is many keywords used for image search, has high accuracy. Image search is because the conditions are severe. Second, generally, there is so much hit number of image search of the word that there is much hit number of text search of a certain word. Third, considering the And search with two words, the way of a reference result which combined the word with much reference number of cases increases. Therefore, we think that the method of searching from a subset with the small degree of relaxation and with much hit number of cases of each word used for picture reference is the optimal. 25

33 0 of relaxation 1, { Mt. Fuji snow sunset } 1 of relaxation { Mt. Fuji},{ sunset snow} { sunset }, { Mt. Fuji snow} { snow},{ Mt. Fuji sunset} 2 of relaxation { Mt. Fuji snow},{ sunset} { Mt. Fuji sunset},{ snow} { snow sunset},{ Mt. Fuji} 3 of relaxation { Mt. Fuji snow sunset }, The number of hits of each keywords Mt. Fuji > snow > sunset Figure 17: Breadth-first search of the stastistical technic Fig.17 shows the execution order of the subset when performing breadthfirst search. As an example, the case where a keyword is inputted as Mt. Fuji and snow and sunset is considered. Here, the hit number of cases presupposes that it is the order of the Mt. Fuji snow sunset. The number in a figure is the number of an execution order. As shown in a figure, it carries out from 0 with the lowest degree of relaxation, and, next, the degree 1 of relaxation is performed. In the degree 1 of relaxation, it shall perform from a subset with most scores using the following formulas. Score sub is score of each subset text hits all is the sum total of the number of hits of all reference results which put each word of a reference keyword into text search, respectively, and was obtained. image hits all is the sum total of the number of hits of all reference results which put each word of a reference keyword into image search, respectively, and was obtained. text hits sub is the sum total of the hit number of the keyword used for text search and image hits sub is the sum total of the 26

34 hit number of the keyword used for image search of a subset to ask for a score. Score sub = α( text hits sub )+β( image hits sub ) (α + β = 1) (1) text hits all image hits all α and β are changed by to which reference importance is attached. This formula is a formula for choosing what has more hit number of cases based on the idea about the above-mentioned hit number. Thus, the query relaxation method shall be carried out to breadth-first search using a statistical technique. However, there is a possibility that accuracy will become bad and a noise will increase more as a reference result increases. Moreover, the information which the user meant may not come out. It becomes important to adjust by sorting out in the stage of a display of a reference result. The other statistical techniques that the technique of judging whether it uses for image search or it uses for text search, based on the information which the user judged in the past by saving the data of the past reference, is also considered. We experimented statistical method. Fig. 18 is an example of an experiment. Fig. 19 shows the graph as a result of an experiment. order keywords Kyoto Autumnal leaves Koudai-ji Keywords for image Keywords for text Number search search of hits Number of Pertinent pages Precision Recall 1 Kyoto Koudai-ji Autumnal leaves % 9% 2 Kyoto Autumnal leaves Koudai-ji % 29% 3 Kyoto Koudai-ji Autumnal leaves % 66% 4 5 Autumnal leaves Koudai-ji Kyoto Kyoto Autumnal leaves Koudai-ji % 60% 78% 80% 6 Autumnal leaves Kyoto Koudai-ji % 83% 7 Koudai-ji Kyoto Autumnal leaves % 100% Figure 18: Experiment of the stastistical technic 27

35 Precision(%) order degrees 0 of relaxation degrees 1 of relaxation degrees 2 of relaxation Recall(%) Figure 19: Graph of the stastistical technic We think from an experiment result that precision may fall only by judging from the hit number of cases. However, recall is high also in an early stage. We think that the statistical technique attaches importance to recall The linguistic technique This is the method of considering the optimal subset paying attention to the ornamentation relation of the word in a subset. It is thought that a picture which user wants is a word modified from other words in the reference keyword group. That is, it is thought that it is the word limited most. For example, the case of the search Kyoto autumnalleaves Kodai ji, generally it is thought that the relation of Kodai-ji where autumnal leaves are famous in Kyoto is realized. Thus, we think that it is that with which Kodai-ji is embellished from other keywords. In this case, when Kodai-ji is inputted into a picture surely the reference number of cases increased so that it might understand also from Fig.20. It can be said that this is harnessing the characteristic of each search engine. There is a word which is easy to search with image search. In this case, the number of hits of proper nouns such as a person and a 28

36 0 of relaxation 1, { Mt. Fuji snow sunset } 1 of relaxation 2 3 { Mt. Fuji},{ sunset snow} { sunset }, { Mt. Fuji snow} { snow },{ Mt. Fuji sunset} 2 of relaxation { Mt. Fuji snow},{ sunset } { Mt. Fuji sunset},{ snow} { snow sunset},{ Mt. Fuji} 4 3 of relaxation { Mt. Fuji snow sunset }, The keywords of image which a user wants Mt. Fuji Figure 20: Depth-first search of the linguistic technic building tends to increase in picture reference. Thus, the technique of judging a subset can be considered by considering the ornamentation relation between words. However, since this is a guess, the page different from the intention of a user may be judged to be the optimal page. Moreover, the linguistic thing is difficult for a system judging. We experimented the linguistic method. Fig. 21 is an example of an experiment. Fig. 22 shows the graph as a result of an experiment. order keywords Kyoto Autumnal leaves Koudai-ji Keywords for image Keywords for text search search Number Number of of hits Pertinent pages Precision Recall 1 Kyoto Koudai-ji Autumnal leaves % 9% 2 Autumnal leaves Koudai-ji Kyoto % 21% 3 Kyoto Koudai-ji Autumnal leaves % 58% 4 Koudai-ji Kyoto Autumnal leaves % 72% Figure 21: Experiment of the linguistic technic 29

37 Precision(%) degrees 0 of relaxation degrees 1 of relaxation degrees 2 of relaxation 20 order Recall(%) Figure 22: Graph of the linguistic technic We think from an experiment result that precision is better than the statistical technique. The linguistic technique has high precision, maintaining moderate recall. We think that the linguistic technique attaches importance to precision The extended proposal of the query relaxation It is enumerated as follows what enhancing idea you exist besides the description of the above-mentioned. Duplication of a reference keyword is not permitted by query relaxation method. That is, when dividing into a subset, it uses for either text search or image search. By permitting duplication, a reference keyword can be used by both of reference. All information can be covered by allowing duplication. The number of processing increases enormously so that the number of subsets increases more enormously. When searching for example, Mt.F uji snow sunset, The solution set which the common pages of Mt.F uji sunset inputted into image search and Mt.F uji snow inputted into text search increases. It is thought by allowing duplication of a reference keyword that a reference result can be 30

38 extracted more. However, since calculation time increases, it is important to perform in not all subsets, but to calculate only a suitable subset, taking the hit number of a keyword into consideration. The concept of intitle and intext is introduced into the partial question to a text search engine. As a role of each word in a subset, the role assignment of whether the word is used by intitle or intext is carried out. It can be decided the word in the reference keyword, make which word main and is considered. However, since a role divides a partial question further, the number of the groups of a partial question will increase and efficiency becomes bad. An unnecessary keyword is filtered and removed from query Q. When the reference question consists of too much a lot of keywords, The Web page of the solution to the reference question is very difficult for finding it. Then, an unimportant word is removed by setting importance as each word of a question of a user. Only an important word is extracted and the word is used as a reference keyword. As a method of deciding importance, we consider how to feed back from the hit number of cases, and the method of preparing a relevant dictionary as a database beforehand. In this way, the efficiency of reference is improved by generating new query Q. Adaptation of feedback. The word which is easy to be used for a picture, and the word used for a text are fed back judging from a reference result. That is, the history of the past reference is stored in a database, a user profile is created, and the query relaxation method is performed based on it. Feedback of image search. This is a picture-oriented method more. A user feeds back by choosing a thing more needed and the similar thing from the picture of a reference result. It is more desirable to perform this with a search engine which analyzes and searches a picture. However, those search engines do not exist in the present stage. Then, we consider how to extract a word including the coincidence relation from the surrounding text of the picture which the user chose, and to search again. 31

39 Chapter 6 Information Integration In order that a user may collect information efficiently, a Web page is not shown as it is, but only the information related from a Web page is extracted and displayed. The information integration is described below. 6.1 Extract information The Web page has collected as a solution set by the query relaxation method. However, in one Web page, the subject which is unrelated to a reference keyword is also included. By removing unrelated subject from a Web page, we think that efficient useful information is acquirable. Based on this, only the portion relevant to a reference keyword is extracted from the inside of a Web page by this paper. The frequency of the word in the text in a Web page is used in considering the relevance to a reference keyword. Moreover, since a possibility that many subjects about the picture are included is high, the surroundings of the picture collected by picture reference also take a distance relation with a picture into consideration. A procedure is described below. 1. Calculate the importance of a word based on the frequency of a word. 2. Calculate the distance of a picture and a paragraph. 3. Calculate the importance of a paragraph by using both the importance of a word and distance of paragraph A formula is expressed below. I s is importance of paragraph s. w t is importance of word of t in paragraph s. r is distance of paragraph s from picture. I s = Score s r 2 (2) Score s = w t (3) w s A paragraph is extracted based on this importance. Thus, only related subject is extracted from the Web page of a reference result. Fig.23 shows this method. The reference result is made into the paragraph 32

and picture which were extracted from the Web page instead of a Web page unit. Thus, it unifies automatically, new contents are generated and a user is shown by making this into a reference result.

40 and picture which were extracted from the Web page instead of a Web page unit. Thus, it unifies automatically, new contents are generated and a user is shown by making this into a reference result. It is the feature for a reference result to be settled as contents with new not the display of a URL list but picture of a reference result and text document unlike the conventional meta search engine, and to be displayed. extraction Figure 23: Extract information Web 3,776m As for the Web page of the reference result collected for each sub-queries, the Web page unit does not serve as a solution. Only the portion relevant to the reference keyword is extracted from each Web page. Thus, a reference result is not a Web page unit but the portion extracted from the Web page. Here, the information extracted from the Web page of the reference result of each partial question is unified automatically, and new contents are generated. A user is shown this as a reference result. Unlike the conventional meta-search engine, a reference result is not the display of a URL list. In Cross-media meta search, it is the feature that the picture and a text document of a reference result are collected as new contents, and are displayed. Fig.24 shows this information integration method. 33

6.2 Copyright problem Figure 24: information integration In Cross-media meta search, only one portion which is related on a page unlike the conventional reference is extracted.

These acts may also become violation of copyright.

Moreover, in Google image search, only the photograph is extracted and shown. These acts cannot become violation of copyright.

41 6.2 Copyright problem Figure 24: information integration In Cross-media meta search, only one portion which is related on a page unlike the conventional reference is extracted. Therefore, the system is adding processing to the work of others who are called a Web page. Moreover, the system shows Web page so that it may be made to peruse simultaneously with other pages. These acts may also become violation of copyright. However, in the conventional search engine such as Google, the contents of some pages containing the reference keyword other than URL are displayed on the display screen of a reference result. Moreover, in Google image search, only the photograph is extracted and shown. These acts cannot become violation of copyright. Based on these, in Cross-media meta search, a system specifies the source point. 6.3 Searching based on Web page Thus, the Cross-media meta search which extended the usual reference can be used for other uses. One of them is searching based on a Web page. When the user needs reference most, it is a time of there being information which he does not understand, and information knowing. It is not efficient that user continue 34

to peruse a Web page after visiting and searching one by one at this time.

It can be used with the feeling of seeing as an encyclopedia by unifying a result and displaying briefly. Fig.

4 Informational arrangement For information integration, an important thing is not displaying information as

42 to peruse a Web page after visiting and searching one by one at this time. Therefore, it is the purpose of this method that a user will search simultaneously with perusing Web pages without using Web search cite. When the user has patrolled the Web page and the language regarded as not understanding comes out, this is specified with the drug of a mouse etc. At this time, while a user searches, Cross-media meta search moves by the background. It can be used with the feeling of seeing as an encyclopedia by unifying a result and displaying briefly. Fig.25 shows this image. Figure 25: searching based on Web page 6.4 Informational arrangement For information integration, an important thing is not displaying information as mere enumeration, but is displaying in a form with a certain scenario. Since we want to create a thing like the encyclopedia using Web as motivation, we consider the display method like an encyclopedia. Then, we think that an important thing is showing the contents considered to be the most effective to the portion which the users chose. As an element like an encyclopedia, there are 35

43 explanation of a term, a photograph, usage, related subject, etc. It is adapted for this information integration in this. In order to decide the element like an encyclopedia, it is determined from a reference keyword what is subject. The example of the element in the subject of reference to display is raised to below. Name of a person A profile, career, and photograph. Name of a place Scenery, sightseeing spot, and the special feature. Term Meaning, relation term and photograph. News A report, a related article, and characters. We think it effective to show such an element. Moreover, it is required to also consider the degree of similar of a text. Since there is little amount of information even if the same text is displayed, as for the subject with the high degree of similar, it is good to display the subject which was omitted and is different. For example, a medical virus and the virus of a computer mainly exist in a virus. Completely, since this is another subject, it classifies and it displays. However, the text with the high degree of similar is summarized into a medical virus. For example, even if there are many texts about an influenza virus, it does not indicate all, but what has the high degree of similar is eliminated and displayed. 36

44 Chapter 7 Prototype System The prototype system was implemented according to the above-mentioned method. In this chapter, we describe the prototype system of Cross-Media meta search system. And, implementing environment is as follows; OS : Windows XP CPU : Pentium4 2.00GHz Memory : 2GB Development Environment : Microsoft Visual Studio.Net/ C 7.1 Implement First, a user inputs their query which is consisted of K 1,K 2,...,K n (n 2). That is query Q = K 1 K 2... K n. Second, we use Google text search engine as E 1. And, we use Google image search engine as E 2. Ans(Q) is Web page set of common Web pages of the result Web pages of E 1 and E 2. The reference result of Question Q is shown to a user by generating the new contents which unified information using this Ans (Q). Contents are written by html and a system creates a html file dynamically here according to a demand of a user. Dynamic integration was enabled when a web browser read it at any time. The system architecture of the prototype system of Cross-media meta search is shown in Fig.26. Proto type system of Cross-media meta search consists 4 parts. Interface part Interface part is a part that a user inputs a reference query and the screen of a reference result is displayed after reference is completed. A reference result is displayed by reading html generated by information integration part. 37

Cross-Media meta search System Interface part Execute query relaxation part Information Integration part Collecting Web pages part Google Google image Web space Figure 26: The system

is made into a subset using the query relaxation method and those subsets are inputted into Google and Google image of a search engine to be used.

45 Cross-Media meta search System Interface part Execute query relaxation part Information Integration part Collecting Web pages part Google Google image Web space Figure 26: The system architecture of the prototype system of Cross-media meta search Executing query relaxation part Executing query relaxation part is a part that the reference question which the user inputted is made into a subset using the query relaxation method and those subsets are inputted into Google and Google image of a search engine to be used. Collecting Web pages part Collecting Web pages part is a part that a system collects the reference results of the question which each search engine inputted, and matches and chooses a common Web page out of the Web page of the reference 38

46 result. Information integration part Information integration part is a part that a system extracts only an important portion from a common Web page, generates the html file for displaying a reference result by interfacepart based on the extracted information. The flow of processing of a prototype is as follows. 1. The query which the user inputted, and the number of keywords are read. A user input query can be carved for every word by using blank. Each keywords are held as K 1,K 2,...,K n (n 2). 2. K 1,...,K n is inputted to Google image search and image s URL and URL of result Web pages are collected. 3. K 1 is inputted to Google text search, URL of result Web pages is collected. 4. K 2,...,K n is inputted to Google image search and image s URL and URL of result Web pages are collected. 5. Calculate common Web pages of both results. The page which is not common is canceled. 6. Title and text except tab and Javascript are extracted from source code of common web pages 7. This procedure is in the same degree of relaxation. 8. The degree of relaxation is raised one and this is repeated. 9. In order to check tf value to each text document, a sentence is divided per word using a chasen[8]. 10. The importance of a paragraph is calculated by calculating the importance of a word. 11. A paragraph is extracted based on this importance. 12. The picture of the reference result of image search is mixed and new contents are created in html form. 13. The created html file is read and a user is shown. 39

7.2 Implemeting display Fig.27 shows implementing display. Thus, it is a display screen like an ordinary browser. However, the result displayed is not as a result of URL like before.

47 7.2 Implemeting display Fig.27 shows implementing display. Thus, it is a display screen like an ordinary browser. However, the result displayed is not as a result of URL like before. A user inputs a reference question and pushes a reference button. If it does so, it will be displayed in the form where the text of one portion of the picture as a result of image search and the Web page as a result of text search was extracted as shown in a figure. Retrieval Query Reference keyword Image search results Text extracted from a web page Figure 27: implementing display 40

48 Chapter 8 Related work Naver[9] is a type of meta-search engine that simultaneously searches for HTML pages, CG animations, images, and sounds etc., pertinent to the user input keyword(s) and then merges the retrieved content. These results are only a collection of the retrieved information for each search engine. Naver is a Crossmedia meta-search engine that can retrieve information from several different media-sources from a user-input keyword query. But a keyword(s) is inputted like the conventional search engine, and The domain of images of image search and the domain of the list of URL of text search are divided by integration. Although various media are treated, it differs from this research at this point. Reference result of image Reference result of text Figure 28: NAVER MIRADOR-Search[10] is service which unified the semantic reference in a text developed by FUJITSU, and the visual reference by the picture and which offers a crossing media reference function. If a user inputs a reference keyword, Web Crawler will analyze the Web page on the Internet, and a picture and explanation texts will be collected automatically. And as a reference result, the collected picture is mapped and displayed on three dimensions based on the picture features, such as a color of a picture, and a form. This system is the reference which specialized in the 41

information to near Figure 29: MIRADOR-Search Cyclone[11] is also a meta-search engine for searching ordinary Web pages.

49 picture very much. feature of text (frequency of word) Specify a kind of feature feature extraction feature of picture (color, figure) Set of feature Space of feature of N dimension 3D space SOM Arrange the similar information to near Figure 29: MIRADOR-Search Cyclone[11] is also a meta-search engine for searching ordinary Web pages. From a user-input keyword query, it searches several search engines and assembles the results into an encyclopedia-like form. The point of showing only a required portion, without displaying a Web page as a reference result is connected with this research. However, in this system, if one word is not inputted, a suitable solution cannot be obtained. That is, it is thought that it is a Japanese dictionary using Web. Title and field of Web page extracted paragraph Figure 30: CYCLONE 42

Mr. oyama[12] proposed the Web reference technique using the formation of hierarchical structure of the reference question at the time of inputting two or more keywords into a search engine.

50 Mr. oyama[12] proposed the Web reference technique using the formation of hierarchical structure of the reference question at the time of inputting two or more keywords into a search engine. Precision is raised by distinguishing a role for every keyword like the keyword showing the theme, and the keyword showing the contents about the theme. Moreover, the reference result set which the same keyword also compared the reference result which changed and searched the role, and had a big difference as the display method of a reference result is shown in parallel. In this research, this technique is used and the role is given to the question used for image search at a reference keyword, and the question used for text search. user input query role of keywords Figure 31: interface by Mr. oyama KartOO[13] is visual meta search engine. KartOO launches the query to a set of search engines, gathers the results, compiles them and represents them in a series of interactive maps through a proprietary algorithm In this map, the found sites are represented by more or less important size pages, depending on their relevance. When you move the pointer over these pages, t he concerned keywords are illuminated and a brief description of the site appears on the left side of the screen. 43

Since what supports the task of such a routine and Web browsing is not developed, it is going to support this.

51 Relation of a link Web pages Figure 32: KartOO Web Montagewebmontage unifies a page dynamically. Web access is a routine pattern. A user looks at a page in the same order, when the same every day. Since what supports the task of such a routine and Web browsing is not developed, it is going to support this. The score is attached to the always visited Web page from a user s history. And contents are arranged so that the sum total of the score of the contents arranged on a Web screen may become the maximum. a part of contents of web page Figure 33: Web montage 44

Information Gathering Support Interface by the Overview Presentation of Web Search Results

Information Gathering Support Interface by the Overview Presentation of Web Search Results Takumi Kobayashi Kazuo Misue Buntarou Shizuki Jiro Tanaka Graduate School of Systems and Information Engineering