Cross-Media Meta Search: Query Relaxation and Information Integration for Heterogeneous-Media Search Engines

Size: px
Start display at page:

Download "Cross-Media Meta Search: Query Relaxation and Information Integration for Heterogeneous-Media Search Engines"

Transcription

1 Master Thesis Cross-Media Meta Search: Query Relaxation and Information Integration for Heterogeneous-Media Search Engines Supervisor Professor Katsumi TANAKA Department of Social Informatics Graduate School of Informatics Kyoto University Akihiro KUWABARA February 9, 2004

2 Cross-Media Meta Search: Query Relaxation and Information Integration for Heterogeneous-Media Search Engines Akihiro KUWABARA Abstract In recent years, the quantity of multimedia contents on WWW has been increasing with improvement in the speed of transmission speed, and the spread of digital cameras. Since the contents are distributed on a lot of websites and expressed by various media, it is important to construct a function which searches the contents effectively. Conventional meta search engines retrieve the results from several search engines and automatically classify them. The meta search engines dispatch queries to same type search engines, and retrieve mono-type contents. Moreover, only the link to a Web page is shown in ranking form as a unified reference result. However, in such conventional meta search engines, the information on various media is simultaneously uncollectible. If a user follows the link of a reference result and peruses two or more Web pages, he can acquire effective information at last. In this paper, in order to solve this problem, we propose a system which dispatch queries to adequate search engines based on the media type. Moreover, we propose the system which extracts only the information in relation to the reference keyword group from the Web page obtained as a reference result, and unifies these automatically. The information over various media relevant to the reference keywords are searched and unified. It can be called output like an encyclopedia using Web page. Users can only input reference keywords and can peruse easily various information currently distributed on Web about the keyword. Users can effort of perusing the Web page of a link place sequentially from the result of higher rank and discovering effective information from Web pages. And also users can peruse the information over reference keywords now at a glance effectively so that it may say that this language is such meanings. In this system, the case where are the reference question by two or more i

3 ii keywords is inputted by the user is considered. The search engine used here corresponds to various media. That is, it is the search engine with which the types of a text search, a image search, a music search, etc. We focus on a text search and a image search. we propose the method of transforming a reference question and collecting Web pages efficiently. A text search has many hits of a solution, but it is difficult to acquire effective information. A image search has high precision and can obtain the picture of a reference keyword simply easily, it has very few hits of a solution. In order to solve this, we use query relaxation. The query relaxation method assigns whether it is used for text search, or it is used for image search for every word of a reference question. And each search engine performs separately and the common Web page of a reference result becomes the answer of reference. By using this query relaxation, it becomes possible to utilize effectively taking advantage of the strong point of the search engine of two different media. That is, it becomes possible to raise recall, maintaining precision. Next, only the portion in relation to the reference keyword is extracted from the inside of a Web page set of the solution collected using query relaxation method. A related portion is extracted in consideration of frequency of the word which appears in the Web page group of a search result, and position relation between the picture of image search and a paragraph. Thus, it enables a user to peruse effectively by unifying only the important portion of a Web page instead of the Web page itself. By cross-media meta search, we think that users become possible to peruse the suitable information about the inputted reference keywords simply and to search multimedia contents easily.

4 iii WWW Web WWW Web Web Web. Web Web Web Web

5 iv And Web.... Web... Web Web. Web Web.

6 Cross-Media Meta Search: Query Relaxation and Information Integration for Heterogeneous-Media Search Engines Contents Chapter 1 Introduction 1 Chapter 2 Cross-Media Meta Search 5 Chapter 3 Query Relaxation Basic concept Query Relaxation method Answers by Query Relaxation Example of Query Relaxation The degree of Query Relaxation Merit of Query Relaxation Chapter 4 Experiments and Evaluation The outline of experiment Experiment results Consideration Chapter 5 Improvement of Query relaxation Method of improvement Execution order of subset Breadth-first search and Depth-first search The statistical technique The linguistic technique The extended proposal of the query relaxation Chapter 6 Information Integration Extract information Copyright problem Searching based on Web page Informational arrangement... 35

7 Chapter 7 Prototype System Implement Implemeting display Chapter 8 Related work 41 Chapter 9 Conclusion 45 Acknowledgments 47 References 48

8 Chapter 1 Introduction Since the Internet environment has spread increasingly, the number of Web pages is increasing steadily. The quantity of multimedia contents on WWW has been increasing by the spread of a broadband, digital cameras, etc. Thus, since Web space is flooded with various information, it is becoming very difficult for users to collect only useful information. It is the search engine which a user uses for looking for a Web page needed from a huge Web page. The conventional search engine shows the link to the Web page of a reference result by inputting a reference question. However, there is a limit in the amount of information which each search engine has, and since it is decided like the text search engine and the image search engine what media are searched for every search engine, there is a limitation also in the kind of information which can be searched. It is the meta search which is compensated with the ability not to do with one search engine. A meta search increases the amount of information by using two or more search engines collectively. Moreover meta search engine is used as a means to search information effectively and to unify it. In the conventional meta search engine, if a user inputs a reference keyword group and performs reference, user input keywords are passed to each search engine, each search engine carries out retrieval separately, and collects Web pages. A meta search engine, to the Web page which each search engine collected, removes duplication, operates a classification automatically, and outputs a reference result. But, meta search engines have three problems. 1. The meta search engines uses same type search engines which search Web pages based on text style. In the existing meta search, only the text search engine is used mostly. Thereby, since only the text document in a Web page is taken into consideration, it is thought on the Web page which has the present various media that sufficient reference cannot be performed. 1

9 2. the same reference question is performed to every search engine which the meta search engine uses. When there were many reference keywords, or when unrelated for every reference keyword, a reference result which a user desires does not come out. In order to solve this problem, the given keyword is not used as it is, but it is considered that it is necessary to make it change into a certain form. 3. Finally, as unified reference results show the links to Web pages, users have to browse one by one. In almost all search engines, the link to a Web page is displayed as a reference result. Therefore, when a user peruses the Web page of a reference result, a user is very troublesome, in order to have to repeat operation of perusing the Web page of each of reference results until it discovers the Web page which has started the contents which can be judged to be useful information. Moreover, since various contents are described in one Web page, a user cannot collect only useful information efficiently. Thus, it is not easy for users to acquire useful information in the conventional search engines. It is important that users can acquire various information, such as text, and pictures, to having inputted the reference keyword into one search engine, and users can easily find useful information. Now, the search engine is highly efficient that it is improved rapidly and is easy to use rapidly. I think that meta search of using them well collectively is a very effective means. If a meta search which solves a problem which was described above exists, a user will be considered that it can collect information very efficiently. Then, we propose the Cross-media meta search. We define the Cross-media meta search as being collecting reference results using a search engine of a different kind. By using a search engine of a different kind simultaneously, the problem that only the text search used mostly is solvable. Moreover, I think that the Cross-media meta search needs to transform a reference question in order to use a search engine of a different kind. I think that informational processing 2

10 and informational integration are required, in order to show the reference result of each media efficiently. Information integration Search result Input keywords The system Return answer Change keywords Search engine Web space Collect Web pages Figure 1: Conventional Meta Search 3

11 Fig.1 shows the whole system image. The system has the following functions. the system uses search engines of a different kind. the system changes user input keywords. the system integrations information which each search engine collects The remainder of this paper is organized as follows: Section 2 explains Cross-Media Meta search, Section 3 explains Query Relaxation, Section 4 discusses our experiments, Section 5 discusses improvement of query relaxation, Section 6 explains Information Integration, Section 7 explains the prototype system, Section 8 discusses related works, and we conclude in Section 9. 4

12 Chapter 2 Cross-Media Meta Search We think there is much information on Web space. If we can use Web well, we can receive a lot of benefit. Especially, it is very effective that users get only the information which users want. Then, we think that Web can be expressed like an encyclopedia. When we can only look up a word in an encyclopedia, we can know the meaning, a photograph, a related word of the word. We put this in for Web retrieval. Thus, users only inputted the reference keyword and get various information. We use a meta search in order to realize this. Each of conventional search engines cannot cover the huge information on Web space. And, since each of search engines has a field, such as text search and image search, users cannot get the information on various forms in one search engine. By using meta search engines, we can increase the amount of information and can also receive several kinds of media. In this paper, we propose Cross-media meta search. The Cross-media meta search engine intuitively collects several types of information from the Web based on user input keyword queries. The differences between conventional Web meta-search engines and Cross-media meta search engines are shown in Figs.2 and 3 are summarized as follows: Conventional meta search Conventional Web meta search engines send the user input keyword query Q (possibly with minor changes) directly to several search engines. Modifications are not made to a user-input keyword queries. To retrieve the results conventional meta-search engines return a list of pertinent URLs with duplicates removed and with ranking scores. Cross-Media meta search The Cross-media meta search engine is designed to collect not only Web pages, but also mixed types of multimedia contents (images, sounds etc.) by sending, query Q to several search engines, each of which is dedicated to a specific type of media content. On sending query Q, the search engine 5

13 may modify and/or relax the term into Q 1,...,Q n, according to the characteristics of each media type. The output of the Cross-media meta search engine is not simply a list of URLs, but a mixture of texts, images, and sounds, edited like an encyclopedia. Search engine E1 Text search result Q Search engine E2 Text search result Intersection Search engine Em Text search result List of URL Figure 2: Conventional Meta Search Q1 Search engine E1 Text search Web pages I Q Q2.. Qn Search engine E2 Image search. Search engine Em Music, etc search Web pages I Web pages Intersection Web pages of result Information Integration Figure 3: Cross-Media Meta Search 6

14 There are two points where Cross-media meta search is characterized. The first is that Cross-media meta search uses the search engine of the type of various media in order to collect various information at once. We think it important that a system changes a user s question into the form which was adapted for each search engine without using as it is, in order to use the search engine of different media efficiently. Then, we propose the method of query relaxation. The second is that Cross-media meta search extracts one portion out of a Web page, integrates those information and displays on users. Because it is bad efficiency that a system displays URLs of Web pages collected with each search engine. It is thought that the information which a user needs is a part in the Web page of the collected result. Then, we propose the method of information integration of Web pages collected of each search engines. 7

15 Chapter 3 Query Relaxation In this chapter, we describe Query Relaxation of Cross-media meta search. In the conventional meta search, the user s input query is used for each search engine as it is. However, in Cross-Media meta search, each search engine is used by using the Query Relaxation method. So, in Cross-Media meta search, each engine can be used efficiently. The method of Query Relaxation is as follow. 1. User input query is divided into a subset. 2. Each subset is used for each search engine. 3. The reference results of each search engine are collected. 4. The common Web page of each reference result is result of Cross-Media meta search. The method of Query Relaxation is described in detail below. 3.1 Basic concept We assume that users wish to have multimedia content (texts, images, sounds, etc.) that is related to their keywords K 1,K 2,...,K n (n 2). Conventionally, if a user wants the text about these keywords, he will use a text search and if a user wants the picture about these keywords, he will use a image search. These actions are usual. However, since it is a thing relevant to these keywords, if a text and a picture can be searched simultaneously and are able to peruse a reference result simultaneously, we think that a user can peruse information very efficiently. Then, our Cross-media meta search engine uses various search engines. (namely, E 1,E 2,...,E m (m 2)). For example, E 1 is Google[5], E 2 is AltaVista[6], E 3 is Google image search[7], and so on. Thus, the key feature of our system is that it can use a mixture of search engines with different media types. We describe the search engines which Cross-media meta search uses. A text search engine is typified by Google and AltaVista. By inputting a keyword, these search engines outputs a reference result Web pages which contains all keywords that user inputs from the Web page 8

16 in the database which crawler collected. Many search engines correspond to this and are called so-called robot type search engine. This robot type search engine is putting many pages into the database, and attaches the index. Since it will search out of a lot of pages if it searches, there is very much hit number of cases. Moreover, by using algorithm called an original page rank by each search engine, the optimal solution comes to a higher rank. However, since there is much hit number of cases, there is a fault that there will be many noises or a page which is unrelated and which does not have validity will hit mostly. An image search engine is typified by Google image search. When keywords with which a user is related to a picture needed is inputted, a system judges contents, by analyzing the text which adjoins a picture, the caption of a picture, and many other factors. Moreover, duplication is eliminated using advanced algorithm and the picture of the highest quality is displayed first. Thus, an image search engine obtains a picture in inputting a text. It is very efficient for collecting pictures so as to extract a picture, Since the image search engine is looking at only a picture and its very near, its relevance for every keyword is high, and its relevance of a keyword and a picture is high. However, the hit number of cases may decrease very much, and a reference result may become zero. an animation and music search engine is typified by Naver. This is the search engine which specialized in the animation file on the Internet. The animation file of various form, such as rm, asf, mpg, mov, and swf, can be searched. an animation and music search engine obtains these files in inputting a text and displays these file in link form there is a problem that we must visit each link and check files. Fig.4, 5, 6 shows the search results of each search engine. In this paper, We focus on two search engines, the text search engine and image search engine considered to be effective for acquiring information most. 9

17 Title and Link A picture of image search A part of text of Web page Link of Web page Text search engine Figure 4: text search Image search engine Figure 5: image search Link of movie or music file Link of Web page Type of file and file size Movie and music search engine Figure 6: each search engine We describe how Cross-media meta search is realized using these two search engines. 3.2 Query Relaxation method Most users do not know which keywords are most appropriate for each search engine. Therefore, we allow users to input their query in the form of a conjunctive query Q = K 1 K 2... K n. Unfortunately, when we use an image search engine, if the number of keywords exceeds 3, it is difficult to obtain any images for the query. but we cannot get efficient reference result in most case. That is query which consists of three or more words is severe as conditions for reference. To solve the problem, we propose relaxing conditions method of queries to receive adequate results by sharing the keywords for some kinds of search engines. 10

18 For example, when three keywords, Mt. Fuji, snow and sunset, are inputted Google image search engine, a referent result is nothing. But, inside of a referent result when two keywords, Mt. Fuji and snow, are inputted Google image search engine, four Web pages contain the text, sunset. In this case, we can receive the pages which are related to three keywords by relaxing conditions. Let Ans (Q) be the set of answers for a user-input conjunctive keyword query Q. In our query relaxation approach, the conjunctive query Q is divided into a set of tuples bound by sub-query: K 1,...,K n K 1, K 2,...,K n K 2, K 1,K 3...,K n. K 1,K 2, K 3,...,K n K 1,K 3, K 2,K 4...,K n. K 1,...,K n 1, K n K 1,...,K n In this case, the query is divided into two separate sub-queries, because two kinds of search engines are required for the query. Generally, the number of sub-queries is dependent on the number of search engines. The sub-query for each set of tuples is translated for the image search engine. For example, the first tuple (i.e. [, K 1,...,K n ]) shows that there are no keywords for the text search engine, and K 1,K 2,...,K n are the input keywords for the image search engine. In general, the larger the number of keywords, the more difficult the image search, as shown by the latter sub-query. Therefore, more images can be obtained by the latter half of the tuple, because there are fewer keywords in the latter sub-query. Fig.7 shows two separate sub-queries. 11

19 ,{ K1, K 2,..., K n} { K1},{ K 2,..., K n} { Kn},{ K1,..., Kn 1} { K 1, K 2 },{ K 3, K 4,..., K n} { K n 1, Kn },{ K1, K 2,..., K n 2} { K1 2 n, K,..., K }, The former element (blue) is used for text search. The latter element (pink) is used for image search. Figure 7: sub-queries 3.3 Answers by Query Relaxation Answers for query Q are retrieved as unions of all the sub-query tuples: Ans(Q) = Ans(K 1... K n,e 2 ) (Ans(K 1,E 1 ) Ans(K 2... K n,e 2 )) (Ans(K 2,E 1 )... Ans(K 1 K 3... K n,e 2 )) (Ans(K 1 K 2,E 1 ) Ans(K 3... K n,e 2 )) (Ans(K 1 K 3,E 1 )... Ans(K 2 K 4... K n,e 2 )) Ans(K 1... K n,e 1 ) 12

20 where Ans(Q, E i ) means the answers of the query Q by the search engine i. In this case, E 1 is a text search engine and E 2 is an image search engine. The engines return URLs which match the queries. To compute (Ans(K 1 K 2,E 1 ) Ans(K 3... K n,e 2 )), Ans(K 1 K 2,E 1 ) and Ans(K 3... K n,e 2 ) are processed separately. Then the intersection of the answers is calculated. The answers are retrieved in the same way for all of the tuples. Finally, Ans(Q) can be retrieved as unions of all the answers. { K 1, K 2},{ K 3,..., K n} K1 K 2 input text search engine output K3... K n input Image search engine output Common web pages Web pages of results of a text search Web pages contained picture of results of an image search Web pages of results by Query relaxation Figure 8: Answers by Query Relaxation 13

21 3.4 Example of Query Relaxation We would now like to demonstrate the results obtained using the query relaxation approach with the following query: Q = q 1 q 2, where q 1 is Mt. Fuji and q 2 is snow. Fig. 9 shows venn diagrams of the results of query Q. The left diagram shows the results from a conventional meta-search engine. The right diagram shows the results using the query relaxation approach. In the left diagram, the hatched areas show the results for Ans(q1 q2, SE text ), and the right hatched areas show the results for Ans(q1 q2, SE img ), where SE text is the text search engine and SE img is an image search engine. The top hatched areas on the right diagram shows the results for (Ans(q 1,SE text ) Ans(q 2,SE img )). The bottom hatched areas shows the results for (Ans(q 2,SE text ) Ans(q 1,SE img )). Therefore, using proposed query relaxation, the possibility for obtaining pertinent results is increased. Conventional results Query relaxation results Figure 9: Venn diagram of results of results of query Q Thus, in addition to results of the conventional meta search, the common Web pages of text search and image search become results by using this technique. That is, the Web pages of Fig.10 are also collected. 14

22 Text 䇸snow䇹 Image 䇸Mt. Fuji䇹 Figure 10: the sample pages which the system collects 3.5 The degree of Query Relaxation We define the degree of relaxation as the number of keywords in a user input query Q that are actually used by a Web text search engine. Why does it call it the degree of Query Relaxation? Because, text search engine has more hit number of cases than image search engine. This shows that the way of image search engine is severe. Since using text search engine has relaxed search, it makes it the degree of relaxation for how many keywords to have used in text search engine. That is, when three of the keywords are used for an image search engine, the degree of relaxation is considered to be zero. When two keywords are used for an image search engine and the third keyword is used for a text search engine, the degree of relaxation is considered to be one. We compare the results of 0 of relaxation, to 1 of relaxation, and 2 of relaxation respectively. Fig.11 shows the example putted concrete keywords. 3.6 Merit of Query Relaxation A user wants to peruse only effective information. The index of the search engine from which a type is different can be used. The Web pages which have 15

23 0 of relaxation, { Mt. Fuji snow sunset} Using text search Using image search 1 of relaxation { Mt. Fuji},{ sunset snow} { sunset }, { Mt. Fuji snow} { snow },{ Mt. Fuji sunset} 2 of relaxation { Mt. Fuji snow},{ sunset} { Mt. Fuji sunset},{ snow} { snow sunset},{ Mt. Fuji} 3 of relaxation { Mt. Fuji snow sunset }, Figure 11: Lattice of query Q (Q = Mt. Fuji snow sunset) not been collected until now are also collected taking advantage of the diversity of media. Moreover, if there are many questions inputted in order to extract conditions, a reference result will not come out, If there are few questions in order to search broadly, it will increase to an excessive page to a reference result. It is difficult to adjust this well. Since an image search outputs the picture a user wants as a search result, it is not necessary to carry out text search, and it does not need to look for a picture out of Web pages. Therefore, we think that exact comprehensive and information will be collectable, because it uses mixing the picture reference with this high accuracy, and the text reference which can search many Web pages. We think that we create an encyclopedia using Web by using this technique. 16

24 Chapter 4 Experiments and Evaluation 4.1 The outline of experiment By using the method of Chapter 3, it is proved what result is obtained actually. We investigated the hit number of pages and the effective number of pages of search results at the time of inputting various keywords in the experiment. The keyword used for an experiment chooses what had become the title of the report of the news page which mainly exists from various fields. We use Google for text search, and Google image for image search. The user input keyword is set to three. This time, the reference results of the degree 0 of relaxation, the degree 1 of relaxation, and the degree 2 of relaxation are compared. Here, the result of the degree 1 of relaxation also includes the result of the degree 0 of relaxation. The result of the degree 2 of relaxation includes the result of the degree 0 of relaxation, and the degree 1 of relaxation. However, since we want the result using image search, we don t use the result of the degree 3 of relaxation of text search only. 4.2 Experiment results A part of experiment result is shown below The keywords of text search are inputted into a text search. The keywords of image search are inputted into an image search. The hit number shows the common web pages of a text search result and an image search result. The number of pertinent pages shows the page which we judge to be described the contents of a reference keyword appropriately in the hit pages. Fig.12, 13, 14 shows table about experiments. Fig.15, 16 shows graph about recall and precision for every reference keyword. Precision and recall are displayed for every degree of relaxation. Precision is the average precision to the result of every degree of relaxation. Since the whole solution set on Web is not understood, recall shall consider total of the effective page for every experiment as the whole solution set. Therefore, it takes cautions that the recall at the time of the degree 2 of relaxation is 100%. So this experiment is the evaluation relatively. 17

25 keywords soccer Nakata team of Japan The degree of relaxation 0 Keywords for image search soccer Nakata team of Japan Keywords for text search Number of hits 19 Number of Pertinent pages 17 soccer Nakata team of Japan soccer team of Japan Nakata Nakata team of Japan soccer soccer Nakata team of Japan Nakata soccer team of Japan team of Japan soccer Nakata 13 0 The degree of relaxation keywords Kyoto Autumnal leaves Koudai-ji Keywords for image Keywords for text Number of Number of search search hits Pertinent pages Kyoto Koudai-ji Autumnal leaves 6 6 Kyoto Autumnal leaves Koudai-ji Kyoto Koudai-ji Autumnal leaves Autumnal leaves Koudai-ji Kyoto 10 8 Kyoto Autumnal leaves Koudai-ji 7 1 Koudai-ji Kyoto Autumnal leaves 6 2 Autumnal leaves Kyoto Koudai-ji keywords musical Shiki theatre company performance The degree of relaxation 0 Keywords for image search musical performance Shiki theatre company Keywords for text search Number of hits 19 Number of Pertinent pages 17 musical Shiki theatre company performance musical performance Shiki theatre company Shiki theatre company performance musical musical Shiki theatre company performance Shiki theatre company musical performance performance musical Shiki theatre company 8 2 Figure 12: table of type A 18

26 keywords Mt. Fuji sunset snow The degree of relaxation Keywords for image search Keywords for text search Number of hits Number of Pertinent pages 0 Mt.Fuji sunset snow 0 0 Mt.Fuji sunset snow Mt.Fuji snow sunset sunset snow Mt.Fuji 3 3 Mt.Fuji sunset snow snow Mt.Fuji sunset 1 1 sunset Mt.Fuji snow 0 0 keywords typhoon flood heavy rain The degree of relaxation Keywords for image search Keywords for text search Number of hits Number of Pertinent pages 0 1 typhoon flood heavy rain typhoon heavy rain typhoon flood flood heavy rain heavy rain flood typhoon typhoon heavy rain heavy rain flood typhoon flood flood typhoon heavy rain 22 5 Figure 13: table of type B keywords SARS Pneumonia Condition The degree of relaxation Keywords for image search Keywords for text search Number of hits Number of Pertinent pages 0 SARS, Pneumonia, Condition 0 0 SARS, Pneumonia condition SARS, Condition Pneumonia, 0 0 Pneumonia, Condition SARS SARS Pneumonia Pneumonia, Condition SARS, Condition Condition SARS, Pneumonia 0 0 Figure 14: table of type C 19

27 Precision(%) average degrees 0 of relaxation degrees 1 of relaxation degrees 2 of relaxation Recall(%) Figure 15: graph of typep A Precision(%) average degrees 0 of relaxation degrees 1 of relaxation degrees 2 of relaxation Recall(%) Figure 16: graph of type B 20

28 4.3 Consideration The results of the experiment are shown in Fig.12, 13, 14. The recall and precision graphs for each keyword are also shown in Fig.15, 16. The recall ratio for the results increases from 0 to 1 for every degree of relaxation. That is, increasing the number of keywords for the text search engine is an effective means of obtaining pertinent Web pages. Moreover, compared with change of the recall from the degree 0 of relaxation to the degree 1 of relaxation, the change to the degree 2 of relaxation from the degree 1 of relaxation was small. In the graph, there are some patterns. Type (A) is normal. The recall ratio is increased and the precision ratio is decreased according to the degree of relaxation. Type (B) is especial. Type (B) shows from 0 to 1 of relaxation with and increased precision and recall ratio that is particularly apparent. No answers were obtained at 0 of relaxation, but by relaxing the degree, many answers were obtained. We think that it is the pattern in which the validity by relaxing the query is shown notably. Type (C) does not have any answers. When there is no keyword which can express a concrete picture like Type (C), even if it makes the degree of relaxation high, a reference result does not come out. When there are few especially results of the degree 0 of relaxation, this pattern appears in many cases. This shows that it is effective to relax a query, if there are few results of the degree 0 of relaxation. By the result of an experiment, query relaxation can improve recall considerably, maintaining precision to about 70% at one of degree. Through an experiment, even if a user inputs a few keyword into image search engine, it turns out that the poor field that there is little hit number of cases also exists. In the case of the words and phrases to which each keyword cannot express a concrete picture, even if it raises the degree of relaxation, there is very little hit number. It turns out that the remarkable difference is attached 21

29 to the number of hits, and the effective page by the keyword relaxed. Depending on the keyword, the hit number of a result of the degree 2 of relaxation become fewer than of the degree 1 of relaxation. When it leaves a keyword more concrete than such a result to image search, it is thought that the number of hits will increase. However, we think that the degree of relation of reference keywords, the degree of coincidence, the frequency for which a keyword is used as contents of the metadata of a picture, etc. are related. However, in the present evaluation, cautions are required for it to be dependent on the picture reference algorithm of Google too much. 22

30 Chapter 5 Improvement of Query relaxation We described the query relaxation method in Chapter 3. However, in the method of calculating all subsets, when a user inputs a lot of keywords, the efficiency of the query relaxation method will become very bad. When a question increases, it is because the number of subsets increases explosively. Therefore, we consider the increase in efficiency of the query relaxation method supposing the case where the reference question Q consists of a lot of keywords. That is the case where many keywords are inputted into a search engine in order that a user may extract and look for his information needed, the case where one sentence of a Web page is copied, etc. We describe the approaches of the query relaxation method for lots of keywords (N keywords) below. 5.1 Method of improvement Execution order of subset Many subsets are made by inputting many keywords. We focus on the point whether it is good to somewhere perform the subset. The lattice structure of a subset in the query relaxation method is shown in Fig.7. The number of search engines is two, the number of keywords is N. The element of the former of each subset in a figure is used for text search and the element of the latter is used for image search. That is, they are the degree 0 of relaxation, and the degree 1 of relaxation sequentially from a top. It is clear by looking at the lattice structure of a figure that the number of subsets of query will become a very huge quantity, If the number of keywords will be N even if the number of search engines is two. Since it is such, in order to collect solutions efficiently by Cross-media meta search, we think it important not to collect solution pages in all subsets, but to collect solutions sequentially. We think each search engine. About text search engine, The number of solutions decreases like the direction under a figure, that is, the degree of relaxation becomes large. Since a reference question increases and conditions are extracted, this is natural. About image search engine, The number of solutions 23

31 decreases like the direction up a figure, that is, the degree of relaxation becomes small. Thus, although it is clear about the solution of each search engine, since the question easing method is used simultaneously, such a relation is not realized. This is the point which makes it difficult to decide the execution order of the query relaxation method. Pruning of every search engine is considered by the easy method. The condition of the solution which put keyword K 1 K 2 K 3 into search engine E 1, that is Ans(K 1 K 2 K 3,E 1 ), is more severe than of Ans(K 1 K 2,E 1 ). Since the keyword is added, this is natural. Therefore, the number of the solutions of Ans(K 1 K 2 K 3,E 1 ) should become below Ans(K 1 K 2,E 1 ). When Ans(K 1 K 2,E 1 ) does not have a solution, Ans(K 1 K 2 K 3,E 1 ) does not have a solution. If this is used, the subset with the element without a solution of a subset does not have a solution. If it becomes so, it is not necessary to search one of the two any longer. It can specify not performing beforehand reference which does not have a solution or does not exceed a threshold with the hit number. Thus, by being simplified, pruning of each search engine is possible. The argument on an execution order is described below Breadth-first search and Depth-first search The foundations of the order of from which subset to perform are breadth-first search and depth-first search. breadth-first search The degree 0 of relaxation is performed first. Since what has high accuracy is good as for reference, Web pages which has high accuracy are showed sequentially. Second, every one degree of relaxation is raised. Finally, a query is relaxed until the contents which a user satisfies are obtained. By performing breadth-first search, it can see sequentially from what has high accuracy. We think that wide range of information can be acquired by seeing the sequence with the same degree of relaxation. depth-first search The degree 0 of relaxation is performed first. Second, a user considers which keyword it leaves and the subset which raised the degree of relaxation 24

32 is performed. For example, the case of the search Mt.F uji snow sunset, suppose that the user specified that he wanted to see the picture which is Mt. Fuji. Then, after showing that Mt.F uji snow is used for image search, the solution by which Mt.F uji is used for image search is displayed. When the same keyword can see what is used for picture reference, we think that it is suitable in the specific thing when you want a user. In order to perform these, the statistical technique is used in breadth-first search and the linguistic technique is used in depth-first search. The technique is shown below The statistical technique By the present query relaxation method, all subsets are to be performed. However, calculation time is in a starting result display, and they are stripes. The method of performing from the optimal subset and showing a solution one by one is required. In order to judge which subset is the optimal, there is a method of asking the hit number of cases of each keyword for the optimal subset. The reason using the hit number is that what it is easy to acquire as information which it had the front in the stage which begins reference is the hit number of cases. First, the one where the degree of relaxation is smaller, that is many keywords used for image search, has high accuracy. Image search is because the conditions are severe. Second, generally, there is so much hit number of image search of the word that there is much hit number of text search of a certain word. Third, considering the And search with two words, the way of a reference result which combined the word with much reference number of cases increases. Therefore, we think that the method of searching from a subset with the small degree of relaxation and with much hit number of cases of each word used for picture reference is the optimal. 25

33 0 of relaxation 1, { Mt. Fuji snow sunset } 1 of relaxation { Mt. Fuji},{ sunset snow} { sunset }, { Mt. Fuji snow} { snow},{ Mt. Fuji sunset} 2 of relaxation { Mt. Fuji snow},{ sunset} { Mt. Fuji sunset},{ snow} { snow sunset},{ Mt. Fuji} 3 of relaxation { Mt. Fuji snow sunset }, The number of hits of each keywords Mt. Fuji > snow > sunset Figure 17: Breadth-first search of the stastistical technic Fig.17 shows the execution order of the subset when performing breadthfirst search. As an example, the case where a keyword is inputted as Mt. Fuji and snow and sunset is considered. Here, the hit number of cases presupposes that it is the order of the Mt. Fuji snow sunset. The number in a figure is the number of an execution order. As shown in a figure, it carries out from 0 with the lowest degree of relaxation, and, next, the degree 1 of relaxation is performed. In the degree 1 of relaxation, it shall perform from a subset with most scores using the following formulas. Score sub is score of each subset text hits all is the sum total of the number of hits of all reference results which put each word of a reference keyword into text search, respectively, and was obtained. image hits all is the sum total of the number of hits of all reference results which put each word of a reference keyword into image search, respectively, and was obtained. text hits sub is the sum total of the hit number of the keyword used for text search and image hits sub is the sum total of the 26

34 hit number of the keyword used for image search of a subset to ask for a score. Score sub = α( text hits sub )+β( image hits sub ) (α + β = 1) (1) text hits all image hits all α and β are changed by to which reference importance is attached. This formula is a formula for choosing what has more hit number of cases based on the idea about the above-mentioned hit number. Thus, the query relaxation method shall be carried out to breadth-first search using a statistical technique. However, there is a possibility that accuracy will become bad and a noise will increase more as a reference result increases. Moreover, the information which the user meant may not come out. It becomes important to adjust by sorting out in the stage of a display of a reference result. The other statistical techniques that the technique of judging whether it uses for image search or it uses for text search, based on the information which the user judged in the past by saving the data of the past reference, is also considered. We experimented statistical method. Fig. 18 is an example of an experiment. Fig. 19 shows the graph as a result of an experiment. order keywords Kyoto Autumnal leaves Koudai-ji Keywords for image Keywords for text Number search search of hits Number of Pertinent pages Precision Recall 1 Kyoto Koudai-ji Autumnal leaves % 9% 2 Kyoto Autumnal leaves Koudai-ji % 29% 3 Kyoto Koudai-ji Autumnal leaves % 66% 4 5 Autumnal leaves Koudai-ji Kyoto Kyoto Autumnal leaves Koudai-ji % 60% 78% 80% 6 Autumnal leaves Kyoto Koudai-ji % 83% 7 Koudai-ji Kyoto Autumnal leaves % 100% Figure 18: Experiment of the stastistical technic 27

35 Precision(%) order degrees 0 of relaxation degrees 1 of relaxation degrees 2 of relaxation Recall(%) Figure 19: Graph of the stastistical technic We think from an experiment result that precision may fall only by judging from the hit number of cases. However, recall is high also in an early stage. We think that the statistical technique attaches importance to recall The linguistic technique This is the method of considering the optimal subset paying attention to the ornamentation relation of the word in a subset. It is thought that a picture which user wants is a word modified from other words in the reference keyword group. That is, it is thought that it is the word limited most. For example, the case of the search Kyoto autumnalleaves Kodai ji, generally it is thought that the relation of Kodai-ji where autumnal leaves are famous in Kyoto is realized. Thus, we think that it is that with which Kodai-ji is embellished from other keywords. In this case, when Kodai-ji is inputted into a picture surely the reference number of cases increased so that it might understand also from Fig.20. It can be said that this is harnessing the characteristic of each search engine. There is a word which is easy to search with image search. In this case, the number of hits of proper nouns such as a person and a 28

36 0 of relaxation 1, { Mt. Fuji snow sunset } 1 of relaxation 2 3 { Mt. Fuji},{ sunset snow} { sunset }, { Mt. Fuji snow} { snow },{ Mt. Fuji sunset} 2 of relaxation { Mt. Fuji snow},{ sunset } { Mt. Fuji sunset},{ snow} { snow sunset},{ Mt. Fuji} 4 3 of relaxation { Mt. Fuji snow sunset }, The keywords of image which a user wants Mt. Fuji Figure 20: Depth-first search of the linguistic technic building tends to increase in picture reference. Thus, the technique of judging a subset can be considered by considering the ornamentation relation between words. However, since this is a guess, the page different from the intention of a user may be judged to be the optimal page. Moreover, the linguistic thing is difficult for a system judging. We experimented the linguistic method. Fig. 21 is an example of an experiment. Fig. 22 shows the graph as a result of an experiment. order keywords Kyoto Autumnal leaves Koudai-ji Keywords for image Keywords for text search search Number Number of of hits Pertinent pages Precision Recall 1 Kyoto Koudai-ji Autumnal leaves % 9% 2 Autumnal leaves Koudai-ji Kyoto % 21% 3 Kyoto Koudai-ji Autumnal leaves % 58% 4 Koudai-ji Kyoto Autumnal leaves % 72% Figure 21: Experiment of the linguistic technic 29

37 Precision(%) degrees 0 of relaxation degrees 1 of relaxation degrees 2 of relaxation 20 order Recall(%) Figure 22: Graph of the linguistic technic We think from an experiment result that precision is better than the statistical technique. The linguistic technique has high precision, maintaining moderate recall. We think that the linguistic technique attaches importance to precision The extended proposal of the query relaxation It is enumerated as follows what enhancing idea you exist besides the description of the above-mentioned. Duplication of a reference keyword is not permitted by query relaxation method. That is, when dividing into a subset, it uses for either text search or image search. By permitting duplication, a reference keyword can be used by both of reference. All information can be covered by allowing duplication. The number of processing increases enormously so that the number of subsets increases more enormously. When searching for example, Mt.F uji snow sunset, The solution set which the common pages of Mt.F uji sunset inputted into image search and Mt.F uji snow inputted into text search increases. It is thought by allowing duplication of a reference keyword that a reference result can be 30

38 extracted more. However, since calculation time increases, it is important to perform in not all subsets, but to calculate only a suitable subset, taking the hit number of a keyword into consideration. The concept of intitle and intext is introduced into the partial question to a text search engine. As a role of each word in a subset, the role assignment of whether the word is used by intitle or intext is carried out. It can be decided the word in the reference keyword, make which word main and is considered. However, since a role divides a partial question further, the number of the groups of a partial question will increase and efficiency becomes bad. An unnecessary keyword is filtered and removed from query Q. When the reference question consists of too much a lot of keywords, The Web page of the solution to the reference question is very difficult for finding it. Then, an unimportant word is removed by setting importance as each word of a question of a user. Only an important word is extracted and the word is used as a reference keyword. As a method of deciding importance, we consider how to feed back from the hit number of cases, and the method of preparing a relevant dictionary as a database beforehand. In this way, the efficiency of reference is improved by generating new query Q. Adaptation of feedback. The word which is easy to be used for a picture, and the word used for a text are fed back judging from a reference result. That is, the history of the past reference is stored in a database, a user profile is created, and the query relaxation method is performed based on it. Feedback of image search. This is a picture-oriented method more. A user feeds back by choosing a thing more needed and the similar thing from the picture of a reference result. It is more desirable to perform this with a search engine which analyzes and searches a picture. However, those search engines do not exist in the present stage. Then, we consider how to extract a word including the coincidence relation from the surrounding text of the picture which the user chose, and to search again. 31

39 Chapter 6 Information Integration In order that a user may collect information efficiently, a Web page is not shown as it is, but only the information related from a Web page is extracted and displayed. The information integration is described below. 6.1 Extract information The Web page has collected as a solution set by the query relaxation method. However, in one Web page, the subject which is unrelated to a reference keyword is also included. By removing unrelated subject from a Web page, we think that efficient useful information is acquirable. Based on this, only the portion relevant to a reference keyword is extracted from the inside of a Web page by this paper. The frequency of the word in the text in a Web page is used in considering the relevance to a reference keyword. Moreover, since a possibility that many subjects about the picture are included is high, the surroundings of the picture collected by picture reference also take a distance relation with a picture into consideration. A procedure is described below. 1. Calculate the importance of a word based on the frequency of a word. 2. Calculate the distance of a picture and a paragraph. 3. Calculate the importance of a paragraph by using both the importance of a word and distance of paragraph A formula is expressed below. I s is importance of paragraph s. w t is importance of word of t in paragraph s. r is distance of paragraph s from picture. I s = Score s r 2 (2) Score s = w t (3) w s A paragraph is extracted based on this importance. Thus, only related subject is extracted from the Web page of a reference result. Fig.23 shows this method. The reference result is made into the paragraph 32

40 and picture which were extracted from the Web page instead of a Web page unit. Thus, it unifies automatically, new contents are generated and a user is shown by making this into a reference result. It is the feature for a reference result to be settled as contents with new not the display of a URL list but picture of a reference result and text document unlike the conventional meta search engine, and to be displayed. extraction Figure 23: Extract information Web 3,776m As for the Web page of the reference result collected for each sub-queries, the Web page unit does not serve as a solution. Only the portion relevant to the reference keyword is extracted from each Web page. Thus, a reference result is not a Web page unit but the portion extracted from the Web page. Here, the information extracted from the Web page of the reference result of each partial question is unified automatically, and new contents are generated. A user is shown this as a reference result. Unlike the conventional meta-search engine, a reference result is not the display of a URL list. In Cross-media meta search, it is the feature that the picture and a text document of a reference result are collected as new contents, and are displayed. Fig.24 shows this information integration method. 33

41 6.2 Copyright problem Figure 24: information integration In Cross-media meta search, only one portion which is related on a page unlike the conventional reference is extracted. Therefore, the system is adding processing to the work of others who are called a Web page. Moreover, the system shows Web page so that it may be made to peruse simultaneously with other pages. These acts may also become violation of copyright. However, in the conventional search engine such as Google, the contents of some pages containing the reference keyword other than URL are displayed on the display screen of a reference result. Moreover, in Google image search, only the photograph is extracted and shown. These acts cannot become violation of copyright. Based on these, in Cross-media meta search, a system specifies the source point. 6.3 Searching based on Web page Thus, the Cross-media meta search which extended the usual reference can be used for other uses. One of them is searching based on a Web page. When the user needs reference most, it is a time of there being information which he does not understand, and information knowing. It is not efficient that user continue 34

42 to peruse a Web page after visiting and searching one by one at this time. Therefore, it is the purpose of this method that a user will search simultaneously with perusing Web pages without using Web search cite. When the user has patrolled the Web page and the language regarded as not understanding comes out, this is specified with the drug of a mouse etc. At this time, while a user searches, Cross-media meta search moves by the background. It can be used with the feeling of seeing as an encyclopedia by unifying a result and displaying briefly. Fig.25 shows this image. Figure 25: searching based on Web page 6.4 Informational arrangement For information integration, an important thing is not displaying information as mere enumeration, but is displaying in a form with a certain scenario. Since we want to create a thing like the encyclopedia using Web as motivation, we consider the display method like an encyclopedia. Then, we think that an important thing is showing the contents considered to be the most effective to the portion which the users chose. As an element like an encyclopedia, there are 35

43 explanation of a term, a photograph, usage, related subject, etc. It is adapted for this information integration in this. In order to decide the element like an encyclopedia, it is determined from a reference keyword what is subject. The example of the element in the subject of reference to display is raised to below. Name of a person A profile, career, and photograph. Name of a place Scenery, sightseeing spot, and the special feature. Term Meaning, relation term and photograph. News A report, a related article, and characters. We think it effective to show such an element. Moreover, it is required to also consider the degree of similar of a text. Since there is little amount of information even if the same text is displayed, as for the subject with the high degree of similar, it is good to display the subject which was omitted and is different. For example, a medical virus and the virus of a computer mainly exist in a virus. Completely, since this is another subject, it classifies and it displays. However, the text with the high degree of similar is summarized into a medical virus. For example, even if there are many texts about an influenza virus, it does not indicate all, but what has the high degree of similar is eliminated and displayed. 36

44 Chapter 7 Prototype System The prototype system was implemented according to the above-mentioned method. In this chapter, we describe the prototype system of Cross-Media meta search system. And, implementing environment is as follows; OS : Windows XP CPU : Pentium4 2.00GHz Memory : 2GB Development Environment : Microsoft Visual Studio.Net/ C 7.1 Implement First, a user inputs their query which is consisted of K 1,K 2,...,K n (n 2). That is query Q = K 1 K 2... K n. Second, we use Google text search engine as E 1. And, we use Google image search engine as E 2. Ans(Q) is Web page set of common Web pages of the result Web pages of E 1 and E 2. The reference result of Question Q is shown to a user by generating the new contents which unified information using this Ans (Q). Contents are written by html and a system creates a html file dynamically here according to a demand of a user. Dynamic integration was enabled when a web browser read it at any time. The system architecture of the prototype system of Cross-media meta search is shown in Fig.26. Proto type system of Cross-media meta search consists 4 parts. Interface part Interface part is a part that a user inputs a reference query and the screen of a reference result is displayed after reference is completed. A reference result is displayed by reading html generated by information integration part. 37

45 Cross-Media meta search System Interface part Execute query relaxation part Information Integration part Collecting Web pages part Google Google image Web space Figure 26: The system architecture of the prototype system of Cross-media meta search Executing query relaxation part Executing query relaxation part is a part that the reference question which the user inputted is made into a subset using the query relaxation method and those subsets are inputted into Google and Google image of a search engine to be used. Collecting Web pages part Collecting Web pages part is a part that a system collects the reference results of the question which each search engine inputted, and matches and chooses a common Web page out of the Web page of the reference 38

46 result. Information integration part Information integration part is a part that a system extracts only an important portion from a common Web page, generates the html file for displaying a reference result by interfacepart based on the extracted information. The flow of processing of a prototype is as follows. 1. The query which the user inputted, and the number of keywords are read. A user input query can be carved for every word by using blank. Each keywords are held as K 1,K 2,...,K n (n 2). 2. K 1,...,K n is inputted to Google image search and image s URL and URL of result Web pages are collected. 3. K 1 is inputted to Google text search, URL of result Web pages is collected. 4. K 2,...,K n is inputted to Google image search and image s URL and URL of result Web pages are collected. 5. Calculate common Web pages of both results. The page which is not common is canceled. 6. Title and text except tab and Javascript are extracted from source code of common web pages 7. This procedure is in the same degree of relaxation. 8. The degree of relaxation is raised one and this is repeated. 9. In order to check tf value to each text document, a sentence is divided per word using a chasen[8]. 10. The importance of a paragraph is calculated by calculating the importance of a word. 11. A paragraph is extracted based on this importance. 12. The picture of the reference result of image search is mixed and new contents are created in html form. 13. The created html file is read and a user is shown. 39

47 7.2 Implemeting display Fig.27 shows implementing display. Thus, it is a display screen like an ordinary browser. However, the result displayed is not as a result of URL like before. A user inputs a reference question and pushes a reference button. If it does so, it will be displayed in the form where the text of one portion of the picture as a result of image search and the Web page as a result of text search was extracted as shown in a figure. Retrieval Query Reference keyword Image search results Text extracted from a web page Figure 27: implementing display 40

48 Chapter 8 Related work Naver[9] is a type of meta-search engine that simultaneously searches for HTML pages, CG animations, images, and sounds etc., pertinent to the user input keyword(s) and then merges the retrieved content. These results are only a collection of the retrieved information for each search engine. Naver is a Crossmedia meta-search engine that can retrieve information from several different media-sources from a user-input keyword query. But a keyword(s) is inputted like the conventional search engine, and The domain of images of image search and the domain of the list of URL of text search are divided by integration. Although various media are treated, it differs from this research at this point. Reference result of image Reference result of text Figure 28: NAVER MIRADOR-Search[10] is service which unified the semantic reference in a text developed by FUJITSU, and the visual reference by the picture and which offers a crossing media reference function. If a user inputs a reference keyword, Web Crawler will analyze the Web page on the Internet, and a picture and explanation texts will be collected automatically. And as a reference result, the collected picture is mapped and displayed on three dimensions based on the picture features, such as a color of a picture, and a form. This system is the reference which specialized in the 41

49 picture very much. feature of text (frequency of word) Specify a kind of feature feature extraction feature of picture (color, figure) Set of feature Space of feature of N dimension 3D space SOM Arrange the similar information to near Figure 29: MIRADOR-Search Cyclone[11] is also a meta-search engine for searching ordinary Web pages. From a user-input keyword query, it searches several search engines and assembles the results into an encyclopedia-like form. The point of showing only a required portion, without displaying a Web page as a reference result is connected with this research. However, in this system, if one word is not inputted, a suitable solution cannot be obtained. That is, it is thought that it is a Japanese dictionary using Web. Title and field of Web page extracted paragraph Figure 30: CYCLONE 42

50 Mr. oyama[12] proposed the Web reference technique using the formation of hierarchical structure of the reference question at the time of inputting two or more keywords into a search engine. Precision is raised by distinguishing a role for every keyword like the keyword showing the theme, and the keyword showing the contents about the theme. Moreover, the reference result set which the same keyword also compared the reference result which changed and searched the role, and had a big difference as the display method of a reference result is shown in parallel. In this research, this technique is used and the role is given to the question used for image search at a reference keyword, and the question used for text search. user input query role of keywords Figure 31: interface by Mr. oyama KartOO[13] is visual meta search engine. KartOO launches the query to a set of search engines, gathers the results, compiles them and represents them in a series of interactive maps through a proprietary algorithm In this map, the found sites are represented by more or less important size pages, depending on their relevance. When you move the pointer over these pages, t he concerned keywords are illuminated and a brief description of the site appears on the left side of the screen. 43

51 Relation of a link Web pages Figure 32: KartOO Web Montagewebmontage unifies a page dynamically. Web access is a routine pattern. A user looks at a page in the same order, when the same every day. Since what supports the task of such a routine and Web browsing is not developed, it is going to support this. The score is attached to the always visited Web page from a user s history. And contents are arranged so that the sum total of the score of the contents arranged on a Web screen may become the maximum. a part of contents of web page Figure 33: Web montage 44

Information Gathering Support Interface by the Overview Presentation of Web Search Results

Information Gathering Support Interface by the Overview Presentation of Web Search Results Information Gathering Support Interface by the Overview Presentation of Web Search Results Takumi Kobayashi Kazuo Misue Buntarou Shizuki Jiro Tanaka Graduate School of Systems and Information Engineering

More information

Drawing Bipartite Graphs as Anchored Maps

Drawing Bipartite Graphs as Anchored Maps Drawing Bipartite Graphs as Anchored Maps Kazuo Misue Graduate School of Systems and Information Engineering University of Tsukuba 1-1-1 Tennoudai, Tsukuba, 305-8573 Japan misue@cs.tsukuba.ac.jp Abstract

More information

Searching User-generated Video Content based on Link Relationships between Videos and Blogs

Searching User-generated Video Content based on Link Relationships between Videos and Blogs Searching User-generated Video Content based on Relationships between Videos and Blogs User-generated Video Blogs Video Search Searching User-generated Video Content based on Relationships between Videos

More information

Semantic Website Clustering

Semantic Website Clustering Semantic Website Clustering I-Hsuan Yang, Yu-tsun Huang, Yen-Ling Huang 1. Abstract We propose a new approach to cluster the web pages. Utilizing an iterative reinforced algorithm, the model extracts semantic

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

2.3 Algorithms Using Map-Reduce

2.3 Algorithms Using Map-Reduce 28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure

More information

AN OVERVIEW OF SEARCHING AND DISCOVERING WEB BASED INFORMATION RESOURCES

AN OVERVIEW OF SEARCHING AND DISCOVERING WEB BASED INFORMATION RESOURCES Journal of Defense Resources Management No. 1 (1) / 2010 AN OVERVIEW OF SEARCHING AND DISCOVERING Cezar VASILESCU Regional Department of Defense Resources Management Studies Abstract: The Internet becomes

More information

Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical application of VOD

Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical application of VOD World Transactions on Engineering and Technology Education Vol.13, No.3, 2015 2015 WIETE Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical

More information

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015 University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 5:00pm-6:15pm, Monday, October 26th Name: ComputingID: This is a closed book and closed notes exam. No electronic

More information

2015 Search Ranking Factors

2015 Search Ranking Factors 2015 Search Ranking Factors Introduction Abstract Technical User Experience Content Social Signals Backlinks Big Picture Takeaway 2 2015 Search Ranking Factors Here, at ZED Digital, our primary concern

More information

DRACULA. CSM Turner Connor Taylor, Trevor Worth June 18th, 2015

DRACULA. CSM Turner Connor Taylor, Trevor Worth June 18th, 2015 DRACULA CSM Turner Connor Taylor, Trevor Worth June 18th, 2015 Acknowledgments Support for this work was provided by the National Science Foundation Award No. CMMI-1304383 and CMMI-1234859. Any opinions,

More information

Extraction of Semantic Text Portion Related to Anchor Link

Extraction of Semantic Text Portion Related to Anchor Link 1834 IEICE TRANS. INF. & SYST., VOL.E89 D, NO.6 JUNE 2006 PAPER Special Section on Human Communication II Extraction of Semantic Text Portion Related to Anchor Link Bui Quang HUNG a), Masanori OTSUBO,

More information

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach Abstract Automatic linguistic indexing of pictures is an important but highly challenging problem for researchers in content-based

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

An Analysis of Image Retrieval Behavior for Metadata Type and Google Image Database

An Analysis of Image Retrieval Behavior for Metadata Type and Google Image Database An Analysis of Image Retrieval Behavior for Metadata Type and Google Image Database Toru Fukumoto Canon Inc., JAPAN fukumoto.toru@canon.co.jp Abstract: A large number of digital images are stored on the

More information

Interactive Video Retrieval System Integrating Visual Search with Textual Search

Interactive Video Retrieval System Integrating Visual Search with Textual Search From: AAAI Technical Report SS-03-08. Compilation copyright 2003, AAAI (www.aaai.org). All rights reserved. Interactive Video Retrieval System Integrating Visual Search with Textual Search Shuichi Shiitani,

More information

Purpose, features and functionality

Purpose, features and functionality Topic 6 Purpose, features and functionality In this topic you will look at the purpose, features, functionality and range of users that use information systems. You will learn the importance of being able

More information

INTRODUCTION. Chapter GENERAL

INTRODUCTION. Chapter GENERAL Chapter 1 INTRODUCTION 1.1 GENERAL The World Wide Web (WWW) [1] is a system of interlinked hypertext documents accessed via the Internet. It is an interactive world of shared information through which

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

Site Audit SpaceX

Site Audit SpaceX Site Audit 217 SpaceX Site Audit: Issues Total Score Crawled Pages 48 % -13 3868 Healthy (649) Broken (39) Have issues (276) Redirected (474) Blocked () Errors Warnings Notices 4164 +3311 1918 +7312 5k

More information

Elimination of Duplicate Videos in Video Sharing Sites

Elimination of Duplicate Videos in Video Sharing Sites Elimination of Duplicate Videos in Video Sharing Sites Narendra Kumar S, Murugan S, Krishnaveni R Abstract - In some social video networking sites such as YouTube, there exists large numbers of duplicate

More information

EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING

EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING Chapter 3 EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING 3.1 INTRODUCTION Generally web pages are retrieved with the help of search engines which deploy crawlers for downloading purpose. Given a query,

More information

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai. UNIT-V WEB MINING 1 Mining the World-Wide Web 2 What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns. 3 Web search engines Index-based: search the Web, index

More information

VIDEO SEARCHING AND BROWSING USING VIEWFINDER

VIDEO SEARCHING AND BROWSING USING VIEWFINDER VIDEO SEARCHING AND BROWSING USING VIEWFINDER By Dan E. Albertson Dr. Javed Mostafa John Fieber Ph. D. Student Associate Professor Ph. D. Candidate Information Science Information Science Information Science

More information

Finding Neighbor Communities in the Web using Inter-Site Graph

Finding Neighbor Communities in the Web using Inter-Site Graph Finding Neighbor Communities in the Web using Inter-Site Graph Yasuhito Asano 1, Hiroshi Imai 2, Masashi Toyoda 3, and Masaru Kitsuregawa 3 1 Graduate School of Information Sciences, Tohoku University

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information

NTUBROWS System for NTCIR-7. Information Retrieval for Question Answering

NTUBROWS System for NTCIR-7. Information Retrieval for Question Answering NTUBROWS System for NTCIR-7 Information Retrieval for Question Answering I-Chien Liu, Lun-Wei Ku, *Kuang-hua Chen, and Hsin-Hsi Chen Department of Computer Science and Information Engineering, *Department

More information

UNIT II Requirements Analysis and Specification & Software Design

UNIT II Requirements Analysis and Specification & Software Design UNIT II Requirements Analysis and Specification & Software Design Requirements Analysis and Specification Many projects fail: because they start implementing the system: without determining whether they

More information

THE WEB SEARCH ENGINE

THE WEB SEARCH ENGINE International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) Vol.1, Issue 2 Dec 2011 54-60 TJPRC Pvt. Ltd., THE WEB SEARCH ENGINE Mr.G. HANUMANTHA RAO hanu.abc@gmail.com

More information

Site Audit Virgin Galactic

Site Audit Virgin Galactic Site Audit 27 Virgin Galactic Site Audit: Issues Total Score Crawled Pages 59 % 79 Healthy (34) Broken (3) Have issues (27) Redirected (3) Blocked (2) Errors Warnings Notices 25 236 5 3 25 2 Jan Jan Jan

More information

A motion planning method for mobile robot considering rotational motion in area coverage task

A motion planning method for mobile robot considering rotational motion in area coverage task Asia Pacific Conference on Robot IoT System Development and Platform 018 (APRIS018) A motion planning method for mobile robot considering rotational motion in area coverage task Yano Taiki 1,a) Takase

More information

Site Audit Boeing

Site Audit Boeing Site Audit 217 Boeing Site Audit: Issues Total Score Crawled Pages 48 % 13533 Healthy (3181) Broken (231) Have issues (9271) Redirected (812) Errors Warnings Notices 15266 41538 38 2k 5k 4 k 11 Jan k 11

More information

Lecture 9: I: Web Retrieval II: Webology. Johan Bollen Old Dominion University Department of Computer Science

Lecture 9: I: Web Retrieval II: Webology. Johan Bollen Old Dominion University Department of Computer Science Lecture 9: I: Web Retrieval II: Webology Johan Bollen Old Dominion University Department of Computer Science jbollen@cs.odu.edu http://www.cs.odu.edu/ jbollen April 10, 2003 Page 1 WWW retrieval Two approaches

More information

A Filtering System Based on Personal Profiles

A  Filtering System Based on Personal Profiles A E-mail Filtering System Based on Personal Profiles Masami Shishibori, Kazuaki Ando and Jun-ichi Aoe Department of Information Science & Intelligent Systems, The University of Tokushima 2-1 Minami-Jhosanjima-Cho,

More information

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE 15 : CONCEPT AND SCOPE 15.1 INTRODUCTION Information is communicated or received knowledge concerning a particular fact or circumstance. Retrieval refers to searching through stored information to find

More information

Capturing Window Attributes for Extending Web Browsing History Records

Capturing Window Attributes for Extending Web Browsing History Records Capturing Window Attributes for Extending Web Browsing History Records Motoki Miura 1, Susumu Kunifuji 1, Shogo Sato 2, and Jiro Tanaka 3 1 School of Knowledge Science, Japan Advanced Institute of Science

More information

Usage Guide to Handling of Bayesian Class Data

Usage Guide to Handling of Bayesian Class Data CAMELOT Security 2005 Page: 1 Usage Guide to Handling of Bayesian Class Data 1. Basics Classification of textual data became much more importance in the actual time. Reason for that is the strong increase

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost

More information

SE Workshop PLAN. What is a Search Engine? Components of a SE. Crawler-Based Search Engines. How Search Engines (SEs) Work?

SE Workshop PLAN. What is a Search Engine? Components of a SE. Crawler-Based Search Engines. How Search Engines (SEs) Work? PLAN SE Workshop Ellen Wilson Olena Zubaryeva Search Engines: How do they work? Search Engine Optimization (SEO) optimize your website How to search? Tricks Practice What is a Search Engine? A page on

More information

Knowing something about how to create this optimization to harness the best benefits will definitely be advantageous.

Knowing something about how to create this optimization to harness the best benefits will definitely be advantageous. Blog Post Optimizer Contents Intro... 3 Page Rank Basics... 3 Using Articles And Blog Posts... 4 Using Backlinks... 4 Using Directories... 5 Using Social Media And Site Maps... 6 The Downfall Of Not Using

More information

CHAPTER-23 MINING COMPLEX TYPES OF DATA

CHAPTER-23 MINING COMPLEX TYPES OF DATA CHAPTER-23 MINING COMPLEX TYPES OF DATA 23.1 Introduction 23.2 Multidimensional Analysis and Descriptive Mining of Complex Data Objects 23.3 Generalization of Structured Data 23.4 Aggregation and Approximation

More information

Four Keys to Creating a Winning SEO Strategy for Healthcare

Four Keys to Creating a Winning SEO Strategy for Healthcare Four Keys to Creating a Winning SEO Strategy for Healthcare SEO Tactics Technical SEO On-Page SEO Off-Site SEO Measure Your SEO Technical SEO What is technical SEO? A strong technical foundation will give

More information

A Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System

A Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System A Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System Takashi Yukawa Nagaoka University of Technology 1603-1 Kamitomioka-cho, Nagaoka-shi Niigata, 940-2188 JAPAN

More information

SELECTED TOPICS in APPLIED COMPUTER SCIENCE

SELECTED TOPICS in APPLIED COMPUTER SCIENCE A Tool for Detecting Detects on Class Implementation in Object Oriented Program on the Basis of the Law of Demeter: Focusing on Dependency between Packages RYOTA CHIBA, HIROAKI HASHIURA and SEIICHI KOMIYA

More information

Worksheet Answer Key: Scanning and Mapping Projects > Mine Mapping > Investigation 2

Worksheet Answer Key: Scanning and Mapping Projects > Mine Mapping > Investigation 2 Worksheet Answer Key: Scanning and Mapping Projects > Mine Mapping > Investigation 2 Ruler Graph: Analyze your graph 1. Examine the shape formed by the connected dots. i. Does the connected graph create

More information

Order from Chaos. University of Nebraska-Lincoln Discrete Mathematics Seminar

Order from Chaos. University of Nebraska-Lincoln Discrete Mathematics Seminar Order from Chaos University of Nebraska-Lincoln Discrete Mathematics Seminar Austin Mohr Department of Mathematics Nebraska Wesleyan University February 8, 20 The (, )-Puzzle Start by drawing six dots

More information

Research on the value of search engine optimization based on Electronic Commerce WANG Yaping1, a

Research on the value of search engine optimization based on Electronic Commerce WANG Yaping1, a 6th International Conference on Machinery, Materials, Environment, Biotechnology and Computer (MMEBC 2016) Research on the value of search engine optimization based on Electronic Commerce WANG Yaping1,

More information

1 The range query problem

1 The range query problem CS268: Geometric Algorithms Handout #12 Design and Analysis Original Handout #12 Stanford University Thursday, 19 May 1994 Original Lecture #12: Thursday, May 19, 1994 Topics: Range Searching with Partition

More information

Heuristic Evaluation Project

Heuristic Evaluation Project INFSCI 2470: Interactive System Design Heuristic Evaluation Project Evaluated System: Course Agent http://wwwsispittedu/~cagent Group Members Abdul Raqeeb Abdul Azeez Arash Farsi Sriranjani Mandayam Denis

More information

Domain Specific Search Engine for Students

Domain Specific Search Engine for Students Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam

More information

Profile Based Information Retrieval

Profile Based Information Retrieval Profile Based Information Retrieval Athar Shaikh, Pravin Bhjantri, Shankar Pendse,V.K.Parvati Department of Information Science and Engineering, S.D.M.College of Engineering & Technology, Dharwad Abstract-This

More information

CREATING A MULTIMEDIA NARRATIVE WITH 1001VOICES

CREATING A MULTIMEDIA NARRATIVE WITH 1001VOICES CREATING A MULTIMEDIA NARRATIVE WITH 1001VOICES Preschool, primary school, junior high and high school March 2015 TALES Comenius Multilateral project, 1 November 2013 1 November 2015. This project has

More information

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the

More information

Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm

Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm K.Parimala, Assistant Professor, MCA Department, NMS.S.Vellaichamy Nadar College, Madurai, Dr.V.Palanisamy,

More information

A Session-based Ontology Alignment Approach for Aligning Large Ontologies

A Session-based Ontology Alignment Approach for Aligning Large Ontologies Undefined 1 (2009) 1 5 1 IOS Press A Session-based Ontology Alignment Approach for Aligning Large Ontologies Editor(s): Name Surname, University, Country Solicited review(s): Name Surname, University,

More information

CS264: Homework #1. Due by midnight on Thursday, January 19, 2017

CS264: Homework #1. Due by midnight on Thursday, January 19, 2017 CS264: Homework #1 Due by midnight on Thursday, January 19, 2017 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. See the course site for submission

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Semantic Search in s

Semantic Search in  s Semantic Search in Emails Navneet Kapur, Mustafa Safdari, Rahul Sharma December 10, 2010 Abstract Web search technology is abound with techniques to tap into the semantics of information. For email search,

More information

4.1 COMPUTATIONAL THINKING AND PROBLEM-SOLVING

4.1 COMPUTATIONAL THINKING AND PROBLEM-SOLVING 4.1 COMPUTATIONAL THINKING AND PROBLEM-SOLVING 4.1.2 ALGORITHMS ALGORITHM An Algorithm is a procedure or formula for solving a problem. It is a step-by-step set of operations to be performed. It is almost

More information

Mathematics of networks. Artem S. Novozhilov

Mathematics of networks. Artem S. Novozhilov Mathematics of networks Artem S. Novozhilov August 29, 2013 A disclaimer: While preparing these lecture notes, I am using a lot of different sources for inspiration, which I usually do not cite in the

More information

Applying Human-Centered Design Process to SystemDirector Enterprise Development Methodology

Applying Human-Centered Design Process to SystemDirector Enterprise Development Methodology Applying Human-Centered Design Process to SystemDirector Enterprise Development HIRAMATSU Takeshi, FUKUZUMI Shin ichi Abstract Human-centered design process is specified in ISO13407 international standard,

More information

Web Page Recommender System based on Folksonomy Mining for ITNG 06 Submissions

Web Page Recommender System based on Folksonomy Mining for ITNG 06 Submissions Web Page Recommender System based on Folksonomy Mining for ITNG 06 Submissions Satoshi Niwa University of Tokyo niwa@nii.ac.jp Takuo Doi University of Tokyo Shinichi Honiden University of Tokyo National

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

XETA: extensible metadata System

XETA: extensible metadata System XETA: extensible metadata System Abstract: This paper presents an extensible metadata system (XETA System) which makes it possible for the user to organize and extend the structure of metadata. We discuss

More information

CS47300 Web Information Search and Management

CS47300 Web Information Search and Management CS47300 Web Information Search and Management Search Engine Optimization Prof. Chris Clifton 31 October 2018 What is Search Engine Optimization? 90% of search engine clickthroughs are on the first page

More information

DEVELOPMENT AND EVALUATION OF A SYSTEM FOR CHECKING FOR IMPROPER SENDING OF PERSONAL INFORMATION IN ENCRYPTED

DEVELOPMENT AND EVALUATION OF A SYSTEM FOR CHECKING FOR IMPROPER SENDING OF PERSONAL INFORMATION IN ENCRYPTED DEVELOPMENT AND EVALUATION OF A SYSTEM FOR CHECKING FOR IMPROPER SENDING OF PERSONAL INFORMATION IN ENCRYPTED E-MAIL Kenji Yasu 1, Yasuhiko Akahane 2, Masami Ozaki 1, Koji Semoto 1, Ryoichi Sasaki 1 1

More information

Chapter 3. Set Theory. 3.1 What is a Set?

Chapter 3. Set Theory. 3.1 What is a Set? Chapter 3 Set Theory 3.1 What is a Set? A set is a well-defined collection of objects called elements or members of the set. Here, well-defined means accurately and unambiguously stated or described. Any

More information

Improving the Performance of the Peer to Peer Network by Introducing an Assortment of Methods

Improving the Performance of the Peer to Peer Network by Introducing an Assortment of Methods Journal of Computer Science 7 (1): 32-38, 2011 ISSN 1549-3636 2011 Science Publications Improving the Performance of the Peer to Peer Network by Introducing an Assortment of Methods 1 M. Sadish Sendil

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

DATA MINING - 1DL105, 1DL111

DATA MINING - 1DL105, 1DL111 1 DATA MINING - 1DL105, 1DL111 Fall 2007 An introductory class in data mining http://user.it.uu.se/~udbl/dut-ht2007/ alt. http://www.it.uu.se/edu/course/homepage/infoutv/ht07 Kjell Orsborn Uppsala Database

More information

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web

More information

CS 160: Evaluation. Professor John Canny Spring /15/2006 1

CS 160: Evaluation. Professor John Canny Spring /15/2006 1 CS 160: Evaluation Professor John Canny Spring 2006 2/15/2006 1 Outline User testing process Severity and Cost ratings Discount usability methods Heuristic evaluation HE vs. user testing 2/15/2006 2 Outline

More information

sitecrafting.com

sitecrafting.com sitecrafting.com hello@sitecrafting.com SEARCH ENGINE OPTIMIZATION In its simplest form, Search Engine Optimization is communicating to search engines the intentions of your website so that your website

More information

II TupleRank: Ranking Discovered Content in Virtual Databases 2

II TupleRank: Ranking Discovered Content in Virtual Databases 2 I Automatch: Database Schema Matching Using Machine Learning with Feature Selection 1 II TupleRank: Ranking Discovered Content in Virtual Databases 2 Jacob Berlin and Amihai Motro 1. Proceedings of CoopIS

More information

Providing quality search in the electronic catalog of scientific library via Yandex.Server

Providing quality search in the electronic catalog of scientific library via Yandex.Server 42 Providing quality search in the electronic catalog of scientific library via Yandex.Server Boldyrev Petr 1[0000-0001-7346-6993] and Krylov Ivan 1[0000-0002-8377-1489] 1 Orenburg State University, Pobedy

More information

Effective Data Transmission in Distributed Visual Sensor Systems

Effective Data Transmission in Distributed Visual Sensor Systems Wei Song Effective Data Transmission in Distributed Visual Sensor Systems Wei Song (weisong@cs.brown.edu) Abstract: Objective of this research project is to design an effective and stable data transmission

More information

3.7 Denotational Semantics

3.7 Denotational Semantics 3.7 Denotational Semantics Denotational semantics, also known as fixed-point semantics, associates to each programming language construct a well-defined and rigorously understood mathematical object. These

More information

Digital Marketing In The Kingdom By Ciaran Doyle for Brains

Digital Marketing In The Kingdom By Ciaran Doyle for Brains Digital Marketing In The Kingdom By Ciaran Doyle for Brains Hold on Cambodia? According to TNS Cambodia, Internet penetration is at 38%. Nearly a half in urban locations access the web, while only a third

More information

Design and Realization of Data Mining System based on Web HE Defu1, a

Design and Realization of Data Mining System based on Web HE Defu1, a 4th International Conference on Machinery, Materials and Computing Technology (ICMMCT 2016) Design and Realization of Data Mining System based on Web HE Defu1, a 1 Department of Quartermaster, Wuhan Economics

More information

Development of Hybrid-Type Navigation System

Development of Hybrid-Type Navigation System Tomohiro SAKAI Yohei OKAMOTO Shoji FUJIMOTO Katsuyuki NAKAI Abstract The environment surrounding in-vehicle devices is going through a revolutionary change. Along with the popularization of smartphones,

More information

Order from Chaos. Nebraska Wesleyan University Mathematics Circle

Order from Chaos. Nebraska Wesleyan University Mathematics Circle Order from Chaos Nebraska Wesleyan University Mathematics Circle Austin Mohr Department of Mathematics Nebraska Wesleyan University February 2, 20 The (, )-Puzzle Start by drawing six dots at the corners

More information

I. INTRODUCTION. Fig Taxonomy of approaches to build specialized search engines, as shown in [80].

I. INTRODUCTION. Fig Taxonomy of approaches to build specialized search engines, as shown in [80]. Focus: Accustom To Crawl Web-Based Forums M.Nikhil 1, Mrs. A.Phani Sheetal 2 1 Student, Department of Computer Science, GITAM University, Hyderabad. 2 Assistant Professor, Department of Computer Science,

More information

MATRIX BASED INDEXING TECHNIQUE FOR VIDEO DATA

MATRIX BASED INDEXING TECHNIQUE FOR VIDEO DATA Journal of Computer Science, 9 (5): 534-542, 2013 ISSN 1549-3636 2013 doi:10.3844/jcssp.2013.534.542 Published Online 9 (5) 2013 (http://www.thescipub.com/jcs.toc) MATRIX BASED INDEXING TECHNIQUE FOR VIDEO

More information

The Design of Model for Tibetan Language Search System

The Design of Model for Tibetan Language Search System International Conference on Chemical, Material and Food Engineering (CMFE-2015) The Design of Model for Tibetan Language Search System Wang Zhong School of Information Science and Engineering Lanzhou University

More information

Divisibility Rules and Their Explanations

Divisibility Rules and Their Explanations Divisibility Rules and Their Explanations Increase Your Number Sense These divisibility rules apply to determining the divisibility of a positive integer (1, 2, 3, ) by another positive integer or 0 (although

More information

Marking Scheme Class X INFORMATION TECHNOLOGY(402)

Marking Scheme Class X INFORMATION TECHNOLOGY(402) Marking Scheme-2018-19 Class X INFORMATION TECHNOLOGY(402) Time: 2.5 Hrs Maximum Marks: 50 -------------------------------------------------------------------------------------------------------------------------------

More information

6 TOOLS FOR A COMPLETE MARKETING WORKFLOW

6 TOOLS FOR A COMPLETE MARKETING WORKFLOW 6 S FOR A COMPLETE MARKETING WORKFLOW 01 6 S FOR A COMPLETE MARKETING WORKFLOW FROM ALEXA DIFFICULTY DIFFICULTY MATRIX OVERLAP 6 S FOR A COMPLETE MARKETING WORKFLOW 02 INTRODUCTION Marketers use countless

More information

Pedestrian Detection Using Correlated Lidar and Image Data EECS442 Final Project Fall 2016

Pedestrian Detection Using Correlated Lidar and Image Data EECS442 Final Project Fall 2016 edestrian Detection Using Correlated Lidar and Image Data EECS442 Final roject Fall 2016 Samuel Rohrer University of Michigan rohrer@umich.edu Ian Lin University of Michigan tiannis@umich.edu Abstract

More information

Recitation 4: Elimination algorithm, reconstituted graph, triangulation

Recitation 4: Elimination algorithm, reconstituted graph, triangulation Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Recitation 4: Elimination algorithm, reconstituted graph, triangulation

More information

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

The Comparative Study of Machine Learning Algorithms in Text Data Classification* The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification

More information

Lecture #3: PageRank Algorithm The Mathematics of Google Search

Lecture #3: PageRank Algorithm The Mathematics of Google Search Lecture #3: PageRank Algorithm The Mathematics of Google Search We live in a computer era. Internet is part of our everyday lives and information is only a click away. Just open your favorite search engine,

More information

International Journal of Scientific & Engineering Research Volume 2, Issue 12, December ISSN Web Search Engine

International Journal of Scientific & Engineering Research Volume 2, Issue 12, December ISSN Web Search Engine International Journal of Scientific & Engineering Research Volume 2, Issue 12, December-2011 1 Web Search Engine G.Hanumantha Rao*, G.NarenderΨ, B.Srinivasa Rao+, M.Srilatha* Abstract This paper explains

More information

Similarity search in multimedia databases

Similarity search in multimedia databases Similarity search in multimedia databases Performance evaluation for similarity calculations in multimedia databases JO TRYTI AND JOHAN CARLSSON Bachelor s Thesis at CSC Supervisor: Michael Minock Examiner:

More information

Python & Web Mining. Lecture Old Dominion University. Department of Computer Science CS 495 Fall 2012

Python & Web Mining. Lecture Old Dominion University. Department of Computer Science CS 495 Fall 2012 Python & Web Mining Lecture 6 10-10-12 Old Dominion University Department of Computer Science CS 495 Fall 2012 Hany SalahEldeen Khalil hany@cs.odu.edu Scenario So what did Professor X do when he wanted

More information

DATA MINING II - 1DL460. Spring 2014"

DATA MINING II - 1DL460. Spring 2014 DATA MINING II - 1DL460 Spring 2014" A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt14 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Dr. Relja Vulanovic Professor of Mathematics Kent State University at Stark c 2008

Dr. Relja Vulanovic Professor of Mathematics Kent State University at Stark c 2008 MATH-LITERACY MANUAL Dr. Relja Vulanovic Professor of Mathematics Kent State University at Stark c 2008 1 Real Numbers 1.1 Sets 1 1.2 Constants and Variables; Real Numbers 7 1.3 Operations with Numbers

More information

9.4 SOME CHARACTERISTICS OF INTEGER PROGRAMS A SAMPLE PROBLEM

9.4 SOME CHARACTERISTICS OF INTEGER PROGRAMS A SAMPLE PROBLEM 9.4 SOME CHARACTERISTICS OF INTEGER PROGRAMS A SAMPLE PROBLEM Whereas the simplex method is effective for solving linear programs, there is no single technique for solving integer programs. Instead, a

More information

The Application Research of Semantic Web Technology and Clickstream Data Mart in Tourism Electronic Commerce Website Bo Liu

The Application Research of Semantic Web Technology and Clickstream Data Mart in Tourism Electronic Commerce Website Bo Liu International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) The Application Research of Semantic Web Technology and Clickstream Data Mart in Tourism Electronic Commerce

More information

Query Evaluation Strategies

Query Evaluation Strategies Introduction to Search Engine Technology Term-at-a-Time and Document-at-a-Time Evaluation Ronny Lempel Yahoo! Labs (Many of the following slides are courtesy of Aya Soffer and David Carmel, IBM Haifa Research

More information