A NEW PERFORMANCE EVALUATION TECHNIQUE FOR WEB INFORMATION RETRIEVAL SYSTEMS

Size: px
Start display at page:

Download "A NEW PERFORMANCE EVALUATION TECHNIQUE FOR WEB INFORMATION RETRIEVAL SYSTEMS"

Transcription

1 A NEW PERFORMANCE EVALUATION TECHNIQUE FOR WEB INFORMATION RETRIEVAL SYSTEMS Fidel Cacheda, Francisco Puentes, Victor Carneiro Department of Information and Communications Technologies, University of A Coruña Facultad de Informática, Campus de Elviña s/n, A Coruña, Spain* {fidel, fpuentes, vicar}@udc.es ABSTRACT The performance evaluation of an information retrieval system is a decisive aspect for the measure of the improvements in search technology. Our work intends to provide a framework to compare and contrast the performance of search engines in a research environment. In this way, we have designed and developed USim, a tool for the performance evaluation of Web IR systems based on the simulation of users behavior. This simulation tool contributes in the performance evaluation process in two ways: estimating the saturation threshold of the system and in the comparison of different search algorithms or engines. The latter point is the most interesting because, as we demonstrated, the comparison using different workload environments will achieve more accurate results (avoiding erroneous conclusions derived from ideal environments). From a general point of view, USim intends to be an approximation to some new performance evaluation techniques specifically developed for the Internet search engines. KEYWORDS Web Information Retrieval, performance evaluation, simulation. 1. INTRODUCTION With the exponential increase of the Web there has also been a growing interest in the study of a variety of topics related to the use of the Web. There is a special interest in finding patterns in and computing statistics of Internet search engines users and some articles that analyze the query logs of several commercial search engines have been published (Cacheda and Viña, 21), (Jansen et al., 1998), (Kirsch, 1998), (Silverstein et al., 1999) and (Spink et al., 22). These studies examine the search process developed by Web users: terms per query, top terms, operators and modifiers, sessions, etc. Moreover, the performance evaluation of an Information Retrieval (IR) system is a decisive aspect for the measure of the improvements in search technology, as it is derived from the Text Retrieval Conference (TREC) and the WEB-TREC conferences. Zobel, Moffat and Ramamohanarao (1996) describe the guidelines for evaluating the performance and comparing several indexing and retrieval techniques. The main criteria for the comparison are the following ones: the scalability of the system, the response time for the search process (which is perhaps the single crucial test for an indexing scheme), disk space used by the index data structures, CPU time, disk traffic and memory requirements. Nevertheless, the response time is not an easily estimable parameter because it is the aggregate of many other parameters. The same problem arises in the Web IR systems, and gets complicated because these systems must operate under different workload situations, especially with a high number of concurrent users. In fact, in the WEB-TREC one of the measures to obtain is the response time of each request sent to the system (Hawking et al., 1999). However, the response times are computed in an ideal situation: without any workload on the system. This work is focused on the development and testing of search engines in a research environment, in order to provide a framework to compare and contrast the performance obtained after a change in the IR system. With the changes referring to any part of the search engine: the index organization, the search algorithm, the system architecture or the system configuration. For this purpose, we have designed and developed USim, a tool for the performance evaluation of Web IR systems based on the simulation of users behavior, in order to compare more accurately the performance

2 of different search systems. Moreover, we tested this performance evaluation tool in a real environment, in two ways: estimating the saturation threshold of the system and comparing the performance of different search algorithms or engines. This paper is structured as follows. It starts with an overview of the related research. Next we describe USim, the simulation tool proposed and the next section details the results obtained in the performance evaluation of Web IR systems. Finally the main conclusions are exposed. 2. RELATED STUDIES Recently there have been some studies that examine the behavior of Web search users while they are using an Internet search engine or a Web directory. The first study was performed by Kirsh (1998) who presented some search statistics of Infoseek usage. A bit later, Jansen et al. (1998) presented a study of queries, sessions and searched terms obtained in the query logs of Excite. Spink et al. (22) analyze the changes in the search topics for Excite through several years. Silverstein et al. (1999) examined a very large number of queries taken from AltaVista logs, studying not only the queries but also the co-ocurrences among them. Cacheda and Viña (21) investigated the queries, categories browsed and the documents retrieved by the users of a Web directory. These studies show that most Web users enter few queries consisting of few search terms, have difficulty using effective keyword or Boolean queries and conduct little query reformulation. Also, Jansen and Pooch (21) provide a framework for the development and comparison of future query log analysis, comparing the searching characteristics of Web users with users of other systems. Latterly Spink and Ozmultu (22) analyzed the characteristics of question format web queries and Ozmultu, Spink and Ozmultu (22) explored a sampling technique for the correct statistical analysis of large data sets of web queries. On the other side, the importance of the performance evaluation is well known and it is fundamental to obtain the effectiveness and response time measures of the IR system. For the traditional IR systems, this evaluation is performed using the methodology created in the TREC evaluation program undertaken by the US National Institute of Standards and Technology (TREC, 24). In the case of the Web IR systems, the WEB-TREC was created allowing the measurement of speed (using the response time) and effectiveness (using the precision and recall parameters). But the Web IR systems must operate under different workload situations, especially with a high number of concurrent users. Consequently, it would be quite important to measure the response time of a Web IR system under different workload situations. Clearly, a search engine cannot be put into production before being evaluated and so, no real workload situations (using real users) could be used. Therefore, it is fundamental to obtain a user s profile that could be simulated, by means of the query logs analysis of a Web IR system, to later examine the simulation of different workloads (as a different number of requests) and its effect on the response time of a search engine, in order to improve the performance evaluation methodology. The user s profile is based on the work by Cacheda and Viña (21), where two basic conclusions are obtained for the simulation of the users behavior: - The searches, categories browsed and documents visited fit an Exponential distribution, with the mean variable through the time. - There is a linear relationship between the number of searches, categories browsed and documents visited in a period of time. 3. USIM: A PERFORMANCE EVALUATION TOOL USim (Users Simulator) is an application that simulates the users behavior while using a search engine, using the results obtained in some previous works. It has been designed to operate with any type of search engine using the HTTP protocol. USim will send multiple requests to an IR system in the same way that a group of users would have done. Three types of requests are supported: searches, browsed categories and visited documents. The requests will be sent following an Exponential distribution, as derived from the previous section and the number of requests per minute is defined as a parameter. In this way, USim can generate different supervised workload environments over an IR system.

3 USim was designed and implemented for research purposes in order to evaluate the performance of a Web IR system in a LAN, before being available on the Web. This guarantees that the network latency is negligible and so, the response times obtained only measure the search engine response times. 3.1 Design and implementation USim was designed using the object oriented methodology and completely developed in the object oriented language Java, in order to build a multiplatform application and facilitate its operation in any environment. The simulation tool is composed of three main modules that operate concurrently, associated with the three types of requests: searches, categories and documents. The three types of requests are generated in a similar way. Each type of request is managed by its own process, and at the same time, each request is processed independently by its own thread. As described in the previous section, the time between two consecutive requests fits an Exponential distribution and the simulation is done using the inversion method over the distribution function of the Exponential distribution. Then, a request is generated, submitted to the search system and then the response is processed when it is received. The generation of the request is handled in different ways depending on the type of the request. In the case of the searches a whole query must be created. Starting with the search string, the analysis carried out by Cacheda and Viña (21) showed that the search strings examined did not fit the Zipf s law (Zipf, 1949), confirming the results of Jansen et al. (1998). Consequently, a mathematical model cannot be defined and so an empirical distribution is used. USim was designed to operate with any empirical search string distribution and in our experiments the distribution obtained by Cacheda and Viña (21) was used, consisting of 26,654 search strings with their respective frequencies. For the rest of the search parameters (e.g. number of results displayed per page) the default values are used. The previous descriptive works by Jansen et al. (1998) and Cacheda and Viña (21) conclude that the high majority of the users do not change the default values in any search parameter. In case of browsing a category or visiting a document only an identifier is needed (of a category or a document, respectively). The best way to obtain a list of categories or documents identifiers is through the own simulation process, because there is no mathematical distribution that fits the categories or documents visited by the users (Cacheda and Viña, 21). Thus, the simulation tool obtains the categories and documents identifiers by either the searching or browsing processes, and then they are stored in their respective caches. Each identifier has a finite life in the cache, which is a configuration parameter (typically the average length of a user s session). When a request of a category or document is performed, the corresponding identifier is randomly selected from the cache. Once the request is generated it will be sent to the search engine using the HTTP protocol. The HTTPClient API (Tschalar, 23) for Java was used for this purpose. The URL for each type of request and the name of the parameters required must be defined in the configuration of USim. When the HTTP response has been received a simple and configurable parsing process is used to extract some relevant information from the HTML document received, but only for the searches or categories requests. As it was previously described, the categories and document identifiers are extracted from this response to be later stored in their respective simulation caches. And also two values are obtained from the result page: the number of categories and the total number of documents retrieved in the answer. Finally, for each request sent to the IR system the following information is stored in an output file for a subsequent analysis: - Timestamp: date and time when the answer was received from the retrieval system. - Request identifier: depending on the type of request, the search string or the category or document identifier. - Response time: the time since the request was sent until the response was received (the HTML document was completely received). - Images: the number of images included in the answer. - Response time images: the additional time needed to download the images. - And for the searches and categories requests, the number of categories and documents retrieved, obtained from the parsing process.

4 Figure 1: General configuration for USim Figure 2: Searches configuration for USim 3.2 Operation In this section we briefly describe the main functionality characteristics of this simulation tool, with the objective of speed up the understanding and comprehension of the performance evaluation situations described in the next section. Firstly it is important to mention that USim can operate with a user interface or in batch mode, using proprietary configuration files. The Figure 1 shows the graphic interface used to configure the general parameters that are related to the whole simulation process. In this part, the user can determine the length of the simulation and the life in cache of the category and document identificatiers gathered by USim during the simulation process. The configuration can be stored in order to use the simulation tool in the batch mode. When the simulation starts the application checks which types of requests must be sent to the IR system, and each module is started independently. In this way, USim can be used with the main types of IR system: Web directories, search engines and metasearch engines. If the system analyzed is a Web directory, USim must send searches and accesses to categories, meanwhile if the system analyzed is a search engine or a metasearch engine only searches must be sent. The module of visits to documents is included because some IR systems include a middle page between the search results and the final document, which also increases the load of the system. The remaining of the user interface is used to configure the parameters of each type of request. These parameters are very similar for the three types of requests and so we only describe the search configuration (see Figure 2). The main parameters are the number of searches per minute and the URL of the search system. Also, the names of the parameters needed to perform the search, commonly the search string, and the number of results to obtain and the position of the first result. The simulation tool will generate the values associated to these parameters and invoke the search engine passing these parameters and the generated values, using either a GET or POST method. The searches file contains the empirical distribution of the search strings that will be used by USim and, the output file will store all the information for the search requests (timestamp, request identifier, response time and so on). It is important to mention that the number of searches per minute is not static but it can dynamically change during the simulation using the parameters Increase in and every minute. This will help in the evaluation of the Web IR systems under different workloads using only one simulation process. The configuration of categories and documents is quite similar. In this case, the number of categories browsed and documents visited per minute are automatically estimated using the linear relationship obtained in (Cacheda and Viña, 21), although these and the rest of the parameters can be directly modified by the user. 4. PERFORMANCE EVALUATION RESULTS USim is a simulation tool designed and implemented for the performance evaluation of any type of Web IR system, especially for research purposes in a local environment with negligible network latency.

5 USim can be used and configured in two different ways to measure two different parameters of the performance of an IR system. The first one is the classical denial of service method that will measure the maximum number of requests supported by the system (named saturation threshold). The second one is more interesting for research purposes because it will measure the response time of a search engine under different workload situations. This leads directly to the comparison of the performance for variations of the same search engine or even completely different IR systems. 4.1 Saturation threshold One of the critical measures for any Web IR system is the maximum number of requests supported in a minute. It is evident that starting from a certain threshold the performance of the system decreases suddenly, increasing the response times up to the denial of service. This point is named saturation threshold. To establish the saturation threshold is fundamental to take some preventive actions (such as, application management techniques or the incorporation of new hardware) before the denial of service is produced. In this case, USim can be easily configured to simulate the effect of multiple simultaneous users sending requests in a controlled environment to the IR system. For this purpose, an experiment was designed where the saturation threshold must be measured for a prototype of a basic Web directory installed in a Sun Ultra Enterprise 25, with one 3 MHz CPU and 768 MB of main memory. This basic Web directory consists of approximately 1 categories and more than 5, classified documents, and its architecture is described in (Cacheda and Viña, 23). USim is configured to send requests to the IR system and to periodically increase the workload. For all the requests the response time is measured. The simulation tool is configured to start with 5 searches per minute (and 4.1 browsed categories per minute and 7.5 viewed documents per minute). The initial value of searches per minute is increased every 1 minutes in 1 search per minute (and also the equivalent increase is performed in the number of categories and documents). The results are showed in Figure 3 and Figure 4. The first graph (Figure 3) presents the response times of the searches performed to the system through the whole simulation. The image is quite clear: approximately above 21 searches per minute the response times start augmenting rapidly. At this point, every new query that is requested to the system will increase the load of the system and worsen the situation. This condition will stop when the number of requests per minute decreases. Figure 4 illustrates the number of errors pages returned by the system and obviously, the IR system operates perfectly until the number of searches is superior to 21 requests per minute. The simulation process also obtains the response times for the browsed categories and the viewed documents, but these graphs are quite similar and don t contribute with any new relevant information. Therefore, using USim the saturation threshold is established in 21 searches per minute (and the respective values for the categories browsed and viewed documents) Response time (ms) Errors Searches per minute Figure 3: Response time vs. searches per minute Searches per minute Figure 4: System errors vs. searches per minute

6 4.2 Performance comparison The main goal of the performance evaluation of an IR system is the measure of the response times to compare and analyze the effect of some changes in the search system, in order to distinguish the real improvements obtained. The Web IR systems are under different workload levels through the time, starting with periods of low workload until situations of high workload or even saturation. Obviously, the performance of the system depends on the workload of each moment. Therefore, the performance evaluation must be done considering different workload situations in order to elaborate a more complete and representative study. In the work described in (Cacheda and Viña, 23) the performance of three different indexing techniques must be compared. In this case, the performance of a type of searches characteristic of the Web directories, named restricted searches was analyzed. A restricted search is a common search, but the results must belong to one category or any of its descendants. The first indexing technique uses a basic architecture based on inverted files and constitutes our baseline. The other two indexing techniques (named, hybrid model with total information and hybrid model with partial information) are based on a hybrid model of signature files embedded into inverted files. The hybrid model with total information will use the hybrid data structure for all the categories of the Web directory, increasing 1% the size of the index. Whereas the hybrid model with partial information only applies the hybrid data structure to the categories of the first levels of the Web directory, reducing approximately in 5% the size of the index. For more details about the hybrid data model and the two variants defined refer to (Cacheda and Viña, 23) and (Cacheda and Baeza-Yates, 24). Each of these search algorithms has been developed and tested on a Sun Ultra Enterprise 25, with one 3 MHz CPU and 768 MB of main memory. The methodology for the performance comparison is based on two units of USim, which are executed simultaneously. The first one will generate the workload on the IR system and the second one will send the restricted queries to test the performance of the system. The performance evaluation is performed over five different workload situations: null, low, medium, high and saturation. The first unit of USim was configured to generate these static workloads with the following average values: - Null: searches/minute, browsed categories/minute and viewed documents/minute. - Low: 5 searches/minute, 4.1 browsed categories/minute and 7.5 viewed documents/minute. - Medium: 12 searches/minute, 8.5 browsed categories/minute and 16.2 viewed documents/minute. - High: 19 searches/minute, 12.9 browsed categories/minute and 24.9 viewed documents/minute. - Saturation: 23 searches/minute, 15.4 browsed categories/minute and 29.9 viewed documents/minute. For each workload, the first unit of USim will send requests to the IR system, and after a stabilization period the second unit of Usim will be executed to measure the response time of the restricted queries. This second unit will use a reduced set of queries specifically designed to analyze the effects of some relevant parameters on the response time. In this case, two parameters are considered: the number of results obtained by the query and the number of documents associated with the restricted category. Therefore, a set of eight queries retrieving from to 2 documents was selected and three different categories were selected to Response time (msegs) Basic Hybrid total info Hybrid parcial info Response time (msegs) Basic Hybrid total info Hybrid parcial info Number of results Figure 5: Response time (null workload) Number of results Figure 6: Response time (low workload)

7 Response time (msegs) Basic Hybrid total info Hybrid parcial info Response time (msegs) Basic Hybrid total info Hybrid parcial info Number of results Number of results Figure 7: Response (medium workload) Figure 8: Response time (high workload) restrict the queries (with 2, 1 and 5 documents associated, respectively). In Figure 5, Figure 6, Figure 7 and Figure 8 we show the results obtained for the comparison of the three algorithms. All the experiments were analyzed using the ANOVA test. Three factors are defined in the ANOVA: the type of model (basic, hybrid model with total information and hybrid model with partial information), the number of results retrieved by the query and the number of documents associated with the restricted category. Obviously, the number of results and the number of documents associated with the restricted category are relevant factors of the response time, and the objective is to determine if the type of model is also relevant, comparing the performance of the hybrid models versus the basic model. Figure 5 represents an ideal situation, and it is clear that the hybrid models perform much better than the baseline. In fact, if the query gets more than 5 results, the response times are reduced in 5% in the hybrid models versus the basic model. The ANOVA test considered relevant the three factors analyzed (R square =.988). The same situation is represented in Figure 6 and Figure 7, but with low and medium workload, respectively. The behavior of the three algorithms is equivalent to the previous one, except that the response times are slightly higher (the three factor are also relevant in the ANOVA test, with R square =.923 and R square =.92, respectively). But the situation changes in Figure 8 where a high workload is generated in the IR system. The ANOVA test still considers relevant the three factors, with a high R square (R square =.857). The most relevant aspect is that the performance of the hybrid model with total information is deteriorated, performing similarly to the baseline. While, the hybrid model with partial information keeps the performance improvement on 5% versus the baseline, and now also versus the hybrid model with total information. The hybrid data structure defined on both hybrid models seems to clearly improve the performance for the restricted queries. Although, in a high workload environment the disk operations are the bottleneck and so, the hybrid model with total information, with its higher index size, is penalized. The saturation workload is not described because its ANOVA test shows that only a small part of the variation in the response times is explained (R square =.536) and therefore its results are not significant. This experiment demonstrates the importance of considering the workload in any IR system, and specifically in the Web IR systems. Initially, both hybrid models behaved in a similar way, performing 5% better than the baseline. But, in the end, we have found out that only the hybrid model with partial information is able to keep the improvement in the performance in all the circumstances, whereas the hybrid model with total information decreases its performance in high workload situations, due to its higher disk requirements. 5. CONCLUSION With the emergence of the Web IR systems some new measures were defined for the evaluation of the retrieval (relative precision, relevant/useful pages, etc.). But this must also involve the performance evaluation of these retrieval systems. Therefore, in this paper we have presented USim, a simulation tool for the performance evaluation of Web IR systems based on the simulation of users behavior. This simulation

8 tool helps in the performance evaluation of Web IR systems in two different ways: estimating the saturation threshold of the system and comparing the performance of different search algorithms or engines. To establish the saturation threshold is fundamental before any Internet search engine is put into production because it will estimate the maximum load that the system will be able to bear before degrading its performance. So, before this point is reached, some preventive actions can be taken to increase the processing capacity of the system or to avoid this threshold using application management techniques. The second point is the most interesting because, traditionally, to measure the response time of any IR system has been fundamental in the performance evaluation. However, the Internet search engines must operate under different workload situations, with a high number of concurrent users. In this way, the response times must be measured considering different workload situations; otherwise erroneous conclusions can be achieved. The results obtained prove that a comparison only in null workload environments is not enough because we have discovered how the performance of a search algorithm seemed to be appropriate in a low workload environment, whereas its performance decreases suddenly in a high workload situation. From a general point of view, with USim we intend to create an approximation to some new performance evaluation techniques specifically developed for the Internet search engines. The use of the simulation for the performance evaluation of Internet search engines seems promising; mainly because the response times can be estimated more accurately considering different workload environments. For further research, an interesting point is the extension of USim to operate on a WAN. At the moment, the main limitation of our simulation tool is to operate on a LAN, where the network latency is negligible. However, on a WAN the network latency must be estimated to obtain the actual search engine response time. Also, in our future work we will concentrate in the improvement of USim in order to make it publicly available, trying to develop a more generic application suitable to any type of retrieval system. This implies that an advance must be done in the information extraction of the results Web pages (where the use of XML seems promising) and in the different parameters used by the search engines URLs. REFERENCES Cacheda, F. and Viña, A., 21. Experiencies retrieving information in the World Wide Web. Proceedings of the 6th IEEE Symposium on Computers and Communications. Hammamet, Tunisia, pp Cacheda, F. and Viña, A., 23, Optimization of Restricted Searches in Web Directories Using Hybrid Data Structures. In Lecture Notes on Computer Science Vol. 2633, pp Cacheda, F. and Baeza-Yates, R., 24, An Optimistic Model for Searching Web Directories. In Lecture Notes on Computer Science Vol. 2997, pp Hawking, D. et al., Results and challenges in Web search evaluation. In Proceedings of the 8th World Wide Web Conference, Toronto, Canada, pp Jansen, B. and Pooch, U., 21. Web User Studies: A Review and Framekwork for Future Work. In Journal of the American Society of Information Science and Technology, Vol. 52, No 3, pp Jansen, B. et al., Real Life Information Retrieval: A Study Of User Queries On The Web. In SIGIR FORUM, Vol. 32, No 1, pp Kirsch, S., Infoseek s experiences searching the Internet. In SIGIR FORUM, Vol. 32, No. 2, pp TREC, 24. Text REtrieval Conference, NIST, TREC home page. Ozmultu, H.C., Spink, A. and Ozmultu, S., 22. Analysis of large data logs: an application of Poisson sampling on Excite web queries. In Information Processing and Management Vol. 38, pp Silverstein, C. et al., Analysis of a Very Large Web Search Engine Query Log. In SIGIR FORUM, Vol. 33, No. 1, pp Spink, A. and Ozmultu, H. C., 22. Characteristics of question format web queries: an exploratory study. In Information Processing and Management Vol. 38, pp A. Spink, B. Jansen, D. Wolfram, T. Saracevic, From E-sex to E-commerce: Web Search Changes. IEEE Computer, 35(3): , 22. R. Tschalar, HTTPClient Zipf, G., Human behaviour and the principle of least effort. Ed. Addison-Wesley. Zobel, J., Moffat, A. and Ramamohanarao, K., Guidelines for Presentation and Comparison of Indexing Techniques. In ACM SIGMOD Record, Vol. 25, No. 3, pp

ARCHITECTURE AND IMPLEMENTATION OF A NEW USER INTERFACE FOR INTERNET SEARCH ENGINES

ARCHITECTURE AND IMPLEMENTATION OF A NEW USER INTERFACE FOR INTERNET SEARCH ENGINES ARCHITECTURE AND IMPLEMENTATION OF A NEW USER INTERFACE FOR INTERNET SEARCH ENGINES Fidel Cacheda, Alberto Pan, Lucía Ardao, Angel Viña Department of Tecnoloxías da Información e as Comunicacións, Facultad

More information

Cacheda, F. and Carneiro, V. and Plachouras, V. and Ounis, I. (2005) Performance comparison of clustered and replicated information retrieval systems. Lecture Notes in Computer Science 4425:pp. 124-135.

More information

Query Modifications Patterns During Web Searching

Query Modifications Patterns During Web Searching Bernard J. Jansen The Pennsylvania State University jjansen@ist.psu.edu Query Modifications Patterns During Web Searching Amanda Spink Queensland University of Technology ah.spink@qut.edu.au Bhuva Narayan

More information

Automatic New Topic Identification in Search Engine Transaction Log Using Goal Programming

Automatic New Topic Identification in Search Engine Transaction Log Using Goal Programming Proceedings of the 2012 International Conference on Industrial Engineering and Operations Management Istanbul, Turkey, July 3 6, 2012 Automatic New Topic Identification in Search Engine Transaction Log

More information

Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines

Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines Appears in WWW 04 Workshop: Measuring Web Effectiveness: The User Perspective, New York, NY, May 18, 2004 Enabling Users to Visually Evaluate the Effectiveness of Different Search Queries or Engines Anselm

More information

A Model for Interactive Web Information Retrieval

A Model for Interactive Web Information Retrieval A Model for Interactive Web Information Retrieval Orland Hoeber and Xue Dong Yang University of Regina, Regina, SK S4S 0A2, Canada {hoeber, yang}@uregina.ca Abstract. The interaction model supported by

More information

Mining the Query Logs of a Chinese Web Search Engine for Character Usage Analysis

Mining the Query Logs of a Chinese Web Search Engine for Character Usage Analysis Mining the Query Logs of a Chinese Web Search Engine for Character Usage Analysis Yan Lu School of Business The University of Hong Kong Pokfulam, Hong Kong isabellu@business.hku.hk Michael Chau School

More information

An Analysis of Document Viewing Patterns of Web Search Engine Users

An Analysis of Document Viewing Patterns of Web Search Engine Users An Analysis of Document Viewing Patterns of Web Search Engine Users Bernard J. Jansen School of Information Sciences and Technology The Pennsylvania State University 2P Thomas Building University Park

More information

Using Clusters on the Vivisimo Web Search Engine

Using Clusters on the Vivisimo Web Search Engine Using Clusters on the Vivisimo Web Search Engine Sherry Koshman and Amanda Spink School of Information Sciences University of Pittsburgh 135 N. Bellefield Ave., Pittsburgh, PA 15237 skoshman@sis.pitt.edu,

More information

Repeat Visits to Vivisimo.com: Implications for Successive Web Searching

Repeat Visits to Vivisimo.com: Implications for Successive Web Searching Repeat Visits to Vivisimo.com: Implications for Successive Web Searching Bernard J. Jansen School of Information Sciences and Technology, The Pennsylvania State University, 329F IST Building, University

More information

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost

More information

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk

More information

RMIT University at TREC 2006: Terabyte Track

RMIT University at TREC 2006: Terabyte Track RMIT University at TREC 2006: Terabyte Track Steven Garcia Falk Scholer Nicholas Lester Milad Shokouhi School of Computer Science and IT RMIT University, GPO Box 2476V Melbourne 3001, Australia 1 Introduction

More information

A User Study on Features Supporting Subjective Relevance for Information Retrieval Interfaces

A User Study on Features Supporting Subjective Relevance for Information Retrieval Interfaces A user study on features supporting subjective relevance for information retrieval interfaces Lee, S.S., Theng, Y.L, Goh, H.L.D., & Foo, S. (2006). Proc. 9th International Conference of Asian Digital Libraries

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab M. Duwairi* and Khaleifah Al.jada'** * Department of Computer Information Systems, Jordan University of Science and Technology, Irbid 22110,

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab Duwairi Dept. of Computer Information Systems Jordan Univ. of Sc. and Technology Irbid, Jordan rehab@just.edu.jo Khaleifah Al.jada' Dept. of

More information

Parallel Algorithms on Clusters of Multicores: Comparing Message Passing vs Hybrid Programming

Parallel Algorithms on Clusters of Multicores: Comparing Message Passing vs Hybrid Programming Parallel Algorithms on Clusters of Multicores: Comparing Message Passing vs Hybrid Programming Fabiana Leibovich, Laura De Giusti, and Marcelo Naiouf Instituto de Investigación en Informática LIDI (III-LIDI),

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS Chapter 6 Indexing Results 6. INTRODUCTION The generation of inverted indexes for text databases is a computationally intensive process that requires the exclusive use of processing resources for long

More information

Performance Evaluation of XHTML encoding and compression

Performance Evaluation of XHTML encoding and compression Performance Evaluation of XHTML encoding and compression Sathiamoorthy Manoharan Department of Computer Science, University of Auckland, Auckland, New Zealand Abstract. The wireless markup language (WML),

More information

Web document summarisation: a task-oriented evaluation

Web document summarisation: a task-oriented evaluation Web document summarisation: a task-oriented evaluation Ryen White whiter@dcs.gla.ac.uk Ian Ruthven igr@dcs.gla.ac.uk Joemon M. Jose jj@dcs.gla.ac.uk Abstract In this paper we present a query-biased summarisation

More information

Improving Range Query Performance on Historic Web Page Data

Improving Range Query Performance on Historic Web Page Data Improving Range Query Performance on Historic Web Page Data Geng LI Lab of Computer Networks and Distributed Systems, Peking University Beijing, China ligeng@net.pku.edu.cn Bo Peng Lab of Computer Networks

More information

Multicast Transport Protocol Analysis: Self-Similar Sources *

Multicast Transport Protocol Analysis: Self-Similar Sources * Multicast Transport Protocol Analysis: Self-Similar Sources * Mine Çağlar 1 Öznur Özkasap 2 1 Koç University, Department of Mathematics, Istanbul, Turkey 2 Koç University, Department of Computer Engineering,

More information

BEx Front end Performance

BEx Front end Performance BUSINESS INFORMATION WAREHOUSE BEx Front end Performance Performance Analyses of BEx Analyzer and Web Application in the Local and Wide Area Networks Environment Document Version 1.1 March 2002 Page 2

More information

Improving object cache performance through selective placement

Improving object cache performance through selective placement University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2006 Improving object cache performance through selective placement Saied

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN: IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T

More information

Domain Specific Search Engine for Students

Domain Specific Search Engine for Students Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam

More information

VIDEO SEARCHING AND BROWSING USING VIEWFINDER

VIDEO SEARCHING AND BROWSING USING VIEWFINDER VIDEO SEARCHING AND BROWSING USING VIEWFINDER By Dan E. Albertson Dr. Javed Mostafa John Fieber Ph. D. Student Associate Professor Ph. D. Candidate Information Science Information Science Information Science

More information

Contents Overview of the Compression Server White Paper... 5 Business Problem... 7

Contents Overview of the Compression Server White Paper... 5 Business Problem... 7 P6 Professional Compression Server White Paper for On-Premises Version 17 July 2017 Contents Overview of the Compression Server White Paper... 5 Business Problem... 7 P6 Compression Server vs. Citrix...

More information

QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER

QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER Hardware Sizing Using Amazon EC2 A QlikView Scalability Center Technical White Paper June 2013 qlikview.com Table of Contents Executive Summary 3 A Challenge

More information

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL Lim Bee Huang 1, Vimala Balakrishnan 2, Ram Gopal Raj 3 1,2 Department of Information System, 3 Department

More information

Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied

Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied Information Processing and Management 43 (2007) 1044 1058 www.elsevier.com/locate/infoproman Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied Anselm Spoerri

More information

A Query-Level Examination of End User Searching Behaviour on the Excite Search Engine. Dietmar Wolfram

A Query-Level Examination of End User Searching Behaviour on the Excite Search Engine. Dietmar Wolfram A Query-Level Examination of End User Searching Behaviour on the Excite Search Engine Dietmar Wolfram University of Wisconsin Milwaukee Abstract This study presents an analysis of selected characteristics

More information

A JAVA META-REGISTRY FOR REMOTE SERVICE OBJECTS

A JAVA META-REGISTRY FOR REMOTE SERVICE OBJECTS A JAVA META-REGISTRY FOR REMOTE SERVICE OBJECTS Dott. Marco Bianchi Netlab, Istituto di Analisi dei Sistemi ed Informatica del C.N.R Viale Manzoni, 30 00185 Rome Italy Dott. Carlo Gaibisso Istituto di

More information

Load Balancing Algorithm over a Distributed Cloud Network

Load Balancing Algorithm over a Distributed Cloud Network Load Balancing Algorithm over a Distributed Cloud Network Priyank Singhal Student, Computer Department Sumiran Shah Student, Computer Department Pranit Kalantri Student, Electronics Department Abstract

More information

Dietmar Wolfram University of Wisconsin--Milwaukee. Abstract

Dietmar Wolfram University of Wisconsin--Milwaukee. Abstract ,QIRUPLQJ 6FLHQFH 6SHFLDO,VVXH RQ,QIRUPDWLRQ 6FLHQFH 5HVHDUFK 9ROXPH 1R APPLICATIONS OF INFORMETRICS TO INFORMATION RETRIEVAL RESEARCH Dietmar Wolfram University of Wisconsin--Milwaukee dwolfram@uwm.edu

More information

Designing Issues For Distributed Computing System: An Empirical View

Designing Issues For Distributed Computing System: An Empirical View ISSN: 2278 0211 (Online) Designing Issues For Distributed Computing System: An Empirical View Dr. S.K Gandhi, Research Guide Department of Computer Science & Engineering, AISECT University, Bhopal (M.P),

More information

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2 Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 1907-1911 1907 Web-Based Data Mining in System Design and Implementation Open Access Jianhu

More information

Integrating VVVVVV Caches and Search Engines*

Integrating VVVVVV Caches and Search Engines* Global Internet: Application and Technology Integrating VVVVVV Caches and Search Engines* W. Meira Jr. R. Fonseca M. Cesario N. Ziviani Department of Computer Science Universidade Federal de Minas Gerais

More information

Design and Implementation of Search Engine Using Vector Space Model for Personalized Search

Design and Implementation of Search Engine Using Vector Space Model for Personalized Search Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 1, January 2014,

More information

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of California, Berkeley Operating Systems Principles

More information

Automatic Query Type Identification Based on Click Through Information

Automatic Query Type Identification Based on Click Through Information Automatic Query Type Identification Based on Click Through Information Yiqun Liu 1,MinZhang 1,LiyunRu 2, and Shaoping Ma 1 1 State Key Lab of Intelligent Tech. & Sys., Tsinghua University, Beijing, China

More information

P6 Compression Server White Paper Release 8.2 December 2011 Copyright Oracle Primavera P6 Compression Server White Paper Copyright 2005, 2011, Oracle and/or its affiliates. All rights reserved. Oracle

More information

Speed and Accuracy using Four Boolean Query Systems

Speed and Accuracy using Four Boolean Query Systems From:MAICS-99 Proceedings. Copyright 1999, AAAI (www.aaai.org). All rights reserved. Speed and Accuracy using Four Boolean Query Systems Michael Chui Computer Science Department and Cognitive Science Program

More information

Efficient World-Wide-Web Information Gathering. Tian Fanjiang Wang Xidong Wang Dingxing

Efficient World-Wide-Web Information Gathering. Tian Fanjiang Wang Xidong Wang Dingxing Efficient World-Wide-Web Information Gathering Tian Fanjiang Wang Xidong Wang Dingxing (Department of Computer Science and Technology, Tsinghua University, Beijing 100084,tfj@www.cs.tsinghua.edu.cn) Abstract

More information

A PRELIMINARY STUDY ON THE EXTRACTION OF SOCIO-TOPICAL WEB KEYWORDS

A PRELIMINARY STUDY ON THE EXTRACTION OF SOCIO-TOPICAL WEB KEYWORDS A PRELIMINARY STUDY ON THE EXTRACTION OF SOCIO-TOPICAL WEB KEYWORDS KULWADEE SOMBOONVIWAT Graduate School of Information Science and Technology, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033,

More information

Internet Usage Transaction Log Studies: The Next Generation

Internet Usage Transaction Log Studies: The Next Generation Internet Usage Transaction Log Studies: The Next Generation Sponsored by SIG USE Dietmar Wolfram, Moderator. School of Information Studies, University of Wisconsin-Milwaukee Milwaukee, WI 53201. dwolfram@uwm.edu

More information

WHAT HAPPENS IF WE SWITCH THE DEFAULT LANGUAGE OF A WEBSITE?

WHAT HAPPENS IF WE SWITCH THE DEFAULT LANGUAGE OF A WEBSITE? WHAT HAPPENS IF WE SWITCH THE DEFAULT LANGUAGE OF A WEBSITE? Te Taka Keegan, Sally Jo Cunningham Computer Science Department, University of Waikato,Hamilton, New Zealand Email: tetaka@cs.waikato.ac.nz,

More information

A Capacity Planning Methodology for Distributed E-Commerce Applications

A Capacity Planning Methodology for Distributed E-Commerce Applications A Capacity Planning Methodology for Distributed E-Commerce Applications I. Introduction Most of today s e-commerce environments are based on distributed, multi-tiered, component-based architectures. The

More information

Spoken Document Retrieval (SDR) for Broadcast News in Indian Languages

Spoken Document Retrieval (SDR) for Broadcast News in Indian Languages Spoken Document Retrieval (SDR) for Broadcast News in Indian Languages Chirag Shah Dept. of CSE IIT Madras Chennai - 600036 Tamilnadu, India. chirag@speech.iitm.ernet.in A. Nayeemulla Khan Dept. of CSE

More information

Tuning WebHound 4.0 and SAS 8.2 for Enterprise Windows Systems James R. Lebak, Unisys Corporation, Malvern, PA

Tuning WebHound 4.0 and SAS 8.2 for Enterprise Windows Systems James R. Lebak, Unisys Corporation, Malvern, PA Paper 272-27 Tuning WebHound 4.0 and SAS 8.2 for Enterprise Windows Systems James R. Lebak, Unisys Corporation, Malvern, PA ABSTRACT Windows is SAS largest and fastest growing platform. Windows 2000 Advanced

More information

Revisiting Join Site Selection in Distributed Database Systems

Revisiting Join Site Selection in Distributed Database Systems Revisiting Join Site Selection in Distributed Database Systems Haiwei Ye 1, Brigitte Kerhervé 2, and Gregor v. Bochmann 3 1 Département d IRO, Université de Montréal, CP 6128 succ Centre-Ville, Montréal

More information

Compressing and Decoding Term Statistics Time Series

Compressing and Decoding Term Statistics Time Series Compressing and Decoding Term Statistics Time Series Jinfeng Rao 1,XingNiu 1,andJimmyLin 2(B) 1 University of Maryland, College Park, USA {jinfeng,xingniu}@cs.umd.edu 2 University of Waterloo, Waterloo,

More information

Deep Web Content Mining

Deep Web Content Mining Deep Web Content Mining Shohreh Ajoudanian, and Mohammad Davarpanah Jazi Abstract The rapid expansion of the web is causing the constant growth of information, leading to several problems such as increased

More information

Configuration Management for Component-based Systems

Configuration Management for Component-based Systems Configuration Management for Component-based Systems Magnus Larsson Ivica Crnkovic Development and Research Department of Computer Science ABB Automation Products AB Mälardalen University 721 59 Västerås,

More information

This is a repository copy of A Rule Chaining Architecture Using a Correlation Matrix Memory.

This is a repository copy of A Rule Chaining Architecture Using a Correlation Matrix Memory. This is a repository copy of A Rule Chaining Architecture Using a Correlation Matrix Memory. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/88231/ Version: Submitted Version

More information

I. INTRODUCTION FACTORS RELATED TO PERFORMANCE ANALYSIS

I. INTRODUCTION FACTORS RELATED TO PERFORMANCE ANALYSIS Performance Analysis of Java NativeThread and NativePthread on Win32 Platform Bala Dhandayuthapani Veerasamy Research Scholar Manonmaniam Sundaranar University Tirunelveli, Tamilnadu, India dhanssoft@gmail.com

More information

Design and Implementation of A P2P Cooperative Proxy Cache System

Design and Implementation of A P2P Cooperative Proxy Cache System Design and Implementation of A PP Cooperative Proxy Cache System James Z. Wang Vipul Bhulawala Department of Computer Science Clemson University, Box 40974 Clemson, SC 94-0974, USA +1-84--778 {jzwang,

More information

Flash Drive Emulation

Flash Drive Emulation Flash Drive Emulation Eric Aderhold & Blayne Field aderhold@cs.wisc.edu & bfield@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison Abstract Flash drives are becoming increasingly

More information

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific

More information

Ch. 7: Benchmarks and Performance Tests

Ch. 7: Benchmarks and Performance Tests Ch. 7: Benchmarks and Performance Tests Kenneth Mitchell School of Computing & Engineering, University of Missouri-Kansas City, Kansas City, MO 64110 Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 1/3 Introduction

More information

DATA MINING II - 1DL460. Spring 2014"

DATA MINING II - 1DL460. Spring 2014 DATA MINING II - 1DL460 Spring 2014" A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt14 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL

QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL David Parapar, Álvaro Barreiro AILab, Department of Computer Science, University of A Coruña, Spain dparapar@udc.es, barreiro@udc.es

More information

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision

More information

Anatomy of a Semantic Virus

Anatomy of a Semantic Virus Anatomy of a Semantic Virus Peyman Nasirifard Digital Enterprise Research Institute National University of Ireland, Galway IDA Business Park, Lower Dangan, Galway, Ireland peyman.nasirifard@deri.org Abstract.

More information

GENERATING SUPPLEMENTARY INDEX RECORDS USING MORPHOLOGICAL ANALYSIS FOR HIGH-SPEED PARTIAL MATCHING ABSTRACT

GENERATING SUPPLEMENTARY INDEX RECORDS USING MORPHOLOGICAL ANALYSIS FOR HIGH-SPEED PARTIAL MATCHING ABSTRACT GENERATING SUPPLEMENTARY INDEX RECORDS USING MORPHOLOGICAL ANALYSIS FOR HIGH-SPEED PARTIAL MATCHING Masahiro Oku NTT Affiliated Business Headquarters 20-2 Nishi-shinjuku 3-Chome Shinjuku-ku, Tokyo 163-1419

More information

A Tree-based Inverted File for Fast Ranked-Document Retrieval

A Tree-based Inverted File for Fast Ranked-Document Retrieval A Tree-based Inverted File for Fast Ranked-Document Retrieval Wann-Yun Shieh Tien-Fu Chen Chung-Ping Chung Department of Computer Science and Information Engineering National Chiao Tung University Hsinchu,

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

COVER SHEET. Accessed from Copyright 2003 Elsevier.

COVER SHEET. Accessed from   Copyright 2003 Elsevier. COVER SHEET Ozmutlu, Seda and Spink, Amanda and Ozmutlu, Huseyin C. (2003) Multimedia web searching trends: 1997-2001. Information Processing and Management 39(4):pp. 611-621. Accessed from http://eprints.qut.edu.au

More information

ON NEW STRATEGY FOR PRIORITISING THE SELECTED FLOW IN QUEUING SYSTEM

ON NEW STRATEGY FOR PRIORITISING THE SELECTED FLOW IN QUEUING SYSTEM ON NEW STRATEGY FOR PRIORITISING THE SELECTED FLOW IN QUEUING SYSTEM Wojciech Burakowski, Halina Tarasiuk,RyszardSyski Warsaw University of Technology, Poland Institute of Telecommunications 00-665-Warsaw,

More information

code pattern analysis of object-oriented programming languages

code pattern analysis of object-oriented programming languages code pattern analysis of object-oriented programming languages by Xubo Miao A thesis submitted to the School of Computing in conformity with the requirements for the degree of Master of Science Queen s

More information

Inverted Indexes. Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5

Inverted Indexes. Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5 Inverted Indexes Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5 Basic Concepts Inverted index: a word-oriented mechanism for indexing a text collection to speed up the

More information

Equivalence Detection Using Parse-tree Normalization for Math Search

Equivalence Detection Using Parse-tree Normalization for Math Search Equivalence Detection Using Parse-tree Normalization for Math Search Mohammed Shatnawi Department of Computer Info. Systems Jordan University of Science and Tech. Jordan-Irbid (22110)-P.O.Box (3030) mshatnawi@just.edu.jo

More information

International Jmynal of Intellectual Advancements and Research in Engineering Computations

International Jmynal of Intellectual Advancements and Research in Engineering Computations www.ijiarec.com ISSN:2348-2079 DEC-2015 International Jmynal of Intellectual Advancements and Research in Engineering Computations VIRTUALIZATION OF DISTIRIBUTED DATABASES USING XML 1 M.Ramu ABSTRACT Objective

More information

Review of. Amanda Spink. and her work in. Web Searching and Retrieval,

Review of. Amanda Spink. and her work in. Web Searching and Retrieval, Review of Amanda Spink and her work in Web Searching and Retrieval, 1997-2004 Larry Reeve for Dr. McCain INFO861, Winter 2004 Term Project Table of Contents Background of Spink 2 Web Search and Retrieval

More information

A Firewall Architecture to Enhance Performance of Enterprise Network

A Firewall Architecture to Enhance Performance of Enterprise Network A Firewall Architecture to Enhance Performance of Enterprise Network Hailu Tegenaw HiLCoE, Computer Science Programme, Ethiopia Commercial Bank of Ethiopia, Ethiopia hailutegenaw@yahoo.com Mesfin Kifle

More information

Subjective Relevance: Implications on Interface Design for Information Retrieval Systems

Subjective Relevance: Implications on Interface Design for Information Retrieval Systems Subjective : Implications on interface design for information retrieval systems Lee, S.S., Theng, Y.L, Goh, H.L.D., & Foo, S (2005). Proc. 8th International Conference of Asian Digital Libraries (ICADL2005),

More information

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction Adaptable and Adaptive Web Information Systems School of Computer Science and Information Systems Birkbeck College University of London Lecture 1: Introduction George Magoulas gmagoulas@dcs.bbk.ac.uk October

More information

A Model for Information Retrieval Agent System Based on Keywords Distribution

A Model for Information Retrieval Agent System Based on Keywords Distribution A Model for Information Retrieval Agent System Based on Keywords Distribution Jae-Woo LEE Dept of Computer Science, Kyungbok College, 3, Sinpyeong-ri, Pocheon-si, 487-77, Gyeonggi-do, Korea It2c@koreaackr

More information

STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE

STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE Wei-ning Qian, Hai-lei Qian, Li Wei, Yan Wang and Ao-ying Zhou Computer Science Department Fudan University Shanghai 200433 E-mail: wnqian@fudan.edu.cn

More information

Okapi in TIPS: The Changing Context of Information Retrieval

Okapi in TIPS: The Changing Context of Information Retrieval Okapi in TIPS: The Changing Context of Information Retrieval Murat Karamuftuoglu, Fabio Venuti Centre for Interactive Systems Research Department of Information Science City University {hmk, fabio}@soi.city.ac.uk

More information

Overview of the INEX 2009 Link the Wiki Track

Overview of the INEX 2009 Link the Wiki Track Overview of the INEX 2009 Link the Wiki Track Wei Che (Darren) Huang 1, Shlomo Geva 2 and Andrew Trotman 3 Faculty of Science and Technology, Queensland University of Technology, Brisbane, Australia 1,

More information

A Visualization Program for Subset Sum Instances

A Visualization Program for Subset Sum Instances A Visualization Program for Subset Sum Instances Thomas E. O Neil and Abhilasha Bhatia Computer Science Department University of North Dakota Grand Forks, ND 58202 oneil@cs.und.edu abhilasha.bhatia@my.und.edu

More information

Analyzing Web Multimedia Query Reformulation Behavior

Analyzing Web Multimedia Query Reformulation Behavior Analyzing Web Multimedia Query Reformulation Behavior Liang-Chun Jack Tseng Faculty of Science and Queensland University of Brisbane, QLD 4001, Australia ntjack.au@hotmail.com Dian Tjondronegoro Faculty

More information

SCALING UP VS. SCALING OUT IN A QLIKVIEW ENVIRONMENT

SCALING UP VS. SCALING OUT IN A QLIKVIEW ENVIRONMENT SCALING UP VS. SCALING OUT IN A QLIKVIEW ENVIRONMENT QlikView Technical Brief February 2012 qlikview.com Introduction When it comes to the enterprise Business Discovery environments, the ability of the

More information

CRAWLING THE CLIENT-SIDE HIDDEN WEB

CRAWLING THE CLIENT-SIDE HIDDEN WEB CRAWLING THE CLIENT-SIDE HIDDEN WEB Manuel Álvarez, Alberto Pan, Juan Raposo, Ángel Viña Department of Information and Communications Technologies University of A Coruña.- 15071 A Coruña - Spain e-mail

More information

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014.

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014. A B S T R A C T International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Information Retrieval Models and Searching Methodologies: Survey Balwinder Saini*,Vikram Singh,Satish

More information

Information Retrieval Spring Web retrieval

Information Retrieval Spring Web retrieval Information Retrieval Spring 2016 Web retrieval The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically infinite due to the dynamic

More information

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Abstract Deciding on which algorithm to use, in terms of which is the most effective and accurate

More information

Evaluation of Performance of Cooperative Web Caching with Web Polygraph

Evaluation of Performance of Cooperative Web Caching with Web Polygraph Evaluation of Performance of Cooperative Web Caching with Web Polygraph Ping Du Jaspal Subhlok Department of Computer Science University of Houston Houston, TX 77204 {pdu, jaspal}@uh.edu Abstract This

More information

WebSphere Application Server 6.1 Base Performance September WebSphere Application Server 6.1 Base Performance

WebSphere Application Server 6.1 Base Performance September WebSphere Application Server 6.1 Base Performance WebSphere Application Server 6.1 Base Performance September 2008 WebSphere Application Server 6.1 Base Performance Table of Contents Introduction to the WebSphere Application Server performance tests...

More information

Switched FC-AL: An Arbitrated Loop Attachment for Fibre Channel Switches

Switched FC-AL: An Arbitrated Loop Attachment for Fibre Channel Switches Switched FC-AL: An Arbitrated Loop Attachment for Fibre Channel Switches Vishal Sinha sinha@cs.umn.edu Department of Computer Science and Engineering University of Minnesota Minneapolis, MN 55455 7481

More information

Efficient Queuing Architecture for a Buffered Crossbar Switch

Efficient Queuing Architecture for a Buffered Crossbar Switch Proceedings of the 11th WSEAS International Conference on COMMUNICATIONS, Agios Nikolaos, Crete Island, Greece, July 26-28, 2007 95 Efficient Queuing Architecture for a Buffered Crossbar Switch MICHAEL

More information

Spemmet - A Tool for Modeling Software Processes with SPEM

Spemmet - A Tool for Modeling Software Processes with SPEM Spemmet - A Tool for Modeling Software Processes with SPEM Tuomas Mäkilä tuomas.makila@it.utu.fi Antero Järvi antero.jarvi@it.utu.fi Abstract: The software development process has many unique attributes

More information

IJMIE Volume 2, Issue 9 ISSN:

IJMIE Volume 2, Issue 9 ISSN: WEB USAGE MINING: LEARNER CENTRIC APPROACH FOR E-BUSINESS APPLICATIONS B. NAVEENA DEVI* Abstract Emerging of web has put forward a great deal of challenges to web researchers for web based information

More information

Software re-use assessment for quality M. Ramachandran School of Computing and Mathematical Sciences, Jo/m Moores C/mrerszZ?/,

Software re-use assessment for quality M. Ramachandran School of Computing and Mathematical Sciences, Jo/m Moores C/mrerszZ?/, Software re-use assessment for quality M. Ramachandran School of Computing and Mathematical Sciences, Jo/m Moores C/mrerszZ?/, ABSTRACT Reuse of software components can improve software quality and productivity

More information

Capturing and Formalizing SAF Availability Management Framework Configuration Requirements

Capturing and Formalizing SAF Availability Management Framework Configuration Requirements Capturing and Formalizing SAF Availability Management Framework Configuration Requirements A. Gherbi, P. Salehi, F. Khendek and A. Hamou-Lhadj Electrical and Computer Engineering, Concordia University,

More information

IMPROVING THE DATA COLLECTION RATE IN WIRELESS SENSOR NETWORKS BY USING THE MOBILE RELAYS

IMPROVING THE DATA COLLECTION RATE IN WIRELESS SENSOR NETWORKS BY USING THE MOBILE RELAYS IMPROVING THE DATA COLLECTION RATE IN WIRELESS SENSOR NETWORKS BY USING THE MOBILE RELAYS 1 K MADHURI, 2 J.KRISHNA, 3 C.SIVABALAJI II M.Tech CSE, AITS, Asst Professor CSE, AITS, Asst Professor CSE, NIST

More information

Performance Extrapolation for Load Testing Results of Mixture of Applications

Performance Extrapolation for Load Testing Results of Mixture of Applications Performance Extrapolation for Load Testing Results of Mixture of Applications Subhasri Duttagupta, Manoj Nambiar Tata Innovation Labs, Performance Engineering Research Center Tata Consulting Services Mumbai,

More information

CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL

CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LVII, Number 4, 2012 CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL IOAN BADARINZA AND ADRIAN STERCA Abstract. In this paper

More information