An Agent for Semi-automatic Management of s

Size: px
Start display at page:

Download "An Agent for Semi-automatic Management of s"

Transcription

1 An Agent for Semi-automatic Management of s Fangfang Xia a and Liu Wenyin b a Dept. of Computer Science & Technology, Tsinghua University, Beijing , China b Dept. of Computer Science, City University of Hong Kong, Hong Kong SAR, China xff99@mails.tsinghua.edu.cn; csliuwy@cityu.edu.hk ABSTRACT Recent growth in the use of s for communication and the corresponding growth in the volume of s have made automatic processing of desirable. However, most existing systems failed to work in practice due to low classification accuracy and inconvenient user interfaces. In this paper, we present an adaptive Personal Agent (PEA) which can learn the mail handling preferences of its user and automatically categorize and manage its user s s. One of the key ideas in this approach is extracting both the high-level semantic features (e.g., concept information) from the body text and other low-level features (e.g., sender, time, importance, etc.) from the entire message for similarity assessment based on the standard Information Retrieval (IR) approach. Another main contribution of our work is establishing both global and local information space models for building relevance categories based on the user s folders. Besides, a query refinement strategy is incorporated to make the agent act as an incremental learner. That is, it can adjust its working strategy based on only the new examples and avoid a total re-training using all previous examples. To test the effectiveness of our system, we did experiments on its two main functions, retrieval and relevance categorization and obtained preliminary promising results. Keywords: Overload, Management, Example -based Learning, Information Retrieval, Content-based Retrieval, Relevance Categories, Query Refinement, Personal Agent (PEA) 1. INTRODUCTION The explosion in electronic communication is dramatically changing the way people interact with one another. overload [1,2] has become a growing problem since more and more users are embracing the online technologies in recent years. According to Forrester Research, 7 trillion s are sent per day in 2002 and an estimated 81 percent of organizations that introduced to improve their efficiency now complain that is becoming a victim of its own success. IDC estimates that in 2002 the average business user spends an average of over 2.4 hours a day just dealing with an average of 30 work-related messages [2]. These numbers are still increasing or updated every day. To address the problem of overload, many researchers have done evaluation of some common manual management strategies for s, including Piorritizers, archivers [3], No filers, Spring cleaners, Frequent filers [2], and Folderless cleaners [4]. Whittaker and Sidner [2] have found that a major aim of filing is to reduce the huge number of undifferentiated inbox items into a relatively small set of folders each containing multiple related messages. Balter [5] has developed a mathematical model to illustrate that storage time is the major time consumer for users with more than a thousand stored messages and the best long term strategy is to use folders sparsely (4 to 20) in combination with the search functionality. He suggest those users who want to use folders use agents that can automatically suggest folders for archiving since the agents could help reduce the storage time drastically and a larger number of folders may help reduce the time to retrieve a message. Hence, the early research focused on a variety of machine learning techniques to classify s into folders. Among the famous prototypes, SwiftFile used shortcut buttons to archive messages into folders, but only when initiated by the user [6]. Mock used a nearest-neighbor classifier to group inbox s into categories in his experimental framework [7]. Some projects, such as Enfish Onespace, and Metastorm s infowise, use information retrieval techniques to measure similarities among folders or individual messages [8]. Other companies, such as Abridge, Plumtree, and Tacit, use rules or user-supplied categories to group s. There are also flexible organizers. For example, the Gnus news and mail reading system [9], distributed with recent versions of GNU Emacs has hooks that allow installation of arbitrary programs for filtering and foldering news and mail. Furthermore, there are several open-source readers which could be modified to include a hook for arbitrary classifiers [10]. With the vast amount of interest and research that has been accomplished with automatic categorization, why hasn t the concept been incorporated into existing readers? The current difficulties with automatic 1

2 organization exist in the following aspects. First, the user s folders are usually not well organized and they change over time as new messages are received; this inbox irregularity has set hurdles for accurate classification. Second, most of the learning algorithms are based on statistics, and for the algorithms to perform well, a large amount of data must be on hand; the training time is usually considerable. Third, many of the current algorithms do not learn incrementally : they update by requiring a complete re-training based upon all data, including the original training messages. Fourth, most existing systems provided limited user-oriented functions; they do not allow classification into multiple categories and use imp licit rules that users cannot adjust. In this paper, we focus on the issue of automatic categorization to save the time on archiving (when there are a large number of folders) and present an example-based semi-automatic learning approach for this purpose. A prototype system Personal Agent (PEA) is built based on this approach, which can adapt to an individual user by learning his/her management preferences from the interaction examples between the user and the system. Based on the user s preferences, PEA can automatically categorize and manage his/her incoming and/or stored s. One of the key ideas in this approach is extracting both the high-level semantic features (e.g., concept information) from the text and other low-level features (e.g., sender, time, importance, etc.) from the entire message for similarity assessment. Another main contribution of our work is establishing both global and local information space models for building relevance categories based on the user s folders. Besides, a query refinement strategy is incorporated to make the agent act as an incremental learner. Experiments have shown the effectiveness of the proposed approach. The remainder of this paper is structured as follows. In Section 2, we present our solution of the Personal Agent and describe its system architecture and user interface. We then present the core algorithms and other implementation details in Section 3. We will also show the preliminary experimental results of the agent in Section 4. Finally, we conclude and present some directions for future work. 2. SOLUTIONS Many of the difficulties described with classification may be alleviated through better classifiers, while another way to resolve these difficulties is to sidestep the entire problem with an alternate technology. We adopt one alternate technology, Relevance Categories [8], which addresses some of the same information management issues as automatic classification while avoiding many of the problems discussed in the previous section. In order to utilize as much detail information as possible, we extract all useful features from an message, including sender, receipt, time, topic, body, etc. Different methods are then employed to compute the similarities respectively. The overall similarity between two messages is the weighted sum of these features. Note that, different sets of weights are assigned to the features in different folders. Learning from the user s feedback, the weights can be adjusted automatically to represent more exactly the user s preferences to the diverse features within one folder and thus refine the query of this folder. 2.1 Architecture The architecture of our agent system is shown in Figure 1. The system consists of two components: the user interface, and the core component of Personal Agent. The user interface is divided into four parts: two functional parts and two peripheral ones. The functional parts include an retrieval interface and an classification interface, both of which provide user feedback interfaces. The system configuration part is where the user can set the parameters and manually adjust part of the folder space coefficients. The non-feedback function part consists of some auxiliary functions such as events logging and message filing according to their category. In the core component, we have three spaces, i.e., the weights space, the local information space and the global information space, five modules, namely, the feature extractor, the nearest-neighbor similarity evaluator, the inverted indexer, the matcher and the relevance categorizer, and finally two databases which store low-level features and high-level semantic features, respectively. They work together to perform both function and feedback routines. A typical scenario of the system is as follows. Upon installation of the agent, the feature extractor scans all the s in the user s personal folder; both low-level and high-level features of the s are extracted and the corresponding databases are constructed. Then, the nearest-neighbor similarity evaluator and the inverted indexer work simultaneously. The indexer builds the global information space for each folder according to the existing inbox structure; the evaluator compares s within each folder to set up the local information space and decide the initial weights for the features. Once the three space models are available, the matcher compares the user s query with the local space model of s to 2

3 yield the ret rieval results and the outcome is given in the form of a rank list. The user can denote irrelevant s which are ranked improperly high, and thus the negative feedback is applied. The relevance categorizer is triggered when a new message comes in or the user adjusts the inbox structure, e.g., moving s from one folder to another or creating new folders. In these occasions, the agent first updates its database and space models and then refreshes its classification. The agent learns from user feedbacks by refining inner space models to yield more accurate results in the future. 2.2 User Interface Figure 1. Architecture of the PEA We implement our Personal Agent as an add-in in Microsoft Outlook 2002 on Windows XP. The basic interface is a supplemental command bar which is indicated within the red (or gray) rectangle (containing the Retrieval, Archive, and Settings buttons) in the upper-right part of Figure 2. Upon the first time startup, the scanning process is performed which automatically creates a category out of every folder the user maintains. The messages in the folder are then associated with that category. While the agent is enabled, new s are automatically classified into the best matching folder. They are only grouped together but not moved immediately. The user can view inbox s that are grouped into categories and make the mails really go to their assigned folders simply by clicking the Archive button. When the user manually adjusts the categorization result in the inbox or move mail from one folder to another, relevance feedbacks are provided and the learning process is then 3

4 triggered. In these occasions, the agent will automatically show the accompanying changes it made and the user can cancel some of them. The Retrieval button is used to aid users that wish to search s. This function provides the capability to quickly display a list of messages ranked by relevance (using the similarity metrics) to the selected messages. In this manner, other messages in the same thread or in the same topic will be displayed at the top of the list. The feedback mechanism is also provided for the retrieval function. Finally, the Settings button is for users to access and change the agent s parameters such as constants and feature weights. Users can also enable or disable some non-feedback functions and change the running modes there. Figure 2. User Interface of the PEA 3. ALGORITHMS AND IMPLEMENTATIONS 3.1 Feature Extracting and Similarity Assessment There are two kinds of features that can be used in our agent. One is low-level feature, such as sender, time, importance, etc. The other is high-level semantic feature extracted from the subject and body of an . We compute first the similarity between two e mails at each level and then calculated their weighted sum as the overall similarity. We implement the relevant retrieval functionality of our agent by similarity assessment. All the s are compared with the query one and then sorted in the descending order of their similarities. A high rank usually indicates significant relevancy Low-level features We have extracted eight basic low-level features in our agent. They are sender, recipients, creation time, importance, body format and three Boolean variables (IsRead, IsReplied and IsWithAttachment). To compute the similarity, we also incorporate an additive feature sender-recipients which is useful in some particular occasions. This is not another independent feature; we add it mostly because of the following concern: In a quite frequent occasion, a user wants to keep all his correspondence with a person in the same folder. However, either the sender or recipients feature alone cannot help him. For example, two s, one from A to B and the other from B to A, are obviously related, but the similarities calculated based on sender and recipients are both 0. In such case, the sender-recipients feature mingles the sender and receivers into one set and the similarity calculated on it should be 1. This feature is also useful for work groups. The similarities corresponding to each of the features is computed differently and their detailed calculation methods will be presented in an extended version of this paper. 4

5 3.1.2 High-level features We have extracted two high-level features in our agent. They are subject and body. Since they are both text features, we use the same method to get the comparing results. Our implementation is based upon an inverted index with integrated TF/IDF [11] values. The detailed algorithm will be presented in an extended version of this paper The overall similarity Although there may be many sophisticated similarity assessment methods, we use the simplest similarity models to obtain the overall similarity. With high-level and low-level similarities calculated separately, the overall similarity is simply calculated as the liner combination of them. Note that different folders are assigned with different sets of weights and they are consistent ly refined by user s feedbacks. This is the key point for our agent to gain intelligence and will be further discussed in the following sections. 3.2 Folder Space and Relevance Categories A key function of our agent is to classify s according to existing folders. Section 3.1 gives an algorithm of computing the similarity between two individual messages. In order to assess the similarity between a message and a folder, we should also build a user folder space model, through which the nature of different folders could be well characterized. Many existing systems achieve this goal by assigning each folder a vector compatible with the vector. Since such vector is usually the average of all the s in the folder, its weakness in classifying is obvious as described in Section 1. To utilize as much detail information as possible, we explore both global and local properties of a folder in establishing its space model. (More exactly, folder here should be replaced by relevance category, a concept that will be discussed soon.) Global Information : Global information of a folder is the semantic information of all the messages in that folder (As we shall introduce the relevance categories concept in the following text, the messages in the category linked with this folder should also be included). The messages are concatenated and treated like a single document. The N most frequent terms (either from the body or the subject field) and term frequencies are extracted. (In our agent, N was set to 50 by default.) The resulting terms comprise part of the query for the category that it represents. Note that as the set of messages changes, the queries are simple to update. All that is required is to re-compute the term frequencies. Local Information: Local information of a folder is obtained by the simple nearest-neighbor method. Given a target message to classify, its features are extracted and compared to all messages in the folder using the algorithm introduced in Section 3.1. The top M matches are averaged as the local measure for the category. M was set to 3 by default in our agent. The introduction of local information should be helpful since some users maintain too generic folders (e.g., Projects ) encompassing multiple irrelevant sub-categories. It is also useful when dealing with topic-drift occasions. The basic concept of Relevance Categories [8] is to provide the same functionality as regular folders or categories. Users can assign to categories, or remove them from categories just like they are normally used to. Relevance Categories are initially built based on the existing folders in the user s inbox. When new s come in, they are automatically assigned to one category by our agent. The user can manually correct the wrongly classifications or assign one to multiple categories. In these occasions, our agent will refine the queries based on the feedbacks, trying to approach more precisely to the user s subjective intention. Otherwise, the newly assigned s will be regarded as members of its category from then on, even though their real movements to the destination folders will not be applied until the user explicitly perform the Archive function of our agent. In the computation of the -category similarity, a unique weight vector indicating the user s preference placed on different features is assigned to each category to obtain the weighted feature sum. Apart from the global and local information, this weight vector is another important part of the folder space model, which alone builds up the Weights Space. How to compute the weight vector and adjust it based on user feedbacks thus becomes the central problem in our query refinement strategy. 3.3 Query Refinement Strategy Queries are created for each relevance category. Corresponding to the folder space model, the query refinement strategy for our agent could also be divided into two parts, the global query refinement and the local query refinement. 5

6 Global query refinement is an approach to the precise representation of the global semantic feature of a category. Negative training could be employed for s the user explicitly denotes as not belonging to the category. These might arise in the agent s retrieval function if the user wishes to apply corrective action to highly ranked messages so that they are displayed toward the bottom of the list. To apply negative training, the N most frequent terms are extracted from the negative examples and subtracted from the N most frequent terms from the positive examples. This may result in some terms with negative frequencies. Local query refinement is mainly the adjustment of the weight vector mentioned in Section 3.2. Our agent learns from user feedbacks in order that the weight vector will more and more tally with the user s subjective emphasis on the features. The detailed algorithm is presented in an extended version of this paper. 4. PERFORMANCE EVALUATION In order to test the two main functions of our agent, retrieval and classification, we designed two corresponding experiments. Since the effectiveness of the relevance categories on the purely semantic feature, i.e. our global information space, has been tested by Mock over the Reuters corpus [8], we will only concentrate on the overall performance of our agent on the multi-feature basis. The test data we use are mainly the daily s of the authors. The volume is not very large (about 1000). However, it represents a typical user s situation well. 4.1 Retrieval Accuracy In this experiment, we randomly select a number of s (the number is less than 20, since usually a user does not have the patience to select more than 5 s in each iteration or go over more than 4 iterations) belonging to the same category as query (positive feedback) examples and do retrieval. Since we exactly use 100 s as our ground truth for each query and we also only actually check first 100 s, the value of precision and recall are the same. Therefore, we use the term accuracy to refer to both. The results are show in Figure 3, with the x axis being the number of query (positive feedback) s and the y axis the average retrieval accuracy. As the figure shows, the average accuracy of retrieval exceeds 50% when the number of query s reaches 10. Accuracy Number of Query s Figure 3. Retrieval Accuracy 4.2 Categorization Accuracy and Feature Abilities The second experiment evaluates the performance of the categorizer on learning a user s mail sorting preferences from hand-sorted mails. The input data are six months of the first author s sorted mails. Table 1 shows the folders and distribution of messages in the data set. These data pose an interesting challenge for a learning system. Not only is the distribution of messages in the folders highly non-uniform, but the selection of folders for messages is also strongly idiosyncratic. While the content of the folder FROM HER was exclusively determined by a single keyword match (sender= Arendt ), other folders were not determined by a single keyword match with the from or to fields, but rather by the subjective judgment of the first author of this paper of what folder would be the best mnemonic for later retrieval of the message based on its content, time, recipients, etc. For example, the REMINDER folder only maintains 6

7 s received within the recent week, while the E -MAGAZINE folder contains various HTML messages the first author of this paper subscribed from various websites. In this case, the task of the agent is to learn a model of the user s sorting preferences. Table 1. Hand Archived s in Our Experiments. Folder Name Count Percentage CS % E-MAGAZINE % FROM HER % MISCELLANEOUS % PERSONAL % PHILOSPHY GROOP % PROJECTS % REMINDER % SOCCER % Total Exemples % (a) (b) Figure 4. (a) Categorization Accuracy and (b) Feature Discrimination Abilities The results of this experiment are shown in Figures 4 (a) and (b). Through learning, the agent achieves 82% test accuracy after 100 training examples and 87% after 200. The weights of features begin to show the user s different emphasis on them as the number of training examples increases. We only show three of the features in the figure. However, the trends of features are clear, which proves that the agent is capable of learning a user's preferences by our query refinement strategy. 7

8 The strategy of our agent has many advantages. First, relevance categories are not such hard folders; they are merely an add-on to existing categories and could be ignored and used exactly like a normal category without impacting performance; therefore, the errors made by our agent are more likely tolerated by users. Second, based on the simple similarity-computing algorithm, the management of our agent will still be possible in the presence of sparse data. Third, since both high-level and low-level features are extracted, the agent can handle diverse occasions well. Our agent obviously surpasses the traditional classifiers which focus only on the text features in dealing with categories like From her in the above experiment. Fourth, the incorporation of global and local information enables the agent to fit for the various user inboxes that are not well organized. Besides, t he query refinement can be done fast and hence can avoid the problems that most classifiers have regarding to intensive computation at the adjusting stage. 5. CONCLUSION AND FUTURE WORK We present an intelligent agent which can learn from the user s interactions with the system and hence can semiautomatically manage the user s s. The feature that distinguishes our system from the existing retrieval or management approaches is fourfold. First, different features of s are extracted with corresponding similarity assessment methods designed for them. The employment of both high level semantic features and other low level features enables our agent to perform ambidextrously. Second, the adoption of relevance categories for our UI sidesteps some of the common hurdles that its peer systems normally face. Though the concept of relevance categories is really a step back from pure categorization, it allows for multiple or overlapping categories and is more likely to be tolerated by users when classification errors occur. Third, a unique space model is established for each user folder base on both global and local information of its encompassing s. This makes it possible for the agent to fit a user s sorting habits which may be extremely idiosyncratic. Fourth, an efficient query refinement strategy is presented to facilitate the learning process. The next phase is to further refine our space models. For example, noun phrase extraction, better term selection, use of more terms, support for languages other than English and mix languages, variation of test parameters and assumptions, and different similarity metrics might significantly improve the categorization accuracy. Additional work is also required to quantify the performance of current classification algorithms with both test data and user studies. Besides, much work remains to be completed in code enhancements such as latching into more Outlook events, database integration for classifiers, or MS.NET upgrades. Finally, new experiments that integrate classification and information retrieval techniques across and into calendaring, notes, or other types of data may also be explored. REFERENCES 1. overload--facts and figures: an e-mountain of _overload.htm 2. Whittaker S and Sidner C. overload: explo ring personal information management of . SIGCHI 96, pp Pliskin N. Interacting with electronic mail can be a dream or a nightmare: a user s point of view. Interacting with Computers 1(3): Bälter O. Strategies for organizing messages. SIGCHI 97, pp Bälter O. Keystroke level analysis of message organization. SIGCHI 2000, pp Segal R and Kephart J. Incremental learning in SwiftFile. ICML Mock K. An experimental framework for categorization and management. SIGIR Mock K. Dynamic organization via relevance categories. ICTAI Ingebrigsten LM. Gnus network user services Malone TW, Lai KY, and Fry C. Experiments with oval: a radically tailorable tool for cooperative work. ACM TOIS 13(2): Salton G. Automatic Text Processing, Addison-Wesley,

Automated Online News Classification with Personalization

Automated Online News Classification with Personalization Automated Online News Classification with Personalization Chee-Hong Chan Aixin Sun Ee-Peng Lim Center for Advanced Information Systems, Nanyang Technological University Nanyang Avenue, Singapore, 639798

More information

Venice: Content-Based Information Management for Electronic Mail

Venice: Content-Based Information Management for Electronic Mail Venice: Content-Based Information Management for Electronic Mail Kenrick Mock, Kenrick_J_Mock@ccm.jf.intel.com, JF2-74, 264-232 Robert Adams, adams@mailbox.jf.intel.com, JF2-74. 264-9424 Lynice Spangler,

More information

Classification and Summarization: A Machine Learning Approach

Classification and Summarization: A Machine Learning Approach Email Classification and Summarization: A Machine Learning Approach Taiwo Ayodele Rinat Khusainov David Ndzi Department of Electronics and Computer Engineering University of Portsmouth, United Kingdom

More information

Domain Specific Search Engine for Students

Domain Specific Search Engine for Students Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam

More information

Basic Concepts of Reliability

Basic Concepts of Reliability Basic Concepts of Reliability Reliability is a broad concept. It is applied whenever we expect something to behave in a certain way. Reliability is one of the metrics that are used to measure quality.

More information

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Wolfgang Tannebaum, Parvaz Madabi and Andreas Rauber Institute of Software Technology and Interactive Systems, Vienna

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

ABBYY Smart Classifier 2.7 User Guide

ABBYY Smart Classifier 2.7 User Guide ABBYY Smart Classifier 2.7 User Guide Table of Contents Introducing ABBYY Smart Classifier... 4 ABBYY Smart Classifier architecture... 6 About Document Classification... 8 The life cycle of a classification

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 188 CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 6.1 INTRODUCTION Image representation schemes designed for image retrieval systems are categorized into two

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

THE UIGARDEN PROJECT: A BILINGUAL WEBZINE Christina Li, Eleanor Lisney, Sean Liu UiGarden.net

THE UIGARDEN PROJECT: A BILINGUAL WEBZINE Christina Li, Eleanor Lisney, Sean Liu UiGarden.net THE UIGARDEN PROJECT: A BILINGUAL WEBZINE Christina Li, Eleanor Lisney, Sean Liu UiGarden.net http://www.uigarden.net Abstract (EN) uigarden is a bilingual on-line magazine that provides an opportunity

More information

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google, 1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to

More information

NTUBROWS System for NTCIR-7. Information Retrieval for Question Answering

NTUBROWS System for NTCIR-7. Information Retrieval for Question Answering NTUBROWS System for NTCIR-7 Information Retrieval for Question Answering I-Chien Liu, Lun-Wei Ku, *Kuang-hua Chen, and Hsin-Hsi Chen Department of Computer Science and Information Engineering, *Department

More information

A Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System

A Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System A Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System Takashi Yukawa Nagaoka University of Technology 1603-1 Kamitomioka-cho, Nagaoka-shi Niigata, 940-2188 JAPAN

More information

Information Discovery, Extraction and Integration for the Hidden Web

Information Discovery, Extraction and Integration for the Hidden Web Information Discovery, Extraction and Integration for the Hidden Web Jiying Wang Department of Computer Science University of Science and Technology Clear Water Bay, Kowloon Hong Kong cswangjy@cs.ust.hk

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

A Document-centered Approach to a Natural Language Music Search Engine

A Document-centered Approach to a Natural Language Music Search Engine A Document-centered Approach to a Natural Language Music Search Engine Peter Knees, Tim Pohle, Markus Schedl, Dominik Schnitzer, and Klaus Seyerlehner Dept. of Computational Perception, Johannes Kepler

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information

NSL Technical Note TN-10. MECCA: A Message-Enabled Communication and Information System

NSL Technical Note TN-10. MECCA: A Message-Enabled Communication and Information System /"Ark-T- r\ r^ _x-^, -. r~ NOVEMBER 1992 NSL Technical Note TN-10 MECCA: A Message-Enabled Communication and Information System Anita Borg Distribution UnlfmiSd / DUO SUALIJ7 DWBEOTED 20000411 149 SuSuDSD

More information

The 4/5 Upper Bound on the Game Total Domination Number

The 4/5 Upper Bound on the Game Total Domination Number The 4/ Upper Bound on the Game Total Domination Number Michael A. Henning a Sandi Klavžar b,c,d Douglas F. Rall e a Department of Mathematics, University of Johannesburg, South Africa mahenning@uj.ac.za

More information

A Novel PAT-Tree Approach to Chinese Document Clustering

A Novel PAT-Tree Approach to Chinese Document Clustering A Novel PAT-Tree Approach to Chinese Document Clustering Kenny Kwok, Michael R. Lyu, Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, N.T., Hong Kong

More information

Indexing by Shape of Image Databases Based on Extended Grid Files

Indexing by Shape of Image Databases Based on Extended Grid Files Indexing by Shape of Image Databases Based on Extended Grid Files Carlo Combi, Gian Luca Foresti, Massimo Franceschet, Angelo Montanari Department of Mathematics and ComputerScience, University of Udine

More information

Feature Selecting Model in Automatic Text Categorization of Chinese Financial Industrial News

Feature Selecting Model in Automatic Text Categorization of Chinese Financial Industrial News Selecting Model in Automatic Text Categorization of Chinese Industrial 1) HUEY-MING LEE 1 ), PIN-JEN CHEN 1 ), TSUNG-YEN LEE 2) Department of Information Management, Chinese Culture University 55, Hwa-Kung

More information

A User Study on Features Supporting Subjective Relevance for Information Retrieval Interfaces

A User Study on Features Supporting Subjective Relevance for Information Retrieval Interfaces A user study on features supporting subjective relevance for information retrieval interfaces Lee, S.S., Theng, Y.L, Goh, H.L.D., & Foo, S. (2006). Proc. 9th International Conference of Asian Digital Libraries

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

HarePoint HelpDesk for SharePoint. User Guide

HarePoint HelpDesk for SharePoint. User Guide HarePoint HelpDesk for SharePoint For SharePoint Server 2016, SharePoint Server 2013, SharePoint Foundation 2013, SharePoint Server 2010, SharePoint Foundation 2010 User Guide Product version: 16.2.0.0

More information

Image retrieval based on bag of images

Image retrieval based on bag of images University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2009 Image retrieval based on bag of images Jun Zhang University of Wollongong

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

A Semi-Supervised Approach for Web Spam Detection using Combinatorial Feature-Fusion

A Semi-Supervised Approach for Web Spam Detection using Combinatorial Feature-Fusion A Semi-Supervised Approach for Web Spam Detection using Combinatorial Feature-Fusion Ye Tian, Gary M. Weiss, Qiang Ma Department of Computer and Information Science Fordham University 441 East Fordham

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

CHAPTER VII INDEXED K TWIN NEIGHBOUR CLUSTERING ALGORITHM 7.1 INTRODUCTION

CHAPTER VII INDEXED K TWIN NEIGHBOUR CLUSTERING ALGORITHM 7.1 INTRODUCTION CHAPTER VII INDEXED K TWIN NEIGHBOUR CLUSTERING ALGORITHM 7.1 INTRODUCTION Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called cluster)

More information

EMMA: An Management Assistant

EMMA: An  Management Assistant EMMA: An E-Mail Management Assistant Van Ho, Wayne Wobcke and Paul Compton School of Computer Science and Engineering University of New South Wales Sydney NSW 2052, Australia {vanho wobcke compton}@cse.unsw.edu.au

More information

I. INTRODUCTION. Fig Taxonomy of approaches to build specialized search engines, as shown in [80].

I. INTRODUCTION. Fig Taxonomy of approaches to build specialized search engines, as shown in [80]. Focus: Accustom To Crawl Web-Based Forums M.Nikhil 1, Mrs. A.Phani Sheetal 2 1 Student, Department of Computer Science, GITAM University, Hyderabad. 2 Assistant Professor, Department of Computer Science,

More information

VisoLink: A User-Centric Social Relationship Mining

VisoLink: A User-Centric Social Relationship Mining VisoLink: A User-Centric Social Relationship Mining Lisa Fan and Botang Li Department of Computer Science, University of Regina Regina, Saskatchewan S4S 0A2 Canada {fan, li269}@cs.uregina.ca Abstract.

More information

Comment Extraction from Blog Posts and Its Applications to Opinion Mining

Comment Extraction from Blog Posts and Its Applications to Opinion Mining Comment Extraction from Blog Posts and Its Applications to Opinion Mining Huan-An Kao, Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan

More information

Information Extraction Techniques in Terrorism Surveillance

Information Extraction Techniques in Terrorism Surveillance Information Extraction Techniques in Terrorism Surveillance Roman Tekhov Abstract. The article gives a brief overview of what information extraction is and how it might be used for the purposes of counter-terrorism

More information

Math Information Retrieval: User Requirements and Prototype Implementation. Jin Zhao, Min Yen Kan and Yin Leng Theng

Math Information Retrieval: User Requirements and Prototype Implementation. Jin Zhao, Min Yen Kan and Yin Leng Theng Math Information Retrieval: User Requirements and Prototype Implementation Jin Zhao, Min Yen Kan and Yin Leng Theng Why Math Information Retrieval? Examples: Looking for formulas Collect teaching resources

More information

The Effect of Inverse Document Frequency Weights on Indexed Sequence Retrieval. Kevin C. O'Kane. Department of Computer Science

The Effect of Inverse Document Frequency Weights on Indexed Sequence Retrieval. Kevin C. O'Kane. Department of Computer Science The Effect of Inverse Document Frequency Weights on Indexed Sequence Retrieval Kevin C. O'Kane Department of Computer Science The University of Northern Iowa Cedar Falls, Iowa okane@cs.uni.edu http://www.cs.uni.edu/~okane

More information

Image retrieval based on region shape similarity

Image retrieval based on region shape similarity Image retrieval based on region shape similarity Cheng Chang Liu Wenyin Hongjiang Zhang Microsoft Research China, 49 Zhichun Road, Beijing 8, China {wyliu, hjzhang}@microsoft.com ABSTRACT This paper presents

More information

Product Documentation SAP Business ByDesign February Marketing

Product Documentation SAP Business ByDesign February Marketing Product Documentation PUBLIC Marketing Table Of Contents 1 Marketing.... 5 2... 6 3 Business Background... 8 3.1 Target Groups and Campaign Management... 8 3.2 Lead Processing... 13 3.3 Opportunity Processing...

More information

THE TITLE OF MY THESIS GOES HERE: USE ALL CAPS AND PUT THE SUBTITLE ON THE SECOND LINE

THE TITLE OF MY THESIS GOES HERE: USE ALL CAPS AND PUT THE SUBTITLE ON THE SECOND LINE THE TITLE OF MY THESIS GOES HERE: USE ALL CAPS AND PUT THE SUBTITLE ON THE SECOND LINE Student Name (as it appears in One.IU) Submitted to the faculty of the School of Informatics in partial fulfillment

More information

Automatic Query Type Identification Based on Click Through Information

Automatic Query Type Identification Based on Click Through Information Automatic Query Type Identification Based on Click Through Information Yiqun Liu 1,MinZhang 1,LiyunRu 2, and Shaoping Ma 1 1 State Key Lab of Intelligent Tech. & Sys., Tsinghua University, Beijing, China

More information

Semi supervised clustering for Text Clustering

Semi supervised clustering for Text Clustering Semi supervised clustering for Text Clustering N.Saranya 1 Assistant Professor, Department of Computer Science and Engineering, Sri Eshwar College of Engineering, Coimbatore 1 ABSTRACT: Based on clustering

More information

X. A Relevance Feedback System Based on Document Transformations. S. R. Friedman, J. A. Maceyak, and S. F. Weiss

X. A Relevance Feedback System Based on Document Transformations. S. R. Friedman, J. A. Maceyak, and S. F. Weiss X-l X. A Relevance Feedback System Based on Document Transformations S. R. Friedman, J. A. Maceyak, and S. F. Weiss Abstract An information retrieval system using relevance feedback to modify the document

More information

Information Retrieval

Information Retrieval Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have

More information

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Thaer Samar 1, Alejandro Bellogín 2, and Arjen P. de Vries 1 1 Centrum Wiskunde & Informatica, {samar,arjen}@cwi.nl

More information

CMPSCI 646, Information Retrieval (Fall 2003)

CMPSCI 646, Information Retrieval (Fall 2003) CMPSCI 646, Information Retrieval (Fall 2003) Midterm exam solutions Problem CO (compression) 1. The problem of text classification can be described as follows. Given a set of classes, C = {C i }, where

More information

Information Retrieval CSCI

Information Retrieval CSCI Information Retrieval CSCI 4141-6403 My name is Anwar Alhenshiri My email is: anwar@cs.dal.ca I prefer: aalhenshiri@gmail.com The course website is: http://web.cs.dal.ca/~anwar/ir/main.html 5/6/2012 1

More information

2 Experimental Methodology and Results

2 Experimental Methodology and Results Developing Consensus Ontologies for the Semantic Web Larry M. Stephens, Aurovinda K. Gangam, and Michael N. Huhns Department of Computer Science and Engineering University of South Carolina, Columbia,

More information

Performance Improvement of Hardware-Based Packet Classification Algorithm

Performance Improvement of Hardware-Based Packet Classification Algorithm Performance Improvement of Hardware-Based Packet Classification Algorithm Yaw-Chung Chen 1, Pi-Chung Wang 2, Chun-Liang Lee 2, and Chia-Tai Chan 2 1 Department of Computer Science and Information Engineering,

More information

Tessy Frequently Asked Questions (FAQs)

Tessy Frequently Asked Questions (FAQs) Tessy Frequently Asked Questions (FAQs) General Q1 What is the main objective of Tessy? Q2 What is a unit for Tessy? Q3 What is a module for Tessy? Q4 What is unit testing? Q5 What is integration testing?

More information

Data Mining and Data Warehousing Classification-Lazy Learners

Data Mining and Data Warehousing Classification-Lazy Learners Motivation Data Mining and Data Warehousing Classification-Lazy Learners Lazy Learners are the most intuitive type of learners and are used in many practical scenarios. The reason of their popularity is

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Analysis on the technology improvement of the library network information retrieval efficiency

Analysis on the technology improvement of the library network information retrieval efficiency Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(6):2198-2202 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Analysis on the technology improvement of the

More information

System Models. 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models. Nicola Dragoni Embedded Systems Engineering DTU Informatics

System Models. 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models. Nicola Dragoni Embedded Systems Engineering DTU Informatics System Models Nicola Dragoni Embedded Systems Engineering DTU Informatics 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models Architectural vs Fundamental Models Systems that are intended

More information

(Refer Slide Time: 2:20)

(Refer Slide Time: 2:20) Data Communications Prof. A. Pal Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture-15 Error Detection and Correction Hello viewers welcome to today s lecture

More information

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA A taxonomy of race conditions. D. P. Helmbold, C. E. McDowell UCSC-CRL-94-34 September 28, 1994 Board of Studies in Computer and Information Sciences University of California, Santa Cruz Santa Cruz, CA

More information

Predicting Messaging Response Time in a Long Distance Relationship

Predicting Messaging Response Time in a Long Distance Relationship Predicting Messaging Response Time in a Long Distance Relationship Meng-Chen Shieh m3shieh@ucsd.edu I. Introduction The key to any successful relationship is communication, especially during times when

More information

Lecture #3: PageRank Algorithm The Mathematics of Google Search

Lecture #3: PageRank Algorithm The Mathematics of Google Search Lecture #3: PageRank Algorithm The Mathematics of Google Search We live in a computer era. Internet is part of our everyday lives and information is only a click away. Just open your favorite search engine,

More information

A Miniature-Based Image Retrieval System

A Miniature-Based Image Retrieval System A Miniature-Based Image Retrieval System Md. Saiful Islam 1 and Md. Haider Ali 2 Institute of Information Technology 1, Dept. of Computer Science and Engineering 2, University of Dhaka 1, 2, Dhaka-1000,

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab Duwairi Dept. of Computer Information Systems Jordan Univ. of Sc. and Technology Irbid, Jordan rehab@just.edu.jo Khaleifah Al.jada' Dept. of

More information

Improving Range Query Performance on Historic Web Page Data

Improving Range Query Performance on Historic Web Page Data Improving Range Query Performance on Historic Web Page Data Geng LI Lab of Computer Networks and Distributed Systems, Peking University Beijing, China ligeng@net.pku.edu.cn Bo Peng Lab of Computer Networks

More information

Information Retrieval CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Retrieval CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science Information Retrieval CS 6900 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Boolean Retrieval vs. Ranked Retrieval Many users (professionals) prefer

More information

INTRODUCTION. Chapter GENERAL

INTRODUCTION. Chapter GENERAL Chapter 1 INTRODUCTION 1.1 GENERAL The World Wide Web (WWW) [1] is a system of interlinked hypertext documents accessed via the Internet. It is an interactive world of shared information through which

More information

K-Means Clustering Using Localized Histogram Analysis

K-Means Clustering Using Localized Histogram Analysis K-Means Clustering Using Localized Histogram Analysis Michael Bryson University of South Carolina, Department of Computer Science Columbia, SC brysonm@cse.sc.edu Abstract. The first step required for many

More information

Warrick County School Corp.

Warrick County School Corp. Warrick County School Corp. Ou Microsoft Outlook Web Access Guide Getting StartedStyarted Go to the Warrick County School Corp. Home Page (www.warrick.k12.in.us) and click the Web Mail link. Logging In

More information

Comodo Antispam Gateway Software Version 2.1

Comodo Antispam Gateway Software Version 2.1 Comodo Antispam Gateway Software Version 2.1 User Guide Guide Version 2.1.010215 Comodo Security Solutions 1255 Broad Street Clifton, NJ, 07013 Table of Contents 1 Introduction to Comodo Antispam Gateway...

More information

Developing ArXivSI to Help Scientists to Explore the Research Papers in ArXiv

Developing ArXivSI to Help Scientists to Explore the Research Papers in ArXiv Submitted on: 19.06.2015 Developing ArXivSI to Help Scientists to Explore the Research Papers in ArXiv Zhixiong Zhang National Science Library, Chinese Academy of Sciences, Beijing, China. E-mail address:

More information

CHAPTER-26 Mining Text Databases

CHAPTER-26 Mining Text Databases CHAPTER-26 Mining Text Databases 26.1 Introduction 26.2 Text Data Analysis and Information Retrieval 26.3 Basle Measures for Text Retrieval 26.4 Keyword-Based and Similarity-Based Retrieval 26.5 Other

More information

6. Relational Algebra (Part II)

6. Relational Algebra (Part II) 6. Relational Algebra (Part II) 6.1. Introduction In the previous chapter, we introduced relational algebra as a fundamental model of relational database manipulation. In particular, we defined and discussed

More information

The Research on the Method of Process-Based Knowledge Catalog and Storage and Its Application in Steel Product R&D

The Research on the Method of Process-Based Knowledge Catalog and Storage and Its Application in Steel Product R&D The Research on the Method of Process-Based Knowledge Catalog and Storage and Its Application in Steel Product R&D Xiaodong Gao 1,2 and Zhiping Fan 1 1 School of Business Administration, Northeastern University,

More information

NUSIS at TREC 2011 Microblog Track: Refining Query Results with Hashtags

NUSIS at TREC 2011 Microblog Track: Refining Query Results with Hashtags NUSIS at TREC 2011 Microblog Track: Refining Query Results with Hashtags Hadi Amiri 1,, Yang Bao 2,, Anqi Cui 3,,*, Anindya Datta 2,, Fang Fang 2,, Xiaoying Xu 2, 1 Department of Computer Science, School

More information

Tree-Based Minimization of TCAM Entries for Packet Classification

Tree-Based Minimization of TCAM Entries for Packet Classification Tree-Based Minimization of TCAM Entries for Packet Classification YanSunandMinSikKim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington 99164-2752, U.S.A.

More information

Design and Implementation of Search Engine Using Vector Space Model for Personalized Search

Design and Implementation of Search Engine Using Vector Space Model for Personalized Search Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 1, January 2014,

More information

On characterizing BGP routing table growth

On characterizing BGP routing table growth University of Massachusetts Amherst From the SelectedWorks of Lixin Gao 00 On characterizing BGP routing table growth T Bu LX Gao D Towsley Available at: https://works.bepress.com/lixin_gao/66/ On Characterizing

More information

Data Mining and Machine Learning: Techniques and Algorithms

Data Mining and Machine Learning: Techniques and Algorithms Instance based classification Data Mining and Machine Learning: Techniques and Algorithms Eneldo Loza Mencía eneldo@ke.tu-darmstadt.de Knowledge Engineering Group, TU Darmstadt International Week 2019,

More information

In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most

In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most common word appears 10 times then A = rn*n/n = 1*10/100

More information

Mini-Lectures by Section

Mini-Lectures by Section Mini-Lectures by Section BEGINNING AND INTERMEDIATE ALGEBRA, Mini-Lecture 1.1 1. Learn the definition of factor.. Write fractions in lowest terms.. Multiply and divide fractions.. Add and subtract fractions..

More information

CS 6320 Natural Language Processing

CS 6320 Natural Language Processing CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic

More information

3.2 Circle Charts Line Charts Gantt Chart Inserting Gantt charts Adjusting the date section...

3.2 Circle Charts Line Charts Gantt Chart Inserting Gantt charts Adjusting the date section... / / / Page 0 Contents Installation, updates & troubleshooting... 1 1.1 System requirements... 2 1.2 Initial installation... 2 1.3 Installation of an update... 2 1.4 Troubleshooting... 2 empower charts...

More information

Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

More information

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Thaer Samar 1, Alejandro Bellogín 2, and Arjen P. de Vries 1 1 Centrum Wiskunde & Informatica, {samar,arjen}@cwi.nl

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

An Empirical Study of Behavioral Characteristics of Spammers: Findings and Implications

An Empirical Study of Behavioral Characteristics of Spammers: Findings and Implications An Empirical Study of Behavioral Characteristics of Spammers: Findings and Implications Zhenhai Duan, Kartik Gopalan, Xin Yuan Abstract In this paper we present a detailed study of the behavioral characteristics

More information

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Walid Magdy, Gareth J.F. Jones Centre for Next Generation Localisation School of Computing Dublin City University,

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Information Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.

Information Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system. Introduction to Information Retrieval Ethan Phelps-Goodman Some slides taken from http://www.cs.utexas.edu/users/mooney/ir-course/ Information Retrieval (IR) The indexing and retrieval of textual documents.

More information

Data Structure Optimization of AS_PATH in BGP

Data Structure Optimization of AS_PATH in BGP Data Structure Optimization of AS_PATH in BGP Weirong Jiang Research Institute of Information Technology, Tsinghua University, Beijing, 100084, P.R.China jwr2000@mails.tsinghua.edu.cn Abstract. With the

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab M. Duwairi* and Khaleifah Al.jada'** * Department of Computer Information Systems, Jordan University of Science and Technology, Irbid 22110,

More information

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES 6.1 INTRODUCTION The exploration of applications of ANN for image classification has yielded satisfactory results. But, the scope for improving

More information

MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS

MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS J.I. Serrano M.D. Del Castillo Instituto de Automática Industrial CSIC. Ctra. Campo Real km.0 200. La Poveda. Arganda del Rey. 28500

More information

INFSCI 2140 Information Storage and Retrieval Lecture 6: Taking User into Account. Ad-hoc IR in text-oriented DS

INFSCI 2140 Information Storage and Retrieval Lecture 6: Taking User into Account. Ad-hoc IR in text-oriented DS INFSCI 2140 Information Storage and Retrieval Lecture 6: Taking User into Account Peter Brusilovsky http://www2.sis.pitt.edu/~peterb/2140-051/ Ad-hoc IR in text-oriented DS The context (L1) Querying and

More information

An Empirical Performance Comparison of Machine Learning Methods for Spam Categorization

An Empirical Performance Comparison of Machine Learning Methods for Spam  Categorization An Empirical Performance Comparison of Machine Learning Methods for Spam E-mail Categorization Chih-Chin Lai a Ming-Chi Tsai b a Dept. of Computer Science and Information Engineering National University

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval (Supplementary Material) Zhou Shuigeng March 23, 2007 Advanced Distributed Computing 1 Text Databases and IR Text databases (document databases) Large collections

More information

Detection and Extraction of Events from s

Detection and Extraction of Events from  s Detection and Extraction of Events from Emails Shashank Senapaty Department of Computer Science Stanford University, Stanford CA senapaty@cs.stanford.edu December 12, 2008 Abstract I build a system to

More information

Routing and Ad-hoc Retrieval with the. Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers. University of Dortmund, Germany.

Routing and Ad-hoc Retrieval with the. Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers. University of Dortmund, Germany. Routing and Ad-hoc Retrieval with the TREC-3 Collection in a Distributed Loosely Federated Environment Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers University of Dortmund, Germany

More information