An Agent for Semi-automatic Management of s
|
|
- Berniece Shepherd
- 5 years ago
- Views:
Transcription
1 An Agent for Semi-automatic Management of s Fangfang Xia a and Liu Wenyin b a Dept. of Computer Science & Technology, Tsinghua University, Beijing , China b Dept. of Computer Science, City University of Hong Kong, Hong Kong SAR, China xff99@mails.tsinghua.edu.cn; csliuwy@cityu.edu.hk ABSTRACT Recent growth in the use of s for communication and the corresponding growth in the volume of s have made automatic processing of desirable. However, most existing systems failed to work in practice due to low classification accuracy and inconvenient user interfaces. In this paper, we present an adaptive Personal Agent (PEA) which can learn the mail handling preferences of its user and automatically categorize and manage its user s s. One of the key ideas in this approach is extracting both the high-level semantic features (e.g., concept information) from the body text and other low-level features (e.g., sender, time, importance, etc.) from the entire message for similarity assessment based on the standard Information Retrieval (IR) approach. Another main contribution of our work is establishing both global and local information space models for building relevance categories based on the user s folders. Besides, a query refinement strategy is incorporated to make the agent act as an incremental learner. That is, it can adjust its working strategy based on only the new examples and avoid a total re-training using all previous examples. To test the effectiveness of our system, we did experiments on its two main functions, retrieval and relevance categorization and obtained preliminary promising results. Keywords: Overload, Management, Example -based Learning, Information Retrieval, Content-based Retrieval, Relevance Categories, Query Refinement, Personal Agent (PEA) 1. INTRODUCTION The explosion in electronic communication is dramatically changing the way people interact with one another. overload [1,2] has become a growing problem since more and more users are embracing the online technologies in recent years. According to Forrester Research, 7 trillion s are sent per day in 2002 and an estimated 81 percent of organizations that introduced to improve their efficiency now complain that is becoming a victim of its own success. IDC estimates that in 2002 the average business user spends an average of over 2.4 hours a day just dealing with an average of 30 work-related messages [2]. These numbers are still increasing or updated every day. To address the problem of overload, many researchers have done evaluation of some common manual management strategies for s, including Piorritizers, archivers [3], No filers, Spring cleaners, Frequent filers [2], and Folderless cleaners [4]. Whittaker and Sidner [2] have found that a major aim of filing is to reduce the huge number of undifferentiated inbox items into a relatively small set of folders each containing multiple related messages. Balter [5] has developed a mathematical model to illustrate that storage time is the major time consumer for users with more than a thousand stored messages and the best long term strategy is to use folders sparsely (4 to 20) in combination with the search functionality. He suggest those users who want to use folders use agents that can automatically suggest folders for archiving since the agents could help reduce the storage time drastically and a larger number of folders may help reduce the time to retrieve a message. Hence, the early research focused on a variety of machine learning techniques to classify s into folders. Among the famous prototypes, SwiftFile used shortcut buttons to archive messages into folders, but only when initiated by the user [6]. Mock used a nearest-neighbor classifier to group inbox s into categories in his experimental framework [7]. Some projects, such as Enfish Onespace, and Metastorm s infowise, use information retrieval techniques to measure similarities among folders or individual messages [8]. Other companies, such as Abridge, Plumtree, and Tacit, use rules or user-supplied categories to group s. There are also flexible organizers. For example, the Gnus news and mail reading system [9], distributed with recent versions of GNU Emacs has hooks that allow installation of arbitrary programs for filtering and foldering news and mail. Furthermore, there are several open-source readers which could be modified to include a hook for arbitrary classifiers [10]. With the vast amount of interest and research that has been accomplished with automatic categorization, why hasn t the concept been incorporated into existing readers? The current difficulties with automatic 1
2 organization exist in the following aspects. First, the user s folders are usually not well organized and they change over time as new messages are received; this inbox irregularity has set hurdles for accurate classification. Second, most of the learning algorithms are based on statistics, and for the algorithms to perform well, a large amount of data must be on hand; the training time is usually considerable. Third, many of the current algorithms do not learn incrementally : they update by requiring a complete re-training based upon all data, including the original training messages. Fourth, most existing systems provided limited user-oriented functions; they do not allow classification into multiple categories and use imp licit rules that users cannot adjust. In this paper, we focus on the issue of automatic categorization to save the time on archiving (when there are a large number of folders) and present an example-based semi-automatic learning approach for this purpose. A prototype system Personal Agent (PEA) is built based on this approach, which can adapt to an individual user by learning his/her management preferences from the interaction examples between the user and the system. Based on the user s preferences, PEA can automatically categorize and manage his/her incoming and/or stored s. One of the key ideas in this approach is extracting both the high-level semantic features (e.g., concept information) from the text and other low-level features (e.g., sender, time, importance, etc.) from the entire message for similarity assessment. Another main contribution of our work is establishing both global and local information space models for building relevance categories based on the user s folders. Besides, a query refinement strategy is incorporated to make the agent act as an incremental learner. Experiments have shown the effectiveness of the proposed approach. The remainder of this paper is structured as follows. In Section 2, we present our solution of the Personal Agent and describe its system architecture and user interface. We then present the core algorithms and other implementation details in Section 3. We will also show the preliminary experimental results of the agent in Section 4. Finally, we conclude and present some directions for future work. 2. SOLUTIONS Many of the difficulties described with classification may be alleviated through better classifiers, while another way to resolve these difficulties is to sidestep the entire problem with an alternate technology. We adopt one alternate technology, Relevance Categories [8], which addresses some of the same information management issues as automatic classification while avoiding many of the problems discussed in the previous section. In order to utilize as much detail information as possible, we extract all useful features from an message, including sender, receipt, time, topic, body, etc. Different methods are then employed to compute the similarities respectively. The overall similarity between two messages is the weighted sum of these features. Note that, different sets of weights are assigned to the features in different folders. Learning from the user s feedback, the weights can be adjusted automatically to represent more exactly the user s preferences to the diverse features within one folder and thus refine the query of this folder. 2.1 Architecture The architecture of our agent system is shown in Figure 1. The system consists of two components: the user interface, and the core component of Personal Agent. The user interface is divided into four parts: two functional parts and two peripheral ones. The functional parts include an retrieval interface and an classification interface, both of which provide user feedback interfaces. The system configuration part is where the user can set the parameters and manually adjust part of the folder space coefficients. The non-feedback function part consists of some auxiliary functions such as events logging and message filing according to their category. In the core component, we have three spaces, i.e., the weights space, the local information space and the global information space, five modules, namely, the feature extractor, the nearest-neighbor similarity evaluator, the inverted indexer, the matcher and the relevance categorizer, and finally two databases which store low-level features and high-level semantic features, respectively. They work together to perform both function and feedback routines. A typical scenario of the system is as follows. Upon installation of the agent, the feature extractor scans all the s in the user s personal folder; both low-level and high-level features of the s are extracted and the corresponding databases are constructed. Then, the nearest-neighbor similarity evaluator and the inverted indexer work simultaneously. The indexer builds the global information space for each folder according to the existing inbox structure; the evaluator compares s within each folder to set up the local information space and decide the initial weights for the features. Once the three space models are available, the matcher compares the user s query with the local space model of s to 2
3 yield the ret rieval results and the outcome is given in the form of a rank list. The user can denote irrelevant s which are ranked improperly high, and thus the negative feedback is applied. The relevance categorizer is triggered when a new message comes in or the user adjusts the inbox structure, e.g., moving s from one folder to another or creating new folders. In these occasions, the agent first updates its database and space models and then refreshes its classification. The agent learns from user feedbacks by refining inner space models to yield more accurate results in the future. 2.2 User Interface Figure 1. Architecture of the PEA We implement our Personal Agent as an add-in in Microsoft Outlook 2002 on Windows XP. The basic interface is a supplemental command bar which is indicated within the red (or gray) rectangle (containing the Retrieval, Archive, and Settings buttons) in the upper-right part of Figure 2. Upon the first time startup, the scanning process is performed which automatically creates a category out of every folder the user maintains. The messages in the folder are then associated with that category. While the agent is enabled, new s are automatically classified into the best matching folder. They are only grouped together but not moved immediately. The user can view inbox s that are grouped into categories and make the mails really go to their assigned folders simply by clicking the Archive button. When the user manually adjusts the categorization result in the inbox or move mail from one folder to another, relevance feedbacks are provided and the learning process is then 3
4 triggered. In these occasions, the agent will automatically show the accompanying changes it made and the user can cancel some of them. The Retrieval button is used to aid users that wish to search s. This function provides the capability to quickly display a list of messages ranked by relevance (using the similarity metrics) to the selected messages. In this manner, other messages in the same thread or in the same topic will be displayed at the top of the list. The feedback mechanism is also provided for the retrieval function. Finally, the Settings button is for users to access and change the agent s parameters such as constants and feature weights. Users can also enable or disable some non-feedback functions and change the running modes there. Figure 2. User Interface of the PEA 3. ALGORITHMS AND IMPLEMENTATIONS 3.1 Feature Extracting and Similarity Assessment There are two kinds of features that can be used in our agent. One is low-level feature, such as sender, time, importance, etc. The other is high-level semantic feature extracted from the subject and body of an . We compute first the similarity between two e mails at each level and then calculated their weighted sum as the overall similarity. We implement the relevant retrieval functionality of our agent by similarity assessment. All the s are compared with the query one and then sorted in the descending order of their similarities. A high rank usually indicates significant relevancy Low-level features We have extracted eight basic low-level features in our agent. They are sender, recipients, creation time, importance, body format and three Boolean variables (IsRead, IsReplied and IsWithAttachment). To compute the similarity, we also incorporate an additive feature sender-recipients which is useful in some particular occasions. This is not another independent feature; we add it mostly because of the following concern: In a quite frequent occasion, a user wants to keep all his correspondence with a person in the same folder. However, either the sender or recipients feature alone cannot help him. For example, two s, one from A to B and the other from B to A, are obviously related, but the similarities calculated based on sender and recipients are both 0. In such case, the sender-recipients feature mingles the sender and receivers into one set and the similarity calculated on it should be 1. This feature is also useful for work groups. The similarities corresponding to each of the features is computed differently and their detailed calculation methods will be presented in an extended version of this paper. 4
5 3.1.2 High-level features We have extracted two high-level features in our agent. They are subject and body. Since they are both text features, we use the same method to get the comparing results. Our implementation is based upon an inverted index with integrated TF/IDF [11] values. The detailed algorithm will be presented in an extended version of this paper The overall similarity Although there may be many sophisticated similarity assessment methods, we use the simplest similarity models to obtain the overall similarity. With high-level and low-level similarities calculated separately, the overall similarity is simply calculated as the liner combination of them. Note that different folders are assigned with different sets of weights and they are consistent ly refined by user s feedbacks. This is the key point for our agent to gain intelligence and will be further discussed in the following sections. 3.2 Folder Space and Relevance Categories A key function of our agent is to classify s according to existing folders. Section 3.1 gives an algorithm of computing the similarity between two individual messages. In order to assess the similarity between a message and a folder, we should also build a user folder space model, through which the nature of different folders could be well characterized. Many existing systems achieve this goal by assigning each folder a vector compatible with the vector. Since such vector is usually the average of all the s in the folder, its weakness in classifying is obvious as described in Section 1. To utilize as much detail information as possible, we explore both global and local properties of a folder in establishing its space model. (More exactly, folder here should be replaced by relevance category, a concept that will be discussed soon.) Global Information : Global information of a folder is the semantic information of all the messages in that folder (As we shall introduce the relevance categories concept in the following text, the messages in the category linked with this folder should also be included). The messages are concatenated and treated like a single document. The N most frequent terms (either from the body or the subject field) and term frequencies are extracted. (In our agent, N was set to 50 by default.) The resulting terms comprise part of the query for the category that it represents. Note that as the set of messages changes, the queries are simple to update. All that is required is to re-compute the term frequencies. Local Information: Local information of a folder is obtained by the simple nearest-neighbor method. Given a target message to classify, its features are extracted and compared to all messages in the folder using the algorithm introduced in Section 3.1. The top M matches are averaged as the local measure for the category. M was set to 3 by default in our agent. The introduction of local information should be helpful since some users maintain too generic folders (e.g., Projects ) encompassing multiple irrelevant sub-categories. It is also useful when dealing with topic-drift occasions. The basic concept of Relevance Categories [8] is to provide the same functionality as regular folders or categories. Users can assign to categories, or remove them from categories just like they are normally used to. Relevance Categories are initially built based on the existing folders in the user s inbox. When new s come in, they are automatically assigned to one category by our agent. The user can manually correct the wrongly classifications or assign one to multiple categories. In these occasions, our agent will refine the queries based on the feedbacks, trying to approach more precisely to the user s subjective intention. Otherwise, the newly assigned s will be regarded as members of its category from then on, even though their real movements to the destination folders will not be applied until the user explicitly perform the Archive function of our agent. In the computation of the -category similarity, a unique weight vector indicating the user s preference placed on different features is assigned to each category to obtain the weighted feature sum. Apart from the global and local information, this weight vector is another important part of the folder space model, which alone builds up the Weights Space. How to compute the weight vector and adjust it based on user feedbacks thus becomes the central problem in our query refinement strategy. 3.3 Query Refinement Strategy Queries are created for each relevance category. Corresponding to the folder space model, the query refinement strategy for our agent could also be divided into two parts, the global query refinement and the local query refinement. 5
6 Global query refinement is an approach to the precise representation of the global semantic feature of a category. Negative training could be employed for s the user explicitly denotes as not belonging to the category. These might arise in the agent s retrieval function if the user wishes to apply corrective action to highly ranked messages so that they are displayed toward the bottom of the list. To apply negative training, the N most frequent terms are extracted from the negative examples and subtracted from the N most frequent terms from the positive examples. This may result in some terms with negative frequencies. Local query refinement is mainly the adjustment of the weight vector mentioned in Section 3.2. Our agent learns from user feedbacks in order that the weight vector will more and more tally with the user s subjective emphasis on the features. The detailed algorithm is presented in an extended version of this paper. 4. PERFORMANCE EVALUATION In order to test the two main functions of our agent, retrieval and classification, we designed two corresponding experiments. Since the effectiveness of the relevance categories on the purely semantic feature, i.e. our global information space, has been tested by Mock over the Reuters corpus [8], we will only concentrate on the overall performance of our agent on the multi-feature basis. The test data we use are mainly the daily s of the authors. The volume is not very large (about 1000). However, it represents a typical user s situation well. 4.1 Retrieval Accuracy In this experiment, we randomly select a number of s (the number is less than 20, since usually a user does not have the patience to select more than 5 s in each iteration or go over more than 4 iterations) belonging to the same category as query (positive feedback) examples and do retrieval. Since we exactly use 100 s as our ground truth for each query and we also only actually check first 100 s, the value of precision and recall are the same. Therefore, we use the term accuracy to refer to both. The results are show in Figure 3, with the x axis being the number of query (positive feedback) s and the y axis the average retrieval accuracy. As the figure shows, the average accuracy of retrieval exceeds 50% when the number of query s reaches 10. Accuracy Number of Query s Figure 3. Retrieval Accuracy 4.2 Categorization Accuracy and Feature Abilities The second experiment evaluates the performance of the categorizer on learning a user s mail sorting preferences from hand-sorted mails. The input data are six months of the first author s sorted mails. Table 1 shows the folders and distribution of messages in the data set. These data pose an interesting challenge for a learning system. Not only is the distribution of messages in the folders highly non-uniform, but the selection of folders for messages is also strongly idiosyncratic. While the content of the folder FROM HER was exclusively determined by a single keyword match (sender= Arendt ), other folders were not determined by a single keyword match with the from or to fields, but rather by the subjective judgment of the first author of this paper of what folder would be the best mnemonic for later retrieval of the message based on its content, time, recipients, etc. For example, the REMINDER folder only maintains 6
7 s received within the recent week, while the E -MAGAZINE folder contains various HTML messages the first author of this paper subscribed from various websites. In this case, the task of the agent is to learn a model of the user s sorting preferences. Table 1. Hand Archived s in Our Experiments. Folder Name Count Percentage CS % E-MAGAZINE % FROM HER % MISCELLANEOUS % PERSONAL % PHILOSPHY GROOP % PROJECTS % REMINDER % SOCCER % Total Exemples % (a) (b) Figure 4. (a) Categorization Accuracy and (b) Feature Discrimination Abilities The results of this experiment are shown in Figures 4 (a) and (b). Through learning, the agent achieves 82% test accuracy after 100 training examples and 87% after 200. The weights of features begin to show the user s different emphasis on them as the number of training examples increases. We only show three of the features in the figure. However, the trends of features are clear, which proves that the agent is capable of learning a user's preferences by our query refinement strategy. 7
8 The strategy of our agent has many advantages. First, relevance categories are not such hard folders; they are merely an add-on to existing categories and could be ignored and used exactly like a normal category without impacting performance; therefore, the errors made by our agent are more likely tolerated by users. Second, based on the simple similarity-computing algorithm, the management of our agent will still be possible in the presence of sparse data. Third, since both high-level and low-level features are extracted, the agent can handle diverse occasions well. Our agent obviously surpasses the traditional classifiers which focus only on the text features in dealing with categories like From her in the above experiment. Fourth, the incorporation of global and local information enables the agent to fit for the various user inboxes that are not well organized. Besides, t he query refinement can be done fast and hence can avoid the problems that most classifiers have regarding to intensive computation at the adjusting stage. 5. CONCLUSION AND FUTURE WORK We present an intelligent agent which can learn from the user s interactions with the system and hence can semiautomatically manage the user s s. The feature that distinguishes our system from the existing retrieval or management approaches is fourfold. First, different features of s are extracted with corresponding similarity assessment methods designed for them. The employment of both high level semantic features and other low level features enables our agent to perform ambidextrously. Second, the adoption of relevance categories for our UI sidesteps some of the common hurdles that its peer systems normally face. Though the concept of relevance categories is really a step back from pure categorization, it allows for multiple or overlapping categories and is more likely to be tolerated by users when classification errors occur. Third, a unique space model is established for each user folder base on both global and local information of its encompassing s. This makes it possible for the agent to fit a user s sorting habits which may be extremely idiosyncratic. Fourth, an efficient query refinement strategy is presented to facilitate the learning process. The next phase is to further refine our space models. For example, noun phrase extraction, better term selection, use of more terms, support for languages other than English and mix languages, variation of test parameters and assumptions, and different similarity metrics might significantly improve the categorization accuracy. Additional work is also required to quantify the performance of current classification algorithms with both test data and user studies. Besides, much work remains to be completed in code enhancements such as latching into more Outlook events, database integration for classifiers, or MS.NET upgrades. Finally, new experiments that integrate classification and information retrieval techniques across and into calendaring, notes, or other types of data may also be explored. REFERENCES 1. overload--facts and figures: an e-mountain of _overload.htm 2. Whittaker S and Sidner C. overload: explo ring personal information management of . SIGCHI 96, pp Pliskin N. Interacting with electronic mail can be a dream or a nightmare: a user s point of view. Interacting with Computers 1(3): Bälter O. Strategies for organizing messages. SIGCHI 97, pp Bälter O. Keystroke level analysis of message organization. SIGCHI 2000, pp Segal R and Kephart J. Incremental learning in SwiftFile. ICML Mock K. An experimental framework for categorization and management. SIGIR Mock K. Dynamic organization via relevance categories. ICTAI Ingebrigsten LM. Gnus network user services Malone TW, Lai KY, and Fry C. Experiments with oval: a radically tailorable tool for cooperative work. ACM TOIS 13(2): Salton G. Automatic Text Processing, Addison-Wesley,
Automated Online News Classification with Personalization
Automated Online News Classification with Personalization Chee-Hong Chan Aixin Sun Ee-Peng Lim Center for Advanced Information Systems, Nanyang Technological University Nanyang Avenue, Singapore, 639798
More informationVenice: Content-Based Information Management for Electronic Mail
Venice: Content-Based Information Management for Electronic Mail Kenrick Mock, Kenrick_J_Mock@ccm.jf.intel.com, JF2-74, 264-232 Robert Adams, adams@mailbox.jf.intel.com, JF2-74. 264-9424 Lynice Spangler,
More informationClassification and Summarization: A Machine Learning Approach
Email Classification and Summarization: A Machine Learning Approach Taiwo Ayodele Rinat Khusainov David Ndzi Department of Electronics and Computer Engineering University of Portsmouth, United Kingdom
More informationDomain Specific Search Engine for Students
Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam
More informationBasic Concepts of Reliability
Basic Concepts of Reliability Reliability is a broad concept. It is applied whenever we expect something to behave in a certain way. Reliability is one of the metrics that are used to measure quality.
More informationEffect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching
Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Wolfgang Tannebaum, Parvaz Madabi and Andreas Rauber Institute of Software Technology and Interactive Systems, Vienna
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationABBYY Smart Classifier 2.7 User Guide
ABBYY Smart Classifier 2.7 User Guide Table of Contents Introducing ABBYY Smart Classifier... 4 ABBYY Smart Classifier architecture... 6 About Document Classification... 8 The life cycle of a classification
More informationImproving the Efficiency of Fast Using Semantic Similarity Algorithm
International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year
More informationCHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES
188 CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 6.1 INTRODUCTION Image representation schemes designed for image retrieval systems are categorized into two
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationTHE UIGARDEN PROJECT: A BILINGUAL WEBZINE Christina Li, Eleanor Lisney, Sean Liu UiGarden.net
THE UIGARDEN PROJECT: A BILINGUAL WEBZINE Christina Li, Eleanor Lisney, Sean Liu UiGarden.net http://www.uigarden.net Abstract (EN) uigarden is a bilingual on-line magazine that provides an opportunity
More informationIn the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,
1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to
More informationNTUBROWS System for NTCIR-7. Information Retrieval for Question Answering
NTUBROWS System for NTCIR-7 Information Retrieval for Question Answering I-Chien Liu, Lun-Wei Ku, *Kuang-hua Chen, and Hsin-Hsi Chen Department of Computer Science and Information Engineering, *Department
More informationA Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System
A Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System Takashi Yukawa Nagaoka University of Technology 1603-1 Kamitomioka-cho, Nagaoka-shi Niigata, 940-2188 JAPAN
More informationInformation Discovery, Extraction and Integration for the Hidden Web
Information Discovery, Extraction and Integration for the Hidden Web Jiying Wang Department of Computer Science University of Science and Technology Clear Water Bay, Kowloon Hong Kong cswangjy@cs.ust.hk
More informationInformation Retrieval. (M&S Ch 15)
Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion
More informationA Document-centered Approach to a Natural Language Music Search Engine
A Document-centered Approach to a Natural Language Music Search Engine Peter Knees, Tim Pohle, Markus Schedl, Dominik Schnitzer, and Klaus Seyerlehner Dept. of Computational Perception, Johannes Kepler
More informationResPubliQA 2010
SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first
More informationNSL Technical Note TN-10. MECCA: A Message-Enabled Communication and Information System
/"Ark-T- r\ r^ _x-^, -. r~ NOVEMBER 1992 NSL Technical Note TN-10 MECCA: A Message-Enabled Communication and Information System Anita Borg Distribution UnlfmiSd / DUO SUALIJ7 DWBEOTED 20000411 149 SuSuDSD
More informationThe 4/5 Upper Bound on the Game Total Domination Number
The 4/ Upper Bound on the Game Total Domination Number Michael A. Henning a Sandi Klavžar b,c,d Douglas F. Rall e a Department of Mathematics, University of Johannesburg, South Africa mahenning@uj.ac.za
More informationA Novel PAT-Tree Approach to Chinese Document Clustering
A Novel PAT-Tree Approach to Chinese Document Clustering Kenny Kwok, Michael R. Lyu, Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, N.T., Hong Kong
More informationIndexing by Shape of Image Databases Based on Extended Grid Files
Indexing by Shape of Image Databases Based on Extended Grid Files Carlo Combi, Gian Luca Foresti, Massimo Franceschet, Angelo Montanari Department of Mathematics and ComputerScience, University of Udine
More informationFeature Selecting Model in Automatic Text Categorization of Chinese Financial Industrial News
Selecting Model in Automatic Text Categorization of Chinese Industrial 1) HUEY-MING LEE 1 ), PIN-JEN CHEN 1 ), TSUNG-YEN LEE 2) Department of Information Management, Chinese Culture University 55, Hwa-Kung
More informationA User Study on Features Supporting Subjective Relevance for Information Retrieval Interfaces
A user study on features supporting subjective relevance for information retrieval interfaces Lee, S.S., Theng, Y.L, Goh, H.L.D., & Foo, S. (2006). Proc. 9th International Conference of Asian Digital Libraries
More informationTERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES
TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.
More informationHarePoint HelpDesk for SharePoint. User Guide
HarePoint HelpDesk for SharePoint For SharePoint Server 2016, SharePoint Server 2013, SharePoint Foundation 2013, SharePoint Server 2010, SharePoint Foundation 2010 User Guide Product version: 16.2.0.0
More informationImage retrieval based on bag of images
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2009 Image retrieval based on bag of images Jun Zhang University of Wollongong
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationA Semi-Supervised Approach for Web Spam Detection using Combinatorial Feature-Fusion
A Semi-Supervised Approach for Web Spam Detection using Combinatorial Feature-Fusion Ye Tian, Gary M. Weiss, Qiang Ma Department of Computer and Information Science Fordham University 441 East Fordham
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More informationCHAPTER VII INDEXED K TWIN NEIGHBOUR CLUSTERING ALGORITHM 7.1 INTRODUCTION
CHAPTER VII INDEXED K TWIN NEIGHBOUR CLUSTERING ALGORITHM 7.1 INTRODUCTION Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called cluster)
More informationEMMA: An Management Assistant
EMMA: An E-Mail Management Assistant Van Ho, Wayne Wobcke and Paul Compton School of Computer Science and Engineering University of New South Wales Sydney NSW 2052, Australia {vanho wobcke compton}@cse.unsw.edu.au
More informationI. INTRODUCTION. Fig Taxonomy of approaches to build specialized search engines, as shown in [80].
Focus: Accustom To Crawl Web-Based Forums M.Nikhil 1, Mrs. A.Phani Sheetal 2 1 Student, Department of Computer Science, GITAM University, Hyderabad. 2 Assistant Professor, Department of Computer Science,
More informationVisoLink: A User-Centric Social Relationship Mining
VisoLink: A User-Centric Social Relationship Mining Lisa Fan and Botang Li Department of Computer Science, University of Regina Regina, Saskatchewan S4S 0A2 Canada {fan, li269}@cs.uregina.ca Abstract.
More informationComment Extraction from Blog Posts and Its Applications to Opinion Mining
Comment Extraction from Blog Posts and Its Applications to Opinion Mining Huan-An Kao, Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan
More informationInformation Extraction Techniques in Terrorism Surveillance
Information Extraction Techniques in Terrorism Surveillance Roman Tekhov Abstract. The article gives a brief overview of what information extraction is and how it might be used for the purposes of counter-terrorism
More informationMath Information Retrieval: User Requirements and Prototype Implementation. Jin Zhao, Min Yen Kan and Yin Leng Theng
Math Information Retrieval: User Requirements and Prototype Implementation Jin Zhao, Min Yen Kan and Yin Leng Theng Why Math Information Retrieval? Examples: Looking for formulas Collect teaching resources
More informationThe Effect of Inverse Document Frequency Weights on Indexed Sequence Retrieval. Kevin C. O'Kane. Department of Computer Science
The Effect of Inverse Document Frequency Weights on Indexed Sequence Retrieval Kevin C. O'Kane Department of Computer Science The University of Northern Iowa Cedar Falls, Iowa okane@cs.uni.edu http://www.cs.uni.edu/~okane
More informationImage retrieval based on region shape similarity
Image retrieval based on region shape similarity Cheng Chang Liu Wenyin Hongjiang Zhang Microsoft Research China, 49 Zhichun Road, Beijing 8, China {wyliu, hjzhang}@microsoft.com ABSTRACT This paper presents
More informationProduct Documentation SAP Business ByDesign February Marketing
Product Documentation PUBLIC Marketing Table Of Contents 1 Marketing.... 5 2... 6 3 Business Background... 8 3.1 Target Groups and Campaign Management... 8 3.2 Lead Processing... 13 3.3 Opportunity Processing...
More informationTHE TITLE OF MY THESIS GOES HERE: USE ALL CAPS AND PUT THE SUBTITLE ON THE SECOND LINE
THE TITLE OF MY THESIS GOES HERE: USE ALL CAPS AND PUT THE SUBTITLE ON THE SECOND LINE Student Name (as it appears in One.IU) Submitted to the faculty of the School of Informatics in partial fulfillment
More informationAutomatic Query Type Identification Based on Click Through Information
Automatic Query Type Identification Based on Click Through Information Yiqun Liu 1,MinZhang 1,LiyunRu 2, and Shaoping Ma 1 1 State Key Lab of Intelligent Tech. & Sys., Tsinghua University, Beijing, China
More informationSemi supervised clustering for Text Clustering
Semi supervised clustering for Text Clustering N.Saranya 1 Assistant Professor, Department of Computer Science and Engineering, Sri Eshwar College of Engineering, Coimbatore 1 ABSTRACT: Based on clustering
More informationX. A Relevance Feedback System Based on Document Transformations. S. R. Friedman, J. A. Maceyak, and S. F. Weiss
X-l X. A Relevance Feedback System Based on Document Transformations S. R. Friedman, J. A. Maceyak, and S. F. Weiss Abstract An information retrieval system using relevance feedback to modify the document
More informationInformation Retrieval
Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have
More informationBetter Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web
Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Thaer Samar 1, Alejandro Bellogín 2, and Arjen P. de Vries 1 1 Centrum Wiskunde & Informatica, {samar,arjen}@cwi.nl
More informationCMPSCI 646, Information Retrieval (Fall 2003)
CMPSCI 646, Information Retrieval (Fall 2003) Midterm exam solutions Problem CO (compression) 1. The problem of text classification can be described as follows. Given a set of classes, C = {C i }, where
More informationInformation Retrieval CSCI
Information Retrieval CSCI 4141-6403 My name is Anwar Alhenshiri My email is: anwar@cs.dal.ca I prefer: aalhenshiri@gmail.com The course website is: http://web.cs.dal.ca/~anwar/ir/main.html 5/6/2012 1
More information2 Experimental Methodology and Results
Developing Consensus Ontologies for the Semantic Web Larry M. Stephens, Aurovinda K. Gangam, and Michael N. Huhns Department of Computer Science and Engineering University of South Carolina, Columbia,
More informationPerformance Improvement of Hardware-Based Packet Classification Algorithm
Performance Improvement of Hardware-Based Packet Classification Algorithm Yaw-Chung Chen 1, Pi-Chung Wang 2, Chun-Liang Lee 2, and Chia-Tai Chan 2 1 Department of Computer Science and Information Engineering,
More informationTessy Frequently Asked Questions (FAQs)
Tessy Frequently Asked Questions (FAQs) General Q1 What is the main objective of Tessy? Q2 What is a unit for Tessy? Q3 What is a module for Tessy? Q4 What is unit testing? Q5 What is integration testing?
More informationData Mining and Data Warehousing Classification-Lazy Learners
Motivation Data Mining and Data Warehousing Classification-Lazy Learners Lazy Learners are the most intuitive type of learners and are used in many practical scenarios. The reason of their popularity is
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationAnalysis on the technology improvement of the library network information retrieval efficiency
Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(6):2198-2202 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Analysis on the technology improvement of the
More informationSystem Models. 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models. Nicola Dragoni Embedded Systems Engineering DTU Informatics
System Models Nicola Dragoni Embedded Systems Engineering DTU Informatics 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models Architectural vs Fundamental Models Systems that are intended
More information(Refer Slide Time: 2:20)
Data Communications Prof. A. Pal Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture-15 Error Detection and Correction Hello viewers welcome to today s lecture
More informationA taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA
A taxonomy of race conditions. D. P. Helmbold, C. E. McDowell UCSC-CRL-94-34 September 28, 1994 Board of Studies in Computer and Information Sciences University of California, Santa Cruz Santa Cruz, CA
More informationPredicting Messaging Response Time in a Long Distance Relationship
Predicting Messaging Response Time in a Long Distance Relationship Meng-Chen Shieh m3shieh@ucsd.edu I. Introduction The key to any successful relationship is communication, especially during times when
More informationLecture #3: PageRank Algorithm The Mathematics of Google Search
Lecture #3: PageRank Algorithm The Mathematics of Google Search We live in a computer era. Internet is part of our everyday lives and information is only a click away. Just open your favorite search engine,
More informationA Miniature-Based Image Retrieval System
A Miniature-Based Image Retrieval System Md. Saiful Islam 1 and Md. Haider Ali 2 Institute of Information Technology 1, Dept. of Computer Science and Engineering 2, University of Dhaka 1, 2, Dhaka-1000,
More informationEnhancing Cluster Quality by Using User Browsing Time
Enhancing Cluster Quality by Using User Browsing Time Rehab Duwairi Dept. of Computer Information Systems Jordan Univ. of Sc. and Technology Irbid, Jordan rehab@just.edu.jo Khaleifah Al.jada' Dept. of
More informationImproving Range Query Performance on Historic Web Page Data
Improving Range Query Performance on Historic Web Page Data Geng LI Lab of Computer Networks and Distributed Systems, Peking University Beijing, China ligeng@net.pku.edu.cn Bo Peng Lab of Computer Networks
More informationInformation Retrieval CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Information Retrieval CS 6900 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Boolean Retrieval vs. Ranked Retrieval Many users (professionals) prefer
More informationINTRODUCTION. Chapter GENERAL
Chapter 1 INTRODUCTION 1.1 GENERAL The World Wide Web (WWW) [1] is a system of interlinked hypertext documents accessed via the Internet. It is an interactive world of shared information through which
More informationK-Means Clustering Using Localized Histogram Analysis
K-Means Clustering Using Localized Histogram Analysis Michael Bryson University of South Carolina, Department of Computer Science Columbia, SC brysonm@cse.sc.edu Abstract. The first step required for many
More informationWarrick County School Corp.
Warrick County School Corp. Ou Microsoft Outlook Web Access Guide Getting StartedStyarted Go to the Warrick County School Corp. Home Page (www.warrick.k12.in.us) and click the Web Mail link. Logging In
More informationComodo Antispam Gateway Software Version 2.1
Comodo Antispam Gateway Software Version 2.1 User Guide Guide Version 2.1.010215 Comodo Security Solutions 1255 Broad Street Clifton, NJ, 07013 Table of Contents 1 Introduction to Comodo Antispam Gateway...
More informationDeveloping ArXivSI to Help Scientists to Explore the Research Papers in ArXiv
Submitted on: 19.06.2015 Developing ArXivSI to Help Scientists to Explore the Research Papers in ArXiv Zhixiong Zhang National Science Library, Chinese Academy of Sciences, Beijing, China. E-mail address:
More informationCHAPTER-26 Mining Text Databases
CHAPTER-26 Mining Text Databases 26.1 Introduction 26.2 Text Data Analysis and Information Retrieval 26.3 Basle Measures for Text Retrieval 26.4 Keyword-Based and Similarity-Based Retrieval 26.5 Other
More information6. Relational Algebra (Part II)
6. Relational Algebra (Part II) 6.1. Introduction In the previous chapter, we introduced relational algebra as a fundamental model of relational database manipulation. In particular, we defined and discussed
More informationThe Research on the Method of Process-Based Knowledge Catalog and Storage and Its Application in Steel Product R&D
The Research on the Method of Process-Based Knowledge Catalog and Storage and Its Application in Steel Product R&D Xiaodong Gao 1,2 and Zhiping Fan 1 1 School of Business Administration, Northeastern University,
More informationNUSIS at TREC 2011 Microblog Track: Refining Query Results with Hashtags
NUSIS at TREC 2011 Microblog Track: Refining Query Results with Hashtags Hadi Amiri 1,, Yang Bao 2,, Anqi Cui 3,,*, Anindya Datta 2,, Fang Fang 2,, Xiaoying Xu 2, 1 Department of Computer Science, School
More informationTree-Based Minimization of TCAM Entries for Packet Classification
Tree-Based Minimization of TCAM Entries for Packet Classification YanSunandMinSikKim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington 99164-2752, U.S.A.
More informationDesign and Implementation of Search Engine Using Vector Space Model for Personalized Search
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 1, January 2014,
More informationOn characterizing BGP routing table growth
University of Massachusetts Amherst From the SelectedWorks of Lixin Gao 00 On characterizing BGP routing table growth T Bu LX Gao D Towsley Available at: https://works.bepress.com/lixin_gao/66/ On Characterizing
More informationData Mining and Machine Learning: Techniques and Algorithms
Instance based classification Data Mining and Machine Learning: Techniques and Algorithms Eneldo Loza Mencía eneldo@ke.tu-darmstadt.de Knowledge Engineering Group, TU Darmstadt International Week 2019,
More informationIn = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most
In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most common word appears 10 times then A = rn*n/n = 1*10/100
More informationMini-Lectures by Section
Mini-Lectures by Section BEGINNING AND INTERMEDIATE ALGEBRA, Mini-Lecture 1.1 1. Learn the definition of factor.. Write fractions in lowest terms.. Multiply and divide fractions.. Add and subtract fractions..
More informationCS 6320 Natural Language Processing
CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic
More information3.2 Circle Charts Line Charts Gantt Chart Inserting Gantt charts Adjusting the date section...
/ / / Page 0 Contents Installation, updates & troubleshooting... 1 1.1 System requirements... 2 1.2 Initial installation... 2 1.3 Installation of an update... 2 1.4 Troubleshooting... 2 empower charts...
More informationChapter 15 Introduction to Linear Programming
Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of
More informationBetter Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web
Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Thaer Samar 1, Alejandro Bellogín 2, and Arjen P. de Vries 1 1 Centrum Wiskunde & Informatica, {samar,arjen}@cwi.nl
More informationDomain-specific Concept-based Information Retrieval System
Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical
More informationThe Encoding Complexity of Network Coding
The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network
More informationDesigning and Building an Automatic Information Retrieval System for Handling the Arabic Data
American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far
More informationAn Empirical Study of Behavioral Characteristics of Spammers: Findings and Implications
An Empirical Study of Behavioral Characteristics of Spammers: Findings and Implications Zhenhai Duan, Kartik Gopalan, Xin Yuan Abstract In this paper we present a detailed study of the behavioral characteristics
More informationApplying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task
Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Walid Magdy, Gareth J.F. Jones Centre for Next Generation Localisation School of Computing Dublin City University,
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationInformation Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.
Introduction to Information Retrieval Ethan Phelps-Goodman Some slides taken from http://www.cs.utexas.edu/users/mooney/ir-course/ Information Retrieval (IR) The indexing and retrieval of textual documents.
More informationData Structure Optimization of AS_PATH in BGP
Data Structure Optimization of AS_PATH in BGP Weirong Jiang Research Institute of Information Technology, Tsinghua University, Beijing, 100084, P.R.China jwr2000@mails.tsinghua.edu.cn Abstract. With the
More informationEnhancing Cluster Quality by Using User Browsing Time
Enhancing Cluster Quality by Using User Browsing Time Rehab M. Duwairi* and Khaleifah Al.jada'** * Department of Computer Information Systems, Jordan University of Science and Technology, Irbid 22110,
More informationCHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES
CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES 6.1 INTRODUCTION The exploration of applications of ANN for image classification has yielded satisfactory results. But, the scope for improving
More informationMODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS
MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS J.I. Serrano M.D. Del Castillo Instituto de Automática Industrial CSIC. Ctra. Campo Real km.0 200. La Poveda. Arganda del Rey. 28500
More informationINFSCI 2140 Information Storage and Retrieval Lecture 6: Taking User into Account. Ad-hoc IR in text-oriented DS
INFSCI 2140 Information Storage and Retrieval Lecture 6: Taking User into Account Peter Brusilovsky http://www2.sis.pitt.edu/~peterb/2140-051/ Ad-hoc IR in text-oriented DS The context (L1) Querying and
More informationAn Empirical Performance Comparison of Machine Learning Methods for Spam Categorization
An Empirical Performance Comparison of Machine Learning Methods for Spam E-mail Categorization Chih-Chin Lai a Ming-Chi Tsai b a Dept. of Computer Science and Information Engineering National University
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval (Supplementary Material) Zhou Shuigeng March 23, 2007 Advanced Distributed Computing 1 Text Databases and IR Text databases (document databases) Large collections
More informationDetection and Extraction of Events from s
Detection and Extraction of Events from Emails Shashank Senapaty Department of Computer Science Stanford University, Stanford CA senapaty@cs.stanford.edu December 12, 2008 Abstract I build a system to
More informationRouting and Ad-hoc Retrieval with the. Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers. University of Dortmund, Germany.
Routing and Ad-hoc Retrieval with the TREC-3 Collection in a Distributed Loosely Federated Environment Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers University of Dortmund, Germany
More information