2 Approaches to worldwide web information retrieval
|
|
- Noah Bell
- 5 years ago
- Views:
Transcription
1 The WEBFIND tool for finding scientific papers over the worldwide web. Alvaro E. Monge and Charles P. Elkan Department of Computer Science and Engineering University of California, San Diego La Jolla, California Phone: (619) Fax: (619) Introduction Information retrieval in the worldwide web environment poses unique challenges. The worldwide web is a distributed, always changing, and ever expanding collection of documents. These features of the web make it difficult to find information about a specific topic. The most common approaches involve indexing, but indexes introduce centralization and can never be up-to-date. Available information retrieval software has been designed for very different environments with typical tools [Salton and McGill, 1983; Salton and Buckley, 1988] working on an unchanging corpus, with the entire corpus available for direct access. This paper describes WEBFIND, an application that discovers scientific papers made available by their authors on the web. WEBFIND uses a novel approach to performing information retrieval on the worldwide web. The approach is to use a combination of external information sources as a guide for locating where to look for information on the web. The external information sources used by WEBFIND are MELVYL and NETFIND. MELVYL is a University of California library service that includes comprehensive databases of bibliographic records, including a science and engineering database called INSPEC [University of California, 1996]. NETFIND is a white pages service that gives internet host addresses and people s addresses [Schwartz and Pu, 1994]. Separately, these services do not provide enough information to locate papers on the web. WEBFIND integrates the information provided by each in order to find a path for discovering on the web the information actually wanted by a user. 2 Approaches to worldwide web information retrieval The most common approach to resource discovery over the web is to use an index to store information about web documents. This approach involves periodic automatic searching of the web and gathering of information about all the documents found in these searches. AltaVista [Digital Equipment Corporation, 1996], WebCrawler [Pinkerton, 1994], Lycos [Mauldin and Leavitt, 1994], Infoseek [Randall, 1995], and Inktomi [Brewer and Gauthier, 1995] are the most important examples of applications which use this indexing approach. WebCrawler can also use its own index to suggest starting points for online searches. The main alternative to resource discovery based on offline indexing is to perform automated online searching. Such online searching requires sophisticated heuristic reasoning to be sufficiently focused. The most developed example of this approach is the so-called Internet Softbot [Etzioni and Weld, 1994]. The
2 Softbot is a software agent that transforms a user s query into a goal and applies a planning algorithm to generate a sequence of actions that should achieve the goal. The Softbot planner possesses extensive knowledge (some acquired through learning) about the information sources available to it. The WEBFIND approach to resource discovery is similar to the WebCrawler and Softbot approaches, in that WEBFIND performs online searching of the web. However, unlike the WebCrawler, the starting points used by WEBFIND are suggested by inference from information provided by reliable external sources, not by a precomputed index of the web. Unlike the Softbot, WEBFIND uses application-specific algorithms for reasoning with its information sources. In principle a planning algorithm could generate the reasoning algorithm used by WEBFIND, but in practice the WEBFIND algorithms are more sophisticated than it is feasible to synthesize automatically. 3 The design of WEBFIND This section describes the protocol followed by WEBFIND when retrieving a scientific paper over the worldwide web. The two main phases are, first, integrating information provided by INSPEC and NETFIND, and second, discovering a worldwide web server, an author s home page, and finally the location of the wanted paper. 3.1 INSPEC and NETFIND integration A WEBFIND search starts with the user providing keywords to identify the paper, exactly as he or she would in searching INSPEC directly. A paper can be identified using any combination of the names of its authors, words from its title or abstract, or other bibliographic information. After the user confirms that the right paper has been identified, WEBFIND queries INSPEC to find the institutional affiliation of the principal author of the paper. Then, WEBFIND uses NETFIND to provide the internet address of a host computer with the same institutional affiliation. A query to NETFIND is a set of keywords describing an affiliation. Useful keywords are typically words in the name of the institution or in the name of the city, state, and/or country where it is located [Schwartz and Pu, 1994]. The NETFIND query engine is incapable of processing abbreviations, so WEBFIND chooses full words found in affiliation given by INSPEC, with a few common abbreviations expanded, such as univ. to university. In general, the result of a NETFIND query is all hosts whose affiliation contains the keywords in the query. There can be many such hosts, and WEBFIND must determine which of them is best. Since institutions are designated very differently in INSPEC and NETFIND, it is non-trivial to decide when an INSPEC institution corresponds to a NETFIND institution. WEBFIND uses the recursive field matching algorithm described in Monge and Elkan [1996] to do this. The algorithm returns a score between 0:0 and 1:0, where 1:0 means certain equivalence and 0:0 means certain non-equivalence. The internet host selected is the one whose NETFIND affiliation has the highest matching score with the INSPEC affiliation. 3.2 Discovery phase The searching of the worldwide web done by WEBFIND is real-time in two senses. First, the search takes place while the user is waiting for a response to his or her query. Second, information gathered from one retrieved document is analyzed and used to guide what documents are retrieved next.
3 The first step in the discovery phase is to find a worldwide web server on the chosen internet host. This step uses heuristics based on common patterns for naming servers. The most widely used convention is to use the prefix www. or www-. WEBFIND tests the existence of a server named with either of these prefixes by calling the Unix ping utility. If either prefix yields a server, then WEBFIND continues with the next step of the discovery phase. Otherwise, WEBFIND strips off the first segment of the internet host name and applies the same heuristics again. For example, cs.ucsd.edu is transformed to ucsd.edu and then the potential servers and www-ucsd.edu are pinged. Once a worldwide web server has been identified, WEBFIND follows links until the wanted article is found. This search proceeds in two stages: find a web page for the principal author, and find a web page that is the wanted article. Each stage of the search uses a priority queue whose entries are candidate links to follow. The priority of each link in the queue is equal to the estimated relevance of the link. For the first stage, the priority queue initially has a single link, the link for the main page of the server. For the second stage, the priority queue initially contains just the result of the first stage. When a link is added to the priority queue, its relevance is estimated using the recursive field matching algorithm applied to the context of the link, and each of two sets of keywords, a primary set and a secondary set. The context of a link is its anchor text and the two lines before and two after the line containing the link, provided no other link appears in those lines. Links are ranked lexicographically, first using degree of match to the primary set, and then using degree of match to the secondary set. In the first stage of search, the primary set of keywords is the name of the principal author, while the secondary set is fstaff, people, facultyg. Intuitively, the main objective is to find a home page for the author, while the fall-back objective is to find a page with a list of people at the institution. In the second stage of search, the primary set of keywords is the title of the wanted article, while the secondary set has keywords fpublications, papers, reportsg. Here, the main objective is to find the actual wanted paper, while the fall-back objective is to find a page with pointers to papers in general. At each stage, the search procedure is to repeatedly remove the first link from the priority queue, and to retrieve the pointed-to web page. The search succeeds when this page is the wanted page. The search fails when the queue is in fact empty. If the page is not the wanted page, all links on it are added to the priority queue with their relevance estimated as just described. Even if either stage of search fails, the user still receives useful information. If the first stage fails, the user is given the web page of the author s institution. If the second stage fails, the user is given the web page of the author s institution and the author s own home page. 4 Experimental results This section reports on experiments performed with the initial implementation of WEBFIND. The aim of the experiments was to identify which aspects of this first version of WEBFIND are the limiting factors in its ability to locate authors and their papers on the worldwide web. Figure 1 shows an example of a WEBFIND discovery session. The experiments discussed here used queries in different areas of computer science concerning papers by authors at ten different institutions.
4 Figure 1: Results from a WEBFIND discovery session Dept. of Comput. Sci. & Eng., California Univ., San Diego, La Jolla, CA, USA Dept. of Cognitive Sci., California Univ., San Diego, La Jolla, CA, USA Dept. of Electr. Eng. & Comput. Sci., California Univ., Berkeley, CA, USA Dept. Comput. Sci. Eng., Washington Univ., Seattle, WA, USA Lab. for Comput. Sci., MIT, Cambridge, MA, USA Dept. of Comput. Sci., Cornell Univ., Ithaca, NY, USA Dept. of Comput. Sci., Texas Univ., Austin, TX, USA Dept. of Electr. Eng. & Comput. Sci., Illinois Univ., Chicago, IL, USA Dept. of Comput. Sci., Waterloo Univ., Ont., Canada Dept. of Comput. Sci., Columbia Univ., New York, NY, USA INSPEC author affiliations. We report on the ability of WEBFIND to map affiliations to internet hosts, to discover worldwide web servers, to discover home pages for authors, and finally to discover the wanted paper. WEBFIND correctly associated eight of the ten INSPEC affiliations to internet hosts in NETFIND. The first affiliation that WEBFIND did not correctly identify was Dept. of Cognitive Sci., California Univ., San
5 Diego. The reason for this failure was that NETFIND does not have an entry for this department, although its internet host is cogsci.ucsd.edu. In future work we intend to quantify the comprehensiveness of the coverage of NETFIND, and if necessary we will extend WEBFIND to use additional white page resources. The other affiliation that WEBFIND did not find a correct host for was Lab. for Comput. Sci., MIT. The reason here is that fifteen different internet hosts all have a NETFIND description equivalent to Lab. for Comput. Sci., MIT. Each of these hosts corresponds to a different research group (for example cag.lcs.mit.edu belongs to the computer architecture group) but this information is not available in either the INSPEC or NETFIND affiliation descriptions. The next version of WEBFIND will overcome this problem by adding keywords to the INSPEC and/or the NETFIND affiliations, if necessary. Added INSPEC affiliation keywords will be subject keywords, while added NETFIND affiliation keywords will be host name segments. For example, adding the subject keywords computer architecture to Lab. for Comput. Sci., MIT would give a specific match to the host name cag.lcs.mit.edu. Note that this will often involve matching of abbreviations, e.g. of cag and computer architecture. Of the eight internet hosts that WEBFIND found correctly, there was only one that it could not find a worldwide web server for. Given the simple heuristic used for finding a server (Section 3.2), this is encouraging. WEBFIND found the home page for five principal authors on the seven worldwide web servers it searched. The other two principal authors did not have home pages of any kind on the servers found by WEBFIND. In these two cases, the authors were no longer affiliated with the institution that INSPEC provided. We will solve this problem in the next version of WEBFIND by using the most recent information that INSPEC can provide. Finally, WEBFIND successfully discovered two papers starting from the five author s home pages found. The low rate is due to the type of author pages which were discovered. Two of the five pages were not personal home pages, but rather they were annual reports or research statements which did not provide any outgoing links, so the wanted papers were not in fact available through their authors home pages. In summary, our experiments show that WEBFIND is successful at finding worldwide web servers and finding web pages designated for authors. WEBFIND is less successful at finding actual papers, most of all because many authors have not yet published their papers on the worldwide web. 5 Conclusion This paper describes a novel approach to the task of finding information relevant to a user s inquiry on the worldwide web. Existing approaches, namely the indexing of worldwide web pages, are plagued with problems caused by the size and distributed, dynamic nature of the worldwide web. Our approach uses external information sources to restrict the part of the worldwide web which is searched. This integration requires a flexible heuristic algorithm for detecting equivalence between alternative ways of writing the names of entities such as people and institutions. A first experimental evaluation indicates that our approach is effective, and that its present limitations are not fundamental. References [Brewer and Gauthier, 1995] Eric Brewer and Paul Gauthier. Inktomi search engine. URL,
6 [Digital Equipment Corporation, 1996] Digital Equipment Corporation. AltaVista search engine. URL, [Etzioni and Weld, 1994] Oren Etzioni and Daniel Weld. A softbot-based interface to the internet. Communications of the ACM, 37(7):72 76, July [Mauldin and Leavitt, 1994] Michael Mauldin and John Leavitt. Web-agent related research at the center for machine translation. In Proceedings of the ACM Special Interest Group on Networked Information Discovery and Retrieval, McLean, VA, August [Monge and Elkan, 1996] Alvaro E. Monge and Charles P. Elkan. The field matching problem: Algorithms and applications. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, To appear. [Pinkerton, 1994] Brian Pinkerton. Finding what people want: Experiences with the WebCrawler. In Electronic Proceedings of the Second International Conference on the World Wide Web, Chicago, October Elsevier Science BV. [Randall, 1995] Neil Randall. The search engine that could. (locating world wide web sites through search engines). PC/Computing, 8(9):165 (4 pages), September [Salton and Buckley, 1988] Gerard Salton and Chris Buckley. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5): , [Salton and McGill, 1983] Gerard Salton and Michael J. McGill. Retrieval. McGraw Hill, Introduction to Modern Information [Schwartz and Pu, 1994] Michael Schwartz and Calton Pu. Applying an information gathering architecture to Netfind: a white pages tool for a changing and growing internet. Technical Report 5, Department of Computer Science, University of Colorado, October [University of California, 1996] Division of Library Automation University of California. Melvyl system welcome page. URL, May
A Complete Bibliography of the Proceedings Volumes of the ACM Symposia on the Theory of Computing ( )
A Complete Bibliography of the Proceedings Volumes of the ACM Symposia on the Theory of Computing (1970 1997) Nelson H. F. Beebe University of Utah Department of Mathematics, 110 LCB 155 S 1400 E RM 233
More informationFull-text scientific databases, electronic resources and improving the quality of scientific research
Full-text scientific databases, electronic resources and improving the quality of scientific research Tamila Mirkamalova EBSCO Information Services Training Specialist Phone:+ 420 234 700 600 Mobile: +
More informationWeb Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India
Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the
More informationSolutions for Homework #1
Solutions for Homework #1 EE1: Introduction to Communication Networks (Fall 6) Department of Electrical Engineering and Computer Sciences College of Engineering University of California, Berkeley Vern
More informationDomain Specific Search Engine for Students
Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam
More informationDistributed Indexing of the Web Using Migrating Crawlers
Distributed Indexing of the Web Using Migrating Crawlers Odysseas Papapetrou cs98po1@cs.ucy.ac.cy Stavros Papastavrou stavrosp@cs.ucy.ac.cy George Samaras cssamara@cs.ucy.ac.cy ABSTRACT Due to the tremendous
More informationA New Technique to Optimize User s Browsing Session using Data Mining
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationSemantic Scholar. ICSTI Towards a More Efficient Review of Research Literature 11 September
Semantic Scholar ICSTI Towards a More Efficient Review of Research Literature 11 September 2018 Allen Institute for Artificial Intelligence (https://allenai.org/) Non-profit Research Institute in Seattle,
More informationDiscovering Relevant Scientific Literature on The Web
Kurt D. Bollacker, Steve Lawrence, C. Lee Giles. Discovering Relevant Scientific Literature on The Web, IEEE Intelligent Systems, Volume 15, Number 2, pp. 42 47, 2000. Discovering Relevant Scientific Literature
More informationUsing Statistical Properties of Text to Create. Metadata. Computer Science and Electrical Engineering Department
Using Statistical Properties of Text to Create Metadata Grace Crowder crowder@cs.umbc.edu Charles Nicholas nicholas@cs.umbc.edu Computer Science and Electrical Engineering Department University of Maryland
More informationAn Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst
An Evaluation of Information Retrieval Accuracy with Simulated OCR Output W.B. Croft y, S.M. Harding y, K. Taghva z, and J. Borsack z y Computer Science Department University of Massachusetts, Amherst
More informationA COMPARATIVE STUDY OF BYG SEARCH ENGINES
American Journal of Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-2, Issue-4, pp-39-43 www.ajer.us Research Paper Open Access A COMPARATIVE STUDY OF BYG SEARCH ENGINES Kailash
More informationijade Reporter An Intelligent Multi-agent Based Context Aware News Reporting System
ijade Reporter An Intelligent Multi-agent Based Context Aware Reporting System Eddie C.L. Chan and Raymond S.T. Lee The Department of Computing, The Hong Kong Polytechnic University, Hung Hong, Kowloon,
More informationDATA POSITION AND PROFILING IN DOMAIN-INDEPENDENT WAREHOUSE CLEANING
DATA POSITION AND PROFILING IN DOMAIN-INDEPENDENT WAREHOUSE CLEANING C.I. Ezeife School of Computer Science, University of Windsor Windsor, Ontario, Canada N9B 3P4 cezeife@uwindsor.ca A. O. Udechukwu,
More informationCisco Service Control Online Advertising Solution Guide: Behavioral. Profile Creation Using Traffic Mirroring, Release 4.0.x
CISCO SERVICE CONTROL SOLUTION GUIDE Cisco Service Control Online Advertising Solution Guide: Behavioral Profile Creation Using Traffic Mirroring, Release 4.0.x 1 Overview 2 Configuring Traffic Mirroring
More informationA Two-stage Crawler for Efficiently Harvesting Deep-Web Interfaces
A Two-stage Crawler for Efficiently Harvesting Deep-Web Interfaces Md. Nazeem Ahmed MTech(CSE) SLC s Institute of Engineering and Technology Adavelli ramesh Mtech Assoc. Prof Dep. of computer Science SLC
More informationGuiding People to Information: Providing an Interface to a Digital Library Using Reference as a Basis for Indexing
Guiding People to Information: Providing an Interface to a Digital Library Using Reference as a Basis for Indexing Shannon Bradshaw, Andrei Scheinkman, and Kristian Hammond Intelligent Information Laboratory
More informationDiscoverySpace: Crowdsourced Suggestions Onboard Novices in Complex Software
DiscoverySpace: Crowdsourced Suggestions Onboard Novices in Complex Software C. Ailie Fraser Scott Klemmer Abstract The Design Lab The Design Lab UC San Diego UC San Diego La Jolla, CA 92092, USA La Jolla,
More informationEffective Tweet Contextualization with Hashtags Performance Prediction and Multi-Document Summarization
Effective Tweet Contextualization with Hashtags Performance Prediction and Multi-Document Summarization Romain Deveaud 1 and Florian Boudin 2 1 LIA - University of Avignon romain.deveaud@univ-avignon.fr
More informationA Parallel Computing Architecture for Information Processing Over the Internet
A Parallel Computing Architecture for Information Processing Over the Internet Wendy A. Lawrence-Fowler, Xiannong Meng, Richard H. Fowler, Zhixiang Chen Department of Computer Science, University of Texas
More informationAutomated Clustering-Based Workload Characterization
Automated Clustering-Based Worload Characterization Odysseas I. Pentaalos Daniel A. MenascŽ Yelena Yesha Code 930.5 Dept. of CS Dept. of EE and CS NASA GSFC Greenbelt MD 2077 George Mason University Fairfax
More informationAutomated Tagging for Online Q&A Forums
1 Automated Tagging for Online Q&A Forums Rajat Sharma, Nitin Kalra, Gautam Nagpal University of California, San Diego, La Jolla, CA 92093, USA {ras043, nikalra, gnagpal}@ucsd.edu Abstract Hashtags created
More informationSearch Quality. Jan Pedersen 10 September 2007
Search Quality Jan Pedersen 10 September 2007 Outline The Search Landscape A Framework for Quality RCFP Search Engine Architecture Detailed Issues 2 Search Landscape 2007 Source: Search Engine Watch: US
More informationB.2 Measures of Central Tendency and Dispersion
Appendix B. Measures of Central Tendency and Dispersion B B. Measures of Central Tendency and Dispersion What you should learn Find and interpret the mean, median, and mode of a set of data. Determine
More informationCPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation
CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation Stéphane Zuckerman Computer Architecture & Parallel Systems Laboratory Electrical & Computer Engineering Dept. University
More informationRule-Based Method for Entity Resolution Using Optimized Root Discovery (ORD)
American-Eurasian Journal of Scientific Research 12 (5): 255-259, 2017 ISSN 1818-6785 IDOSI Publications, 2017 DOI: 10.5829/idosi.aejsr.2017.255.259 Rule-Based Method for Entity Resolution Using Optimized
More informationAspect Refactoring Verifier
Aspect Refactoring Verifier Charles Zhang and Julie Waterhouse Hans-Arno Jacobsen Centers for Advanced Studies Department of Electrical and IBM Toronto Lab Computer Engineering juliew@ca.ibm.com and Department
More informationSimilarity Joins of Text with Incomplete Information Formats
Similarity Joins of Text with Incomplete Information Formats Shaoxu Song and Lei Chen Department of Computer Science Hong Kong University of Science and Technology {sshaoxu,leichen}@cs.ust.hk Abstract.
More informationJianyong Wang Department of Computer Science and Technology Tsinghua University
Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity
More informationAnalysis of Pointers and Structures
RETROSPECTIVE: Analysis of Pointers and Structures David Chase, Mark Wegman, and Ken Zadeck chase@naturalbridge.com, zadeck@naturalbridge.com, wegman@us.ibm.com Historically our paper was important because
More informationHow SPICE Language Modeling Works
How SPICE Language Modeling Works Abstract Enhancement of the Language Model is a first step towards enhancing the performance of an Automatic Speech Recognition system. This report describes an integrated
More informationDesigning and Building an Automatic Information Retrieval System for Handling the Arabic Data
American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far
More informationA COMPARISON OF MESHES WITH STATIC BUSES AND HALF-DUPLEX WRAP-AROUNDS. and. and
Parallel Processing Letters c World Scientific Publishing Company A COMPARISON OF MESHES WITH STATIC BUSES AND HALF-DUPLEX WRAP-AROUNDS DANNY KRIZANC Department of Computer Science, University of Rochester
More informationTwo-Dimensional Visualization for Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California
Two-Dimensional Visualization for Internet Resource Discovery Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California 90089-0781 fshli, danzigg@cs.usc.edu
More informationContent Based Smart Crawler For Efficiently Harvesting Deep Web Interface
Content Based Smart Crawler For Efficiently Harvesting Deep Web Interface Prof. T.P.Aher(ME), Ms.Rupal R.Boob, Ms.Saburi V.Dhole, Ms.Dipika B.Avhad, Ms.Suvarna S.Burkul 1 Assistant Professor, Computer
More informationAdministrivia. Crawlers: Nutch. Course Overview. Issues. Crawling Issues. Groups Formed Architecture Documents under Review Group Meetings CSE 454
Administrivia Crawlers: Nutch Groups Formed Architecture Documents under Review Group Meetings CSE 454 4/14/2005 12:54 PM 1 4/14/2005 12:54 PM 2 Info Extraction Course Overview Ecommerce Standard Web Search
More informationExtraction of Web Image Information: Semantic or Visual Cues?
Extraction of Web Image Information: Semantic or Visual Cues? Georgina Tryfou and Nicolas Tsapatsoulis Cyprus University of Technology, Department of Communication and Internet Studies, Limassol, Cyprus
More informationINTRO REPORT? CHAPTERS 02 GREETING FROM NEMP PRESIDENT 02 FUTURE PLANS 03 MISSION & VISION 04 IMPORTANCE AT A GLANCE 06 NEMP NETWORK REPORT
2015 Annual Report CONTENTS INTRO 02 GREETING FROM NEMP PRESIDENT 02 FUTURE PLANS 03 MISSION & VISION WHY AN ANNUAL REPORT? LEAD 04 IMPORTANCE 05 2015 AT A GLANCE 05 2016 SHARE CHAPTERS 06 NEMP NETWORK
More informationBrian F. Cooper. Distributed systems, digital libraries, and database systems
Brian F. Cooper Home Office Internet 2240 Homestead Ct. #206 Stanford University cooperb@stanford.edu Los Altos, CA 94024 Gates 424 http://www.stanford.edu/~cooperb/app/ (408) 730-5543 Stanford, CA 94305
More informationSANS Vendor Offerings Detail
SANS Vendor Offerings Detail After working with SANS for a few years now, the audience at SANS events and webinars continues to represent some of the most forward thinking IT security practitioners looking
More informationVideo Representation. Video Analysis
BROWSING AND RETRIEVING VIDEO CONTENT IN A UNIFIED FRAMEWORK Yong Rui, Thomas S. Huang and Sharad Mehrotra Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign
More informationRanking Techniques in Search Engines
Ranking Techniques in Search Engines Rajat Chaudhari M.Tech Scholar Manav Rachna International University, Faridabad Charu Pujara Assistant professor, Dept. of Computer Science Manav Rachna International
More informationREDUNDANCY REMOVAL IN WEB SEARCH RESULTS USING RECURSIVE DUPLICATION CHECK ALGORITHM. Pudukkottai, Tamil Nadu, India
REDUNDANCY REMOVAL IN WEB SEARCH RESULTS USING RECURSIVE DUPLICATION CHECK ALGORITHM Dr. S. RAVICHANDRAN 1 E.ELAKKIYA 2 1 Head, Dept. of Computer Science, H. H. The Rajah s College, Pudukkottai, Tamil
More informationImproving Synoptic Querying for Source Retrieval
Improving Synoptic Querying for Source Retrieval Notebook for PAN at CLEF 2015 Šimon Suchomel and Michal Brandejs Faculty of Informatics, Masaryk University {suchomel,brandejs}@fi.muni.cz Abstract Source
More informationAssociation-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications
Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications Daniel Mican, Nicolae Tomai Babes-Bolyai University, Dept. of Business Information Systems, Str. Theodor
More informationMatching Algorithms within a Duplicate Detection System
Matching Algorithms within a Duplicate Detection System Alvaro E. Monge California State University Long Beach Computer Engineering and Computer Science Department, Long Beach, CA, 90840-8302 Abstract
More informationSANS Vendor Events. SANS offers a variety of events which bring you in touch with the highly qualified SANS community.
SANS Vendor Events SANS offers a variety of events which bring you in touch with the highly qualified SANS community. SANS National Events over 1200 profession IT Security attendees and over 45 SANS classes
More informationCompetitive Intelligence and Web Mining:
Competitive Intelligence and Web Mining: Domain Specific Web Spiders American University in Cairo (AUC) CSCE 590: Seminar1 Report Dr. Ahmed Rafea 2 P age Khalid Magdy Salama 3 P age Table of Contents Introduction
More informationMyITWconnect Non-US Employees Registration User Guide
MyITWconnect Non-US Employees Registration User Guide Shannon Lawrence 10/9/2014 Table of Content Quick Facts... 2 User Guide Non-US Employee... 2 Non-US Employee Registration... 3 Log In... 8 View Employee
More informationWeb Service Usage Mining: Mining For Executable Sequences
7th WSEAS International Conference on APPLIED COMPUTER SCIENCE, Venice, Italy, November 21-23, 2007 266 Web Service Usage Mining: Mining For Executable Sequences MOHSEN JAFARI ASBAGH, HASSAN ABOLHASSANI
More informationAn Adaptive Agent for Web Exploration Based on Concept Hierarchies
An Adaptive Agent for Web Exploration Based on Concept Hierarchies Scott Parent, Bamshad Mobasher, Steve Lytinen School of Computer Science, Telecommunication and Information Systems DePaul University
More informationSOFTWARE COMPLEXITY MEASUREMENT USING MULTIPLE CRITERIA ABSTRACT
SOFTWARE COMPLEXITY MEASUREMENT USING MULTIPLE CRITERIA Bhaskar Raj Sinha, Pradip Peter Dey, Mohammad Amin and Hassan Badkoobehi National University, School of Engineering, Technology, and Media 3678 Aero
More informationFeature-Guided Automated Collaborative Filtering. Yezdi Lashkari. Abstract. of content analysis of documents to represent a prole of user interests.
Feature-Guided Automated Collaborative Filtering Yezdi Lashkari Abstract Information ltering systems have traditionally relied on some form of content analysis of documents to represent a prole of user
More informationConcept Tree Based Clustering Visualization with Shaded Similarity Matrices
Syracuse University SURFACE School of Information Studies: Faculty Scholarship School of Information Studies (ischool) 12-2002 Concept Tree Based Clustering Visualization with Shaded Similarity Matrices
More informationHierarchical Document Clustering
Hierarchical Document Clustering Benjamin C. M. Fung, Ke Wang, and Martin Ester, Simon Fraser University, Canada INTRODUCTION Document clustering is an automatic grouping of text documents into clusters
More informationWater and Solid Contaminant Control in LP Gas
Particle Analysis Data Final Report on Water and Solid Contaminant Control in LP Gas Docket 11353 by Rod Osborne and Sudheer Pimputkar 100 Battelle Applied Energy Systems 80 Size of human hair Size of
More informationDrive standards, performance and organizational value
Drive standards, performance and organizational value Facility Management Professional FMP Profile Students entering profession from universities, certificate or technical programs FM practitioners with
More informationCS 297 Report. Yioop! Full Historical Indexing In Cache Navigation. Akshat Kukreti SJSU ID:
CS 297 Report Yioop! Full Historical Indexing In Cache Navigation By Akshat Kukreti SJSU ID: 008025342 Email: akshat.kukreti@sjsu.edu Project Advisor: Dr. Chris Pollett Professor, Department of Computer
More informationDocument Structure Analysis in Associative Patent Retrieval
Document Structure Analysis in Associative Patent Retrieval Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba, 305-8550,
More informationFrom Passages into Elements in XML Retrieval
From Passages into Elements in XML Retrieval Kelly Y. Itakura David R. Cheriton School of Computer Science, University of Waterloo 200 Univ. Ave. W. Waterloo, ON, Canada yitakura@cs.uwaterloo.ca Charles
More informationImplementing a customised meta-search interface for user query personalisation
Implementing a customised meta-search interface for user query personalisation I. Anagnostopoulos, I. Psoroulas, V. Loumos and E. Kayafas Electrical and Computer Engineering Department, National Technical
More informationFILTERING OF URLS USING WEBCRAWLER
FILTERING OF URLS USING WEBCRAWLER Arya Babu1, Misha Ravi2 Scholar, Computer Science and engineering, Sree Buddha college of engineering for women, 2 Assistant professor, Computer Science and engineering,
More information2017 CMU FIRST DESTINATION OUTCOMES Information Networking Institute, Information Networking (M.S.)
DESTINATION OUTCOMES 2017 CMU FIRST DESTINATION OUTCOMES Information Networking Institute, Information Networking (M.S.) SALARIES Employed 52 Total Graduates 52 AVERAGE SALARY = $117,445 MEDIAN SALARY
More informationUsing the Functional Information Processing Model (FIPM) to Learn how a Library Catalog Works. Daniel A. Sabol Teachers College, Columbia University
Running head: Using the Functional Information Processing Model (FIPM) Using the Functional Information Processing Model (FIPM) to Learn how a Library Catalog Works. Daniel A. Sabol Teachers College, Columbia
More informationWEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS
1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,
More informationNavigating Product Catalogs Through OFDAV Graph Visualization
Navigating Product Catalogs Through OFDAV Graph Visualization Mao Lin Huang Department of Computer Systems Faculty of Information Technology University of Technology, Sydney NSW 2007, Australia maolin@it.uts.edu.au
More information2015, IJARCSSE All Rights Reserved Page 31
Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Procedural Cognitive
More informationNUS-I2R: Learning a Combined System for Entity Linking
NUS-I2R: Learning a Combined System for Entity Linking Wei Zhang Yan Chuan Sim Jian Su Chew Lim Tan School of Computing National University of Singapore {z-wei, tancl} @comp.nus.edu.sg Institute for Infocomm
More informationChina Building Energy Performance Rating Workshop 28 October 2013
Adam Hinge Sustainable Energy Partnerships China Building Energy Performance Rating Workshop 28 October 2013 IPEEC BEET Goals The key focus for the BEET: Improve energy efficiency of buildings in IPEEC
More informationWeb Usage Mining: A Research Area in Web Mining
Web Usage Mining: A Research Area in Web Mining Rajni Pamnani, Pramila Chawan Department of computer technology, VJTI University, Mumbai Abstract Web usage mining is a main research area in Web mining
More informationHow Primo Works VE. 1.1 Welcome. Notes: Published by Articulate Storyline Welcome to how Primo works.
How Primo Works VE 1.1 Welcome Welcome to how Primo works. 1.2 Objectives By the end of this session, you will know - What discovery, delivery, and optimization are - How the library s collections and
More informationColumbia University (office) Computer Science Department (mobile) Amsterdam Avenue
Wisam Dakka Columbia University (office) 212-939-7116 Computer Science Department (mobile) 646-643-1306 1214 Amsterdam Avenue wisam@cs.columbia.edu New York, New York, 10027 www.cs.columbia.edu/~wisam
More informationSciVerse Scopus. Date: 21 Sept, Coen van der Krogt Product Sales Manager Presented by Andrea Kmety
SciVerse Scopus Date: 21 Sept, 2011 Coen van der Krogt Product Sales Manager Presented by Andrea Kmety Agenda What is SciVerse? SciVerse Scopus at a glance Supporting Researchers Supporting the Performance
More informationA Fast Algorithm for Optimal Alignment between Similar Ordered Trees
Fundamenta Informaticae 56 (2003) 105 120 105 IOS Press A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Jesper Jansson Department of Computer Science Lund University, Box 118 SE-221
More informationInteractive Machine Learning (IML) Markup of OCR Generated Text by Exploiting Domain Knowledge: A Biodiversity Case Study
Interactive Machine Learning (IML) Markup of OCR Generated by Exploiting Domain Knowledge: A Biodiversity Case Study Several digitization projects such as Google books are involved in scanning millions
More informationA Novel PAT-Tree Approach to Chinese Document Clustering
A Novel PAT-Tree Approach to Chinese Document Clustering Kenny Kwok, Michael R. Lyu, Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, N.T., Hong Kong
More informationSyskill & Webert: Identifying interesting web sites
Syskill & Webert Page 1 of 10 Syskill & Webert: Identifying interesting web sites Abstract Michael Pazzani, Jack Muramatsu & Daniel Billsus Department of Information and Computer Science University of
More informationClustering Documents in Large Text Corpora
Clustering Documents in Large Text Corpora Bin He Faculty of Computer Science Dalhousie University Halifax, Canada B3H 1W5 bhe@cs.dal.ca http://www.cs.dal.ca/ bhe Yongzheng Zhang Faculty of Computer Science
More informationHire Counsel + ACEDS. Unified Team, National Footprint Offices. ediscovery Centers
Unified Team, National Footprint Offices Boston, MA Charlotte, NC Chicago, IL Darien, CT Los Angeles, CA Miami, FL Morrisville, NC New York, NY Philadelphia, PA San Francisco, CA Southfield, MI Washington,
More informationLink Recommendation Method Based on Web Content and Usage Mining
Link Recommendation Method Based on Web Content and Usage Mining Przemys law Kazienko and Maciej Kiewra Wroc law University of Technology, Wyb. Wyspiańskiego 27, Wroc law, Poland, kazienko@pwr.wroc.pl,
More informationFeature Subset Selection Utilizing BioMechanical Characteristics for Hand Gesture Recognition
Feature Subset Selection Utilizing BioMechanical Characteristics for Hand Gesture Recognition Farid Parvini Computer Science Department University of Southern California Los Angeles, USA Dennis McLeod
More informationAn Integrated Framework to Enhance the Web Content Mining and Knowledge Discovery
An Integrated Framework to Enhance the Web Content Mining and Knowledge Discovery Simon Pelletier Université de Moncton, Campus of Shippagan, BGI New Brunswick, Canada and Sid-Ahmed Selouani Université
More informationA SURVEY ON WEB FOCUSED INFORMATION EXTRACTION ALGORITHMS
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 A SURVEY ON WEB FOCUSED INFORMATION EXTRACTION ALGORITHMS Satwinder Kaur 1 & Alisha Gupta 2 1 Research Scholar (M.tech
More informationWeb Data mining-a Research area in Web usage mining
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,
More informationHOMEPAGES OF INDIAN UNIVERSITIES WEBSITES : A STUDY
333 HOMEPAGES OF INDIAN UNIVERSITIES WEBSITES : A STUDY M. Chandrashekara M. Nandish Kumar Abstract The rapid development of information and communication technology has made it very easy to access information
More informationUsing Query History to Prune Query Results
Using Query History to Prune Query Results Daniel Waegel Ursinus College Department of Computer Science dawaegel@gmail.com April Kontostathis Ursinus College Department of Computer Science akontostathis@ursinus.edu
More informationComparison of Online Record Linkage Techniques
International Research Journal of Engineering and Technology (IRJET) e-issn: 2395-0056 Volume: 02 Issue: 09 Dec-2015 p-issn: 2395-0072 www.irjet.net Comparison of Online Record Linkage Techniques Ms. SRUTHI.
More informationMinimizing Collateral Damage by Proactive Surge Protection
Minimizing Collateral Damage by Proactive Surge Protection Jerry Chou, Bill Lin University of California, San Diego Subhabrata Sen, Oliver Spatscheck AT&T Labs-Research ACM SIGCOMM LSAD Workshop, Kyoto,
More information1 Introduction Web search services such as Lycos and WebCrawler have proven both useful and popular. As the Web grows, the number and variety of searc
Multi-Service Search and Comparison Using the MetaCrawler Erik Selberg Oren Etzioni Department of Computer Science and Engineering University of Washington Seattle, WA 98195 fselberg, etzionig@cs.washington.edu
More informationAdvances in Natural and Applied Sciences. Information Retrieval Using Collaborative Filtering and Item Based Recommendation
AENSI Journals Advances in Natural and Applied Sciences ISSN:1995-0772 EISSN: 1998-1090 Journal home page: www.aensiweb.com/anas Information Retrieval Using Collaborative Filtering and Item Based Recommendation
More informationOracle Database 10g Resource Manager. An Oracle White Paper October 2005
Oracle Database 10g Resource Manager An Oracle White Paper October 2005 Oracle Database 10g Resource Manager INTRODUCTION... 3 SYSTEM AND RESOURCE MANAGEMENT... 3 ESTABLISHING RESOURCE PLANS AND POLICIES...
More informationOntology-Based Web Query Classification for Research Paper Searching
Ontology-Based Web Query Classification for Research Paper Searching MyoMyo ThanNaing University of Technology(Yatanarpon Cyber City) Mandalay,Myanmar Abstract- In web search engines, the retrieval of
More informationEnhancing Cluster Quality by Using User Browsing Time
Enhancing Cluster Quality by Using User Browsing Time Rehab Duwairi Dept. of Computer Information Systems Jordan Univ. of Sc. and Technology Irbid, Jordan rehab@just.edu.jo Khaleifah Al.jada' Dept. of
More informationPoirot: a relevance-based web search agent
Poirot: a relevance-based web search agent From: AAAI Technical Report WS-00-01. Compilation copyright 2000, AAAI (www.aaai.org). All rights reserved. José M. Ramírez Jordi Donadeu Francisco J. Neves Grupo
More informationNYC Metro Area Oracle Users Group Day
The New York Oracle Users Group NYC Metro Area Oracle Users Group Day September 10, 2008 Welcome! This is the 6 th Metro Area Meeting Are You a Member? NYOUG NJOUG CTOUG IOUG ODTUG Other Oracle User Group
More informationCS506/606 - Topics in Information Retrieval
CS506/606 - Topics in Information Retrieval Instructors: Class time: Steven Bedrick, Brian Roark, Emily Prud hommeaux Tu/Th 11:00 a.m. - 12:30 p.m. September 25 - December 6, 2012 Class location: WCC 403
More informationSeek and Ye shall Find
Seek and Ye shall Find The continuum of computer intelligence COS 116, Spring 2010 Adam Finkelstein Final tally: Computer $77,147, Ken Jennings $24,000, Brad Rutter $21,600. Jennings: I, for one, welcome
More informationIdentifying Web Spam With User Behavior Analysis
Identifying Web Spam With User Behavior Analysis Yiqun Liu, Rongwei Cen, Min Zhang, Shaoping Ma, Liyun Ru State Key Lab of Intelligent Tech. & Sys. Tsinghua University 2008/04/23 Introduction simple math
More informationEnterprise Chat and Supervisor s Guide, Release 11.5(1)
Enterprise Chat and Email Supervisor s Guide, Release 11.5(1) For Unified Contact Center Enterprise August 2016 Americas Headquarters Cisco Systems, Inc. 170 West Tasman Drive San Jose, CA 95134-1706 USA
More informationIntegrating Image Content and its Associated Text in a Web Image Retrieval Agent
From: AAAI Technical Report SS-97-03. Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. Integrating Image Content and its Associated Text in a Web Image Retrieval Agent Victoria Meza
More information