AN EFFECTIVE SEARCH TOOL FOR LOCATING RESOURCE IN NETWORK
|
|
- Victor Bryant
- 6 years ago
- Views:
Transcription
1 AN EFFECTIVE SEARCH TOOL FOR LOCATING RESOURCE IN NETWORK G.Mohammad Rafi 1, K.Sreenivasulu 2, K.Anjaneyulu 3 1. M.Tech(CSE Pursuing), Madina Engineering College,Kadapa,AP 2. Professor & HOD Dept.Of CSE, Madina Engineering College,Kadapa,AP 3. Asst.Prof & HOD,Dept. of Computer Applications, Sri Sai Institute Of Technology &Science, Rayachoty,Kadapa,AP Abstract: A good search engine ensures that users find what they're looking for, first time, regardless of the format or location of the information. This means that a wide variety of information can be effectively dispersed and made available to staff, without the need for complex navigation systems or filing conventions. Most intranets evolve over time, and search functionality need not be a daunting task. A search tool can be implemented quickly, and then refined as the intranet grows and the needs of the organization change. Importantly, a flexible search engine that is costeffective and expands to suit growing requirements can be a much better. It is important to recognize that every intranet is different, with its own objectives, requirements and environment. INTRODUCTION The desktop search is a tool using certain keywords to search various data sources like the web browser histories, the archives, the text documents or the metadata of mp3 files from the local disk storage. After the search tool is installed, desktop search engine used by the search tool will parse all files stored on local disks and maintain an index database for achieving reasonable performance especially when the local storages have kept data of several hundred gigabytes or even terabytes. The key techniques used by the desktop search engine are the ability to parse all files in various formats and the ability to do full text search to the parsed contents. Search all contents in the intranet: It is convenient to find information that is hidden in the myriad files in the local hard drives with the help with desktop search tool. However, for some enterprise applications, sometimes the useful information like meeting minutes or schedule of certain works may be 36
2 concealed in other person's computer in metadata about documents. Lucene lets the intranet. us add searching capabilities to applications. Some Salient features of Intranet based search engine are: It is FAST For fast search it creates indexes of the shared folders. Creation of indexes considerably reduces the search time for a query. The application is DISTRIBUTED Documents to be searched may not reside on your local machine. Being distributed it can make queries to other machines in the network. The application uses P2P sharing No special requirements for server just install and get connected. Each peer serves others. 1. LUCENE PACKAGE: Lucene is a high performance, scalable Information Retrieval (IR) library. Information retrieval refers to the process of searching for documents, information within documents or Fig: Typical components of search application, the shaded components show which parts Lucene handles. Lucene provides a simple yet powerful core API that requires minimal understanding of full-text indexing and searching. Because Lucene is a Java library, it doesn t make assumptions about what it indexes and searches, which gives it an advantage over a number of other search applications. Its design is compact and simple, allowing Lucene to be easily embedded into desktop applications. 37
3 Lucene can index and make searchable documents where they any data that can extract text from. As appear. The quality of a seen in the following figure, Lucene search is typically doesn t care about the source of the data, its format, or even its language, as long as we can derive text from it. This means described using precision and recall metrics. Recall measures how well the we can index and search data stored in search system finds files: web pages on remote web servers, documents stored in local file systems, relevant whereas documents, precision simple text files, Microsoft Word measures how well the documents, XML or HTML or PDF system filters out the files, or any other format from which irrelevant documents. you can extract textual information. Search Query: This is the 2. IMPLEMENTATION: process of consulting the search index and retrieving We have used Lucene Package for the documents matching indexing the content of files. the Query, sorted in the 3.1 Indexing and searching: requested sort order. Index document: During the indexing step, the document is added to the index. Lucene provides everything necessary for this step, and works quite a bit of magic under a surprisingly simple API. Components for searching: Searching is the process of looking up words in an index to find There are three common models of search: Pure boolean model -- Documents either match or does not match the provided query, and no scoring is done. In this model there are no relevance scores associated with matching documents; a query simply identifies a subset of the overall corpus as matching the query. 38
4 Vector space model -- Both 3.2 The core indexing classes: queries and documents are modeled as vectors in a very high dimensional space, where In our Indexer class, we need the following classes to perform each unique term is a the simplest indexing dimension. Relevance, or procedure: similarity, between a query and a document is computed by a IndexWriter vector distance measure Directory between these vectors. Analyzer Probabilistic model -- Computes the probability that a Document document is a good match to a Field query using a full probabilistic approach. Lucene s approach combines the vector space and pure Boolean models. Lucene returns documents which we next must render in a very consumable way for users. Render Results: Once we have the raw set of documents that match the query, sorted in the right order, we then render them to the user in an intuitive, consumable manner. Fig: Classes used when indexing documents with Lucene IndexWriter: IndexWriter is the central component of the indexing process. This class creates a new index or opens an existing one, and then adds, removes or updates documents in the index. 39
5 Directory: The Directory class represents the location of a Lucene index. It s an abstract class that allows its subclasses to store the corresponding value, and a bunch of options, described in Section, that control precisely how Lucene will index the Field s value index as they see fit. In our Indexer example, we created an 3.3 The core searching classes: FSDirectory, which stores real files The basic search interface that in a directory in the filesystem, and Lucene provides is as passed it to IndexWriter s straightforward as the one for constructor. Analyzer: Before text is indexed, it s passed through an Analyzer. The indexing. Only a few classes are needed to perform the basic search operation: Analyzer, specified in the IndexSearcher IndexWriter constructor, is in charge of extracting those tokens out of text Term that should be indexed, and Query eliminating the rest. TermQuery Document: A Document represents a collection of fields. It can be like TopDocs virtual document a chunk of data, IndexSearcher: IndexSearcher is to such as a web page, an searching what IndexWriter is to message, or a text files that you indexing: the central link to the want to make retrievable at a later index that exposes several search time. Fields of a document represent methods. the document or meta-data associated with that document. Term: A Term is the basic unit for searching. Similar to the Field Field: Each Document in an index object, it consists of a pair of string contains one or more named fields, elements: the name of the field and embodied in a class called Field. the word (text value) of that field. Each field has a name and 40
6 Query: Lucene comes with a Lucene to translate a PDF number of concrete Query into a Lucene document. subclasses. The most basic Lucene Query: TermQuery. Other Query XPDF is an open source types are BooleanQuery, tool that is licensed under PhraseQuery, PrefixQuery, the GPL. It's not a Java PhrasePrefixQuery, RangeQuery, tool, but there is a utility FilteredQuery, and SpanQuery. called pdftotext that can translate PDF files into text TermQuery: TermQuery is the most basic type of query supported files on most platforms from the command line. by Lucene, and it s one of the primitive query types. It s used for matching documents that contain fields with specific values. JPedal is a Java API for extracting text and images from PDF documents. TopDocs: The TopDocs class is a simple container of pointers to the top N ranked search results documents that match a given query Simple Text Extractor Library for use with PDF documents. Relies on PDFBox. In order to index PDF documents we need to first parse them to extract text that we want to index from them. Here are some of the PDF parsers: PDFBox is a Java API from Ben Litchfield that will let us access the contents of a PDF document. It comes with integration classes for Fig : Indexing with Lucene breaks down into three main operations: extracting text from source documents, analyzing it and saving it to the index. 41
7 During indexing, the text is first extracted from the original content and its task. It typically uses port 20 for data transfer and port 21 to listen to used to create an instance of Document; commands. Though having data containing Field instances hold the content. The text in the fields is then analyzed, to produce a stream of tokens. Finally, those tokens are added to the index in a segmented architecture. transferred over port 20 is not always the case as it can also be a different port as well. That is where the confusing part for many people comes into play. There are two modes to FTP, namely active and passive mode. These two modes are 3.4 FTP Protocol: File Transfer Protocol (FTP) is a standard network protocol used to copy a file from one host to another over a initiated by the FTP client, and then acted upon by the FTP server. Some of the terminology regarding File Transfer Protocol TCP/IP-based network, such as the Internet. FTP is built on client-server 1. Control connection: The architecture and utilizes separate control and data connections between the client communication path between the USER-PI and SERVER-PI and server applications which solve the for the exchange of problem of different end host commands and replies. This configurations (i.e. Operating System, file names). FTP is used with user-based connection follows the Telnet Protocol. password authentication or with anonymous user access. 2. Data connection: A full duplex connection over which data is FTP itself uses the TCP transport transferred, in a specified protocol exclusively, or in other words, it mode and type. The data never uses UDP for its transport needs. Typically an application layer protocol transferred may be a part of a file, an entire file or a number will use one or the other. One notable of files. The path may be exception to that is DNS or Domain between a server-dtp and a Name System. FTP also is odd in the user-dtp, or between two fact that it uses two ports to accomplish server-dtps. 42
8 3. Data port: The passive data The query entered will be fired transfer process "listens" on the data port for a connection from the active transfer process in simultaneously on both local machine and remote machine and the results are displayed obtained from both local and order to open the data remote machine. By selecting a connection. particular file we can retrieve a file from the remote machine using FTP Protocol. 4. DTP: The data transfer process remote machine. By selecting a establishes and manages the particular file we can retrieve a file from data connection. The DTP can the remote machine using FTP Protocol. be passive or active. 5. End-of-Line: The end-of-line sequence defines the separation of printing lines. The sequence is Carriage Return, followed by Line Feed. 6. EOF: The end-of-file condition that defines the end of a file being transferred. 7. EOR: The end-of-record condition that defines the end of a record being transferred. 8. Error recovery: A procedure that allows a user to recover from certain errors such as failure of either host system or transfer process. In FTP, error recovery may involve restarting a file transfer at a given checkpoint. 43
9 3.5 Project Screenshots Fig: Indexing Files Fig : Settings for Indexing Fig : Results of search query moblin Fig : Search Settings 44
10 4 CONCLUSION: The desktop search can be sped up with the help of other computers in the intranet. Rather than searching the files based on name of the files we have developed an application that searches based on the content of the files and most importantly indexes the contents of the file using Lucene package rather than the files itself. We made sure that simplicity is maintained for user. The semantics of local and remote search procedures are properly abstracted and user feels that he is getting the results from remote machine with the same semantics as that of a local query. As this is a distributed application it can be further enhanced to search on different subnets in an organization by incorporating the range of I.P Addresses. REFERENCES: [1] Wei Lun Huang,Tzao Lin Lee, Chiao Szu Liao,Dept. of Comput. Sci. & Inf. Eng., Nat. Taiwan Univ, Taipei, "Desktop search in the intranet with integrated desktop search engines"computer Systems Architecture Conference, ACSAC th Asia-Pacific Aug page(s): 1 4 2/security-special-report-who-sees-your-data.htm, Apr [3] Ferragina, P., and Grossi, R. Fast incremental text indexing. Proc. 6th ACM SIAM Symposium on Discrete Algorithms (SODA), pages , [4] Erik Hatcher, Otis Gospodnetic, Michael McCandless; Lucene In Action, Second Edition, Manning Publication. [5] Apache-Luceneweb site, [6] Google desktop web site, March [7] Beagle desktop search engine web site, Apr First Author 1.G.MohammadRafi Pursuing M.Tech(CSE), MADINA Engineering College, Kadapa, Kadapa(Dt), Andhra Pradesh. I am interest to research the Network & Wireless Network areas. Second Author 2. K.Sreenivasulu HOD Dept.Of CSE, Madina Engineering College,Kadapa,AP Third Author 3.K.Anjaneyulu Asst.Prof & HOD,Dept. of Computer Applications, Sri Sai Institute Of Technology &Science, Rayachoty,Kadapa,AP [2] Computerweekly web site, 006/04/25/
BEST SEARCH AND RETRIEVAL PERFORMANCE EVALUATION WITH LUCENE INDEXING
Journal homepage: www.mjret.in ISSN:2348-6953 BEST SEARCH AND RETRIEVAL PERFORMANCE EVALUATION WITH LUCENE INDEXING Sonam Baban Borhade, Prof. Pankaj Agarkar Department of Computer Engineering Dr. D.Y.Patil
More informationCSMC 412. Computer Networks Prof. Ashok K Agrawala Ashok Agrawala Set 2. September 15 CMSC417 Set 2 1
CSMC 412 Computer Networks Prof. Ashok K Agrawala 2015 Ashok Agrawala Set 2 September 15 CMSC417 Set 2 1 Contents Client-server paradigm End systems Clients and servers Sockets Socket abstraction Socket
More informationLucidWorks: Searching with curl October 1, 2012
LucidWorks: Searching with curl October 1, 2012 1. Module name: LucidWorks: Searching with curl 2. Scope: Utilizing curl and the Query admin to search documents 3. Learning objectives Students will be
More informationQuerying a Lucene Index
Querying a Lucene Index Queries and Scorers and Weights, oh my! Alan Woodward - alan@flax.co.uk - @romseygeek We build, tune and support fast, accurate and highly scalable search, analytics and Big Data
More informationTechnical Overview. Access control lists define the users, groups, and roles that can access content as well as the operations that can be performed.
Technical Overview Technical Overview Standards based Architecture Scalable Secure Entirely Web Based Browser Independent Document Format independent LDAP integration Distributed Architecture Multiple
More informationProject Report. Project Title: Evaluation of Standard Information retrieval system related to specific queries
Project Report Project Title: Evaluation of Standard Information retrieval system related to specific queries Submitted by: Sindhu Hosamane Thippeswamy Information and Media Technologies Matriculation
More informationComputer Networks Prof. Ashok K. Agrawala
CMSC417 Computer Networks Prof. Ashok K. Agrawala 2018Ashok Agrawala September 6, 2018 Fall 2018 Sept 6, 2018 1 Overview Client-server paradigm End systems Clients and servers Sockets Socket abstraction
More informationSearching and Analyzing Qualitative Data on Personal Computer
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 10, Issue 2 (Mar. - Apr. 2013), PP 41-45 Searching and Analyzing Qualitative Data on Personal Computer Mohit
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More informationDesign and Implementation of Full Text Search Engine Based on Lucene Na-na ZHANG 1,a *, Yi-song WANG 1 and Kun ZHU 1
2017 2 nd International Conference on Computer Science and Technology (CST 2017) ISBN: 978-1-60595-461-5 Design and Implementation of Full Text Search Engine Based on Lucene Na-na ZHANG 1,a *, Yi-song
More informationOBJECT ORIENTED PROGRAMMING
1 OBJECT ORIENTED PROGRAMMING Lecture 14 Networking Basics Outline 2 Networking Basics Socket IP Address DNS Client/Server Networking Class & Interface URL Demonstrating URL Networking 3 Java is practically
More informationChapter 2. Architecture of a Search Engine
Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them
More informationApache Lucene - Scoring
Grant Ingersoll Table of contents 1 Introduction...2 2 Scoring... 2 2.1 Fields and Documents... 2 2.2 Score Boosting...3 2.3 Understanding the Scoring Formula...3 2.4 The Big Picture...3 2.5 Query Classes...
More informationAPPLICATION LAYER APPLICATION LAYER : DNS, HTTP, , SMTP, Telnet, FTP, Security-PGP-SSH.
APPLICATION LAYER : DNS, HTTP, E-mail, SMTP, Telnet, FTP, Security-PGP-SSH. To identify an entity, the Internet used the IP address, which uniquely identifies the connection of a host to the Internet.
More informationResearch and implementation of search engine based on Lucene Wan Pu, Wang Lisha
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) Research and implementation of search engine based on Lucene Wan Pu, Wang Lisha Physics Institute,
More informationInformation Retrieval
Introduction to Information Retrieval Lucene Tutorial Chris Manning and Pandu Nayak Open source IR systems Widely used academic systems Terrier (Java, U. Glasgow) http://terrier.org Indri/Galago/Lemur
More informationDevelopment of Search Engines using Lucene: An Experience
Available online at www.sciencedirect.com Procedia Social and Behavioral Sciences 18 (2011) 282 286 Kongres Pengajaran dan Pembelajaran UKM, 2010 Development of Search Engines using Lucene: An Experience
More informationFiery PRO 80 /S450 65C-KM Color Server. Printing from Windows
Fiery PRO 80 /S450 65C-KM Color Server Printing from Windows 2007 Electronics for Imaging, Inc. The information in this publication is covered under Legal Notices for this product. 45067315 01 November
More informationA short introduction to the development and evaluation of Indexing systems
A short introduction to the development and evaluation of Indexing systems Danilo Croce croce@info.uniroma2.it Master of Big Data in Business SMARS LAB 3 June 2016 Outline An introduction to Lucene Main
More informationextensible Text Framework (XTF): Building a Digital Publishing Framework
extensible Text Framework (XTF): Building a Digital Publishing Framework California Digital Library Kirk Hastings Martin Haye XTF Topics Digital publishing at CDL What XTF is (and isn't) Design and Features
More informationPort Forwarding Setup (NB7)
Port Forwarding Setup (NB7) Port Forwarding Port forwarding enables programs or devices running on your LAN to communicate with the internet as if they were directly connected. This is most commonly used
More informationWeb Mechanisms. Draft: 2/23/13 6:54 PM 2013 Christopher Vickery
Web Mechanisms Draft: 2/23/13 6:54 PM 2013 Christopher Vickery Introduction While it is perfectly possible to create web sites that work without knowing any of their underlying mechanisms, web developers
More informationCoveo Platform 7.0. Yammer Connector Guide
Coveo Platform 7.0 Yammer Connector Guide Notice The content in this document represents the current view of Coveo as of the date of publication. Because Coveo continually responds to changing market conditions,
More informationDistributed Systems Fall 2009 Final
15-440 Distributed Systems Fall 2009 Final Name: Andrew: ID November 29, 2010 Please write your name and Andrew ID above before starting this exam. This exam has 10 pages, including this title page. Please
More informationThe Client Server Model and Software Design
The Client Server Model and Software Design Prof. Chuan-Ming Liu Computer Science and Information Engineering National Taipei University of Technology Taipei, TAIWAN MCSE Lab, NTUT, TAIWAN 1 Introduction
More informationIndexing and Searching Document Collections using Lucene
University of New Orleans ScholarWorks@UNO University of New Orleans Theses and Dissertations Dissertations and Theses 5-18-2007 Indexing and Searching Document Collections using Lucene Sridevi Addagada
More informationEPL660: Information Retrieval and Search Engines Lab 2
EPL660: Information Retrieval and Search Engines Lab 2 Παύλος Αντωνίου Γραφείο: B109, ΘΕΕ01 University of Cyprus Department of Computer Science Apache Lucene Extremely rich and powerful full-text search
More informationFiery EXP50 Color Server. Printing from Windows
Fiery EXP50 Color Server Printing from Windows 2006 Electronics for Imaging, Inc. The information in this publication is covered under Legal Notices for this product. 45055357 24 March 2006 CONTENTS 3
More informationSearch Engines Exercise 5: Querying. Dustin Lange & Saeedeh Momtazi 9 June 2011
Search Engines Exercise 5: Querying Dustin Lange & Saeedeh Momtazi 9 June 2011 Task 1: Indexing with Lucene We want to build a small search engine for movies Index and query the titles of the 100 best
More informationIMPLEMENTATION OF INFORMATION RETRIEVAL (IR) ALGORITHM FOR CLOUD COMPUTING: A COMPARATIVE STUDY BETWEEN WITH AND WITHOUT MAPREDUCE MECHANISM *
Journal of Contemporary Issues in Business Research ISSN 2305-8277 (Online), 2012, Vol. 1, No. 2, 42-56. Copyright of the Academic Journals JCIBR All rights reserved. IMPLEMENTATION OF INFORMATION RETRIEVAL
More informationDistributed Filesystem
Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the
More informationVK Multimedia Information Systems
VK Multimedia Information Systems Mathias Lux, mlux@itec.uni-klu.ac.at This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Results Exercise 01 Exercise 02 Retrieval
More informationCHAPTER 22 DISTRIBUTED APPLICATIONS ANSWERS TO QUESTIONS ANSWERS TO PROBLEMS
CHAPTER 22 DISTRIBUTED APPLICATIONS ANSWERS TO QUESTIONS 22.1 RFC 821 defines SMTP which is the protocol for exchanging email messages. RFC 822 describes the format of those messages. 22.2 The Simple Mail
More informationLucene Java 2.9: Numeric Search, Per-Segment Search, Near-Real-Time Search, and the new TokenStream API
Lucene Java 2.9: Numeric Search, Per-Segment Search, Near-Real-Time Search, and the new TokenStream API Uwe Schindler Lucene Java Committer uschindler@apache.org PANGAEA - Publishing Network for Geoscientific
More informationVersion 3.3 System Administrator Guide
Version 3.3 System Administrator Guide This document provides information Ensemble Video System Administrators can use to design and implement an appropriate Ensemble Video organizational framework, manage
More informationNetworked Applications: Sockets. End System: Computer on the Net
Networked Applications: Sockets Topics Programmer s view of the Internet Sockets interface End System: Computer on the Net Internet Also known as a host 2 Page 1 Clients and Servers Client program Running
More informationApplication Layer Introduction; HTTP; FTP
Application Layer Introduction; HTTP; FTP Tom Kelliher, CS 325 Feb. 4, 2011 1 Administrivia Announcements Assignment Read 2.4 2.6. From Last Time Packet-switched network characteristics; protocol layers
More informationEfficient Indexing and Searching Framework for Unstructured Data
Efficient Indexing and Searching Framework for Unstructured Data Kyar Nyo Aye, Ni Lar Thein University of Computer Studies, Yangon kyarnyoaye@gmail.com, nilarthein@gmail.com ABSTRACT The proliferation
More informationMap-Reduce. Marco Mura 2010 March, 31th
Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK SQL EDITOR FOR XML DATABASE MISS. ANUPAMA V. ZAKARDE 1, DR. H. R. DESHMUKH 2, A.
More informationCMPE 151: Network Administration. Servers
CMPE 151: Network Administration Servers Announcements Unix shell+emacs tutorial. Basic Servers Telnet/Finger FTP Web SSH NNTP Let s look at the underlying protocols. Client-Server Model Request Response
More informationConcept Questions Demonstrate your knowledge of these concepts by answering the following questions in the space provided.
113 Chapter 9 TCP/IP Transport and Application Layer Services that are located in the transport layer enable users to segment several upper-layer applications onto the same transport layer data stream.
More informationHow the Web Works. Chapter 1. Modified by Marissa Schmidt Pearson
How the Web Works Chapter 1 Modified by Marissa Schmidt 2015 Pearson Fundamentals ofhttp://www.funwebdev.com Web Development Objectives 1 Definitions and History 2 Internet Protocols 3 Client-Server Model
More informationBorland Search 2009 Administration Guide. StarTeam and CaliberRM
Borland Search 2009 Administration Guide StarTeam and CaliberRM Borland Software Corporation 8310 N Capital of Texas Hwy Bldg 2, Ste 100 Austin, Texas 78731 http://www.borland.com Borland Software Corporation
More informationCLOUD-SCALE FILE SYSTEMS
Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients
More informationObjectives. Connecting with Computer Science 2
Objectives Learn what the Internet really is Become familiar with the architecture of the Internet Become familiar with Internet-related protocols Understand how the TCP/IP protocols relate to the Internet
More informationRanked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points?
Ranked Retrieval One option is to average the precision scores at discrete Precision 100% 0% More junk 100% Everything points on the ROC curve But which points? Recall We want to evaluate the system, not
More informationInformation Retrieval
Introduction Information Retrieval Information retrieval is a field concerned with the structure, analysis, organization, storage, searching and retrieval of information Gerard Salton, 1968 J. Pei: Information
More informationIntroduc)on to Lucene. Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata
Introduc)on to Lucene Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata Open source search engines Academic Terrier (Java, University of Glasgow) Indri, Lemur (C++,
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable References Bigtable: A Distributed Storage System for Structured Data. Fay Chang et. al. OSDI
More informationQuick Start Guide TABLE OF CONTENTS COMMCELL ARCHITECTURE OVERVIEW COMMCELL SOFTWARE DEPLOYMENT INSTALL THE COMMSERVE SOFTWARE
Page 1 of 35 Quick Start Guide TABLE OF CONTENTS This Quick Start Guide is designed to help you install and use a CommCell configuration to which you can later add other components. COMMCELL ARCHITECTURE
More informationINTRODUCING THE UNIFIED E-BOOK FORMAT AND A HYBRID LIBRARY 2.0 APPLICATION MODEL BASED ON IT. 1. Introduction
Преглед НЦД 14 (2009), 43 52 Teo Eterović, Nedim Šrndić INTRODUCING THE UNIFIED E-BOOK FORMAT AND A HYBRID LIBRARY 2.0 APPLICATION MODEL BASED ON IT Abstract: We introduce Unified e-book Format (UeBF)
More informationOpen Source Search. Andreas Pesenhofer. max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria
Open Source Search Andreas Pesenhofer max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria max.recall information systems max.recall is a software and consulting company enabling
More informationWHITE PAPER. Good Mobile Intranet Technical Overview
WHITE PAPER Good Mobile Intranet CONTENTS 1 Introduction 4 Security Infrastructure 6 Push 7 Transformations 8 Differential Data 8 Good Mobile Intranet Server Management Introduction Good Mobile Intranet
More informationComprehensive Guide to Evaluating Event Stream Processing Engines
Comprehensive Guide to Evaluating Event Stream Processing Engines i Copyright 2006 Coral8, Inc. All rights reserved worldwide. Worldwide Headquarters: Coral8, Inc. 82 Pioneer Way, Suite 106 Mountain View,
More informationTFTP and FTP Basics BUPT/QMUL
TFTP and FTP Basics BUPT/QMUL 2017-04-24 Agenda File transfer and access TFTP (Trivial File Transfer Protocol) FTP (File Transfer Protocol) NFS (Network File System) 2 File Transfer And Access 3 File Transfer
More informationChapter 2 Layer Architecture of Network Protocols. School of Info. Sci. & Eng. Shandong Univ.
Chapter 2 Architecture of Network Protocols School of Info. Sci. & Eng. Shandong Univ. Outline 2.1 Examples of ing 2.2 OSI Reference Model (Continued from last time) 2.3. TCP/IP Architecture 2.4 Berkeley
More informationDeveloping A Web-based User Interface for Semantic Information Retrieval
Developing A Web-based User Interface for Semantic Information Retrieval Daniel C. Berrios 1, Richard M. Keller 2 1 Research Institute for Advanced Computer Science, MS 269-2, NASA Ames Research Center,
More informationBI-09 Using Enterprise Guide Effectively Tom Miron, Systems Seminar Consultants, Madison, WI
Paper BI09-2012 BI-09 Using Enterprise Guide Effectively Tom Miron, Systems Seminar Consultants, Madison, WI ABSTRACT Enterprise Guide is not just a fancy program editor! EG offers a whole new window onto
More informationFiery X3eTY2 65_55C-KM Color Server. Utilities
Fiery X3eTY2 65_55C-KM Color Server Utilities 2008 Electronics for Imaging, Inc. The information in this publication is covered under Legal Notices for this product. 45072888 14 March 2008 CONTENTS 3 CONTENTS
More informationPROJECT 6: PINTOS FILE SYSTEM. CS124 Operating Systems Winter , Lecture 25
PROJECT 6: PINTOS FILE SYSTEM CS124 Operating Systems Winter 2015-2016, Lecture 25 2 Project 6: Pintos File System Last project is to improve the Pintos file system Note: Please ask before using late tokens
More informationComputer Network : Lecture Notes Nepal Engineering College Compiled by: Junior Professor: Daya Ram Budhathoki Nepal Engineering college, Changunarayan
Computer Network : Lecture Notes Nepal Engineering College Compiled by: Junior Professor: Daya Ram Budhathoki Nepal Engineering college, Changunarayan Chapter3: OSI Reference Model: Network Software: Network
More informationParametric Search using In-memory Auxiliary Index
Parametric Search using In-memory Auxiliary Index Nishant Verman and Jaideep Ravela Stanford University, Stanford, CA {nishant, ravela}@stanford.edu Abstract In this paper we analyze the performance of
More informationEFI Fiery Utilities Technical Reference. Part Number: , Rev. 1.0
EFI Fiery Utilities Technical Reference Part Number: 59308805, Rev. 1.0 15 March 2008 CONTENTS 3 CONTENTS INTRODUCTION 5 Terminology and conventions 6 About this document 7 About Help 7 Preparing for installation
More informationTransport layer Internet layer
Lecture 2-bis. 2 Transport Protocols As seen by the application developer point of view The primary (in principle unique) role of transport protocols!" # $ % "!"& Transport httpd 25 80 3211... My app 131.175.15.1
More informationSCAM Portfolio Scalability
SCAM Portfolio Scalability Henrik Eriksson Per-Olof Andersson Uppsala Learning Lab 2005-04-18 1 Contents 1 Abstract 3 2 Suggested Improvements Summary 4 3 Abbreviations 5 4 The SCAM Portfolio System 6
More informationInformation Management Platform Release Date Version Highlights compared to previous version
For over 30 years ZyLAB has been working with professionals in the litigation, auditing, security and intelligence communities to develop the best solutions for investigating and managing large sets of
More informationHadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017
Hadoop File System 1 S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y Moving Computation is Cheaper than Moving Data Motivation: Big Data! What is BigData? - Google
More informationEFFECTIVE EFFICIENT BOOLEAN RETRIEVAL
EFFECTIVE EFFICIENT BOOLEAN RETRIEVAL J Naveen Kumar 1, Dr. M. Janga Reddy 2 1 jnaveenkumar6@gmail.com, 2 pricipalcmrit@gmail.com 1 M.Tech Student, Department of Computer Science, CMR Institute of Technology,
More informationThinAir Server Platform White Paper June 2000
ThinAir Server Platform White Paper June 2000 ThinAirApps, Inc. 1999, 2000. All Rights Reserved Copyright Copyright 1999, 2000 ThinAirApps, Inc. all rights reserved. Neither this publication nor any part
More information21.1 FTP. Connections
21.1 FTP File Transfer Protocol (FTP) is the standard mechanism provided by TCP/IP for copying a file from one host to another. Although transferring files from one system to another seems simple and straightforward,
More informationNetworked Applications: Sockets. Goals of Todayʼs Lecture. End System: Computer on the ʻNet. Client-server paradigm End systems Clients and servers
Networked Applications: Sockets CS 375: Computer Networks Spring 2009 Thomas Bressoud 1 Goals of Todayʼs Lecture Client-server paradigm End systems Clients and servers Sockets and Network Programming Socket
More informationLayered Architecture
1 Layered Architecture Required reading: Kurose 1.7 CSE 4213, Fall 2006 Instructor: N. Vlajic Protocols and Standards 2 Entity any device capable of sending and receiving information over the Internet
More informationInternational Jmynal of Intellectual Advancements and Research in Engineering Computations
www.ijiarec.com ISSN:2348-2079 DEC-2015 International Jmynal of Intellectual Advancements and Research in Engineering Computations VIRTUALIZATION OF DISTIRIBUTED DATABASES USING XML 1 M.Ramu ABSTRACT Objective
More informationDesign and Implementation of a Service Discovery Architecture in Pervasive Systems
Design and Implementation of a Service Discovery Architecture in Pervasive Systems Vincenzo Suraci 1, Tiziano Inzerilli 2, Silvano Mignanti 3, University of Rome La Sapienza, D.I.S. 1 vincenzo.suraci@dis.uniroma1.it
More informationSOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES
SOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES Introduction to Information Retrieval CS 150 Donald J. Patterson This content based on the paper located here: http://dx.doi.org/10.1007/s10618-008-0118-x
More informationPerformance evaluation of searching using various indexing techniques in Lucene with Relational Databases
Performance evaluation of searching using various indexing techniques in Lucene with Relational Databases Chetan Khilosiya 1, H. P. Channe 2 Abstract The Organizations commonly use relational databases
More informationTOSHIBA GA Printing from Windows
TOSHIBA GA-1211 Printing from Windows 2008 Electronics for Imaging, Inc. The information in this publication is covered under Legal Notices for this product. 45075925 24 October 2008 CONTENTS 3 CONTENTS
More informationEclipse as a Web 2.0 Application Position Paper
Eclipse Summit Europe Server-side Eclipse 11 12 October 2006 Eclipse as a Web 2.0 Application Position Paper Automatic Web 2.0 - enabling of any RCP-application with Xplosion Introduction If todays Web
More informationCSE Lecture 24 Review and Recap. High-Level Overview of the Course!! L1-7: I. Programming Basics!
CSE 1710 Lecture 24 Review and Recap High-Level Overview of the Course L1-7: I. Programming Basics Ch1, 2, 5, sec 3.2.4 (JBA) L8, L9: II. Working with Images APIs + Classes L10: Midterm L11-14: III. Object
More informationCHAPTER 7 WEB SERVERS AND WEB BROWSERS
CHAPTER 7 WEB SERVERS AND WEB BROWSERS Browser INTRODUCTION A web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information
More informationIndexing HTML files in Solr 1
Indexing HTML files in Solr 1 This tutorial explains how to index html files in Solr using the built-in post tool, which leverages Apache Tika and auto extracts content from html files. You should have
More informationWeb-based File Upload and Download System
COMP4905 Honor Project Web-based File Upload and Download System Author: Yongmei Liu Student number: 100292721 Supervisor: Dr. Tony White 1 Abstract This project gives solutions of how to upload documents
More informationEMC Ionix Network Configuration Manager Version 4.1.1
EMC Ionix Network Configuration Manager Version 4.1.1 RSA Token Service Installation Guide 300-013-088 REVA01 EMC Corporation Corporate Headquarters: Hopkinton, MA 01748-9103 1-508-435-1000 www.emc.com
More informationContents. A Recommended Reading...21 Index iii
Contents Installing SAS Information Retrieval Studio...1 1.1 About This Book... 1 1.1.1 Audience... 1 1.1.2 Prerequisites... 1 1.1.3 Typographical Conventions... 2 1.2 Introduction to SAS Information Retrieval
More informationSoftware installation and configuration IEC-line series. update:
Software installation and configuration IEC-line series update: 16-06-2017 IEC-line by OVERDIGIT overdigit.com Table of contents 1. Installing the software... 3 1.1. Installing CoDeSys... 4 1.2. Installing
More informationCHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS
82 CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS In recent years, everybody is in thirst of getting information from the internet. Search engines are used to fulfill the need of them. Even though the
More information: Semantic Web (2013 Fall)
03-60-569: Web (2013 Fall) University of Windsor September 4, 2013 Table of contents 1 2 3 4 5 Definition of the Web The World Wide Web is a system of interlinked hypertext documents accessed via the Internet
More informationCN1047 INTRODUCTION TO COMPUTER NETWORKING CHAPTER 6 OSI MODEL TRANSPORT LAYER
CN1047 INTRODUCTION TO COMPUTER NETWORKING CHAPTER 6 OSI MODEL TRANSPORT LAYER Transport Layer The Transport layer ensures the reliable arrival of messages and provides error checking mechanisms and data
More informationInstructor: Stefan Savev
LECTURE 2 What is indexing? Indexing is the process of extracting features (such as word counts) from the documents (in other words: preprocessing the documents). The process ends with putting the information
More informationWeb Ontology for Software Package Management
Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 2. pp. 331 338. Web Ontology for Software Package Management Péter Jeszenszky Debreceni
More informationARCHITECTURE AND IMPLEMENTATION OF A NEW USER INTERFACE FOR INTERNET SEARCH ENGINES
ARCHITECTURE AND IMPLEMENTATION OF A NEW USER INTERFACE FOR INTERNET SEARCH ENGINES Fidel Cacheda, Alberto Pan, Lucía Ardao, Angel Viña Department of Tecnoloxías da Información e as Comunicacións, Facultad
More informationElixir Domain Configuration and Administration
Elixir Domain Configuration and Administration Release 3.5.0 Elixir Technology Pte Ltd Elixir Domain Configuration and Administration: Release 3.5.0 Elixir Technology Pte Ltd Published 2014 Copyright 2014
More informationCOMMUNITIES USER MANUAL. Satori Team
COMMUNITIES USER MANUAL Satori Team Table of Contents Communities... 2 1. Introduction... 4 2. Roles and privileges.... 5 3. Process flow.... 6 4. Description... 8 a) Community page.... 9 b) Creating community
More informationLAB 7: Search engine: Apache Nutch + Solr + Lucene
LAB 7: Search engine: Apache Nutch + Solr + Lucene Apache Nutch Apache Lucene Apache Solr Crawler + indexer (mainly crawler) indexer + searcher indexer + searcher Lucene vs. Solr? Lucene = library, more
More informationCoveo Platform 7.0. Oracle UCM Connector Guide
Coveo Platform 7.0 Oracle UCM Connector Guide Notice The content in this document represents the current view of Coveo as of the date of publication. Because Coveo continually responds to changing market
More informationCOMMUNICATION PROTOCOLS
COMMUNICATION PROTOCOLS Index Chapter 1. Introduction Chapter 2. Software components message exchange JMS and Tibco Rendezvous Chapter 3. Communication over the Internet Simple Object Access Protocol (SOAP)
More informationThe Google File System
October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single
More informationIntroduction to TCP/IP
Introduction to TCP/IP Properties and characteristics of TCP/IP IPv4 IPv6 Public vs private vs APIPA/link local Static vs dynamic Client-side DNS settings Client-side DHCP Subnet mask vs CIDR Gateway TCP/IP
More informationX100 ARCHITECTURE REFERENCES:
UNION SYSTEMS GLOBAL This guide is designed to provide you with an highlevel overview of some of the key points of the Oracle Fusion Middleware Forms Services architecture, a component of the Oracle Fusion
More information