AN EFFECTIVE SEARCH TOOL FOR LOCATING RESOURCE IN NETWORK

Size: px
Start display at page:

Download "AN EFFECTIVE SEARCH TOOL FOR LOCATING RESOURCE IN NETWORK"

Transcription

1 AN EFFECTIVE SEARCH TOOL FOR LOCATING RESOURCE IN NETWORK G.Mohammad Rafi 1, K.Sreenivasulu 2, K.Anjaneyulu 3 1. M.Tech(CSE Pursuing), Madina Engineering College,Kadapa,AP 2. Professor & HOD Dept.Of CSE, Madina Engineering College,Kadapa,AP 3. Asst.Prof & HOD,Dept. of Computer Applications, Sri Sai Institute Of Technology &Science, Rayachoty,Kadapa,AP Abstract: A good search engine ensures that users find what they're looking for, first time, regardless of the format or location of the information. This means that a wide variety of information can be effectively dispersed and made available to staff, without the need for complex navigation systems or filing conventions. Most intranets evolve over time, and search functionality need not be a daunting task. A search tool can be implemented quickly, and then refined as the intranet grows and the needs of the organization change. Importantly, a flexible search engine that is costeffective and expands to suit growing requirements can be a much better. It is important to recognize that every intranet is different, with its own objectives, requirements and environment. INTRODUCTION The desktop search is a tool using certain keywords to search various data sources like the web browser histories, the archives, the text documents or the metadata of mp3 files from the local disk storage. After the search tool is installed, desktop search engine used by the search tool will parse all files stored on local disks and maintain an index database for achieving reasonable performance especially when the local storages have kept data of several hundred gigabytes or even terabytes. The key techniques used by the desktop search engine are the ability to parse all files in various formats and the ability to do full text search to the parsed contents. Search all contents in the intranet: It is convenient to find information that is hidden in the myriad files in the local hard drives with the help with desktop search tool. However, for some enterprise applications, sometimes the useful information like meeting minutes or schedule of certain works may be 36

2 concealed in other person's computer in metadata about documents. Lucene lets the intranet. us add searching capabilities to applications. Some Salient features of Intranet based search engine are: It is FAST For fast search it creates indexes of the shared folders. Creation of indexes considerably reduces the search time for a query. The application is DISTRIBUTED Documents to be searched may not reside on your local machine. Being distributed it can make queries to other machines in the network. The application uses P2P sharing No special requirements for server just install and get connected. Each peer serves others. 1. LUCENE PACKAGE: Lucene is a high performance, scalable Information Retrieval (IR) library. Information retrieval refers to the process of searching for documents, information within documents or Fig: Typical components of search application, the shaded components show which parts Lucene handles. Lucene provides a simple yet powerful core API that requires minimal understanding of full-text indexing and searching. Because Lucene is a Java library, it doesn t make assumptions about what it indexes and searches, which gives it an advantage over a number of other search applications. Its design is compact and simple, allowing Lucene to be easily embedded into desktop applications. 37

3 Lucene can index and make searchable documents where they any data that can extract text from. As appear. The quality of a seen in the following figure, Lucene search is typically doesn t care about the source of the data, its format, or even its language, as long as we can derive text from it. This means described using precision and recall metrics. Recall measures how well the we can index and search data stored in search system finds files: web pages on remote web servers, documents stored in local file systems, relevant whereas documents, precision simple text files, Microsoft Word measures how well the documents, XML or HTML or PDF system filters out the files, or any other format from which irrelevant documents. you can extract textual information. Search Query: This is the 2. IMPLEMENTATION: process of consulting the search index and retrieving We have used Lucene Package for the documents matching indexing the content of files. the Query, sorted in the 3.1 Indexing and searching: requested sort order. Index document: During the indexing step, the document is added to the index. Lucene provides everything necessary for this step, and works quite a bit of magic under a surprisingly simple API. Components for searching: Searching is the process of looking up words in an index to find There are three common models of search: Pure boolean model -- Documents either match or does not match the provided query, and no scoring is done. In this model there are no relevance scores associated with matching documents; a query simply identifies a subset of the overall corpus as matching the query. 38

4 Vector space model -- Both 3.2 The core indexing classes: queries and documents are modeled as vectors in a very high dimensional space, where In our Indexer class, we need the following classes to perform each unique term is a the simplest indexing dimension. Relevance, or procedure: similarity, between a query and a document is computed by a IndexWriter vector distance measure Directory between these vectors. Analyzer Probabilistic model -- Computes the probability that a Document document is a good match to a Field query using a full probabilistic approach. Lucene s approach combines the vector space and pure Boolean models. Lucene returns documents which we next must render in a very consumable way for users. Render Results: Once we have the raw set of documents that match the query, sorted in the right order, we then render them to the user in an intuitive, consumable manner. Fig: Classes used when indexing documents with Lucene IndexWriter: IndexWriter is the central component of the indexing process. This class creates a new index or opens an existing one, and then adds, removes or updates documents in the index. 39

5 Directory: The Directory class represents the location of a Lucene index. It s an abstract class that allows its subclasses to store the corresponding value, and a bunch of options, described in Section, that control precisely how Lucene will index the Field s value index as they see fit. In our Indexer example, we created an 3.3 The core searching classes: FSDirectory, which stores real files The basic search interface that in a directory in the filesystem, and Lucene provides is as passed it to IndexWriter s straightforward as the one for constructor. Analyzer: Before text is indexed, it s passed through an Analyzer. The indexing. Only a few classes are needed to perform the basic search operation: Analyzer, specified in the IndexSearcher IndexWriter constructor, is in charge of extracting those tokens out of text Term that should be indexed, and Query eliminating the rest. TermQuery Document: A Document represents a collection of fields. It can be like TopDocs virtual document a chunk of data, IndexSearcher: IndexSearcher is to such as a web page, an searching what IndexWriter is to message, or a text files that you indexing: the central link to the want to make retrievable at a later index that exposes several search time. Fields of a document represent methods. the document or meta-data associated with that document. Term: A Term is the basic unit for searching. Similar to the Field Field: Each Document in an index object, it consists of a pair of string contains one or more named fields, elements: the name of the field and embodied in a class called Field. the word (text value) of that field. Each field has a name and 40

6 Query: Lucene comes with a Lucene to translate a PDF number of concrete Query into a Lucene document. subclasses. The most basic Lucene Query: TermQuery. Other Query XPDF is an open source types are BooleanQuery, tool that is licensed under PhraseQuery, PrefixQuery, the GPL. It's not a Java PhrasePrefixQuery, RangeQuery, tool, but there is a utility FilteredQuery, and SpanQuery. called pdftotext that can translate PDF files into text TermQuery: TermQuery is the most basic type of query supported files on most platforms from the command line. by Lucene, and it s one of the primitive query types. It s used for matching documents that contain fields with specific values. JPedal is a Java API for extracting text and images from PDF documents. TopDocs: The TopDocs class is a simple container of pointers to the top N ranked search results documents that match a given query Simple Text Extractor Library for use with PDF documents. Relies on PDFBox. In order to index PDF documents we need to first parse them to extract text that we want to index from them. Here are some of the PDF parsers: PDFBox is a Java API from Ben Litchfield that will let us access the contents of a PDF document. It comes with integration classes for Fig : Indexing with Lucene breaks down into three main operations: extracting text from source documents, analyzing it and saving it to the index. 41

7 During indexing, the text is first extracted from the original content and its task. It typically uses port 20 for data transfer and port 21 to listen to used to create an instance of Document; commands. Though having data containing Field instances hold the content. The text in the fields is then analyzed, to produce a stream of tokens. Finally, those tokens are added to the index in a segmented architecture. transferred over port 20 is not always the case as it can also be a different port as well. That is where the confusing part for many people comes into play. There are two modes to FTP, namely active and passive mode. These two modes are 3.4 FTP Protocol: File Transfer Protocol (FTP) is a standard network protocol used to copy a file from one host to another over a initiated by the FTP client, and then acted upon by the FTP server. Some of the terminology regarding File Transfer Protocol TCP/IP-based network, such as the Internet. FTP is built on client-server 1. Control connection: The architecture and utilizes separate control and data connections between the client communication path between the USER-PI and SERVER-PI and server applications which solve the for the exchange of problem of different end host commands and replies. This configurations (i.e. Operating System, file names). FTP is used with user-based connection follows the Telnet Protocol. password authentication or with anonymous user access. 2. Data connection: A full duplex connection over which data is FTP itself uses the TCP transport transferred, in a specified protocol exclusively, or in other words, it mode and type. The data never uses UDP for its transport needs. Typically an application layer protocol transferred may be a part of a file, an entire file or a number will use one or the other. One notable of files. The path may be exception to that is DNS or Domain between a server-dtp and a Name System. FTP also is odd in the user-dtp, or between two fact that it uses two ports to accomplish server-dtps. 42

8 3. Data port: The passive data The query entered will be fired transfer process "listens" on the data port for a connection from the active transfer process in simultaneously on both local machine and remote machine and the results are displayed obtained from both local and order to open the data remote machine. By selecting a connection. particular file we can retrieve a file from the remote machine using FTP Protocol. 4. DTP: The data transfer process remote machine. By selecting a establishes and manages the particular file we can retrieve a file from data connection. The DTP can the remote machine using FTP Protocol. be passive or active. 5. End-of-Line: The end-of-line sequence defines the separation of printing lines. The sequence is Carriage Return, followed by Line Feed. 6. EOF: The end-of-file condition that defines the end of a file being transferred. 7. EOR: The end-of-record condition that defines the end of a record being transferred. 8. Error recovery: A procedure that allows a user to recover from certain errors such as failure of either host system or transfer process. In FTP, error recovery may involve restarting a file transfer at a given checkpoint. 43

9 3.5 Project Screenshots Fig: Indexing Files Fig : Settings for Indexing Fig : Results of search query moblin Fig : Search Settings 44

10 4 CONCLUSION: The desktop search can be sped up with the help of other computers in the intranet. Rather than searching the files based on name of the files we have developed an application that searches based on the content of the files and most importantly indexes the contents of the file using Lucene package rather than the files itself. We made sure that simplicity is maintained for user. The semantics of local and remote search procedures are properly abstracted and user feels that he is getting the results from remote machine with the same semantics as that of a local query. As this is a distributed application it can be further enhanced to search on different subnets in an organization by incorporating the range of I.P Addresses. REFERENCES: [1] Wei Lun Huang,Tzao Lin Lee, Chiao Szu Liao,Dept. of Comput. Sci. & Inf. Eng., Nat. Taiwan Univ, Taipei, "Desktop search in the intranet with integrated desktop search engines"computer Systems Architecture Conference, ACSAC th Asia-Pacific Aug page(s): 1 4 2/security-special-report-who-sees-your-data.htm, Apr [3] Ferragina, P., and Grossi, R. Fast incremental text indexing. Proc. 6th ACM SIAM Symposium on Discrete Algorithms (SODA), pages , [4] Erik Hatcher, Otis Gospodnetic, Michael McCandless; Lucene In Action, Second Edition, Manning Publication. [5] Apache-Luceneweb site, [6] Google desktop web site, March [7] Beagle desktop search engine web site, Apr First Author 1.G.MohammadRafi Pursuing M.Tech(CSE), MADINA Engineering College, Kadapa, Kadapa(Dt), Andhra Pradesh. I am interest to research the Network & Wireless Network areas. Second Author 2. K.Sreenivasulu HOD Dept.Of CSE, Madina Engineering College,Kadapa,AP Third Author 3.K.Anjaneyulu Asst.Prof & HOD,Dept. of Computer Applications, Sri Sai Institute Of Technology &Science, Rayachoty,Kadapa,AP [2] Computerweekly web site, 006/04/25/

BEST SEARCH AND RETRIEVAL PERFORMANCE EVALUATION WITH LUCENE INDEXING

BEST SEARCH AND RETRIEVAL PERFORMANCE EVALUATION WITH LUCENE INDEXING Journal homepage: www.mjret.in ISSN:2348-6953 BEST SEARCH AND RETRIEVAL PERFORMANCE EVALUATION WITH LUCENE INDEXING Sonam Baban Borhade, Prof. Pankaj Agarkar Department of Computer Engineering Dr. D.Y.Patil

More information

CSMC 412. Computer Networks Prof. Ashok K Agrawala Ashok Agrawala Set 2. September 15 CMSC417 Set 2 1

CSMC 412. Computer Networks Prof. Ashok K Agrawala Ashok Agrawala Set 2. September 15 CMSC417 Set 2 1 CSMC 412 Computer Networks Prof. Ashok K Agrawala 2015 Ashok Agrawala Set 2 September 15 CMSC417 Set 2 1 Contents Client-server paradigm End systems Clients and servers Sockets Socket abstraction Socket

More information

LucidWorks: Searching with curl October 1, 2012

LucidWorks: Searching with curl October 1, 2012 LucidWorks: Searching with curl October 1, 2012 1. Module name: LucidWorks: Searching with curl 2. Scope: Utilizing curl and the Query admin to search documents 3. Learning objectives Students will be

More information

Querying a Lucene Index

Querying a Lucene Index Querying a Lucene Index Queries and Scorers and Weights, oh my! Alan Woodward - alan@flax.co.uk - @romseygeek We build, tune and support fast, accurate and highly scalable search, analytics and Big Data

More information

Technical Overview. Access control lists define the users, groups, and roles that can access content as well as the operations that can be performed.

Technical Overview. Access control lists define the users, groups, and roles that can access content as well as the operations that can be performed. Technical Overview Technical Overview Standards based Architecture Scalable Secure Entirely Web Based Browser Independent Document Format independent LDAP integration Distributed Architecture Multiple

More information

Project Report. Project Title: Evaluation of Standard Information retrieval system related to specific queries

Project Report. Project Title: Evaluation of Standard Information retrieval system related to specific queries Project Report Project Title: Evaluation of Standard Information retrieval system related to specific queries Submitted by: Sindhu Hosamane Thippeswamy Information and Media Technologies Matriculation

More information

Computer Networks Prof. Ashok K. Agrawala

Computer Networks Prof. Ashok K. Agrawala CMSC417 Computer Networks Prof. Ashok K. Agrawala 2018Ashok Agrawala September 6, 2018 Fall 2018 Sept 6, 2018 1 Overview Client-server paradigm End systems Clients and servers Sockets Socket abstraction

More information

Searching and Analyzing Qualitative Data on Personal Computer

Searching and Analyzing Qualitative Data on Personal Computer IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 10, Issue 2 (Mar. - Apr. 2013), PP 41-45 Searching and Analyzing Qualitative Data on Personal Computer Mohit

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

Design and Implementation of Full Text Search Engine Based on Lucene Na-na ZHANG 1,a *, Yi-song WANG 1 and Kun ZHU 1

Design and Implementation of Full Text Search Engine Based on Lucene Na-na ZHANG 1,a *, Yi-song WANG 1 and Kun ZHU 1 2017 2 nd International Conference on Computer Science and Technology (CST 2017) ISBN: 978-1-60595-461-5 Design and Implementation of Full Text Search Engine Based on Lucene Na-na ZHANG 1,a *, Yi-song

More information

OBJECT ORIENTED PROGRAMMING

OBJECT ORIENTED PROGRAMMING 1 OBJECT ORIENTED PROGRAMMING Lecture 14 Networking Basics Outline 2 Networking Basics Socket IP Address DNS Client/Server Networking Class & Interface URL Demonstrating URL Networking 3 Java is practically

More information

Chapter 2. Architecture of a Search Engine

Chapter 2. Architecture of a Search Engine Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them

More information

Apache Lucene - Scoring

Apache Lucene - Scoring Grant Ingersoll Table of contents 1 Introduction...2 2 Scoring... 2 2.1 Fields and Documents... 2 2.2 Score Boosting...3 2.3 Understanding the Scoring Formula...3 2.4 The Big Picture...3 2.5 Query Classes...

More information

APPLICATION LAYER APPLICATION LAYER : DNS, HTTP, , SMTP, Telnet, FTP, Security-PGP-SSH.

APPLICATION LAYER APPLICATION LAYER : DNS, HTTP,  , SMTP, Telnet, FTP, Security-PGP-SSH. APPLICATION LAYER : DNS, HTTP, E-mail, SMTP, Telnet, FTP, Security-PGP-SSH. To identify an entity, the Internet used the IP address, which uniquely identifies the connection of a host to the Internet.

More information

Research and implementation of search engine based on Lucene Wan Pu, Wang Lisha

Research and implementation of search engine based on Lucene Wan Pu, Wang Lisha 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) Research and implementation of search engine based on Lucene Wan Pu, Wang Lisha Physics Institute,

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lucene Tutorial Chris Manning and Pandu Nayak Open source IR systems Widely used academic systems Terrier (Java, U. Glasgow) http://terrier.org Indri/Galago/Lemur

More information

Development of Search Engines using Lucene: An Experience

Development of Search Engines using Lucene: An Experience Available online at www.sciencedirect.com Procedia Social and Behavioral Sciences 18 (2011) 282 286 Kongres Pengajaran dan Pembelajaran UKM, 2010 Development of Search Engines using Lucene: An Experience

More information

Fiery PRO 80 /S450 65C-KM Color Server. Printing from Windows

Fiery PRO 80 /S450 65C-KM Color Server. Printing from Windows Fiery PRO 80 /S450 65C-KM Color Server Printing from Windows 2007 Electronics for Imaging, Inc. The information in this publication is covered under Legal Notices for this product. 45067315 01 November

More information

A short introduction to the development and evaluation of Indexing systems

A short introduction to the development and evaluation of Indexing systems A short introduction to the development and evaluation of Indexing systems Danilo Croce croce@info.uniroma2.it Master of Big Data in Business SMARS LAB 3 June 2016 Outline An introduction to Lucene Main

More information

extensible Text Framework (XTF): Building a Digital Publishing Framework

extensible Text Framework (XTF): Building a Digital Publishing Framework extensible Text Framework (XTF): Building a Digital Publishing Framework California Digital Library Kirk Hastings Martin Haye XTF Topics Digital publishing at CDL What XTF is (and isn't) Design and Features

More information

Port Forwarding Setup (NB7)

Port Forwarding Setup (NB7) Port Forwarding Setup (NB7) Port Forwarding Port forwarding enables programs or devices running on your LAN to communicate with the internet as if they were directly connected. This is most commonly used

More information

Web Mechanisms. Draft: 2/23/13 6:54 PM 2013 Christopher Vickery

Web Mechanisms. Draft: 2/23/13 6:54 PM 2013 Christopher Vickery Web Mechanisms Draft: 2/23/13 6:54 PM 2013 Christopher Vickery Introduction While it is perfectly possible to create web sites that work without knowing any of their underlying mechanisms, web developers

More information

Coveo Platform 7.0. Yammer Connector Guide

Coveo Platform 7.0. Yammer Connector Guide Coveo Platform 7.0 Yammer Connector Guide Notice The content in this document represents the current view of Coveo as of the date of publication. Because Coveo continually responds to changing market conditions,

More information

Distributed Systems Fall 2009 Final

Distributed Systems Fall 2009 Final 15-440 Distributed Systems Fall 2009 Final Name: Andrew: ID November 29, 2010 Please write your name and Andrew ID above before starting this exam. This exam has 10 pages, including this title page. Please

More information

The Client Server Model and Software Design

The Client Server Model and Software Design The Client Server Model and Software Design Prof. Chuan-Ming Liu Computer Science and Information Engineering National Taipei University of Technology Taipei, TAIWAN MCSE Lab, NTUT, TAIWAN 1 Introduction

More information

Indexing and Searching Document Collections using Lucene

Indexing and Searching Document Collections using Lucene University of New Orleans ScholarWorks@UNO University of New Orleans Theses and Dissertations Dissertations and Theses 5-18-2007 Indexing and Searching Document Collections using Lucene Sridevi Addagada

More information

EPL660: Information Retrieval and Search Engines Lab 2

EPL660: Information Retrieval and Search Engines Lab 2 EPL660: Information Retrieval and Search Engines Lab 2 Παύλος Αντωνίου Γραφείο: B109, ΘΕΕ01 University of Cyprus Department of Computer Science Apache Lucene Extremely rich and powerful full-text search

More information

Fiery EXP50 Color Server. Printing from Windows

Fiery EXP50 Color Server. Printing from Windows Fiery EXP50 Color Server Printing from Windows 2006 Electronics for Imaging, Inc. The information in this publication is covered under Legal Notices for this product. 45055357 24 March 2006 CONTENTS 3

More information

Search Engines Exercise 5: Querying. Dustin Lange & Saeedeh Momtazi 9 June 2011

Search Engines Exercise 5: Querying. Dustin Lange & Saeedeh Momtazi 9 June 2011 Search Engines Exercise 5: Querying Dustin Lange & Saeedeh Momtazi 9 June 2011 Task 1: Indexing with Lucene We want to build a small search engine for movies Index and query the titles of the 100 best

More information

IMPLEMENTATION OF INFORMATION RETRIEVAL (IR) ALGORITHM FOR CLOUD COMPUTING: A COMPARATIVE STUDY BETWEEN WITH AND WITHOUT MAPREDUCE MECHANISM *

IMPLEMENTATION OF INFORMATION RETRIEVAL (IR) ALGORITHM FOR CLOUD COMPUTING: A COMPARATIVE STUDY BETWEEN WITH AND WITHOUT MAPREDUCE MECHANISM * Journal of Contemporary Issues in Business Research ISSN 2305-8277 (Online), 2012, Vol. 1, No. 2, 42-56. Copyright of the Academic Journals JCIBR All rights reserved. IMPLEMENTATION OF INFORMATION RETRIEVAL

More information

Distributed Filesystem

Distributed Filesystem Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the

More information

VK Multimedia Information Systems

VK Multimedia Information Systems VK Multimedia Information Systems Mathias Lux, mlux@itec.uni-klu.ac.at This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Results Exercise 01 Exercise 02 Retrieval

More information

CHAPTER 22 DISTRIBUTED APPLICATIONS ANSWERS TO QUESTIONS ANSWERS TO PROBLEMS

CHAPTER 22 DISTRIBUTED APPLICATIONS ANSWERS TO QUESTIONS ANSWERS TO PROBLEMS CHAPTER 22 DISTRIBUTED APPLICATIONS ANSWERS TO QUESTIONS 22.1 RFC 821 defines SMTP which is the protocol for exchanging email messages. RFC 822 describes the format of those messages. 22.2 The Simple Mail

More information

Lucene Java 2.9: Numeric Search, Per-Segment Search, Near-Real-Time Search, and the new TokenStream API

Lucene Java 2.9: Numeric Search, Per-Segment Search, Near-Real-Time Search, and the new TokenStream API Lucene Java 2.9: Numeric Search, Per-Segment Search, Near-Real-Time Search, and the new TokenStream API Uwe Schindler Lucene Java Committer uschindler@apache.org PANGAEA - Publishing Network for Geoscientific

More information

Version 3.3 System Administrator Guide

Version 3.3 System Administrator Guide Version 3.3 System Administrator Guide This document provides information Ensemble Video System Administrators can use to design and implement an appropriate Ensemble Video organizational framework, manage

More information

Networked Applications: Sockets. End System: Computer on the Net

Networked Applications: Sockets. End System: Computer on the Net Networked Applications: Sockets Topics Programmer s view of the Internet Sockets interface End System: Computer on the Net Internet Also known as a host 2 Page 1 Clients and Servers Client program Running

More information

Application Layer Introduction; HTTP; FTP

Application Layer Introduction; HTTP; FTP Application Layer Introduction; HTTP; FTP Tom Kelliher, CS 325 Feb. 4, 2011 1 Administrivia Announcements Assignment Read 2.4 2.6. From Last Time Packet-switched network characteristics; protocol layers

More information

Efficient Indexing and Searching Framework for Unstructured Data

Efficient Indexing and Searching Framework for Unstructured Data Efficient Indexing and Searching Framework for Unstructured Data Kyar Nyo Aye, Ni Lar Thein University of Computer Studies, Yangon kyarnyoaye@gmail.com, nilarthein@gmail.com ABSTRACT The proliferation

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK SQL EDITOR FOR XML DATABASE MISS. ANUPAMA V. ZAKARDE 1, DR. H. R. DESHMUKH 2, A.

More information

CMPE 151: Network Administration. Servers

CMPE 151: Network Administration. Servers CMPE 151: Network Administration Servers Announcements Unix shell+emacs tutorial. Basic Servers Telnet/Finger FTP Web SSH NNTP Let s look at the underlying protocols. Client-Server Model Request Response

More information

Concept Questions Demonstrate your knowledge of these concepts by answering the following questions in the space provided.

Concept Questions Demonstrate your knowledge of these concepts by answering the following questions in the space provided. 113 Chapter 9 TCP/IP Transport and Application Layer Services that are located in the transport layer enable users to segment several upper-layer applications onto the same transport layer data stream.

More information

How the Web Works. Chapter 1. Modified by Marissa Schmidt Pearson

How the Web Works. Chapter 1. Modified by Marissa Schmidt Pearson How the Web Works Chapter 1 Modified by Marissa Schmidt 2015 Pearson Fundamentals ofhttp://www.funwebdev.com Web Development Objectives 1 Definitions and History 2 Internet Protocols 3 Client-Server Model

More information

Borland Search 2009 Administration Guide. StarTeam and CaliberRM

Borland Search 2009 Administration Guide. StarTeam and CaliberRM Borland Search 2009 Administration Guide StarTeam and CaliberRM Borland Software Corporation 8310 N Capital of Texas Hwy Bldg 2, Ste 100 Austin, Texas 78731 http://www.borland.com Borland Software Corporation

More information

CLOUD-SCALE FILE SYSTEMS

CLOUD-SCALE FILE SYSTEMS Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients

More information

Objectives. Connecting with Computer Science 2

Objectives. Connecting with Computer Science 2 Objectives Learn what the Internet really is Become familiar with the architecture of the Internet Become familiar with Internet-related protocols Understand how the TCP/IP protocols relate to the Internet

More information

Ranked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points?

Ranked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points? Ranked Retrieval One option is to average the precision scores at discrete Precision 100% 0% More junk 100% Everything points on the ROC curve But which points? Recall We want to evaluate the system, not

More information

Information Retrieval

Information Retrieval Introduction Information Retrieval Information retrieval is a field concerned with the structure, analysis, organization, storage, searching and retrieval of information Gerard Salton, 1968 J. Pei: Information

More information

Introduc)on to Lucene. Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata

Introduc)on to Lucene. Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata Introduc)on to Lucene Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata Open source search engines Academic Terrier (Java, University of Glasgow) Indri, Lemur (C++,

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable References Bigtable: A Distributed Storage System for Structured Data. Fay Chang et. al. OSDI

More information

Quick Start Guide TABLE OF CONTENTS COMMCELL ARCHITECTURE OVERVIEW COMMCELL SOFTWARE DEPLOYMENT INSTALL THE COMMSERVE SOFTWARE

Quick Start Guide TABLE OF CONTENTS COMMCELL ARCHITECTURE OVERVIEW COMMCELL SOFTWARE DEPLOYMENT INSTALL THE COMMSERVE SOFTWARE Page 1 of 35 Quick Start Guide TABLE OF CONTENTS This Quick Start Guide is designed to help you install and use a CommCell configuration to which you can later add other components. COMMCELL ARCHITECTURE

More information

INTRODUCING THE UNIFIED E-BOOK FORMAT AND A HYBRID LIBRARY 2.0 APPLICATION MODEL BASED ON IT. 1. Introduction

INTRODUCING THE UNIFIED E-BOOK FORMAT AND A HYBRID LIBRARY 2.0 APPLICATION MODEL BASED ON IT. 1. Introduction Преглед НЦД 14 (2009), 43 52 Teo Eterović, Nedim Šrndić INTRODUCING THE UNIFIED E-BOOK FORMAT AND A HYBRID LIBRARY 2.0 APPLICATION MODEL BASED ON IT Abstract: We introduce Unified e-book Format (UeBF)

More information

Open Source Search. Andreas Pesenhofer. max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria

Open Source Search. Andreas Pesenhofer. max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria Open Source Search Andreas Pesenhofer max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria max.recall information systems max.recall is a software and consulting company enabling

More information

WHITE PAPER. Good Mobile Intranet Technical Overview

WHITE PAPER. Good Mobile Intranet Technical Overview WHITE PAPER Good Mobile Intranet CONTENTS 1 Introduction 4 Security Infrastructure 6 Push 7 Transformations 8 Differential Data 8 Good Mobile Intranet Server Management Introduction Good Mobile Intranet

More information

Comprehensive Guide to Evaluating Event Stream Processing Engines

Comprehensive Guide to Evaluating Event Stream Processing Engines Comprehensive Guide to Evaluating Event Stream Processing Engines i Copyright 2006 Coral8, Inc. All rights reserved worldwide. Worldwide Headquarters: Coral8, Inc. 82 Pioneer Way, Suite 106 Mountain View,

More information

TFTP and FTP Basics BUPT/QMUL

TFTP and FTP Basics BUPT/QMUL TFTP and FTP Basics BUPT/QMUL 2017-04-24 Agenda File transfer and access TFTP (Trivial File Transfer Protocol) FTP (File Transfer Protocol) NFS (Network File System) 2 File Transfer And Access 3 File Transfer

More information

Chapter 2 Layer Architecture of Network Protocols. School of Info. Sci. & Eng. Shandong Univ.

Chapter 2 Layer Architecture of Network Protocols. School of Info. Sci. & Eng. Shandong Univ. Chapter 2 Architecture of Network Protocols School of Info. Sci. & Eng. Shandong Univ. Outline 2.1 Examples of ing 2.2 OSI Reference Model (Continued from last time) 2.3. TCP/IP Architecture 2.4 Berkeley

More information

Developing A Web-based User Interface for Semantic Information Retrieval

Developing A Web-based User Interface for Semantic Information Retrieval Developing A Web-based User Interface for Semantic Information Retrieval Daniel C. Berrios 1, Richard M. Keller 2 1 Research Institute for Advanced Computer Science, MS 269-2, NASA Ames Research Center,

More information

BI-09 Using Enterprise Guide Effectively Tom Miron, Systems Seminar Consultants, Madison, WI

BI-09 Using Enterprise Guide Effectively Tom Miron, Systems Seminar Consultants, Madison, WI Paper BI09-2012 BI-09 Using Enterprise Guide Effectively Tom Miron, Systems Seminar Consultants, Madison, WI ABSTRACT Enterprise Guide is not just a fancy program editor! EG offers a whole new window onto

More information

Fiery X3eTY2 65_55C-KM Color Server. Utilities

Fiery X3eTY2 65_55C-KM Color Server. Utilities Fiery X3eTY2 65_55C-KM Color Server Utilities 2008 Electronics for Imaging, Inc. The information in this publication is covered under Legal Notices for this product. 45072888 14 March 2008 CONTENTS 3 CONTENTS

More information

PROJECT 6: PINTOS FILE SYSTEM. CS124 Operating Systems Winter , Lecture 25

PROJECT 6: PINTOS FILE SYSTEM. CS124 Operating Systems Winter , Lecture 25 PROJECT 6: PINTOS FILE SYSTEM CS124 Operating Systems Winter 2015-2016, Lecture 25 2 Project 6: Pintos File System Last project is to improve the Pintos file system Note: Please ask before using late tokens

More information

Computer Network : Lecture Notes Nepal Engineering College Compiled by: Junior Professor: Daya Ram Budhathoki Nepal Engineering college, Changunarayan

Computer Network : Lecture Notes Nepal Engineering College Compiled by: Junior Professor: Daya Ram Budhathoki Nepal Engineering college, Changunarayan Computer Network : Lecture Notes Nepal Engineering College Compiled by: Junior Professor: Daya Ram Budhathoki Nepal Engineering college, Changunarayan Chapter3: OSI Reference Model: Network Software: Network

More information

Parametric Search using In-memory Auxiliary Index

Parametric Search using In-memory Auxiliary Index Parametric Search using In-memory Auxiliary Index Nishant Verman and Jaideep Ravela Stanford University, Stanford, CA {nishant, ravela}@stanford.edu Abstract In this paper we analyze the performance of

More information

EFI Fiery Utilities Technical Reference. Part Number: , Rev. 1.0

EFI Fiery Utilities Technical Reference. Part Number: , Rev. 1.0 EFI Fiery Utilities Technical Reference Part Number: 59308805, Rev. 1.0 15 March 2008 CONTENTS 3 CONTENTS INTRODUCTION 5 Terminology and conventions 6 About this document 7 About Help 7 Preparing for installation

More information

Transport layer Internet layer

Transport layer Internet layer Lecture 2-bis. 2 Transport Protocols As seen by the application developer point of view The primary (in principle unique) role of transport protocols!" # $ % "!"& Transport httpd 25 80 3211... My app 131.175.15.1

More information

SCAM Portfolio Scalability

SCAM Portfolio Scalability SCAM Portfolio Scalability Henrik Eriksson Per-Olof Andersson Uppsala Learning Lab 2005-04-18 1 Contents 1 Abstract 3 2 Suggested Improvements Summary 4 3 Abbreviations 5 4 The SCAM Portfolio System 6

More information

Information Management Platform Release Date Version Highlights compared to previous version

Information Management Platform Release Date Version Highlights compared to previous version For over 30 years ZyLAB has been working with professionals in the litigation, auditing, security and intelligence communities to develop the best solutions for investigating and managing large sets of

More information

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017 Hadoop File System 1 S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y Moving Computation is Cheaper than Moving Data Motivation: Big Data! What is BigData? - Google

More information

EFFECTIVE EFFICIENT BOOLEAN RETRIEVAL

EFFECTIVE EFFICIENT BOOLEAN RETRIEVAL EFFECTIVE EFFICIENT BOOLEAN RETRIEVAL J Naveen Kumar 1, Dr. M. Janga Reddy 2 1 jnaveenkumar6@gmail.com, 2 pricipalcmrit@gmail.com 1 M.Tech Student, Department of Computer Science, CMR Institute of Technology,

More information

ThinAir Server Platform White Paper June 2000

ThinAir Server Platform White Paper June 2000 ThinAir Server Platform White Paper June 2000 ThinAirApps, Inc. 1999, 2000. All Rights Reserved Copyright Copyright 1999, 2000 ThinAirApps, Inc. all rights reserved. Neither this publication nor any part

More information

21.1 FTP. Connections

21.1 FTP. Connections 21.1 FTP File Transfer Protocol (FTP) is the standard mechanism provided by TCP/IP for copying a file from one host to another. Although transferring files from one system to another seems simple and straightforward,

More information

Networked Applications: Sockets. Goals of Todayʼs Lecture. End System: Computer on the ʻNet. Client-server paradigm End systems Clients and servers

Networked Applications: Sockets. Goals of Todayʼs Lecture. End System: Computer on the ʻNet. Client-server paradigm End systems Clients and servers Networked Applications: Sockets CS 375: Computer Networks Spring 2009 Thomas Bressoud 1 Goals of Todayʼs Lecture Client-server paradigm End systems Clients and servers Sockets and Network Programming Socket

More information

Layered Architecture

Layered Architecture 1 Layered Architecture Required reading: Kurose 1.7 CSE 4213, Fall 2006 Instructor: N. Vlajic Protocols and Standards 2 Entity any device capable of sending and receiving information over the Internet

More information

International Jmynal of Intellectual Advancements and Research in Engineering Computations

International Jmynal of Intellectual Advancements and Research in Engineering Computations www.ijiarec.com ISSN:2348-2079 DEC-2015 International Jmynal of Intellectual Advancements and Research in Engineering Computations VIRTUALIZATION OF DISTIRIBUTED DATABASES USING XML 1 M.Ramu ABSTRACT Objective

More information

Design and Implementation of a Service Discovery Architecture in Pervasive Systems

Design and Implementation of a Service Discovery Architecture in Pervasive Systems Design and Implementation of a Service Discovery Architecture in Pervasive Systems Vincenzo Suraci 1, Tiziano Inzerilli 2, Silvano Mignanti 3, University of Rome La Sapienza, D.I.S. 1 vincenzo.suraci@dis.uniroma1.it

More information

SOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES

SOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES SOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES Introduction to Information Retrieval CS 150 Donald J. Patterson This content based on the paper located here: http://dx.doi.org/10.1007/s10618-008-0118-x

More information

Performance evaluation of searching using various indexing techniques in Lucene with Relational Databases

Performance evaluation of searching using various indexing techniques in Lucene with Relational Databases Performance evaluation of searching using various indexing techniques in Lucene with Relational Databases Chetan Khilosiya 1, H. P. Channe 2 Abstract The Organizations commonly use relational databases

More information

TOSHIBA GA Printing from Windows

TOSHIBA GA Printing from Windows TOSHIBA GA-1211 Printing from Windows 2008 Electronics for Imaging, Inc. The information in this publication is covered under Legal Notices for this product. 45075925 24 October 2008 CONTENTS 3 CONTENTS

More information

Eclipse as a Web 2.0 Application Position Paper

Eclipse as a Web 2.0 Application Position Paper Eclipse Summit Europe Server-side Eclipse 11 12 October 2006 Eclipse as a Web 2.0 Application Position Paper Automatic Web 2.0 - enabling of any RCP-application with Xplosion Introduction If todays Web

More information

CSE Lecture 24 Review and Recap. High-Level Overview of the Course!! L1-7: I. Programming Basics!

CSE Lecture 24 Review and Recap. High-Level Overview of the Course!! L1-7: I. Programming Basics! CSE 1710 Lecture 24 Review and Recap High-Level Overview of the Course L1-7: I. Programming Basics Ch1, 2, 5, sec 3.2.4 (JBA) L8, L9: II. Working with Images APIs + Classes L10: Midterm L11-14: III. Object

More information

CHAPTER 7 WEB SERVERS AND WEB BROWSERS

CHAPTER 7 WEB SERVERS AND WEB BROWSERS CHAPTER 7 WEB SERVERS AND WEB BROWSERS Browser INTRODUCTION A web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information

More information

Indexing HTML files in Solr 1

Indexing HTML files in Solr 1 Indexing HTML files in Solr 1 This tutorial explains how to index html files in Solr using the built-in post tool, which leverages Apache Tika and auto extracts content from html files. You should have

More information

Web-based File Upload and Download System

Web-based File Upload and Download System COMP4905 Honor Project Web-based File Upload and Download System Author: Yongmei Liu Student number: 100292721 Supervisor: Dr. Tony White 1 Abstract This project gives solutions of how to upload documents

More information

EMC Ionix Network Configuration Manager Version 4.1.1

EMC Ionix Network Configuration Manager Version 4.1.1 EMC Ionix Network Configuration Manager Version 4.1.1 RSA Token Service Installation Guide 300-013-088 REVA01 EMC Corporation Corporate Headquarters: Hopkinton, MA 01748-9103 1-508-435-1000 www.emc.com

More information

Contents. A Recommended Reading...21 Index iii

Contents. A Recommended Reading...21 Index iii Contents Installing SAS Information Retrieval Studio...1 1.1 About This Book... 1 1.1.1 Audience... 1 1.1.2 Prerequisites... 1 1.1.3 Typographical Conventions... 2 1.2 Introduction to SAS Information Retrieval

More information

Software installation and configuration IEC-line series. update:

Software installation and configuration IEC-line series. update: Software installation and configuration IEC-line series update: 16-06-2017 IEC-line by OVERDIGIT overdigit.com Table of contents 1. Installing the software... 3 1.1. Installing CoDeSys... 4 1.2. Installing

More information

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS 82 CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS In recent years, everybody is in thirst of getting information from the internet. Search engines are used to fulfill the need of them. Even though the

More information

: Semantic Web (2013 Fall)

: Semantic Web (2013 Fall) 03-60-569: Web (2013 Fall) University of Windsor September 4, 2013 Table of contents 1 2 3 4 5 Definition of the Web The World Wide Web is a system of interlinked hypertext documents accessed via the Internet

More information

CN1047 INTRODUCTION TO COMPUTER NETWORKING CHAPTER 6 OSI MODEL TRANSPORT LAYER

CN1047 INTRODUCTION TO COMPUTER NETWORKING CHAPTER 6 OSI MODEL TRANSPORT LAYER CN1047 INTRODUCTION TO COMPUTER NETWORKING CHAPTER 6 OSI MODEL TRANSPORT LAYER Transport Layer The Transport layer ensures the reliable arrival of messages and provides error checking mechanisms and data

More information

Instructor: Stefan Savev

Instructor: Stefan Savev LECTURE 2 What is indexing? Indexing is the process of extracting features (such as word counts) from the documents (in other words: preprocessing the documents). The process ends with putting the information

More information

Web Ontology for Software Package Management

Web Ontology for Software Package Management Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 2. pp. 331 338. Web Ontology for Software Package Management Péter Jeszenszky Debreceni

More information

ARCHITECTURE AND IMPLEMENTATION OF A NEW USER INTERFACE FOR INTERNET SEARCH ENGINES

ARCHITECTURE AND IMPLEMENTATION OF A NEW USER INTERFACE FOR INTERNET SEARCH ENGINES ARCHITECTURE AND IMPLEMENTATION OF A NEW USER INTERFACE FOR INTERNET SEARCH ENGINES Fidel Cacheda, Alberto Pan, Lucía Ardao, Angel Viña Department of Tecnoloxías da Información e as Comunicacións, Facultad

More information

Elixir Domain Configuration and Administration

Elixir Domain Configuration and Administration Elixir Domain Configuration and Administration Release 3.5.0 Elixir Technology Pte Ltd Elixir Domain Configuration and Administration: Release 3.5.0 Elixir Technology Pte Ltd Published 2014 Copyright 2014

More information

COMMUNITIES USER MANUAL. Satori Team

COMMUNITIES USER MANUAL. Satori Team COMMUNITIES USER MANUAL Satori Team Table of Contents Communities... 2 1. Introduction... 4 2. Roles and privileges.... 5 3. Process flow.... 6 4. Description... 8 a) Community page.... 9 b) Creating community

More information

LAB 7: Search engine: Apache Nutch + Solr + Lucene

LAB 7: Search engine: Apache Nutch + Solr + Lucene LAB 7: Search engine: Apache Nutch + Solr + Lucene Apache Nutch Apache Lucene Apache Solr Crawler + indexer (mainly crawler) indexer + searcher indexer + searcher Lucene vs. Solr? Lucene = library, more

More information

Coveo Platform 7.0. Oracle UCM Connector Guide

Coveo Platform 7.0. Oracle UCM Connector Guide Coveo Platform 7.0 Oracle UCM Connector Guide Notice The content in this document represents the current view of Coveo as of the date of publication. Because Coveo continually responds to changing market

More information

COMMUNICATION PROTOCOLS

COMMUNICATION PROTOCOLS COMMUNICATION PROTOCOLS Index Chapter 1. Introduction Chapter 2. Software components message exchange JMS and Tibco Rendezvous Chapter 3. Communication over the Internet Simple Object Access Protocol (SOAP)

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

Introduction to TCP/IP

Introduction to TCP/IP Introduction to TCP/IP Properties and characteristics of TCP/IP IPv4 IPv6 Public vs private vs APIPA/link local Static vs dynamic Client-side DNS settings Client-side DHCP Subnet mask vs CIDR Gateway TCP/IP

More information

X100 ARCHITECTURE REFERENCES:

X100 ARCHITECTURE REFERENCES: UNION SYSTEMS GLOBAL This guide is designed to provide you with an highlevel overview of some of the key points of the Oracle Fusion Middleware Forms Services architecture, a component of the Oracle Fusion

More information