Software Requirement Specification Version 1.0.0

Similar documents
Searching. Outline. Copyright 2006 Haim Levkowitz. Copyright 2006 Haim Levkowitz

AN OVERVIEW OF SEARCHING AND DISCOVERING WEB BASED INFORMATION RESOURCES

5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search

Everyday Activity. Course Content. Objectives of Lecture 13 Search Engine

A web directory lists web sites by category and subcategory. Web directory entries are usually found and categorized by humans.

A crawler is a program that visits Web sites and reads their pages and other information in order to create entries for a search engine index.

Why is Search Engine Optimisation (SEO) important?

Tennessee. Trade & Industrial Course Web Page Design II - Site Designer Standards. A Guide to Web Development Using Adobe Dreamweaver CS3 2009

PROJECT REPORT (Final Year Project ) Project Supervisor Mrs. Shikha Mehta

Running Head: HOW A SEARCH ENGINE WORKS 1. How a Search Engine Works. Sara Davis INFO Spring Erika Gutierrez.

Introduction. What do you know about web in general and web-searching in specific?

SE Workshop PLAN. What is a Search Engine? Components of a SE. Crawler-Based Search Engines. How Search Engines (SEs) Work?

Administrivia. Crawlers: Nutch. Course Overview. Issues. Crawling Issues. Groups Formed Architecture Documents under Review Group Meetings CSE 454

LIST OF ACRONYMS & ABBREVIATIONS

Themis An Automated Online Programming Contest System

Uniform Resource Locators (URL)

You got a website. Now what?

SEO Technical & On-Page Audit

Web Search An Application of Information Retrieval Theory

The Anatomy of a Large-Scale Hypertextual Web Search Engine

How many people are online? As of Sept. 2002: an educated guess suggests: World Total: million. Internet. Types of Computers on Internet

Blackboard Learn 9.1 Reference Terminology elearning Blackboard Learn 9.1 for Faculty

Search Engine Visibility Analysis

power up your business SEO (SEARCH ENGINE OPTIMISATION)

Integration Service. Admin Console User Guide. On-Premises

Searching in All the Right Places. How Is Information Organized? Chapter 5: Searching for Truth: Locating Information on the WWW

Middle East Technical University. Department of Computer Engineering

How to Get Your Website Listed on Major Search Engines

Web Crawling. Jitali Patel 1, Hardik Jethva 2 Dept. of Computer Science and Engineering, Nirma University, Ahmedabad, Gujarat, India

NEST Kali Linux Tutorial: Burp Suite

Digital Marketing. Introduction of Marketing. Introductions

National Training and Education Resource. Authoring Course. Participant Guide

SOFTWARE REQUIREMENT SPECIFICATION FOR PPTX TO HTML5 CONTENT CONVERTER

Site Audit SpaceX

CSE 3. How Is Information Organized? Searching in All the Right Places. Design of Hierarchies

How A Website Works. - Shobha

Web Crawling. Introduction to Information Retrieval CS 150 Donald J. Patterson

SEO. A Lecture by Usman Akram for CIIT Lahore Students

Kaltura Video Building Block 4.0 for Blackboard 9.x Quick Start Guide. Version: 4.0 for Blackboard 9.x

2018 SEO CHECKLIST. Use this checklist to ensure that you are optimizing your website by following these best practices.

JRA4 COMMON SOFTWARE SOFTWARE USER REQUIREMENTS JRA4-SPE Revision : 1.0. Date : 04/07/2005

Alexander Lyuty 29, Staromonetny per., Moscow, , Russia Institute of Geography, Russian Academy of Sciences

Crawling. CS6200: Information Retrieval. Slides by: Jesse Anderton

SOFTWARE REQUIREMENT SPECIFICATION

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS

Presented by Kit Na Goh

CS47300: Web Information Search and Management

Building Your Blog Audience. Elise Bauer & Vanessa Fox BlogHer Conference Chicago July 27, 2007

WWW and Web Browser. 6.1 Objectives In this chapter we will learn about:

Information Networks. Hacettepe University Department of Information Management DOK 422: Information Networks

Emerging Technologies in Knowledge Management By Ramana Rao, CTO of Inxight Software, Inc.

The Black Magic of Flash SEO

What s New in Enterprise Jeff Simpson Sr. Systems Engineer

Full-Text Indexing For Heritrix

SEO Today s Agenda: Introduction What to expect today How search engines work What is SEO? Foundational SEO On and off page basics

What Is Voice SEO and Why Should My Site Be Optimized For Voice Search?

Web Programming Paper Solution (Chapter wise)

Basics of SEO Published on: 20 September 2017

University of Toronto School of Continuing Studies. A Conceptual Overview of E-Business Technologies

DLV02.01 Business processes. Study on functional, technical and semantic interoperability requirements for the Single Digital Gateway implementation

Website Name. Project Code: # SEO Recommendations Report. Version: 1.0

Code Check TM Software Requirements Specification

LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology

WebBiblio Subject Gateway System:

CRAWLING THE CLIENT-SIDE HIDDEN WEB

Marketing & Back Office Management

Web Development & Design Foundations with HTML5

Software Requirements Specification. <Project> for. Version 1.0 approved. Prepared by <author> <organization> <date created>

Blackboard Portfolio System Owner and Designer Reference

THE HISTORY & EVOLUTION OF SEARCH

DEC Computer Technology LESSON 6: DATABASES AND WEB SEARCH ENGINES

PORTAL RESOURCES INFORMATION SYSTEM: THE DESIGN AND DEVELOPMENT OF AN ONLINE DATABASE FOR TRACKING WEB RESOURCES.

Eclipse Support for Using Eli and Teaching Programming Languages

How to do an On-Page SEO Analysis Table of Contents

Global Support Software. User Guide

Site Audit Boeing

HOW DOES A SEARCH ENGINE WORK?

Webinar Series. Sign up at February 15 th. Website Optimization - What Does Google Think of Your Website?

Information Retrieval. Lecture 10 - Web crawling

Wholesale Lockbox User Guide

Site Audit Virgin Galactic


Content Discovery of Invisible Web

Information Retrieval Spring Web retrieval

SEO Services. Climb up the Search Engine Ladder

Building a Web-based Health Promotion Database

CHAPTER 4 PROPOSED ARCHITECTURE FOR INCREMENTAL PARALLEL WEBCRAWLER

Software Requirements Specification BRIC. for. Requirements for Version Prepared by Panagiotis Vasileiadis

Oracle Real-Time Scheduler

Search Engine Optimisation Basics for Government Agencies

5.1 Configuring Authentication, Authorization, and Impersonation. 5.2 Configuring Projects, Solutions, and Reference Assemblies

Software Requirements Specification. <Project> for. Version 1.0 approved. Prepared by <author(s)> <Organization> <Date created>

Unit 4 The Web. Computer Concepts Unit Contents. 4 Web Overview. 4 Section A: Web Basics. 4 Evolution

Webcasting. Features. Event Plus Webcast. Use the internet to deliver your message and expand your reach. Basic package.

CHAPTER 7 WEB SERVERS AND WEB BROWSERS

Objective Explain concepts used to create websites.

WHY EFFECTIVE WEB WRITING MATTERS Web users read differently on the web. They rarely read entire pages, word for word.

OncoSys is developed to make the Oncologists work easy by automating the

Google Search Appliance

H2020-ICT

Transcription:

Software Requirement Specification Version 1.0.0 Project Title :BATSS - Search Engine for Animation Team Title :BATSS Team Guide (KreSIT) and College : Vijyalakshmi,V.J.T.I.,Mumbai Group Members : Basesh Gala Avani Vadera Tejal Bhatt Swapnil Mhatre Sneha Rajan 16 1. INTRODUCTION The purpose of this section is to provide the Reader with a general, background information about the software BATSS Search Engine for Animation. 1.1. PURPOSE This document is the Software Requirement Specification for the BATSS Search Engine version 1.0.0. This SRS describes the functions and performance requirements of the BATSS Search Engine.The BATSS Search Engine crawls and indexes only animation web pages in Java Applets and Flash format. 1.2. DOCUMENT CONVENTIONS Throughout this document, the following conventions have been used: - Font: - Times New Roman Size 16 for Main Headings Size 14 for Sub Headings Size 12 for the rest of the document. Words in Bold are important terms, and have been formatted to grab the attention of the reader.

1.3. INTENDED AUDIENCE AND READING SUGGESTIONS This document is meant for users, developers, project managers, testers, and documentation writers. The SRS document aims to explain in an easy manner, the basic idea behind the BATSS Search Engine and how the developers aim to achieve their goals. It also aims to introduce to the users the main features of the BATSS Search Engine and what makes it different from other Search Engines like Google,Yahoo! Etc. 1.4. SCOPE With a 10 months time constraint we students have looked into the analysis of the search engine and its design and implementation(integration of modules too). Animation based search engine is the specific area on which will be dealing in the next 9 months prior to the implementation details. For gaining an insight into how the existing search engine works, a comparatative study of various features the several engines offer have been made. A survey of the existing search engines has also been conducted in order to understand the in-addition expectations from current search engine. The planning stage and requirement gathering stage is a base work for further analysis and design.hence planning and requirement gathering stage has also tbeen allotted a time period of 2 months. 1.5. DEFINITIONS, ACRONYMS, AND ABBREVIATIONS 1. Search engine : A search engine is an information retrieval system designed to help find information stored on a computer system such as on the World Wide Web inside a corporate or proprietary network, or in a personal computer 2. Crawler : A web crawler (also known as a Web spider or Web robot) is a program or automated script which browses the World Wide Web in a methodical, automated manner. Other less frequently used names for Web crawlers are ants, automatic indexers, bots, and worms. 3. Indexing : Search engine indexing entails how data is collected, parsed, and stored to facilitate fast and accurate retrieval 4. Web directories : A web directory is a directory on the World Wide Web It specializes in linking to other web sites and categorizing those links.a web directory is not a search engine and does not display lists of web pages based on keywords instead, it lists web sites by category and subcategory. The categorization is usually based on the whole web site, rather than one page or a set of keywords, and sites are often limited to inclusion in only one or two categories. Web directories often allow site owners to directly submit their site for inclusion, and have editors review

submissions for fitness. 5. URL normalization : URL normalization (or URL canonicalization) is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URL into a normalized or canonical URL so it is possible to determine if two syntactically different URLs are equivalent. 6. Lexicon : A lexicon can be a list of words together with additional wordspecific information, i.e., a dictionary 1.6. REFERENCES [1] Complete Reference: JAVA, By Herbert Shield, [2] General information related Crawler: http://en.wikipedia.org/wiki/web_crawler [3] URL Normalisation http://dblab.ssu.ac.kr/publication/leki05a.pdf//url normalisation [4] Google s Ranking Scheme www.google.com/technology/ [5] Features of Search Engines www.searchengineshowdown.com/features/ [6] Features of Search Engines, A Comparative analysis http://www.searchengineshowdown.com/features/google/review.html [7] Query Evaluation Techniques www.google.com/corporate/tech.htm//query evaluation [8] Aris Anagnostopoulos, Andrei Z.Border, David Carmel a Paper on Sampling Search-Engine Results, Brown University, IBM. 1.7. OVERVIEW OF DOCUMENT For the rest of the document, we first of all have defined the overall product. Then, we have given the external interface requirements, followed by a brief description of the product components and features. In the last section, we have provided the non functional requirements of the product.

2. OVERALL DESCRIPTION 2.1. PRODUCT PERSPECTIVE BATSS Search Engine is a standalone system.it provides modules for crawling,indexing,sorting and searching animation web pages in Applet and Flash formats. 2.2. PRODUCT FUNCTIONS The main function of BATSS Search Engine is to allow its users to search for animation pages throughout the www.it also allows the users to specify query using Boolean operators and also allows for searching phrases. 2.3. USER CLASSES AND CHARACTERSTICS The major user classes that are expected to use this product are as follows 2.3.1. Students and Teachers The BATSS Search Engine has been developed for OSCAR Project in IITB.This project is used by students to view animation on various topics belonging to physics,mathematics,networking etc.hence they can use this Search Engine to efficiently search animation on a required topic from OSCAR as well as from WWW. 2.3.2. General Users The general users who are interested in viewing animations on different topics can use BATSS Search Engine to search for animation pages. 2.3.3. Website Developer The person who develops a website containing animations registers with the Search Engine so that his animation is crawled and indexed in future. 2.4. OPERATING ENVIORNMENT 2.4.1. Client Side Requirement OS : Linux,Windows Software Packages : Browser

2.4.2. Server Side Requirements OS : Linux,Windows Software Packages : Java Runtime Environment 1.6 2.5. DESIGN AND IMPLEMENTATION CONSTRAINTS Due to the lack of time, we are unable to implements indexing of other animation formats such as PPT,GIF etc. 2.6. ASSUMPTIONS 1. The pages containing query term inside the animations are more relevant to query, hence given more importance in the final ranking process. 2. Most interactive animations are written using Java Applets or Macromedia Flash. 3. The pages containing the animation gives relevant information regarding that animations and hence indexed along with information present in the animation. 2.7. DEPENDENCIES 1. Apache commons Http Client package used in crawling process of the Software 2. JAD 1.2, a java decompiler, is used for extracting metadata from Applet animations. 3. HTML Parser package is used to parse the HTML pages. 4. Transform SWF package is used to extract for extracting metadata from Flash file animations. 5. Fast MD5 package for creating hash values of files, words and URLs. 2.8. GENERAL CONSTRAINTS, ASSUMPTIONS, DEPENDENCIES, GUIDELINES This product is a web based application Hence a major constraint on the performance will be due to the bandwidth of the server s web connection. A faster bandwidth will result in faster crawling of web pages.

2.9. USER VIEW OF PRODUCT USE When the user submits the query in the query box provided,the server invokes searcher process which searches for the relevant documents in the Reverse Index.The result is then forwarded to the presenter module which encodes the result in HTML format and sends it back to the client(browser). 3. EXTERNAL INTERFACE REQUIREMENTS 3.1. USER INTERFACE Basically there will be two user interfaces provided by the software. 1. A screen containing a search panel providing area for the user to input his search query. 2. A results page which lists the links of the documents relevant to the given query. 3.2. SOFTWARE INTERFACE Softwarewill require the following libraries: Apache commons Http Client package JAD 1.2, a java decompiler HTML Parser package Transform SWF package Fast MD5 package 3.3. COMMUNICATION INTERFACES The crawler module of the search engine software uses the HTTP protocol to download the pages from WWW. The user uses the search engine through browser. 4.SYSTEM FEATURES Crawling The crawling module accepts the URL s from URL Server.It first downloads the robots.txt file which contains the permission for indexing that particular web page.if the permission is granted the crawler downloads the given web page and stores them in Store Server.

Indexing The indexer module indexes the crawled pages in the Store Server and prepares forward index.a forward index consists of document along with all the words present in the document and their context.this forward index is then sorted according to words by the sorter module to form Reverse Index.The Reverse Index contains all the words along with a list of documents containing the word and their rank. Searching When user submits a query,searcher module is invoked.the searcher modules searches for the relevant documents in the Reverse Index and ranks the documents.the result page is provided to the Presenter module which in turn providees it to the user. 5. OTHER NON-FUNCTIONAL REQUIREMENTS 5.1. PERFORMANCE REQUIREMENTS The number of crawlers working at a tme is dynamically created depending on the available bandwidth.the average response time for a user is 0.36 sec.the expected accuracy of output is 90% 5.2. SAFETY REQUIREMENTS If the speed of crawler is higher than that the web server can handle then it may lad the web server to crash.hence a website developer should specify the speed supported. 5.3. SOFTWARE QUALITY ATTRIBUTES All the software modules are developed in Java, which makes the system platform independent and robust. Secondly the system will provide the user with easy to use and understandable GUI interface. 5.4. BUSSINESS RULES The administrator can use the administrator GUI to start and stop any of the internal modules like Crawler,Store Server,Extractor,Indexer and Web Server.

5.6. Special User Requirements 5.6.1. Installation 1. Extract the distribution-zip-file at any location. 2. Run the config.sh file (config.bat in case of windows). A window will appear. 3. Set the General->Root-Path variable to the directory where distribution-zip-file is extracted. 5. Also set the Web-Server->Web-Server-IP to the IP address of the host on which the search engine will be run. 4. You can also set other variables, but it is highly recommended that you should not modify the values as they are optimized for most of the systems.