Software Requirement Specification Version 1.0.0 Project Title :BATSS - Search Engine for Animation Team Title :BATSS Team Guide (KreSIT) and College : Vijyalakshmi,V.J.T.I.,Mumbai Group Members : Basesh Gala Avani Vadera Tejal Bhatt Swapnil Mhatre Sneha Rajan 16 1. INTRODUCTION The purpose of this section is to provide the Reader with a general, background information about the software BATSS Search Engine for Animation. 1.1. PURPOSE This document is the Software Requirement Specification for the BATSS Search Engine version 1.0.0. This SRS describes the functions and performance requirements of the BATSS Search Engine.The BATSS Search Engine crawls and indexes only animation web pages in Java Applets and Flash format. 1.2. DOCUMENT CONVENTIONS Throughout this document, the following conventions have been used: - Font: - Times New Roman Size 16 for Main Headings Size 14 for Sub Headings Size 12 for the rest of the document. Words in Bold are important terms, and have been formatted to grab the attention of the reader.
1.3. INTENDED AUDIENCE AND READING SUGGESTIONS This document is meant for users, developers, project managers, testers, and documentation writers. The SRS document aims to explain in an easy manner, the basic idea behind the BATSS Search Engine and how the developers aim to achieve their goals. It also aims to introduce to the users the main features of the BATSS Search Engine and what makes it different from other Search Engines like Google,Yahoo! Etc. 1.4. SCOPE With a 10 months time constraint we students have looked into the analysis of the search engine and its design and implementation(integration of modules too). Animation based search engine is the specific area on which will be dealing in the next 9 months prior to the implementation details. For gaining an insight into how the existing search engine works, a comparatative study of various features the several engines offer have been made. A survey of the existing search engines has also been conducted in order to understand the in-addition expectations from current search engine. The planning stage and requirement gathering stage is a base work for further analysis and design.hence planning and requirement gathering stage has also tbeen allotted a time period of 2 months. 1.5. DEFINITIONS, ACRONYMS, AND ABBREVIATIONS 1. Search engine : A search engine is an information retrieval system designed to help find information stored on a computer system such as on the World Wide Web inside a corporate or proprietary network, or in a personal computer 2. Crawler : A web crawler (also known as a Web spider or Web robot) is a program or automated script which browses the World Wide Web in a methodical, automated manner. Other less frequently used names for Web crawlers are ants, automatic indexers, bots, and worms. 3. Indexing : Search engine indexing entails how data is collected, parsed, and stored to facilitate fast and accurate retrieval 4. Web directories : A web directory is a directory on the World Wide Web It specializes in linking to other web sites and categorizing those links.a web directory is not a search engine and does not display lists of web pages based on keywords instead, it lists web sites by category and subcategory. The categorization is usually based on the whole web site, rather than one page or a set of keywords, and sites are often limited to inclusion in only one or two categories. Web directories often allow site owners to directly submit their site for inclusion, and have editors review
submissions for fitness. 5. URL normalization : URL normalization (or URL canonicalization) is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URL into a normalized or canonical URL so it is possible to determine if two syntactically different URLs are equivalent. 6. Lexicon : A lexicon can be a list of words together with additional wordspecific information, i.e., a dictionary 1.6. REFERENCES [1] Complete Reference: JAVA, By Herbert Shield, [2] General information related Crawler: http://en.wikipedia.org/wiki/web_crawler [3] URL Normalisation http://dblab.ssu.ac.kr/publication/leki05a.pdf//url normalisation [4] Google s Ranking Scheme www.google.com/technology/ [5] Features of Search Engines www.searchengineshowdown.com/features/ [6] Features of Search Engines, A Comparative analysis http://www.searchengineshowdown.com/features/google/review.html [7] Query Evaluation Techniques www.google.com/corporate/tech.htm//query evaluation [8] Aris Anagnostopoulos, Andrei Z.Border, David Carmel a Paper on Sampling Search-Engine Results, Brown University, IBM. 1.7. OVERVIEW OF DOCUMENT For the rest of the document, we first of all have defined the overall product. Then, we have given the external interface requirements, followed by a brief description of the product components and features. In the last section, we have provided the non functional requirements of the product.
2. OVERALL DESCRIPTION 2.1. PRODUCT PERSPECTIVE BATSS Search Engine is a standalone system.it provides modules for crawling,indexing,sorting and searching animation web pages in Applet and Flash formats. 2.2. PRODUCT FUNCTIONS The main function of BATSS Search Engine is to allow its users to search for animation pages throughout the www.it also allows the users to specify query using Boolean operators and also allows for searching phrases. 2.3. USER CLASSES AND CHARACTERSTICS The major user classes that are expected to use this product are as follows 2.3.1. Students and Teachers The BATSS Search Engine has been developed for OSCAR Project in IITB.This project is used by students to view animation on various topics belonging to physics,mathematics,networking etc.hence they can use this Search Engine to efficiently search animation on a required topic from OSCAR as well as from WWW. 2.3.2. General Users The general users who are interested in viewing animations on different topics can use BATSS Search Engine to search for animation pages. 2.3.3. Website Developer The person who develops a website containing animations registers with the Search Engine so that his animation is crawled and indexed in future. 2.4. OPERATING ENVIORNMENT 2.4.1. Client Side Requirement OS : Linux,Windows Software Packages : Browser
2.4.2. Server Side Requirements OS : Linux,Windows Software Packages : Java Runtime Environment 1.6 2.5. DESIGN AND IMPLEMENTATION CONSTRAINTS Due to the lack of time, we are unable to implements indexing of other animation formats such as PPT,GIF etc. 2.6. ASSUMPTIONS 1. The pages containing query term inside the animations are more relevant to query, hence given more importance in the final ranking process. 2. Most interactive animations are written using Java Applets or Macromedia Flash. 3. The pages containing the animation gives relevant information regarding that animations and hence indexed along with information present in the animation. 2.7. DEPENDENCIES 1. Apache commons Http Client package used in crawling process of the Software 2. JAD 1.2, a java decompiler, is used for extracting metadata from Applet animations. 3. HTML Parser package is used to parse the HTML pages. 4. Transform SWF package is used to extract for extracting metadata from Flash file animations. 5. Fast MD5 package for creating hash values of files, words and URLs. 2.8. GENERAL CONSTRAINTS, ASSUMPTIONS, DEPENDENCIES, GUIDELINES This product is a web based application Hence a major constraint on the performance will be due to the bandwidth of the server s web connection. A faster bandwidth will result in faster crawling of web pages.
2.9. USER VIEW OF PRODUCT USE When the user submits the query in the query box provided,the server invokes searcher process which searches for the relevant documents in the Reverse Index.The result is then forwarded to the presenter module which encodes the result in HTML format and sends it back to the client(browser). 3. EXTERNAL INTERFACE REQUIREMENTS 3.1. USER INTERFACE Basically there will be two user interfaces provided by the software. 1. A screen containing a search panel providing area for the user to input his search query. 2. A results page which lists the links of the documents relevant to the given query. 3.2. SOFTWARE INTERFACE Softwarewill require the following libraries: Apache commons Http Client package JAD 1.2, a java decompiler HTML Parser package Transform SWF package Fast MD5 package 3.3. COMMUNICATION INTERFACES The crawler module of the search engine software uses the HTTP protocol to download the pages from WWW. The user uses the search engine through browser. 4.SYSTEM FEATURES Crawling The crawling module accepts the URL s from URL Server.It first downloads the robots.txt file which contains the permission for indexing that particular web page.if the permission is granted the crawler downloads the given web page and stores them in Store Server.
Indexing The indexer module indexes the crawled pages in the Store Server and prepares forward index.a forward index consists of document along with all the words present in the document and their context.this forward index is then sorted according to words by the sorter module to form Reverse Index.The Reverse Index contains all the words along with a list of documents containing the word and their rank. Searching When user submits a query,searcher module is invoked.the searcher modules searches for the relevant documents in the Reverse Index and ranks the documents.the result page is provided to the Presenter module which in turn providees it to the user. 5. OTHER NON-FUNCTIONAL REQUIREMENTS 5.1. PERFORMANCE REQUIREMENTS The number of crawlers working at a tme is dynamically created depending on the available bandwidth.the average response time for a user is 0.36 sec.the expected accuracy of output is 90% 5.2. SAFETY REQUIREMENTS If the speed of crawler is higher than that the web server can handle then it may lad the web server to crash.hence a website developer should specify the speed supported. 5.3. SOFTWARE QUALITY ATTRIBUTES All the software modules are developed in Java, which makes the system platform independent and robust. Secondly the system will provide the user with easy to use and understandable GUI interface. 5.4. BUSSINESS RULES The administrator can use the administrator GUI to start and stop any of the internal modules like Crawler,Store Server,Extractor,Indexer and Web Server.
5.6. Special User Requirements 5.6.1. Installation 1. Extract the distribution-zip-file at any location. 2. Run the config.sh file (config.bat in case of windows). A window will appear. 3. Set the General->Root-Path variable to the directory where distribution-zip-file is extracted. 5. Also set the Web-Server->Web-Server-IP to the IP address of the host on which the search engine will be run. 4. You can also set other variables, but it is highly recommended that you should not modify the values as they are optimized for most of the systems.