Contents Installing SAS Information Retrieval Studio...1 1.1 About This Book... 1 1.1.1 Audience... 1 1.1.2 Prerequisites... 1 1.1.3 Typographical Conventions... 2 1.2 Introduction to SAS Information Retrieval Studio... 2 1.2.1 What Is SAS Information Retrieval Studio... 2 1.2.2 Requirements and Processes Overview... 3 1.2.3 Prerequisite System Requirements... 4 1.3 Installing the Prerequisite Software... 5 1.3.1 Before You Install the Prerequisite Software... 5 1.3.2 Installing Flash... 5 1.3.3 Java... 5 1.3.4 Installing Python... 5 1.3.5 Installing the Web Crawler... 6 1.3.6 Installing SAS Search and Indexing... 6 1.3.7 Installing Other SAS Programs... 6 1.3.7.A Process Documents... 6 1.3.7.B Export Processed Documents... 7 1.4 Installing and Uninstalling SAS Information Retrieval Studio... 8 1.4.1 Before You Install on Either Windows or UNIX... 8 1.4.2 Install on Windows... 8 1.4.3 Install on UNIX... 16 1.4.4 Perform an Advanced Installation on UNIX... 16 1.4.5 Uninstall on Windows... 17 1.4.6 Uninstall on UNIX... 18 A Recommended Reading...21 Index...25 iii
iv SAS Information Retrieval Studio: Installation Guide
Installing SAS Information Retrieval Studio 1.1 About This Book 1.1.1 Audience SAS Information Retrieval Studio: Installation Guide is designed for administrators who perform the following operations: - maintain servers - set up custom applications - assign users to projects Administrators install all of the software explained in this guide. 1.1.2 Prerequisites Here are the prerequisites for installing SAS Information Retrieval Studio: - Use only the supported hardware. - You need a supported browser, installed on your desktop client. - Obtain necessary permissions and passwords. - Install the prerequisite software, if you have not already done this. 1
1.1.3 Typographical Conventions This manual uses the following typographical conventions: Convention installkit.tar Browse www.sas.com Description Program names, filenames, command names, and code examples are shown in a fixed-width font. Variable portions are italicized. The names of the components are displayed in a bold font. Hypertext links are shown in a light blue, fixed-width font, and are underlined. 1.2 Introduction to SAS Information Retrieval Studio 1.2.1 What Is SAS Information Retrieval Studio In many organizations, diverse information consumers need to quickly access specific data. In an environment where data, and its types, grow exponentially there is a need to automate the related processes. SAS Information Retrieval Studio combines several key technologies to provide a comprehensive solution to data collection, indexing, searching, and so on. These technologies are bundled into one customizable product. Easy information retrieval The web, feed, and file crawlers gather the documents that you specify according to your parameters. Documents are chunks of text, with or without markup tags, gathered from the Internet, feeds, and databases. These chunks of text can be treated by the document processors that parse, convert, categorize, extract concepts and facts, and so on. The documents can then be sent to the index or to another program such as SAS Sentiment Analysis WorkbenchSAS Sentiment Analysis Workbench. If indexed, the documents can be searched by your end users. 2 SAS Information Retrieval Studio: Installation Guide
Build a custom information retrieval pipeline Choose to build an information retrieval system that is customized to meet the needs of your organization. You can choose all, or some, of the following components: Crawlers Choose the web, feed, or file crawlers to gather documents from the Web, feeds, and file systems, respectively. Pipeline server Choose your document processors that parse, categorize, extract concepts, locate facts, convert documents into text, and so on. These processors can also hand the gathered documents to other applications such as SAS Sentiment Analysis Workbench. Indexing server Choose how, and whether input documents are indexed. End users can search indexed documents using a customized search window that runs on the query web server. Query web server Specify how the matching documents are returned in the search window, the appearance of this window, and how end users can navigate the returns. Query statistics server See the counts for the entered queries according to various time frames. Easy component customization Easy-to-use windows and wizards simplify the process of customizing the information retrieval components that you choose. These panes also provide log files, statistics, information about the processes involved, and data on documents in the pipeline. 1.2.2 Requirements and Processes Overview You, the administrator, use this chapter to learn about the hardware requirements that are necessary to install SAS Information Retrieval Studio SAS Information Retrieval Studio: Installation Guide 3
and the related software. Also use this chapter to install and uninstall SAS Information Retrieval Studio and related applications. 1.2.3 Prerequisite System Requirements Configure the machine where you install SAS Information Retrieval Studio according to the recommended system configuration: CPU RAM 2+ CPUs of 2 GHz or higher, each, are recommended 1 GB or higher is recommended, but this base number depends on the size of the installation that you load Before you install SAS Information Retrieval Studio, make sure that you have the appropriate operating system and platform. Table 1-1: Supported Operating Systems Operating System Windows (32-bit) Platform x86 Windows (64-bit) x86-64 HP-UX (64-bit) ia64 Linux (64-bit) x86-64 IBM AIX (64-bit) Sun Solaris (64-bit) PPC UltraSPARC Sun Solaris (64-bit) x86-64 A browser with Adobe Flash version 10.x is also required. 4 SAS Information Retrieval Studio: Installation Guide
1.3 Installing the Prerequisite Software 1.3.1 Before You Install the Prerequisite Software There are several applications that are required before you install SAS Information Retrieval Studio. Some of these programs might already be installed on your machine. If not, install them before you install SAS Information Retrieval Studio. When you uninstall Teragram WebCrawler, SAS Search and Indexing, and SAS Information Retrieval Studio, do so in order. In other words, the first of these three applications to be installed is the first to be uninstalled, and so on. Note: Be sure to install the correct version of the software for your machine. For example, if you run the 64-bit version of SAS Information Retrieval Studio install the 64-bit version of Java. 1.3.2 Installing Flash 1.3.3 Java Flash is required when you are running your Web browser. You can download Flash from the following location: http://www.adobe.com/products/flashplayer/ Java 1.5 or newer, is required. You can download Java from the following location: http://www.java.com/en/download/manual.jsp 1.3.4 Installing Python If Python is not installed, you can download the program. ActivePython 2.6 is preferred. Download Python from the following location: http://www.activestate.com/activepython SAS Information Retrieval Studio: Installation Guide 5
1.3.5 Installing the Web Crawler If you want to download documents hosted on Web servers, install and configure the web crawler. For more information, see the WebCrawler User s Guide. However, if you plan to use SAS Information Retrieval Studio to process texts that are located on a local disk or a Windows fileshare, the web crawler in unnecessary. 1.3.6 Installing SAS Search and Indexing If you want to create a searchable index, install SAS Search and Indexing. For more information, see the SAS Search and Indexing: User and C and Java API Guide. However, if you plan to export the documents for further analyses or for another use, the SAS Search and Indexing solution is not necessary. For example, you might choose to export your documents into SAS Text Miner for further analyses, or into SAS Content Categorization Studio as training documents. 1.3.7 Installing Other SAS Programs 1.3.7.A Process Documents Use the following programs to process documents that were collected by a crawler: SAS Content Categorization Server Install SAS Content Categorization Server to identify categories, concepts, and facts from SAS Content Categorization Studio and SAS Contextual Extraction Studio. Make sure that the taxonomies that you want to apply to the documents input to SAS Information Retrieval Studio are uploaded to SAS Content Categorization Server. Run SAS Content Categorization Server before you configure the pipeline server. For more information, see the SAS Content Categorization Server: Administrator and Java Programmer s Guide. 6 SAS Information Retrieval Studio: Installation Guide
Note: To use SAS Content Categorization Server, you first create projects using SAS Content Categorization Studio and SAS Contextual Extraction Studio. SAS Document Conversion Install SAS Document Conversion if you want to process documents such as Microsoft Word and PDF files. For more information, see the SAS Document Conversion: Developer s Guide. If you plan to process only XML and HTML documents, SAS Document Conversion is not necessary. 1.3.7.B Export Processed Documents Use the following programs to process documents that were collected by a crawler and processed by the pipeline server: SAS Content Categorization Studio Install SAS Content Categorization Studio that can read collected and processed files. These files can be used for training and testing purposes. SAS Sentiment Analysis Workbench Install SAS Sentiment Analysis Workbench and send the collected documents to this application for sentiment analysis. For more information, see the SAS Sentiment Analysis Workbench: Installation Guide. SAS Text Miner Install SAS Text Miner to identify topics and themes. For more information, see the appropriate installation guide for SAS Text Miner. SAS Information Retrieval Studio: Installation Guide 7
1.4 Installing and Uninstalling SAS Information Retrieval Studio 1.4.1 Before You Install on Either Windows or UNIX If you choose to install Teragram WebCrawler and SAS Search and Indexing, do so before you install SAS Information Retrieval Studio. The installation order determines the order of the uninstall process. In other words, the first program to be installed is the first program to be uninstalled, and so on. 1.4.2 Install on Windows You can install one, or more instances of SAS Information Retrieval Studio on your machine. You might choose to install more than one instance of SAS Information Retrieval Studio when you want to use more than one pipeline server. In this case, you can send the same set of documents to two different servers for different types of document processing operations. For more information, see SAS Information Retrieval Studio: Administrator s Guide. To install the software on a Microsoft Windows system, complete these steps: 1. Double-click SAS_Information_Retrieval_Studio_Setup.exe and the installation wizard launches. 8 SAS Information Retrieval Studio: Installation Guide
The Welcome page appears. SAS Information Retrieval Studio: Installation Guide 9
2. Click Next and the Installation Name page appears: 3. Type the name of this installation into the blank field. For example, enter Installation1. The purpose of this screen is to enable you to perform multiple installations of SAS Information Retrieval Studio. 10 SAS Information Retrieval Studio: Installation Guide
4. Click Next and the Choose Install Location page appears. 5. (Optional) Click Browse and the Browse For Folder dialog box appears. SAS Information Retrieval Studio: Installation Guide 11
a. Select an installation folder. b. (Optional) Click Make New Folder and create a new folder for SAS Information Retrieval Studio. c. Click OK. 6. Click Next and the Choose Port Number page appears. 12 SAS Information Retrieval Studio: Installation Guide
7. Leave the default selection, 9000, or type in a new port number. Click Next and the Create Icons page appears. 8. Choose from the following selections: a. (Optional) Deselect On the desktop and the SAS Information Retrieval Studio icon does not appear on your desktop after you install the program. b. (Optional) Deselect In the start menu and a SAS Information Retrieval Studio icon does not appear in your start menu after you install the program. c. (Optional) Deselect For all users and a SAS Information Retrieval Studio icon does not appear for both administrators and regular users in the selected location. d. (Optional) Select For the current user and a SAS Information Retrieval Studio icon appears in the selected location. For more information about this location, see Step a and Step b above. SAS Information Retrieval Studio: Installation Guide 13
9. Click Install and the Installation Complete window appears. 10. (Optional) To see a list of the extracted files and the created folder, click Show details. 14 SAS Information Retrieval Studio: Installation Guide
11. Click Next. The Completing the SAS Information Retrieval Studio Setup Wizard window appears: 12. Click Finish. 13. Click on your desktop to start SAS Information Retrieval Studio. Note: Use this process reiteratively until you have added all of the instances of SAS Information Retrieval Studio that you require. SAS Information Retrieval Studio: Installation Guide 15
1.4.3 Install on UNIX SAS Information Retrieval Studio is distributed on UNIX systems as a tar archive. To install the software, use the following UNIX commands: gzip -d installkit.tar.gz tar -xf installkit.tar The -d switch on the gzip command decompresses the distribution file in preparation for the expansion of the archive tar file. The switches in the tar command extract the contents from the specified tar file. The actual name of your tar file might vary from that shown in the example. Additional information about using the gzip and tar commands is available in the UNIX main pages. To launch SAS Information Retrieval Studio, edit $INSTALLDIR/work/ information-retrieval-studio.conf by completing these steps: 1. Specify the port number. For example, port=9000. 2. (Optional) Enter the path to find SAS Web Crawler and SAS Search and Indexing, if you installed them: pcrawler=/path/to/sas_web_crawler/bin/_pcrawler tix=/path/to/sas_search_indexing/bin/_tix 3. Start the server: $INSTALLDIR/start.py 4. Connect to the SAS Information Retrieval Studio administrative interface. For example, specify the following line: http://localhost:9000 1.4.4 Perform an Advanced Installation on UNIX Choose to use the advanced installation operation for UNIX when you want to run more than one application of SAS Information Retrieval Studio on a single machine. When you run more than one application, you can run duplicate operations without conflict. For example, install a second application to define a pipeline server that sends documents to SAS Sentiment Analysis Workbench while the first pipeline server sends documents to an index. 16 SAS Information Retrieval Studio: Installation Guide
To perform a second installation of SAS Information Retrieval Studio on your machine, complete these steps: 1. Untar the installation kit tar file into a directory where you have not already installed SAS Information Retrieval Studio. 2. Follow the directions in Section 1.4.3 Install on UNIX on page 16. 3. Specify a new port number. For example, enter 9200. 1.4.5 Uninstall on Windows Before you uninstall SAS Information Retrieval Studio uninstall Teragram WebCrawler and SAS Search and Indexing. Uninstall the first program that you installed, and so on. To uninstall SAS Information Retrieval Studio, complete these steps: 1. Select Control Panel --> Add or Remove programs and select SAS Information Retrieval Studio. SAS Information Retrieval Studio: Installation Guide 17
2. Click Remove. The SAS Information Retrieval Studio Uninstall window appears. 3. (Optional) Click Show details to see a list of the deleted files and removed folders. 4. Click Close. 1.4.6 Uninstall on UNIX To uninstall SAS Information Retrieval Studio, complete these steps: 1. Run stop.py in the directory where you installed SAS Information Retrieval Studio. Note: Like Windows uninstall procedures, if you installed Teragram WebCrawler and SAS Search and Indexing uninstall these programs first. In other words, the first 18 SAS Information Retrieval Studio: Installation Guide
of these three applications to be installed is the first to be uninstalled, and so on. 2. Delete SAS Information Retrieval Studio. SAS Information Retrieval Studio: Installation Guide 19
20 SAS Information Retrieval Studio: Installation Guide
Appendix: A Recommended Reading The following books are recommended as companion guides: - SAS Information Retrieval Studio: Administrator s Guide: Use this book to select the components of your custom information retrieval pipeline. Configure these components that include the search window that end users use to query an index. - SAS Information Retrieval Studio: User s Guide: Use the search window that an administrator customized to query the index built in SAS Information Retrieval Studio. - SAS Sentiment Analysis Studio: User s Guide: Create a SAS Sentiment Analysis Studio project, test, and upload it to SAS Sentiment Analysis Server. - SAS Sentiment Analysis Server: Administrator s Guide: Automate the process of applying the rules that you define in SAS Sentiment Analysis Studio to your input documents. - SAS Sentiment Analysis Workbench: Installation Guide: Install SAS Sentiment Analysis Workbench and prerequisite software. - SAS Sentiment Analysis Workbench: Administrator s Guide: Set up SAS Sentiment Analysis Studio projects, add users, and specify the files to be used. These files include SAS Sentiment Analysis Studio and SAS Content Categorization Studio files. - SAS Sentiment Analysis Workbench: User s Guide: Review and edit the automated analyses and create reports illustrated with graphs that illustrate these analyses. - SAS Content Categorization: User s Guide: Create a SAS Content Categorization Studio project, test, and upload to SAS Content Categorization Server. 21
- SAS Content Categorization Studio: Quick Start Guide: Advanced users can learn how to expeditiously set up a SAS Content Categorization Studio project. - SAS Content Categorization: Installation Guide: Install SAS Content Categorization Server. - SAS Content Categorization Server: Administrator s Guide: Understand how SAS Content Categorization Server applies the.mco and.concepts files to input documents. Program this application using the Java language. - SAS Contextual Extraction Studio: Administrator s Guide: Use this add-on application to SAS Content Categorization Studio to write complex concept definitions that can include multiple rule types within a single definition. - SAS Contextual Extraction Studio: Installation Guide: Install SAS Contextual Extraction Studio. - SAS Document Conversion: Developer s Guide: Use this C API for SAS Document Conversion to convert documents in formats such as Adobe PDF and Microsoft Office into text. - Use the language book that applies to the language that you use to create your project. Each of the SAS world language books contain a comprehensive list of part-of-speech tags. - SAS offers instructor-led training and self-paced e-learning courses to help you get started with the SAS add-in, learn how the SAS add-in works with the other products in the SAS Enterprise Intelligence Platform, and learn how to run stored processes in the SAS add-in. For more information about the courses available, see support.sas.com/training. For a complete list of SAS publications, see the current SAS Publishing Catalog. To order the most current publications or to receive a free copy of the catalog, contact a SAS representative at SAS Publishing Sales SAS Campus Drive Cary, NC 27513 Telephone: (800) 727-3228* Fax: (919) 677-8166 22 SAS Information Retrieval Studio: Administrator s Guide
E-mail: sasbook@sas.com Web address:support.sas.com/pubs * For other SAS Institute business, call (919) 677-8000. Customers outside the United States should contact their local SAS office. SAS Information Retrieval Studio: Administrator s Guide
24 SAS Information Retrieval Studio: Administrator s Guide
Index C CPU...4 H hardware requirements...3 I installation advanced...16 O operating systems supported...4 S SAS Content Categorization Server...6 install... 6, 7 taxonomies...6 SAS Content Categorization Studio...7 SAS Contextual Extraction Studio concepts and facts...6 SAS Document Conversion install...7 SAS Sentiment Analysis Workbench install...7 SAS Text Miner...7 install...7 system configuration specified...4 25
U uninstall UNIX... 18 Windows... 17 26 SAS Information Retrieval Studio: Installation Guide