IBM Content Analytics with Enterprise Search Version 3.0. Expanding queries and influencing how documents are ranked in the results

Similar documents
CONFIGURING SSO FOR FILENET P8 DOCUMENTS

IBM Watson Explorer Content Analytics Version Upgrading to Version IBM

Installing Watson Content Analytics 3.5 Fix Pack 1 on WebSphere Application Server Network Deployment 8.5.5

Platform LSF Version 9 Release 1.1. Migrating on Windows SC

IBM. Networking INETD. IBM i. Version 7.2

IBM Operational Decision Manager Version 8 Release 5. Configuring Operational Decision Manager on Java SE

Platform LSF Version 9 Release 1.3. Migrating on Windows SC

IBM Cognos Dynamic Query Analyzer Version Installation and Configuration Guide IBM

IBM Spectrum LSF Process Manager Version 10 Release 1. Release Notes IBM GI

Version 9 Release 0. IBM i2 Analyst's Notebook Premium Configuration IBM

Version 9 Release 0. IBM i2 Analyst's Notebook Configuration IBM

IBM Operations Analytics - Log Analysis: Network Manager Insight Pack Version 1 Release 4.1 GI IBM

Getting Started with InfoSphere Streams Quick Start Edition (VMware)

IBM Maximo Calibration Version 7 Release 5. Installation Guide

Netcool/Impact Version Release Notes GI

IBM Cloud Orchestrator. Content Pack for IBM Endpoint Manager for Software Distribution IBM

IBM Kenexa LCMS Premier on Cloud. Release Notes. Version 9.3

Using application properties in IBM Cúram Social Program Management JUnit tests

Version 2 Release 1. IBM i2 Enterprise Insight Analysis Understanding the Deployment Patterns IBM BA

IBM OpenPages GRC Platform Version 7.0 FP2. Enhancements

IBM. IBM i2 Enterprise Insight Analysis Understanding the Deployment Patterns. Version 2 Release 1 BA

Application and Database Protection in a VMware vsphere Environment

Build integration overview: Rational Team Concert and IBM UrbanCode Deploy

Best practices. Starting and stopping IBM Platform Symphony Developer Edition on a two-host Microsoft Windows cluster. IBM Platform Symphony

IBM Security QRadar Version Customizing the Right-Click Menu Technical Note

IBM Storage Management Pack for Microsoft System Center Operations Manager (SCOM) Version Release Notes

IBM Maximo for Service Providers Version 7 Release 6. Installation Guide

IBM Operational Decision Manager. Version Sample deployment for Operational Decision Manager for z/os artifact migration

IBM Storage Driver for OpenStack Version Release Notes

Implementing Enhanced LDAP Security

Performance Tuning Guide

IBM emessage Version 8.x and higher. Account Startup Overview

iscsi Configuration Manager Version 2.0

Patch Management for Solaris

IBM Endpoint Manager Version 9.1. Patch Management for Ubuntu User's Guide

A Quick Look at IBM SmartCloud Monitoring. Author: Larry McWilliams, IBM Tivoli Integration of Competency Document Version 1, Update:

IBM Netcool/OMNIbus 8.1 Web GUI Event List: sending NodeClickedOn data using Netcool/Impact. Licensed Materials Property of IBM

Best practices. Reducing concurrent SIM connection requests to SSM for Windows IBM Platform Symphony

IBM License Metric Tool Enablement Guide

IBM Maximo Spatial Asset Management Version 7 Release 6. Installation Guide IBM

Determining dependencies in Cúram data

Migrating Classifications with Migration Manager

IBM Maximo for Aviation MRO Version 7 Release 6. Installation Guide IBM

Best practices. Linux system tuning for heavilyloaded. IBM Platform Symphony

IBM Security QRadar Version Forwarding Logs Using Tail2Syslog Technical Note

IBM OpenPages GRC Platform - Version Interim Fix 1. Interim Fix ReadMe

Tivoli Access Manager for Enterprise Single Sign-On

IBM i Version 7.2. Systems management Logical partitions IBM

IBM WebSphere Sample Adapter for Enterprise Information System Simulator Deployment and Testing on WPS 7.0. Quick Start Scenarios

IBM. Networking Open Shortest Path First (OSPF) support. IBM i. Version 7.2

IBM. Business Process Troubleshooting. IBM Sterling B2B Integrator. Release 5.2

IBM LoadLeveler Version 5 Release 1. Documentation Update: IBM LoadLeveler Version 5 Release 1 IBM

IBM Maximo Spatial Asset Management Version 7 Release 5. Installation Guide

Installation and Configuration Guide

IBM Extended Command-Line Interface (XCLI) Utility Version 5.2. Release Notes IBM

Tivoli Endpoint Manager for Patch Management - AIX. User s Guide

Requirements Supplement

Integrating IBM Rational Build Forge with IBM Rational ClearCase and IBM Rational ClearQuest

IBM Security QRadar Version 7 Release 3. Community Edition IBM

IBM Storage Driver for OpenStack Version Installation Guide SC

IBM Worklight V5.0.6 Getting Started

IBM Storage Driver for OpenStack Version Release Notes

US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

IBM i2 Analyze ibase Connector Deployment Guide. Version 4 Release 1 IBM

IBM Storage Device Driver for VMware VAAI. Installation Guide. Version 1.1.0

IBM. Avoiding Inventory Synchronization Issues With UBA Technical Note

IBM XIV Provider for Microsoft Windows Volume Shadow Copy Service. Version 2.3.x. Installation Guide. Publication: GC (August 2011)

Integrated use of IBM WebSphere Adapter for Siebel and SAP with WPS Relationship Service. Quick Start Scenarios

Migrating on UNIX and Linux

IBM Spectrum LSF Version 10 Release 1. Readme IBM

IBM InfoSphere Master Data Management Reference Data Management Hub Version 11 Release 0. Upgrade Guide GI

IBM Copy Services Manager Version 6 Release 1. Release Notes August 2016 IBM

IBM. IBM i2 Analyze Windows Upgrade Guide. Version 4 Release 1 SC

Contents. Configuring AD SSO for Platform Symphony API Page 2 of 8

Installing on Windows

IBM i2 ibridge 8 for Oracle

Networking Bootstrap Protocol

Development tools System i5 Debugger

Version 1.2 Tivoli Integrated Portal 2.2. Tivoli Integrated Portal Customization guide

IBM License Metric Tool Version Readme File for: IBM License Metric Tool, Fix Pack TIV-LMT-FP0001

RSE Server Installation Guide: AIX and Linux on IBM Power Systems

Release Notes. IBM Tivoli Identity Manager Universal Provisioning Adapter. Version First Edition (June 14, 2010)

IBM Storage Driver for OpenStack Version Installation Guide SC

Release Notes. IBM Tivoli Identity Manager Rational ClearQuest Adapter for TDI 7.0. Version First Edition (January 15, 2011)

IBM XIV Host Attachment Kit for HP-UX Version Release Notes

IBM FlashSystem V MTM 9846-AC3, 9848-AC3, 9846-AE2, 9848-AE2, F, F. Quick Start Guide IBM GI

IBM XIV Host Attachment Kit for HP-UX Version Release Notes

Implementing IBM Easy Tier with IBM Real-time Compression IBM Redbooks Solution Guide

Tivoli Access Manager for Enterprise Single Sign-On

IBM Geographically Dispersed Resiliency for Power Systems. Version Release Notes IBM

IBM Rational Development and Test Environment for System z Version Release Letter GI

Installing and Configuring Tivoli Monitoring for Maximo

IBM Storage Host Attachment Kit for HP-UX Version Release Notes IBM

IBM. Release Notes November IBM Copy Services Manager. Version 6 Release 1

Release Notes. IBM Security Identity Manager GroupWise Adapter. Version First Edition (September 13, 2013)

IBM Storage Management Pack for Microsoft System Center Operations Manager (SCOM) Version Release Notes IBM

IBM Cloud Object Storage System Version Time Synchronization Configuration Guide IBM DSNCFG_ K

Version 2 Release 1. IBM i2 Enterprise Insight Analysis Maintaining a deployment IBM

IBM Rational Synergy DCM-GUI

IBM FlashSystem V Quick Start Guide IBM GI

Transcription:

IBM Content Analytics with Enterprise Search Version 3.0 Expanding queries and influencing how documents are ranked in the results

IBM Content Analytics with Enterprise Search Version 3.0 Expanding queries and influencing how documents are ranked in the results

Note Before using this information and the product it supports, read the information in Notices on page 13. This edition applies to version 3, release 0, modification 0 of IBM Content Analytics with Enterprise Search (product number 5724-Z21) and to all subsequent releases and modifications until otherwise indicated in new editions. Copyright IBM Corporation 2009, 2012. US Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

Contents Configuring rules to tune queries and rank results............. 1 Configuring query expansion rules... 3 Configuring document ranking filters.. 5 Enabling custom analyzer support....... 6 Creating and deploying a custom analyzer for document ranking filters.......... 7 Sample custom analyzer for document ranking filters 9 Configuring how results are grouped 11 Notices.............. 13 Trademarks.............. 15 Copyright IBM Corp. 2009, 2012 iii

iv IBM Content Analytics with Enterprise Search: Tuning queries

Configuring rules to tune queries and rank results When you configure an enterprise search collection, you can specify rules to alter queries and influence how documents are ranked. You can configure rules for expanding queries, filters for ranking documents, and rules for aggregating documents into groups that can be collapsed and prioritized in the search results. Rule-based query expansion During query expansion, the original query is rewritten into multiple queries according to the rules that you specify. For example, terms in the query might be replaced by terms from a custom dictionary, or additional terms might be appended to the original query terms. You can also specify whether documents that match the expanded query are to be ranked higher or lower than documents that match the original query terms. Document ranking filters You can adjust the precision of how results are ranked according to the relative importance of metadata in the documents. For example, a document in which the search is found in the title field can be ranked higher in the results than a document in which the term is found in the abstract field. Result aggregation rules You can configure rules to aggregate results into different groups, such as based on document type, and specify how the groups are to be prioritized in the results. Copyright IBM Corp. 2009, 2012 1

2 IBM Content Analytics with Enterprise Search: Tuning queries

Configuring query expansion rules You can configure rules to expand queries and influence how documents are ranked. When a user submits a query, the rules are automatically evaluated. If the query terms match the rule, the rule is applied before results are returned. During query expansion, the original query is rewritten into multiple expanded queries (known as interpretations) according to the rules that you specify. For example, terms in the query might be replaced by terms from a custom dictionary, or additional terms might be appended to the original query terms. You can also specify whether documents that match the expanded query are to be ranked higher or lower than documents that match the original query terms. Rules are applied in order according to the expansion type and the order in which they are listed within each expansion type. For example, rules that cause the original query to be replaced by the generated query are applied before rules that expand the original query by appending the generated query (Boolean OR). Both of these rules are applied before rules that control whether results from the original query are ranked higher or lower than results from the generated query. Rules that cause results from the generated query to be ranked higher than the original query are applied before rules that cause results from the original query to be ranked higher than the generated query. Restriction: If you create an expansion rule that references a field that was not yet added to the index, the rule will not be evaluated for search queries until the index is rebuilt. To configure query expansion rules: 1. In the Search pane for an enterprise search collection, click Configure > Rules to tune queries and results. 2. In the Rule-Based Query Expansion area, select the Enable rule-based query expansion check box. 3. Click Edit Rules and click Add Rule. a. Configure how the rule is triggered. 1) In the Condition list, select a comparison rule to specify when a match occurs, such as when a query equals, contains, or starts with a specified term. To trigger the rule for every query, select ANY. 2) If you selected a condition other than ANY, click Add in the Pattern body area to define a term to match. You can specify whether the term to be evaluated is a keyword, a regular expression that uses Perl regular expression syntax, or from a search tuning dictionary. If you specify a keyword term, you can also specify a field in which to search for the keyword. The available fields are those fields that are returnable and enabled for fielded search. If you specify a dictionary, you can specify a particular word in the dictionary to match. If you define multiple pattern body entries, you can move them into the order in which they are to be evaluated. Restriction: Pattern bodies that include a field cannot be moved. b. Configure the action to apply if the rule is triggered (that is, when a user submits a query that matches the specified condition and pattern body). Copyright IBM Corp. 2009, 2012 3

1) Select the type of expansion rule to apply. REPLACE The original query is replaced by the query that is generated when this rule is applied. These rules are always applied before any other types of rules. OR The query that is generated when rules are applied is appended to the original query with a Boolean OR operator. For document ranking purposes, the original query and generated query are treated identically. These rules are applied after the REPLACE rules are processed. NEW OVER ORIGINAL Documents that match the generated query are ranked higher in the results than documents that match the original query. These rules are applied after the OR rules are processed. To apply this expansion rule to all queries that were expanded by previously triggered rules of the same expansion type, select the Apply to all check box. To limit the number of results shown for the expanded query, select the Limited to check box and specify a value. ORIGINAL OVER NEW Documents that match the original query are ranked higher in the results than documents that match the generated query. These rules are applied after the NEW OVER ORIGINAL rules are processed. To apply this expansion rule to all queries that were expanded by previously triggered rules of the same expansion type, select the Apply to all check box. To limit the number of results shown for the expanded query, select the Limited to check box and specify a value. 2) In the Expansion body area, click Add. For each expansion body entry, select an expansion type and define a term to use in the expanded query. If you select Rewrite term as the expansion type, select a pattern body that is to be rewritten by the expansion body. If the pattern body is based on a dictionary term, you can specify how the pattern body is to be rewritten. None Rewrite the query with the specified pattern matching. Expand Expand the aliases of the dictionary term and combine the queries by using a Boolean OR operator. Map Expand the first alias of the matching dictionary term. You can also specify a field in which to search for the term. If you define multiple expansion body entries, you can move them into the order in which they are to be evaluated. 3) Optional: Associate the action with a document ranking filter or filter group by clicking Add in the Document Ranking Filters area. You can specify a filter in which to run the expanded query (by selecting the Include option), or specify to run the query in all filters other than the specified filter (by selecting the Exclude option). You can also run the query in all filters in a document ranking filter group, such as all filters that rank documents at the top of the search results. 4. After you define the expansion rules configuration, you must restart the search servers to apply the changes. 4 IBM Content Analytics with Enterprise Search: Tuning queries

Configuring document ranking filters Configure document ranking filters to adjust the ranking of documents according to the relative importance of their metadata. For example, a document in which the search is found in the title field can be ranked higher in the results than a document in which the term is found in the abstract field. The IBM Content Analytics with Enterprise Search system includes two custom analyzers: Coverage This analyzer extracts keywords from the text and generates a token that contains the combined keywords. For example, this analyzer might produce the token IBMUnitedStates from the text "You and IBM - United States". N-gram This analyzer extracts keywords from the text and treats sequences of n keywords as a single token. For example, this analyzer might produce the tokens IBM, United, States, IBMUnited, United States, and IBMUnitedStates from the text "You and IBM - United States". When you configure document ranking filters, you specify which documents are to displayed at the top of the results (the Top Most filter group), high in the results (the Top filter group), or low in the results (the Bottom filter group). Any additional results are displayed after the results from the Bottom filter group. For example, you want to ensure that documents with titles that exactly match the query text are ranked higher in the results. In that case, you add a filter that consists of the index field named title and the Coverage analyzer to the Top document ranking filter group. The Coverage analyzer tokenizes the text to generate a token in which all of the white space and punctuation are removed from the original text and the words from the text are combined. If the user searches for International Business Machines, documents with the title "International Business Machines" are assigned to the Top document ranking filter group and are displayed high in the results. Documents with the title "IBM" or "About International Business Machines Corporation" do not match the document ranking filter criteria and are displayed lower in the results. You can implement your own custom analyzers in Java by extending the org.apache.lucene.analysis.analyzer package and uploading the analyzers to the IBM Content Analytics with Enterprise Search system. Important: Before you can upload custom analyzers or associate analyzers with fields in the administration console, you must enable the custom analyzer support. For instructions, see Enabling custom analyzer support on page 6. Restriction: If you create a document ranking filter that references a field that was not yet added to the index, the filter will not be evaluated for search queries until the index is rebuilt. To configure document ranking filters: Copyright IBM Corp. 2009, 2012 5

1. If you want to define filters that use custom analyzers that you developed, upload the package that contains your custom analyzers. Open the System view, click the Parse tab, and click Configure custom analyzer packages. 2. Define document ranking filters by associating index fields that are enabled for fielded search with analyzers. In the Parse and Index pane for an enterprise search collection, click Configure > Custom analyzers for document ranking filters and click Associate Analyzer with Field. 3. Configure the document ranking filter groups. In the Search pane for the enterprise search collection, click Configure > Rules to tune queries and results. 4. In the Document Ranking Filters area, select the Enable document ranking filters check box. 5. Click Edit Document Ranking Filters and add document filters to the appropriate document ranking filter group. a. Click the tab for one of the document ranking filter groups, such as the Top Most group, and click Add filters. Each filter consists of an index field and the analyzer to use for parsing the text that is extracted to the specified field. The displayed list includes all fields that are enabled for fielded search and processed by the default IBM Content Analytics with Enterprise Search analyzer, as defined by the configuration of the field, such as exact match or case sensitive. The list also includes all filters that were associated with an analyzer. b. Select the filters to add to the filter group. c. If you want to configure custom scoring for a filter, edit the filter and specify a parametric field to use for scoring and ranking the results. When you associate a parametric field with the filter, the indexed value from the parametric field is multiplied by the original document matching score. d. To define the relative priority of filters in the group, move the filters into the order in which they are to be evaluated. e. Repeat steps 5a to 5d for the other filter groups. 6. After you define the document ranking filters configuration, you must restart the search servers to apply the changes. Enabling custom analyzer support Before you can upload custom analyzers or associate analyzers with fields in the administration console, you must enable the custom analyzer support. To enable the custom analyzer support: 1. Stop the administration console session. v If you use the embedded web application server, enter the command esadmin admin stop. v If you use WebSphere Application Server, enter the command esadmin system stopall. 2. In the config.properties file for the administration console, change the value of the disable.customanalyzer property to false and save the file. The config.properties file is installed in the following locations: v If you use the embedded web application server: ES_INSTALL_ROOT/webapps/adminapp/ESAdmin/WEB-INF/config.properties v If you use WebSphere Application Server: 6 IBM Content Analytics with Enterprise Search: Tuning queries

ES_INSTALL_ROOT/installedApps/ESAdmin.ear/ESAdmin.war/WEB-INF/ config.properties 3. Restart the administration console session. v v If you use the embedded web application server, enter the command esadmin admin start. If you use WebSphere Application Server, enter the command esadmin system startall. Creating and deploying a custom analyzer for document ranking filters You can create custom analyzers for use with document ranking filters in an enterprise search collection. Custom analyzers are used for parsing the text that is extracted to the associated field. A sample analyzer is provided in the ES_INSTALL_ROOT/samples/customAnalyzer directory. Important: Before you can upload custom analyzers or associate analyzers with fields in the administration console, you must enable the custom analyzer support. For instructions, see Enabling custom analyzer support on page 6. To create and deploy a custom analyzer: 1. Develop a Java program that extends the Apache Lucene org.apache.lucene.analysis.analyzer class. This class specifies how to generate tokens when provided with a field name and text. Ensure that the custom analyzer is compatible with Apache Lucene 3.5 libraries. For more information about the org.apache.lucene.analysis.analyzer class, see the Apache Lucene documentation. 2. Package your Java classes into one or more JAR files. Ensure that you include all Java classes and JAR files that are required for the custom analyzer. However, you do not need to include the lucene-core-3.5.0.jar file because it is installed with IBM Content Analytics with Enterprise Search. 3. Create the custom analyzer configuration file. This file specifies the Java class path and the path to the analyzer definition file. In a text editor, create a file with the name stg.xml. The format of the file is an XML file, as shown in the following example. <?xml version="1.0" encoding="utf-8"?> <stg> <descriptor>config/analyzers_definition.xml</descriptor> <classpath> <pathelement path="customcode.jar"/> <pathelement path="dependinglibrary.jar"/> </classpath> </stg> The file contains the following XML elements: descriptor This required element specifies the path to the analyzer definition file that defines one or more custom analyzers. Each analyzer consists of two Java classes that implement the org.apache.lucene.analysis.analyzer class for use at indexing time and run time. The indexing analyzer is used to tokenize documents when parsing and extracting text to the index, and the runtime analyzer is used to tokenize the search query. The format of the analyzer definition file is an XML file, as shown in the following example: Configuring document ranking filters 7

<definition> <field name="myfirstanalyzer"> <indexinganalyzer impl="com.example.my.myindexinganalyzer"/> <runtimeanalyzer impl="com.example.my.myruntimeanalyzer"/> </field> <field name="mysecondanalyzer"> <indexinganalyzer impl="com.example.my.myindexinganalyzer2"/> <runtimeanalyzer impl="com.example.my.myruntimeanalyzer2"/> </field> </definition> The XML file consists of the following elements: field This element specifies a set of indexing and runtime analyzers for each custom analyzer. The value of the name attribute is used as the display name for the analyzer in the administration console. The element must contain <indexinganalyzer> and <runtimeanalyzer> elements. indexinganalyzer The element specifies a class that implements the org.apache.lucene.analysis.analyzer class to tokenize document at indexing time. The impl attribute specifies the name of the class. Ensure that the class can be loaded from the specified class path. runtimeanalyzer The element specifies a class that implements the org.apache.lucene.analysis.analyzer class to tokenize query text. The impl attribute specifies the name of the class. Ensure that the class can be loaded from the specified class path. classpath This element specifies the Java class path. It contains at least one <pathelement> element. There must be separate <pathelement> elements for each JAR file. pathelement This element specifies each entry in the Java class path. The path attribute specifies the path to the JAR file. 4. Add all required files to an archive file that has the.zip file extension. The archive file must contain the custom analyzer configuration file (stg.xml), the analyzer definition file, and all JAR files that contain the Java classes that are used by the analyzer. Save the stg.xml file at the top level of the archive file, as shown in the following example:./stg.xml./config/analyzers_definition.xml./dependinglibrary.jar./customcode.jar 5. In the administration console, deploy the custom analyzer. a. Upload the archive file that contains the custom analyzer. Open the System view, click the Parse tab, and click Configure custom analyzer packages. Click Add Package and browse to the archive file that contains the custom analyzer. b. Define document ranking filters to associate the sample analyzer to one or more index fields that are enabled for fielded search. In the Parse and Index 8 IBM Content Analytics with Enterprise Search: Tuning queries

pane for an enterprise search collection, click Configure > Custom analyzers for document ranking filters and click Associate Analyzer with Field. c. Configure the document ranking filter groups. In the Search pane for the enterprise search collection, click Configure > Rules to tune queries and results and click Edit Document Ranking Filters to add the document filters to a document ranking filter group. Ensure that you selected the Enable document ranking filters check box in the Document Ranking Filters area, and then restart the search servers to apply the changes. Sample custom analyzer for document ranking filters The personname sample analyzer shows how you can create custom analyzers for use with document ranking filters. When documents are parsed, this custom analyzer detects the occurrence of person names that have nicknames. Then the analyzer inserts the nicknames in place of the original names when the text is extracted to the specified field so that users can search for documents by entering the nickname. For example, if the value of a field is "William Smith", the document will be returned if a user enters the search terms "Will Smith" or "Bill Smith". If you created a document ranking filter for this analyzer and added it to the Top document ranking filter group, documents that contain the nicknames in the specified field will be ranked higher in the results. The sample is provided in the ES_INSTALL_ROOT/samples/customAnalyzer directory: Important: Before you can upload custom analyzers or associate analyzers with fields in the administration console, you must enable the custom analyzer support. For instructions, see Enabling custom analyzer support on page 6. To use the sample analyzer: 1. Upload the package that contains the sample analyzer. In the administration console, open the System view, click the Parse tab, and click Configure custom analyzer packages. Click Add Package and browse to the ES_INSTALL_ROOT/ samples/customanalyzer/samplepackage.zip file. 2. Define a document ranking filter to associate an index field that is enabled for fielded search with the sample analyzer. In the Parse and Index pane for an enterprise search collection, click Configure > Custom analyzers for document ranking filters and click Associate Analyzer with Field. For example, associate the author field with the personname sample analyzer. 3. Configure the document ranking filter. a. In the Search pane for the enterprise search collection, click Configure > Rules to tune queries and results. b. In the Document Ranking Filters area, select the Enable document ranking filters check box. c. Click Edit Document Ranking Filters and add the document filter that you created in step 2 to a document ranking filter group. For example, click the tab for Top document ranking filter group, click Add filters, and select the filter. 4. Restart the search servers to apply the changes. 5. In a web browser, open the enterprise search application and search for a nickname. If any documents in the collection contain the name of a person that corresponds to the specified nickname in the specified field, the documents will be ranked higher in the results. Configuring document ranking filters 9

10 IBM Content Analytics with Enterprise Search: Tuning queries

Configuring how results are grouped You can configure rules to display results in different groups and specify the number of results to show in each group. You can also specify whether groups are to be prioritized in the results. The groups are collapsed in the results, which helps ensure that users see diverse documents when they skim a page of results as opposed to seeing entire pages dominated by only one group of results. When results are collapsed, the highest ranking result is typically displayed flush left. One or more lower ranking results are grouped and indented below the first result. A link is provided to enable users to view the full list of results in the group. You can configure when and how results of a search query are to be grouped and prioritized by specifying rules. Each rule consists of a matching query pattern (that is, for what type of query the rule is triggered), and a specific field value to use for grouping the results. You can also configure which groups are to be displayed higher in the results. For example, you configure a rule that when a user on your intranet site enters a query that contains the term software, results that have the value download for the category field are to be grouped. Restriction: If you create a result aggregation rule that references a field that was not yet added to the index, the rule will not be evaluated for search queries until the index is rebuilt. To configure how results are grouped: 1. In the Search pane for an enterprise search collection, click Configure > Rules to tune queries and results. 2. In the Result Aggregation area, select the Enable result aggregation check box. 3. Click Edit Aggregations and click Add Entry to configure a result aggregation rule. a. Configure how the rule is triggered. 1) In the Condition list, select a comparison rule to specify when a match occurs, such as when a query equals, contains, or starts with a specified term. To trigger the rule for every query, select ANY. 2) If you selected a condition other than ANY, click Add in the Pattern body area to define a term to match. You can specify whether the term to be evaluated is a keyword, a regular expression, or from a search tuning dictionary. If you specify a keyword term, you can also specify a field in which to search for the keyword. The available fields are those fields that are returnable and enabled for exact match fielded search. If you specify a dictionary, you can also specify a particular word in the dictionary to match. If you define multiple pattern body entries, you can move them into the order in which they are to be evaluated. Restriction: Pattern bodies that include a field cannot be moved. b. Configure how to group the results when the rule is triggered (that is, when a user submits a query that matches the specified condition and pattern). Copyright IBM Corp. 2009, 2012 11

1) In the Aggregation body area, click Add. For each aggregation body entry, specify a field value by which to group the results and the number of results to display in the group. For example, you can specify that documents that have the value download for the category field are to be grouped together in the results, and that only 5 of those documents are to be displayed in the group. Restriction: The field value in the aggregation body cannot contain an asterisk (*) character. If you define multiple aggregation body entries, you can move them into the order in which they are to be evaluated. 2) If you want the groups to be displayed higher in the results, select the Prioritize these content source groups higher check box. 4. After you define the aggregation configuration, you must restart the search servers to apply the changes. 12 IBM Content Analytics with Enterprise Search: Tuning queries

Notices This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information about the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-ibm product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to: Intellectual Property Licensing Legal and Intellectual Property Law IBM Japan Ltd. 1623-14, Shimotsuruma, Yamato-shi Kanagawa 242-8502 Japan The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-ibm Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. Copyright IBM Corp. 2009, 2012 13

IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information which has been exchanged, should contact: IBM Corporation J46A/G4 555 Bailey Avenue San Jose, CA 95141-1003 U.S.A. Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee. The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or any equivalent agreement between us. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. Information concerning non-ibm products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-ibm products. Questions on the capabilities of non-ibm products should be addressed to the suppliers of those products. All statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. 14 IBM Content Analytics with Enterprise Search: Tuning queries

Each copy or any portion of these sample programs or any derivative work, must include a copyright notice as follows: (your company name) (year). Portions of this code are derived from IBM Corp. Sample Programs. Copyright IBM Corp. _enter the year or years_. Portions of this product are: v Oracle Outside In Content Access, Copyright 1992, 2012, Oracle. v IBM XSLT Processor Licensed Materials - Property of IBM Copyright IBM Corp., 1999-2012. This product uses the FIPS 140-2 approved cryptographic provider(s); IBMJCEFIPS (certificate 376) and/or IBMJSSEFIPS (certificate 409) and/or IBM Crypto for C (ICC (certificate 384) for cryptography. The certificates are listed on the NIST web site at http://csrc.nist.gov/cryptval/140-1/1401val2004.htm. Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml. The following terms are trademarks or registered trademarks of other companies: Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Intel and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Other company, product, or service names might be trademarks or service marks of others. Notices 15

16 IBM Content Analytics with Enterprise Search: Tuning queries

Product Number: