IBM Connections Search: Troubleshooting and Best Practices 5/14/2014 Greg Presayzen Client Technical Professional Mark McCarville Advisory Software Engineer Click to add text IBM Collaboration Solutions Powered by IBM SmartCloud Meetings
Who are we? 7+ years of Connections experience and in Connections L2 Support Client Technical Professional for Social Collaboration and Exceptional Digital Experience in the Northeast US Based out of Boston, MA 7+ years of Connections experience Search Developer Based out of Dublin, Ireland 2
Agenda Architectural Overview of IBM Connections Search Catalog Index, and Activity Steam Index Search Index Usage Search Troubleshooting - Server Status - Server Stats - Search Resources Q&A 3
IBM Connections Search The Search service provides a point for performing full text and tag searches across all the deployed IBM Connections applications. Search is a required application for all IBM Connections applications, and it must be running to prevent unexpected behaviors in the other applications. IBM Connections Search is based on multifaceted search technology and uses related people, related dates, related tags, and source application facets. This information enables users to drill down into specific facets to find the content that they want without having to page through large numbers of results. 4
IBM Connections Search Index The Search application uses a Lucene 3.0.3 index, supplemented by social facet information The location of the Search index is mapped to an IBM WebSphere Application Server variable, SEARCH_INDEX_DIR Search uses the WebSphere Application Server scheduling service for creating and updating the Search index. IBM Connections applications maintain delete and access-control update information for a maximum of 30 days If indexing is not performed on an index for 30 days, that index is considered to be out-of-date and reindexing is necessary The index must be deployed on each node running the Search enterprise application. Because of the new features introduced in IBM Connections 4.5, you cannot migrate an index from a previous versions of the product. 5
The Indexing Process The Search index is generated by retrieving information from each of the applications. Search indexing happens in several stages: Crawling Crawling is the process of accessing and reading content from each application in order to create entries for indexing File content extraction Search provides a document conversion service to extract the content of the files to be indexed Indexing During the indexing phase, the entries in the persisted seedlists are processed into Lucene documents, which are serialized into a database table that acts as an index cache. 6
The Indexing Process Index building Index building refers to the deserialization and writing of the Lucene documents into the Search index Post Processing Post-processing work takes place on the new index entries to add additional metadata to the search results This work includes bookmark rollup and the addition of file content to Files search results. 7
8
Indexing Types Initial indexing Incremental indexing Foreground Indexing Background Indexing The initial index is built using the default 15minsearch-indexing-task. An index is built using the SearchService.startBackgr oundindex command. This index is used for searching and for further indexing. The database cache is not used. This index is not used for searching. The database cache is not used. The index is updated using the default 15min-searchindexing-task. A background index can be updated using the SearchService.startBackgr oundindex command. This index is used for searching and for further indexing. The database cache is used. This index is not used for searching. The database cache is not used. 9
Initial and Background Indexing Crawl all pages of the seedlist and persist them to disk. Extract the file content and persist it to disk. Crawl a seedlist page from disk. Index the seedlist entries into Lucene documents. Write the documents to the Lucene index. Repeat until all the persisted seedlist pages have been crawled. 10
Incremental Foreground Indexing The node that has the scheduler lease crawls all the pages of the seedlist and persists them to disk. Crawl a seedlist page from disk. Index the seedlist entries into Lucene documents. Serialize the Lucene documents into the database cache. Send a JMS message to all Search nodes to alert them of the completion of the serialization. Each node deserializes the Lucene documents into the Lucene index. 11
Bookmark Rollup Bookmark rollup refers to the process of aggregating the information for public bookmarks that point to the same URL For example, if 1000 users create a public bookmark for the same URL, when someone searches for that URL, a single bookmark is returned instead of 1000 search results. The bookmark that is returned includes the information for all 1000 bookmarks rolled up into a single search result All of the tags and people associated with each of the individual bookmarks are now associated with the one document. If two users bookmark the same internal document, for example, a wiki page, then the wiki page gets rolled up with the bookmark The tags and people associated with the bookmark and the wiki page are combined into a single document. 12
Agenda Architectural Overview of IBM Connections Search Catalog Index, and Activity Steam Index Search Index Usage Search Troubleshooting - Server Status - Server Stats - Search Resources Q&A 13
Communities Catalog Index The Communities catalog enables you to display content from IBM Lotus Quickr places in the Communities application You need both a catalog index folder and a catalog shared replication folder when you add IBM Connections nodes to an existing catalog The variable CATALOG_INDEX_DIR contains the directory path to the catalog index folder The variable CATALOG_REPLICATION_DIR contains the directory path to the shared replication folder. All nodes need access to this shared folder http://www-10.lotus.com/ldd/lcwiki.nsf/xpdocviewer.xsp? lookupname=ibm+connections+4.5+documentation#action=opendo cument&res_title=configuring_communities_catalog_settings_ic45&c ontent=pdcontent 14
Activity Stream Search The activity stream search service provides an indexing and search infrastructure that is bundled with the News application. This service provides search capabilities over the activity stream The variable ACTIVITY_STREAM_SEARCH_INDEX_DIR contains the directory path to the activity stream search index The variable ACTIVITY_STREAM_SEARCH_REPLICATION_DIR contains the directory path to the stream search replication folder. You need one shared replication folder for each server cluster http://www-10.lotus.com/ldd/lcwiki.nsf/xpdocviewer.xsp? lookupname=ibm+connections+4.5+documentation#action=opendoc ument&res_title=activity_stream_search_ic45&content=pdcontent 15
Dictionaries Searching over Multiple Languages Only the English language dictionary is enabled during installation For non-english deployments, enabling multilingual support for Search is a mandatory post-installation step Enabling multiple dictionaries ensures better quality search results when your user base is multilingual Enabling additional dictionaries adds a performance cost at indexing time and will increase the size of the index Enabling multiple dictionaries ensures better quality search results when your user base is multilingual 16
Organization Tag Cloud - The organization tag cloud on the Profiles Directory page consists of the 50 most popular tags in the entire organization, based on the frequencies of tags. - The number of tags in the organization tag cloud is not configurable. - The organization tag cloud is built from the indexes, so there will be delays in displaying new popular tags. - The organization tag cloud data is cached internally in the application. - A delay of up to 30 minutes is not unusual for updating the display of new popular tags in the organization tag cloud.
Search API All content can be searched by the search APIs: Refer to the following info center topic for details: http://www-10.lotus.com/ldd/appdevwiki.nsf/xpdocviewer.xsp? lookupname=ibm+connections+4.5+api+documentation#action =opendocument&res_title=search_api_ic45&content=pdcontent
Agenda Architectural Overview of IBM Connections Search Catalog Index, and Activity Steam Index Search Index Usage Search Troubleshooting - Server Status - Server Stats - Search Resources Q&A 19
How does Profiles search work: The following searches from the UI use the Profiles database: 20
How does Profiles Name Search Logic work? - Data on the names table - Database searches for names are performed against two tables: GIVEN_NAME and SURNAME. - Tables hold names and their alias - During data population using TDI or Admin APIs, the values are converted to lower cases for these tables. - Name alias are created through an internal program for most of the common English first names.
How does Profiles search work: All Profiles content is searchable: - User profiles: name, phone numbers, title, departments, city, and so on. All extension attributes are searchable: - Simple, rich text and XML extension attribute types Tags for users: - All profile search queries are case in-sensitive. - A search containing only wild card characters is not supported.
How does Profiles Name Search Logic work? User input of a single word - 'david - We would search for users whose: i). first name starts with david or ii). last name is david User input with two words - 'david jones', we would search for users whose: i). first name is: david and last name is: jones or ii). first name is: david jones or iii). last name is: david jones
How does Profiles Name Search Logic work? - Search query with a comma: - Query before the comma is treated as a user's last name. - Query after the comma is treated as the user's first name.
Index Search Usage Advanced Search - The following Search from the UI use the search index:
Index Search Usage Advanced Search - Wildcard searches - Search supports single and multiple character wildcard searches within single terms, but not in phrases. - You cannot use the question mark (?) or asterisk (*) wildcards as the first character of a search string. - Single character wildcard searches - Use the question mark (?) - the Search application looks for terms that match when the single character is replaced by another character. - Example: if you enter te?t as a search string, the results might include information containing the terms text and test. -
Index Search Usage Advanced Search - Multiple character wildcard searches - To perform a multiple character wildcard search, use the asterisk (*). This type of search looks for zero or more characters. - Example: if you enter test* as a search string, the results might include information containing the terms test, tests, and tester.
Index Search Usage Advanced Search - Search Operators - OR - search for content that contains either word. - AND - search for content where both terms exist anywhere in the text of a single document. - plus sign (+) to combine search words - applies only to the word immediately following it. Example: to search for information that must contain car and can contain motorcycle, enter the following query: +car motorcycle Example: to search for information that contains car but not motorcycle, enter one of the following queries: car NOT motorcycle car -motorcycle
Index Search Usage Advanced Search - Search Operators - When your search term contains one of the nonalphanumeric characters listed here - use a backslash before using any of these characters: + - &&! ( ) { } [ ] ^ " ~ *? : \ - Example: if you want to search for information that contains the text string cat + dog: cat \+ dog "cat + dog"
Agenda Architectural Overview of IBM Connections Search Catalog Index, and Activity Steam Index Search Index Usage Search Troubleshooting - Server Status - Server Stats - Search Resources Q&A 30
Search Troubleshooting What is a Connections Search Issue? Runtime: - Expected search results not getting returned - Inconsistent results - Index is not returning latest content Install: - Initial index creation failed: - CLFRW0075W: Failed to load the index at startup, it may not have been created yet
Search Troubleshooting How do you quickly look at the health of the index? Enable the Server Status web page (replaced the search validation tool in 4.0): ServerStatus page Highlights configuration issues along with an analysis of the latest Search logs (per Node) Helps debug issues on a system. Your first stop in troubleshooting a search issue Server Status Page Demo!
Search Troubleshooting Search Metrics Provides metrics for search Search metrics, enter the following URL in your browser: http://servername.com:port/search/serverstats
Search Troubleshooting - Overview Index automatically configured during installation. Search starts working after: Search app is started on all nodes Index has been build on one node Index.ready is created Index copied to all other nodes Each application has its own index folder. Each Node has its own local index. 34
Search Troubleshooting Crawling and Indexing How do you verify crawling is taking place? Look for the following messages in the logs CLFRW0297I: Search is starting to crawl the {0} component CLFRW0294I: Search has finished crawling the {0} component How do you verify indexing is taking place? Look for the following messages in the logs CLFRW0588I: Search is starting to index the {0} component. CLFRW0576I: Search has finished indexing the {0} component. 35
Search Troubleshooting Crawling and Indexing How do you know that the index, crawling and post processing is complete? Look for the following messages in the logs: [7/11/12 15:46:05:356 IST] 00000020 IndexBuilder I com.ibm.connections.search.process.incremental.indexbuilder postprocess CLFRW0580I: Search has finished StatusUpdatesPostProcessor post-processing of the index. [7/11/12 15:46:05:351 IST] 00000020 IndexBuilder I com.ibm.connections.search.process.incremental.indexbuilder postprocess CLFRW0580I: Search has finished BookmarksPostProcessor post-processing of the index. [7/11/12 15:46:05:355 IST] 00000020 IndexBuilder I com.ibm.connections.search.process.incremental.indexbuilder postprocess CLFRW0580I: Search has finished FilesPostProcessor post-processing of the index. 36
Search Troubleshooting - Validating Search Seedlists How do we know the seedlist is working correctly? - Look for the following messages in the logs CLFRW0262I: Seedlist validation successful. OR - Server Status webpage What is a seedlist? - xml representation of the index content 37
Search Troubleshooting Crawling Version of Index How do we know if the index is consistent on all nodes? They will have the same crawling version. The crawling version is recorded in a file name CRAWLING_VERSION.1343218215356 Does the out of sync index recover to a consistent state? Yes this functionality is built in! Replacing an index on a node: http://www10.lotus.com/ldd/lcwiki.nsf/xpdocviewer.xsp? lookupname=ibm+connections+4.5+documenta tion#action=opendocument&res_title=restoring_ a_search_index_in_an_environment_with_multip le_nodes_ic40&content=pdcontent 38
Search Troubleshooting Common Issues Why do I see Failed to load the index at startup and how to fix? From the UI: Failed to load the index at startup, it may not have been created yet From the SystemOut.logfile: CLFRW0394E: Search indexing of services [profiles, dogear, communities, activities, blogs, forums, wikis, files, status_updates, calendar] to directory /opt/ibm//data/search/index has failed. You should examine the logs to determine the cause of the failure and correct it. Solution: Make an index available for users and then fix the problem. 39
Search Troubleshooting Common Issues Failed to load the index at startup 1. Disable the current Search indexing task until the problem is resolved. SearchService.disableTask(). 2. Review Search SystemOut.log and determine which of the services are not being crawled or indexed "CLFRW0283E: Search has encountered a problem while crawling." or "CLFRW0027E: Error Indexing component {0} for search". 3. Create a once-off task to index all services except the problem service or services. SearchService.indexNow("blogs,communities,profiles,status_updates,files,wikis, forums,dogear"). This command updates the index and creates INDEX.READY. 4. Copy the index to all nodes in the deployment. Users can use40this index to search for content
Search Troubleshooting Common Issues Failed to load the index at startup 5. Determine why the all the services were not crawled or indexed as expected. See search/serverstatus webpage 6. Back up the current Search index 7. Make a copy of the index backup on the local file system (not a network share). This copy is referred to the as the "working copy". (When you work with a background index, you must perform all operations on a copy of the background index) 8. Update the working copy of the index with the content from all services by using the command SearchService.startBackgroundIndex(). 9. When the background index operation completes, you must roll out the working copy of the index across the deployment. 10. Re-enable the disabled tasks that were disabled in Step 1. 41
Search Troubleshooting Common Issues Failed to load the index at startup Steps also listed in the info center: Reference: http://www-10.lotus.com/ldd/lcwiki.nsf/xpdocviewer.xsp? lookupname=ibm+connections+4.5+documentation#action=opendocument&re s_title=troubleshooting_exceptions_when_creating_an_initial_search_index_ic 45&content=pdcontent 42
Search Troubleshooting Best Practices Summary 1. Analyze results from the search server Status page to help isolate problem area! 2. After initial installation, confirm the health of the index on all nodes 3. Remember to have a backup! Confirm nightly backups are being taken. (look at the backup folder and there should be an index from the previous night) Perform regular backups of your backup Search index onto another drive/nas! 4. If you see an error, the Error messages page in the info center is a good first place to look http://www-10.lotus.com/ldd/lcwiki.nsf/xpdocviewer.xsp? lookupname=ibm+connections+4.5+documentation#action=opendocument&res_title=search_error_messa ges_ic45&content=pdcontent 5. No need to re-create the index! Utilize the background index process instead to troubleshoot the index without incurring down time. 43
Search Resources Error messages page in the info center is a good first place to look if you have an error code from the logs: http://www-10.lotus.com/ldd/lcwiki.nsf/xpdocviewer.xsp? lookupname=ibm+connections+4.5+documentation#action=opendocument&res_title =Search_error_messages_ic45&content=pdcontent How to administer search in the info center: http://www-10.lotus.com/ldd/lcwiki.nsf/xpdocviewer.xsp? lookupname=ibm+connections+4.5+documentation#action=opendocument&res_title =Administering_Search_ic45&content=pdcontent Verifying Search Index section in the info center: http://www-10.lotus.com/ldd/lcwiki.nsf/xpdocviewer.xsp? lookupname=ibm+connections+4.5+documentation#action=opendocument&res_title =Verifying_Search_ic45&content=pdcontent 44
Q&A 45
Press *1 on your telephone to ask a question. Visit our Support Technical Exchange page or our Facebook page for details on future events. To help shape the future of IBM software, take this quality survey and share your opinion of IBM software used within your organization: https://ibm.biz/bdxqb2 IBM Collaboration Solutions Support page http://www.facebook.com/ibmlotussupport IBM Collaboration Solutions Support http://twitter.com/ibm_icssupport 46