Processing Troubleshooting Guide March 5, 2018 - Version 9.5.411.4 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - For the most recent version of this document, visit our documentation website.
Table of Contents 1 Overview 3 2 Retrying an error 3 3 Monitoring worker activity 6 4 Sorting by thread count 6 5 Identifying a stuck job 7 5.1 Out-of-proc jobs 8 6 Identifying a job stuck in the queue 8 7 Addressing a stuck job 10 8 Stopping a worker 10 9 Terminating a single job 11 10 Clearing empty rows and columns 15 11 Getting a stack dump for a stuck job 17 11.1 Connection issues 22 Relativity Processing Troubleshooting Guide - 2
1 Overview This document provides insights and procedures for using the Relativity Processing Console (RPC) to troubleshoot processing issues in your Relativity environment. 2 Retrying an error If you need to retry an error from the RPC, perform the following steps: 1. Open the matter inspector for the job that contains the errors. 2. Sort by Message to locate the documents in error. 3. Right-click on any file you wish to attempt to retry. You can rediscover, regenerate images or extract text as appropriate for the nature of the error. Relativity Processing Troubleshooting Guide - 3
4. If you selected Regenerate Images/OCR, the Job Settings window will pop up. Select the Image tab and check the box for Overwrite intermediate files and uncheck the box for Preserve existing pages if they are not already set that way. If there are other changes needed on the Document tab, make those changes now then click OK to queue the selected file as a regenerate images job in the queue. Relativity Processing Troubleshooting Guide - 4
Note: The Extract Text option does not first bring up the Job Settings window as the Regenerate Images/OCR option does. If you wish to make changes to the settings before executing extracting text you need to right-click on the appropriate import job in the Data Stores window and select Settings. If the selected document has already been text extracted, nothing will happen unless you check the Overwrite intermediate files setting on the Text tab. If you chose to rediscover a file that was previously exported, you'll see a warning. Relativity Processing Troubleshooting Guide - 5
Click Yes or No, depending on your needs. 3 Monitoring worker activity The worker activity pane allows you to gain insight into each task that each thread is performing. In the task pane, you can see which tasks are being performed on the worker while a job is running. You can generally use the worker activity and task panes as starting points for identifying potentially problematic threads and jobs. The following sections provide descriptions of how to use other areas of the RPC to troubleshoot issues with your processing jobs. 4 Sorting by thread count It s often best to sort the task pane by the thread count when looking at a problematic job in the RPC. This allows you to keep all tasks that are active at the top of the worker activity pane, which lets you more easily identify how long these tasks are taking and what jobs might be stuck. To sort by thread count: 1. Select the worker whose threads you want to see. 2. Click on the Thread column once to sort the list from the most threads to the least. Relativity Processing Troubleshooting Guide - 6
If you have a job that doesn t seem to be progressing, you can first view the job in the job activity pane to see when the last activity was for this job. Note that you re only able to see last activity on Inventory, Import or Data Extract jobs. Publish jobs do not populate a last activity. 5 Identifying a stuck job The following characteristics usually indicate a job that is stuck because a single document is encountering issues and is clogging a thread. Look for the following if you suspect that this is the case: Relativity Processing Troubleshooting Guide - 7
There are jobs pending in the queue. The worker is still running. There is a significantly higher than normal CPU activity percentage. 5.1 Out-of-proc jobs An out-of-proc job is a job that an external process is completing, specifically one that is outside of the Invariant worker process. Invariant generates an out-of-proc call when all threads are in use and are using up a lot of memory. For example, if there are 16 threads allocating 500 MB of memory, Invariant usually generates an out-of-proc call. Note the following about out-of-proc calls: As a general rule, if the document that you want to process is processed by a third-party application like Excel, then it s not necessary for Invariant to spin up a worker proc 64 or worker proc.exe to tell Excel to open the file. This is because Excel is itself an external program. Out-of-proc calls are managed by either the workerproc.exe or the workerproc64.exe. 6 Identifying a job stuck in the queue If you need to identify which jobs are still stuck in the queue, perform the following steps: 1. Open the Properties window in the RPC for the worker encountering issues. 2. Select the Active Jobs in the Properties window. 3. There are two jobs present in the resulting window. One is the root job and the other is the child job being performed as a result of, in this case, the image generation. Differentiate between the root job and child job by checking the JobID value against the RootJobID value if they match, you re looking at the root job. Relativity Processing Troubleshooting Guide - 8
4. Click on the Parameters ellipsis to open the parameters of the job. In the Members window, select the Invariant.Data.Matter parameter to open its properties. In the properties list for the Invariant.Data.Matter parameter, scroll down and locate the StoredAs property. This tells you where the file actually lives on the network share and where the native resides. Relativity Processing Troubleshooting Guide - 9
7 Addressing a stuck job Stopping and starting the worker in order to address a stuck job is often not the best option because it does nothing to address the fact that the file is corrupted. Before terminating the Invariant worker process, terminate the external process that is currently running (the out-of-proc call). Never delete a job in the RPC. If you identify an out-of-proc call that is stuck, perform the following steps to fix it: 1. Remote into the worker on which the job is stuck, right-click, and select Start Task Manager. 2. Scan the Task Manager for the out-of-proc call that you believe is stuck. It is most likely the call that s using up the most memory. Note that a stuck out-of-proc call doesn't holding up the work of the entire worker, just the thread. 3. Debug the job by terminating the individual worker process. For example, if you discover that the Image Name workerproc.exe *32 is the stuck job, end that job. This way, Invariant can simply error out the document, put it in the error log, and allow you to proceed with the rest of the job. In this case, don t terminate the Invariant worker because some other worker will pick up that job and that new worker will run into the same issue as the old one, thus leading to a potential infinite loop. 4. Once the problem document receives its error as a result of you terminating the job, then restart the worker and let it complete the rest of the files. 8 Stopping a worker You may have to stop the worker in the RPC in order to identify threads that are stuck in your job. If you have multiple threads that are busy but only two of them are stuck, click the Stop button on the worker in the RPC. This is better than logging the worker off from the front end, as doing this will send that worker offline entirely. Relativity Processing Troubleshooting Guide - 10
Doing this allows all the other, functional threads to finish the work they were already doing, clean up after themselves, and disappear from the active thread list because they won t take on any new work while the worker is stopped. Meanwhile, the threads that are stuck will remain stuck and you can recognize them as such. They will remain in the worker item list as being active even though the worker is stopped. In this case, stopping the worker helps you identify problematic threads and significantly reduce the scope of what you have to look at to diagnose the problem. If you stop a job before you stop your workers, you will encounter issues because stopping a job doesn t mean that the workers will stop working on the jobs they ve already picked up. Thus, always stop the workers before you stop a job in the RPC. 9 Terminating a single job When you take the worker offline, the queue manager simply moves that task to another worker, at which point it can become stuck again, thus creating an endless loop. If you have a single stuck task, for example a single Excel file, you can terminate that one task instead of taking the worker offline. First you need to isolate this file, as you may have many tasks running on a worker, given that up to 16 threads can be active at any given time. To isolate the stuck task, you can use the stop button in the Worker Activity pane to not allow the worker to pick up any new tasks from the queue or you can move the worker to a group that does not have any jobs currently assigned to it. Relativity Processing Troubleshooting Guide - 11
To move the worker, perform the following steps: 1. Select the worker you want to move. 2. Select the down arrow on the Workgroup drop-down and select the group to which you want to move the worker. 3. Confirm that the worker has been moved to the correct workgroup. You can now use the properties pane to view what file this thread is currently working on. From there you can log in to the worker using the RDC functionality within the RPC to terminate that process. Terminating the single task automatically brings that job back with a new process ID and retries it. To terminate a single task, perform the following steps: 1. Open task manager on the worker. 2. Identify the process running that you will need to end, for example a stuck Excel file. Relativity Processing Troubleshooting Guide - 12
3. Right-click on the Image Name and select End Process. Relativity Processing Troubleshooting Guide - 13
Relativity Processing Troubleshooting Guide - 14
4. Click End Process again on the confirmation window. Check the Memory and CPU settings in the Task Manager and note that the job has started again, as indicated by the fluctuating values. 5. Repeat these steps to make the file error out. 10 Clearing empty rows and columns If you have a long-running job, it may be due to the Excel Iterating Rows function, which is one of the more time-consuming functions that the Excel handler performs. An Excel file with hundreds of thousands of rows will often result in a stuck job. To address this issue and speed up the job: 1. Take the worker offline in the middle of the job. 2. Scroll down to the Job Activity window at the bottom of the pane, right-click the job, and select Settings. 3. In the Job Settings window, uncheck the options to Clear empty rows and Clear empty columns. This will eliminate the need for the Excel handler to perform the extra work of clearing those empty rows and columns, thus reducing the effort required for the iterating rows function. Relativity Processing Troubleshooting Guide - 15
4. Turn the worker back on and monitor the progress of the job to ensure that it is performing better. 5. The worker will pick the job up again and it won t have to perform the Iterate Rows function. Relativity Processing Troubleshooting Guide - 16
11 Getting a stack dump for a stuck job If you have any stuck jobs and you anticipate needing to terminate worker processes, your first course of action should be to get a stack dump for the stuck files. A stack dump file could be invaluable because it can tell you exactly where and when any problematic processes stalled out. A stack dump file is useful any time you can confirm that there is a job in your environment that is either stuck completely or running extremely slowly. To get a stack dump: 1. Stop the worker and allow all threads to finish the work they were performing. 2. Download the ProcDump executable from Windows Sysinternals at the following address: http://- technet.microsoft.com/en-us/sysinternals/dd996900.aspx. 3. Start the Task Manager in the RPC. All the files that appear in the Task Manager are the files that are stuck. 4. From this list, find the file in the Image Name column that you want to perform a stack dump on, and copy that file s process ID. Note that if you right-click on the image name and select the Create Dump File option, you won t be able to use the results of that stack dump because it will create a 64-bit file and most of the processes you need to troubleshoot are 32-bit. Therefore, it s recommended that you perform the remaining steps to get a properly-formatted dump file. Relativity Processing Troubleshooting Guide - 17
5. Get the appropriate proc dump syntax from the Examples list on Windows Sysinternals. Relativity Processing Troubleshooting Guide - 18
6. Enter the syntax for the C drive as a command line, making sure to include the process ID for the file you selected from the Task Manager. Relativity Processing Troubleshooting Guide - 19
7. Click Enter to execute the stack dump and review the results. You can verify that the dump occurred by scrolling down the results and finding the name of the file you selected in the Task Manager. It will be listed as a.dmp file. Relativity Processing Troubleshooting Guide - 20
Relativity Processing Troubleshooting Guide - 21
8. Create a zip file of sthe.dmp file and convert it to an FTP file. 11.1 Connection issues You may encounter connection issues that cause jobs to hang in the RPC. For example, a message that states, "An error occurred because a transaction is still pending on this connection" occurred on the queue manager when the system tried to access the connection and then caused the queue manager to close the connection, which then caused an endless loop trying to recover. If you're encountering issues in the RPC related to too many connections to the SQL Server or workers being unable to connect to the SQL Server, you might have TCP offloading enabled for your environment. To resolve any such issues in the RPC, you'll need to disable TCP offloading on your worker manager server and workers. For instructions on how to do this, see the Environment Optimization guide. Note: For more information on TCP offloading, see https://support.microsoft.com/en-us/kb/951037. Relativity Processing Troubleshooting Guide - 22
Proprietary Rights This documentation ( Documentation ) and the software to which it relates ( Software ) belongs to Relativity ODA LLC and/or Relativity s third party software vendors. Relativity grants written license agreements which contain restrictions. All parties accessing the Documentation or Software must: respect proprietary rights of Relativity and third parties; comply with your organization s license agreement, including but not limited to license restrictions on use, copying, modifications, reverse engineering, and derivative products; and refrain from any misuse or misappropriation of this Documentation or Software in whole or in part. The Software and Documentation is protected by the Copyright Act of 1976, as amended, and the Software code is protected by the Illinois Trade Secrets Act. Violations can involve substantial civil liabilities, exemplary damages, and criminal penalties, including fines and possible imprisonment. 2018. Relativity ODA LLC. All rights reserved. Relativity are registered trademarks of Relativity ODA LLC. Relativity Processing Troubleshooting Guide - 23