TROUBLESHOOTING PROCEDURE FOR SYSTEM STALL ISSUES

Similar documents
Pulse Secure Access Cluster Upgrade

Troubleshooting. Backing up the Monitoring Database in a Multi-Node Setup. This chapter contains the following sections:

Reset the Admin Password with the ExtraHop Rescue CD

D-Link (Europe) Ltd. 4 th Floor Merit House Edgware Road London HA7 1DP U.K. Tel: Fax:

Product Family: Networking Hardware Number: AN-NH-012

Pulse Connect Secure Release Notes

EAN-Managing the Parameter File

Installing Double-Take

User Guide. Informatica Log Express

Web Self Service Administrator Guide. Version 1.1.2

Arrow Contract Management System. Electronic Tendering Guide

FACULTY OF ENGINEERING. Lab Experiment

Installing Cisco StadiumVision Director Software from a DVD

Performing Administrative Tasks

Upgrading an ObserveIT One-Click Installation

IVE Quick Startup Guide - OS 4.0

Renaming/Moving Parameter Files

Management Console User s Guide (AST 2500 Chipset)

Overview of the Cisco NCS Command-Line Interface

IBM Single Sign On for Bluemix Version December Identity Bridge Configuration topics

Create a pfsense router for your private lab network template

Using the SSM Administration Console

Adding a Boot Disk Entry to an HP Integrity rx2660

FACULTY OF ENGINEERING. Lab Experiment

Magento Survey Extension User Guide

Secure Access Troubleshooting Rewrite related issues (Core/Web Based Access)

ATTACHMENT MANAGEMENT USING AZURE BLOB STORAGE

Administration of Cisco WLC

Reimage Procedures. Firepower 2100 Series Software Reimage and Disaster Recovery

Cisco Videoscape Distribution Suite Transparent Caching Troubleshooting Guide

Service Profiles and Templates

Performing an ObserveIT Upgrade Using the Interactive Installer

P.E.O. STAR Scholarship Online Recommendation Instructions

USER MANUAL TABLE OF CONTENTS. Custom Auto Numbering. Version: 1.0

8.0 Help for Community Managers Release Notes System Requirements Administering Jive for Office... 6

MAGNUM-SDVN Security Administration Manual

VidyoGateway. Administrator Guide. Product Version Document Version A February, 2017

How to Register and Manage Buyer Accounts for a Customer Organization on Version 5 April 2007

JADCONFIG INSTRUCTIONS

Smartware Automated Services August 2006

NETS KVM Setup. July 11, 2006

Migrate Cisco Prime Collaboration Assurance

JADCONFIG INSTRUCTIONS

After you install WatchGuard XCS v10.2, make sure you install any additional software updates available for this release.

User and System Administration

Tanium Endpoint Detection and Response. (ISC)² East Bay Chapter Training Day July 13, 2018

SCI Testers and Remote Communication: Using HyperTerminal

THE LHH. For New Users. Using a webcam, you can record your responses to standard interview questions and assess your performance.

ATTACHMENT MANAGEMENT USING AZURE BLOB STORAGE

System Manager Unit (SMU) Hardware Reference

SparkFun Blocks for Intel Edison - Base Block

Loading REG-PED firmware with REG-PEx Loader (RPL)

Technical Support Information

Ruckus Wireless ZoneFlex (ZoneDirector and ZoneFlex Access Points) Release Notes. October 26, 2012

Installing and Configuring vcloud Connector

VMware vrealize Operations for Horizon Installation

Guideline for the installation of C-MOR Video Surveillance Virtual Machine on VMware ESX Server

User Guide on SkillsFuture Credit (Individuals)

Upgrade 6081RC Firmware

McAfee Advanced Threat Defense Migration Guide

UPGRADING ARCTIC CONTROL S FIRMWARE

Abstract. Avaya Solution & Interoperability Test Lab

Using the Certificate Authority Proxy Function

Cloud Help for Community Managers...3. Release Notes System Requirements Administering Jive for Office... 6

Avaya Aura System Manager 5.2 HA and CLI Restore

This Tech Note provides instructions on how to upgrade to ClearPass 6.7 from versions 6.5 and later.

All Rights Reserved. Rev. 1.0, 2003

Encode Rule Explorer App v1.0.2 for IBM QRadar Documentation

ilo MP Utilities ilo MP

Author A.Kishore/Sachin WinSCP

SSL VPN Reinstallation

Configuring the SMA 500v Virtual Appliance

VNS3 3.5 Upgrade Instructions

SpringSource Tool Suite 2.3.2

Tanium Incident Response User Guide

Quick Note 24. Extracting the debug.txt file from a TransPort. Digi Technical Support. February Page 1

ATTACHMENT MANAGEMENT USING AZURE BLOB STORAGE

VMware vrealize Operations for Horizon Installation. VMware vrealize Operations for Horizon 6.5

How to create a System Logon Account in Backup Exec for Windows Servers

What can I do in the settings menu of my WL-330 (which settings are there and what can I change)?

How to load the Discipline file in Data Pipeline

Managing Rack-Mount Servers

docalpha Installation Guide

Carbon Black QRadar App User Guide

INPEX CONTRACT MANAGEMENT SYSTEM

dctrack Quick Setup Guide Virtual Machine Requirements Requirements Requirements Preparing to Install dctrack

Firepower Management Center High Availability

DYNAMICS 365 BUSINESS PROCESS VISUALIZATION USING VISIO

HOW TO INSTALL E-LD TO E-xD USING CONNECTION TO USB CONSOLE

Stratusphere FIT & Stratusphere UX

Oracle Sourcing Support Helpdesk: Telephone: > Option

Release Notes for Cisco Smart Care Services Version 1.4.2

VMware vrealize Operations for Horizon Installation. VMware vrealize Operations for Horizon 6.3

ATTACHMENT MANAGEMENT USING AZURE BLOB STORAGE

Design Specification For Tribal Knowledge

Log & Event Manager UPGRADE GUIDE. Version Last Updated: Thursday, May 25, 2017

Thank You for purchasing our firmware. This guide aims to guide you one step at a time to mod your Seagate GoFlex Satellite(GFS)!

VMware vrealize Log Insight Getting Started Guide

FIPS Mode Setup

7-1. This chapter explains how to set and use Event Log Overview Event Log Management Creating a New Event Log...

Transcription:

TROUBLESHOOTING PROCEDURE FOR SYSTEM STALL ISSUES SA, MAG, IC, VA PLATFORMS Published Date July 2015

SUMMARY: This document describes the proper procedure for monitoring and obtaining required logs for troubleshooting system stalls. Procedure requires setting up a serial console monitor during first occurrence of issue and preferably leaving the console monitor for extended period necessary (could be only up to the next freezing incident when good kernel trace is taken) to obtain critical information needed to root cause the issue. The stall described here is an SA, VA, MAG, or IC device becoming inaccessible via web and/or serial console with total loss of access to the device by admin and end users. Device may still be responsive to pings. PRE-REQUISITES: Troubleshooting system stall requires the following: Connection of serial console monitor (common applications are hyperterminal and putty), and a null modem cable for direct connection or via console servers using IP/Port combination to the stalled SA/ IC; then setup kernel logging level and leave for further continuous capture of serial console outputs (if extended monitoring is allowed) Serial console snapshot if console is still responsive, and if possible to upload to a local SCP server if available so it can be obtained prior to rebooting. Otherwise, snapshot taken can be downloaded later via Web Admin UI after reboot of the system Additional information such as time of freeze, whether the box interfaces are ping-able or not, whether serial console is responsive or not, and very importantly, note the front panel LED indicators such as power, disk activity, hard disk, FIPS (if FIPS unit), and network interfaces REQUIRED LOGS: Following are logs necessary to submit to Pulse Secure when opening a case for system stall or similar issues: Serial Console output of everything being displayed saved as a text file Serial or admin system snapshot taken from serial console or Web Admin UI, respectively Other process snapshots seen in the Web Admin UI snapshot page (if any), these are usually prefixed with the process name, e.g. Pulse Secure-state-watchdog-x-xxxxxxxx-xxxxxx, dscsd_xxxxxxxx_xxxxxx SA/IC logs such as Events, Admin and User access logs taken as one zipped file, e.g. Pulse Securelogs[1].tar.gz Cockpit graph screenshots to include all nodes in the cluster Note: If in cluster, always include SA/IC logs and a system snapshot from the other working node/s as these oftentimes are helpful in correlating events between the nodes of cluster as part of the analysis 2

LOG COLLECTION PROCEDURE: A. OBTAINING NECESSARY SERIAL CONSOLE LOGS: IMPORTANT: It is preferred to use a serial console application that can display timestamps to the screen output as well as the saved log. One of the terminal server applications with this capability is (enhanced) Putty available from: http://www.extraputty.com/download.php Procedure if using Putty Go to Putty main screen Session then Logging and select appropriate file to save output to, and other logging options to ensure file gets recorded and preserved as needed. Ensure to select the timestamp options On terminal and All session output so all outputs will include timestamps both in terminal view and in saved log. This is so that Pulse Secure engineers are able to correlate outputs with the device logs. Make sure the IVE device (SA/MAG/VA) time is in sync with the computer being used for monitoring serial console. Select the location and log to be used for logging console outputs. Go back to Session and enter appropriate serial settings as displayed. 3

Go to Session, then select Serial and then Open to connect. 4

1. Preparing kernel logging : To obtain a good kernel trace, kernel logging level 9 must be enabled prior to having the issue. However, first time occurrence of stall will not have this kind of logging set. So in that first stall condition, enable this log level, and wait for at least 10 minutes up to 15 minutes and watch for some errors or messages that may come out, then finally obtaining the kernel stack trace. It is recommended, however, to monitor and have this kernel logging enabled after the first reboot as this will give better information when next issue occurs. Setting Kernel Logging to Level 9 The CTL-Break>9 sequence will set proper kernel logging level for troubleshooting halt or stall issues. Hold the CTL key, then hit the Break key momentarily, release the CTL key and hit the 9 key. It should respond with Loglevel set to 9. (If no response obtained from this command, try again for a few times, and if still not responsive, proceed to Step #3 below Getting the kernel trace during lockup ). Leave the serial console at this state, and while capturing to a local text file. You may notice that occasionally, you may get some messages by kernel which are the important messages needed for root causing the issue. 2. Getting the Kernel trace during lockup The CTL-Break>T sequence will output to console a kernel trace/dump While console is in any state AND web UI is not accessible, hold the CTL key, hit the Break key momentarily, release the CTL key and then hit the T key (lower or upper case). Do not execute this command in a fully operational system as this could disconnect user sessions. The console should output a long list with entries similar to the following screenshot: 5

B. OBTAINING NECESSARY SYSTEM SNAPSHOT WHEN CONSOLE MENU IS STILL UP: 1. Getting system snapshot from serial console if console menu is accessible: Take a system snapshot from serial console when menu is still accessible while Web UI is down. Select console menu option 7. System Maintenance then menu option 1. System snapshot It should respond with Taking system snapshot You can either SCP or leave the snapshot in the system for downloading from Admin UI later by answering y or n in the prompt: 6

C. OBTAINING LOGS AFTER THE REBOOT: 1. After reboot, login back to Admin UI and obtain the SA/IC logs and all the snapshots: Download the SA/IC logs. Go to Log/Monitoring>Events>Log>Click Save All Logs. This saves the events, users, and admin logs and given as a single zipped file. Download all snapshots. Go to Maintenance>Troubleshooting>System Snapshot, then download the snapshot that was taken from the serial console and also any other snapshot (process or watchdog) automatically generated by system (if any) 7

D. COMPLETE LOG SUBMISSION: 1. Gather the following from the above steps: Serial console output log in text format Serial snapshot, system snapshot, and any other process snapshots including any watchdog snapshots SA/IC logs Cockpit dashboard graphs screenshots Date and time of occurrence and status of pings LED front panel indicators for all hardware most importantly the hard drive status (activity and error status, also please note that drive activity indicator next to the power indicator) 8

SERIAL CONSOLE COMMANDS FOR TROUBLESHOOTING STALL ISSUES: CTL-Break>9 puts the console kernel logging to Level 9 details (should output Loglevel set to 9 ). During a perceived hung state of the device, useful kernel and system driver/s events may still be collected via serial console with this level of logging. Enable this log level and wait for 10 15 minutes to observe output of the console, then proceed with taking the Kernel trace by way of CTL-Break>T command. This waiting period is very important. To monitor and obtain logs for next failure event, enter into this level 9 logging with console timestamps enabled and output being recorded to a local file. On next event, proceed with collecting kernel trace via Ctl- Break>T command. Kernel logging level 9 when enabled has very small effect on the performance of the device so it can be left enabled over some period of monitoring. In this level of logging and during a halt condition, the various system drivers could output important and useful information. If the output tends to be too much every minute and while running in still working state, it may cause some performance impact, so please observe and advise Pulse Secure support. IMPORTANT NOTES: CTL-Break>9 in some kernel condition may not work and appear that serial console is unresponsive, please try several times and then do CTL-Break>T. CTL-Break>0 or rebooting the unit resets the kernel logging to its default setting. CTL-Break>T outputs the kernel stack and memory dump to the serial console for kernel level analysis by Pulse Secure. ONLY execute this command in halted state as this will affect users. d (quotes excluded) outputs the recent kernel messages and some information on the state of the file system, that can be used to track issues related to file system mounted read only and file system running out of space, and may only work in certain conditions ie; when prompted with Do you want to reboot? (y/n). Menu Option 7 then 1: System Snapshot (if serial console is responsive and menu is available) takes a system snapshot of system while in that present condition and can be locally transferred to an SCP server or left in the SA/IC for downloading later after reboot. NOTE: For any questions about this document, please contact Pulse Secure support. 9