PCI / PCIe Error Recovery Product Note. HP-UX 11i v3

Similar documents
IEther-00 (iether) B Ethernet Driver Release Notes

RAID-01 (ciss) B Mass Storage Driver Release Notes

Interface Card OL* Support Guide

HP 3PAR OS MU1 Patch 11

HP-UX Support Tools Manager (STM) Release Notes

HP 3PAR OS MU3 Patch 17

Ramdisk (Memory-based Disk) Support on HP-UX 11i v2

HP 3PAR OS MU3 Patch 18 Release Notes

HPE WBEM Providers for OpenVMS Integrity servers Release Notes Version 2.2-5

HPE 3PAR OS MU5 Patch 49 Release Notes

Using Dynamic Root Disk Activate and Deactivate Commands

A9890A RAID Smart Array 6402 Controller Quick Installation Guide

HPE 3PAR OS MU3 Patch 24 Release Notes

HPE 3PAR OS MU2 Patch 36 Release Notes

Internal Cabling Guide for the HP Smart Array 6400 Series Controller on an HP Integrity Server rx7620

HP OpenView Storage Data Protector A.05.10

HP Integrity Servers and HP 9000 Servers Firmware Update Options

HPE 3PAR OS MU3 Patch 28 Release Notes

HPE 3PAR OS MU3 Patch 23 Release Notes

HP Auto Port Aggregation (APA) Release Notes

HPE StoreEver MSL6480 Tape Library CLI Utility Version 1.0 User Guide

HP 3PAR OS MU2 Patch 11

HP VMware ESXi and vsphere 5.x and Updates Getting Started Guide

HP Virtual Connect Enterprise Manager

Configuring RAID with HP Z Turbo Drives

Register for this course. Find this course in the Training calendar and click the "Register" link.

HP AutoPass License Server

HP-UX DCE v2.0 Application Development Tools Release Notes

Support Notes for Red Hat Enterprise Linux ES v.4.6 for HP Integrity Servers

version on HP-UX 11i v3 March 2014 Operating Environment Updat e Release

SiteScope Adapter for HP OpenView Operations

HPE Automatic Number Plate Recognition Software Version: Automatic Number Plate Recognition Release Notes

Introduction...2. Executive summary...2. Test results...3 IOPs...3 Service demand...3 Throughput...4 Scalability...5

HPE VMware ESXi and vsphere 5.x, 6.x and Updates Getting Started Guide

HP Auto Port Aggregation (APA) Release Notes

Intelligent Provisioning 3.00 Release Notes

Processor Halt Codes Manual

IDE Connector Customizer Readme

Guidelines for using Internet Information Server with HP StorageWorks Storage Mirroring

HP StorageWorks 4000/6000/8000 Enterprise Virtual Array connectivity for Sun Solaris installation and reference guide

HP LeftHand P4500 and P GbE to 10GbE migration instructions

Table of Contents. HP A7173A PCI-X Dual Channel Ultra320 SCSI Host Bus Adapter. Performance Paper for HP PA-RISC Servers

Marvell BIOS Utility User Guide

HPE 3PAR OS MU3 Patch 97 Upgrade Instructions

HPE Moonshot ilo Chassis Management Firmware 1.52 Release Notes

HPE BladeSystem c-class Virtual Connect Support Utility Version Release Notes

HP Business Availability Center

HP 3PAR Host Explorer MU1 Software User Guide

HPE ProLiant Gen9 Troubleshooting Guide

Configuring the HP StorageWorks Modular Smart Array 1000 and 1500cs for external boot with Novell NetWare New Installations

HP A5120 EI Switch Series IRF. Command Reference. Abstract

QuickSpecs. Models HP I/O Accelerator Options. HP PCIe IO Accelerators for ProLiant Servers. Overview

Computer Setup (F10) Utility Guide HP Business Desktops dx5150 model

HPE StoreEver MSL6480 Tape Library Version 5.50 Firmware Release Notes

HP Intelligent Management Center Remote Site Management User Guide

Installing and configuring HP Integrity VM for HP SIM 5.x

Itanium 2-based Servers: Superdome and mid-range

HP Enterprise Collaboration

HPE ilo Federation User Guide for ilo 5

Upgrading the MSA1000 for Enhanced Features

Configuring the MSA1000 for Linux or NetWare Environments

HPE Knowledge Article

Enabling High Availability for SOA Manager

HP UFT Connection Agent

HP Integrity rx2800 i2 2D Graphics Adapter Installation Guide

QuickSpecs. HP Integrity Virtual Machines (Integrity VM) Overview. Retired. Currently shipping versions:

HP Serviceguard Quorum Server Version A Release Notes, Fourth Edition

HPE 1/8 G2 Tape Autoloader and MSL Tape Libraries Encryption Kit User Guide

SUSE Linux Enterprise Server 11 Support Pack 2 Support Notes

HPE 3PAR OS MU3 Patch 18 Upgrade Instructions

HP Data Center Automation Appliance

HPE XP7 Performance Advisor Software 7.2 Release Notes

Intelligent Provisioning 3.10 Release Notes

HPE 3PAR OS GA Patch 12

HP Operations Orchestration

HP ALM Client MSI Generator

Mac OS X Fibre Channel connectivity to the HP StorageWorks Enterprise Virtual Array storage system configuration guide

Support Note for Red Hat Enterprise Linux AS v.3 for the Itanium Processor on HP Integrity Servers

HPE RDX Utility Version 2.36 Release Notes

Computer Setup (F10) Utility Guide HP Compaq d220 and d230 Microtower

HP Serviceguard Solutions Storage Support Matrix (HPUX) Jan 13, 2015, Ver 02.60

Designing high-availability solutions using HP Integrity Virtual Machines as HP Serviceguard packages

HP Data Protector A Support for Windows Vista and Windows Server 2008 Clients Whitepaper

HP BLc Intel 4X QDR InfiniBand Switch Release Notes. Firmware Version

HPE Security ArcSight Connectors

HP Disk File Optimizer for OpenVMS Release Notes

HPE FlexNetwork MSR Router Series

LVM Migration from Legacy to Agile Naming Model HP-UX 11i v3

HPE FlexNetwork HSR6800 Routers

HPE ALM Excel Add-in. Microsoft Excel Add-in Guide. Software Version: Go to HELP CENTER ONLINE

HP Business Service Management

HP-UX Software and Patching Management Using HP Server Automation

WLAN high availability

Quick Setup & Getting Started

HPE 3PAR OS GA Patch 20 Release Notes

QuickSpecs. NC7771 PCI-X 1000T Gigabit Server Adapter. HP NC7771 PCI-X 1000T Gigabit Server Adapter. Overview

HPE 3PAR OS MU2 Patch 53 Release Notes

HP Accelerated iscsi for Multifunction Network Adapters User Guide

Computer Setup (F10) Utility Guide HP Elite 7000 MT Series PCs

Release Notes. Operations Smart Plug-in for Virtualization Infrastructure

Transcription:

PCI / PCIe Error Recovery Product Note HP-UX 11i v3 HP Part Number: 5900-0584 Published: September 2010

Legal Notices Copyright 2003-2010 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor s standard commercial license. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing here should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. UNIX is a registered trademark of The Open Group. PostScript is a trademark of Adobe Systems Incorporated. Intel and Itanium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

Table of Contents 1 PCI / PCIe Error Recovery Product Note...5 Confirm PCI Error Recovery is Supported...6 Using ioscan to identify PCI Error Recovery Capability...9 Example:...9 Tunable Kernel Parameters...10 Error Messages for PCI Error Recovery...10 Automatic Recovery from a PCI Error...12 Manual Recovery from a PCI Error...13 PCI Error Recovery Documentation...15 Terms and Definitions...16 Table of Contents 3

List of Tables 1-1 Utility Subsystem FW Revision Level: 15.22...6 1-2 Error Recovery Attributes...9 1-3 Events Generated on Legacy Platforms due to PCI / PCIe Errors...10 1-4 Events Generated on HP Superdome 2 Platform due to PCIe Errors...11 4 List of Tables

1 PCI / PCIe Error Recovery Product Note The PCI / PCIe Error Recovery feature provides the ability to detect, isolate, and automatically recover from a PCI / PCIe error, avoiding a system crash. PCI Error Recovery is included with the HP-UX 11i v3 operating system, and it is enabled by default. NOTE: PCI / PCIe Error Recovery is not supported on all platforms. To determine if PCI / PCIe Error Recovery is supported on your system, see the PCI Error Recovery Support Matrix, available at http://www.hp.com/go/hpux-networking-docs in the PCI Error Recovery section. With the PCI / PCIe Error Recovery feature enabled, if an error occurs on a PCI bus containing an I/O card that supports PCI Error Recovery: The PCI bus is quarantined to isolate the system from further I/O and prevent the error from damaging the system. The PCI Error Recovery feature will attempt to recover from the error and reinitialize the bus so I/O can resume. If an error occurs during the automated error recovery process, the bus and I/O card will remain quiesced. If the bus contains a card that supports online addition, replacement, or deletion (OL*) and the card is in a hotpluggable slot, you can use the olrad command (or the attention button) to manually recover from the error by replacing the card. For information on OL* operations, see the Interface Card OL* Support Guide, available at: http:// www.hp.com/go/hpux-core-docs To determine if OL* is supported, see the I/O card documentation or support matrix available at http://www.hp.com/go/hpux-iocards-docs If the PCI Error Recovery feature is disabled and an error occurs on a PCI bus, a Machine Check Abort (MCA) or a High Priority Machine Check (HPMC) will occur, and the system will crash. NOTE: PCI / PCIe Error Recovery is enabled by default. If you use HP Serviceguard, HP recommends the PCI Error Recovery feature only be enabled if your storage devices are configured with multiple paths and you have not disabled HP-UX native multipathing. If PCI Error Recovery is enabled, but your storage devices are configured with only a single path, HP Serviceguard might not detect when connectivity is lost. If HP Serviceguard does not detect loss of connectivity, it does not cause a failover. For instructions on using the pci_eh_enable tunable to disable PCI Error Recovery, see Tunable Kernel Parameters. If a PCI error occurs on an I/O card very early in the boot process or an OL* online addition operation, the I/O card will not be claimed and the software state of the I/O card will be marked as UNUSABLE in the ioscan(1) output. To recover I/O cards that are in the UNUSABLE state, a system reboot is required. 5

Confirm PCI Error Recovery is Supported 1. To confirm PCI Error Recovery (ER) is supported with your configuration and system firmware version, see PCI Error Recovery Support Matrix, HP-UX 11i v3 at: http://docs.hp.com/en/ha.html NOTE: PCI-express ER functionality can be enabled on legacy platforms only if the patch set: PHKL_37099, PHKL_37329, PHKL_37330, PHKL_37331, PHKL_37648, PHKL_37405, and PHKL_37510 is installed on HP-UX 11i v3 OS. PCI-express ER functionality can be enabled on HP Superdome 2 platform only if patches PHKL_41301 and PHKL_41302 are installed on HP-UX 11i v3 OS. This feature can be turned off by setting the static kernel tunable pci_eh_enable to zero. By setting this tunable to zero, the ER feature will be disabled for both PCI-X and PCI-express I/O slots. The value of the tunable is persistent across reboots. 2. To confirm the system firmware version installed on your system, or any cell in your system, use the sysrev command from the management processor Command Menu (CM) prompt: MP:CM> sysrev NOTE: The sysrev command is supported only on legacy platform. Use the machinfo command on HP Superdome 2 to confirm the system firmware version. The sysrev command output on Superdome systems is different from the sysrev command output on the other systems that support PCI Error Recovery. The sysrev command output on Superdome systems will list the system firmware version under the SYS FW heading as illustrated in the following example: MP:CM> sysrev Table 1-1 Utility Subsystem FW Revision Level: 15.22 Cabinet #0 Cabinet #1 Cab #8 Cab #9 SYS FW PDHC SYS FW PDHC Cell (slot 0) 3.64 3.82 Cell (slot 1) 3.82 3.66 Cell (slot 2) 3.88 15.14 3.66 Cell (slot 3) 3.82 3.50 15.10 Cell (slot 4) 3.82 3.86 15.14 Cell (slot 5) 3.64 3.82 Cell (slot 6) 3.82 3.84 15.10 Cell (slot 7) 3.88 15.14 3.82 MP 15.22 ED 3.13 6 PCI / PCIe Error Recovery Product Note

Table 1-1 Utility Subsystem FW Revision Level: 15.22 (continued) CLU 15.2 15.2 15.2 15.2 PM CIO (bay 0, chassis 1) CIO (bay 0, chassis 3) CIO (bay 1, chassis 1) CIO (bay 1, chassis 3) On the mid-range systems that support PCI Error Recovery, the system firmware version will be listed with the Pri SFW heading as illustrated in this example: MP:CM> sysrev Cabinet firmware revision report PROGRAMMABLE HARDWARE : System Backplane : GPM FM OSP ------- ------- ------- 1.002 1.002 1.002 PCI-X Backplane : LPM HS ------- ------- 2.000 1.000 Core IO : Master Slave -------- ------- 2.010 2.010 LPM PDHC ------- ------- Cell 0 : 1.002 1.010 Cell 1 : 1.002 1.010 Cell 2 : 1.002 1.010 Cell 3 : 1.002 1.010 FIRMWARE: Core IO Master : A.007.008 Event Dict. : 0.009 Slave : A.007.008 Event Dict. : 0.009 Cell 0 PDHC : A.003.027 Pri SFW : 23.001 (PA) Sec SFW : 23.001 (PA) Cell 1 PDHC : A.003.027 Pri SFW : 23.001 (PA) Sec SFW : 23.001 (PA) Cell 2 PDHC : A.003.027 Pri SFW : 23.001 (PA) Sec SFW : 23.001 (PA) Confirm PCI Error Recovery is Supported 7

Cell 3 PDHC : A.003.027 Pri SFW : 23.001 (PA) Sec SFW : 23.001 NOTE: The sysrev command output on some systems includes extra zeros in the system firmware version number. These zeros can be ignored. For example, 3.88 and 3.088 on Integrity systems are the same firmware version, also 23.1 and 23.001 on HP 9000 systems represent the same firmware version. 3. The system firmware is the main component of the firmware recipe required to support PCI Error Recovery. If you do not have the minimum system firmware version (or a later version) listed in the PCI Error Recovery Support Matrix (http://www.hp.com/go/ hpux-networking-docs in the HP-UX 11i v3 Networking Software category.), you do not have a firmware recipe installed on your system that supports PCI Error Recovery. Go to the Business Support Center Web site at http://www.hp.com/go/bizsupport for the latest HP-UX 11i firmware updates. The IT Resource Center (ITRC) Web site at http:// itrc.hp.com also provides a link to the Business Support Center. The system firmware files, installation instructions, and release notes with detailed firmware version information can be obtained by selecting Download Drivers and Software. This provides a searchable database for various products, or you can follow the Server link to select the latest FW download for your specific server product. Be sure to read the Release Notes for the firmware to ensure a successful update. 4. To confirm if PCI Error Recovery is supported with your I/O card driver, see PCI Error Recovery Support Matrix, HP-UX 11i v3 at: http://www.hp.com/go/hpux-iocards-docs in the HP-UX 11i v3 I/O Cards section. 8 PCI / PCIe Error Recovery Product Note

Using ioscan to identify PCI Error Recovery Capability Example: The command ioscan -P error_recovery can be used to determine if Local Bus Adapters (LBA) in a system support PCI Error Recovery feature. The capability of an LBA is in turn determined by the hardware platform capability and the driver controlling the PCI adapter in the slot under that LBA. The possible values for error_recovery attribute as displayed by the ioscan command are: Supported - PCI error recovery is supported on the LBA and I/O adapter under that LBA. This is set only if both platform and all the interface card driver instances under the given LBA support PCI error recovery functionality. Unsupported - PCI error recovery is not supported on the LBA and I/O adapter under that LBA. This could be because either the platform or one of the interface card driver instances under the given LBA does not support PCI error recovery functionality. NOTE: The error_recovery attribute display properties for the following options: LBAs for PCI/PCI-X slots on PA-RISC and Integrity systems gh2p for PCIe slots on legacy system PCItoPCI bridge on HP Superdome 2 platform To look at the error recovery capability of all Local Bus Adapters (LBAs) in legacy system: # ioscan -P error_recovery -d lba Table 1-2 Error Recovery Attributes Class I H/W Path Error_Recovery ba 0 0/0/0 Supported ba 1 0/0/1 Supported ba 10 0/0/8 Supported ba 11 0/0/9 Supported ba 13 0/0/10 Supported ba 14 0/0/12 Supported This implies that PCI error recovery is supported for I/O adapters located under LBAs like 0/0/0, 0/0/1. Using ioscan to identify PCI Error Recovery Capability 9

Tunable Kernel Parameters There are two PCI Error Recovery tunables that you can configure: pci_eh_enable This tunable is used to enable or disable the PCI Error Recovery feature. On HP-UX 11i v3, PCI Error Recovery is enabled by default. pci_eh_enable is not a dynamic tunable. A reboot will be required for changes to take effect. For more information about kernel tunable parameters, see the pci_eh_enable(5) manpage. pci_error_tolerance_time This tunable determines whether an automatic PCI error recovery will occur on an I/O slot based on the time interval between two PCI errors. If two PCI errors occur on a PCI slot within the time interval specified by the pci_error_tolerance_time tunable, the card in the I/O slot will be suspended and a manual PCI error recovery operation is required to restore the card. For more information about kernel tunable parameters, see the pci_error_tolerance_time(5) man page. Error Messages for PCI Error Recovery All drivers that support PCI Error Recovery generate error messages for specific PCI Error Recovery events. Mass storage drivers post error messages to the diaglog. If a mass storage driver generates verbose error messages, they can be accessed in the Support Tools Manager (STM) diagnostic logs. NOTE: Support Tools Manager (STM) is supported only on legacy platform. Networking drivers post error messages to the console and to the syslog. If a networking driver generates verbose error messages, they will be posted to nettl. When a PCI error is detected After a successful PCI error recovery When PCI error recovery fails These messages are posted to the console and to the syslog. PCI errors supported by the product are mapped to specific events.table 1-3 Events Generated on Legacy Platforms due to PCI / PCIe Errors and Table 1-4 Events Generated on HP Superdome 2 Platform due to PCIe Errors lists the events generated due to PCI / PCIe errors supported on all platforms. To view the list of critical events generated on the system, use the following command: evweb eventviewer -L To obtain detailed information about an event generated on the system, use the following command: evweb eventviewer -E -n <EvArchNo> Table 1-3 Events Generated on Legacy Platforms due to PCI / PCIe Errors Event ID 100107 100104 Summary An Uncorrectable Error was reported by PCI express bus for which recovery is in progress. A corrected platform error was reported by PCI bus 10 PCI / PCIe Error Recovery Product Note

Table 1-3 Events Generated on Legacy Platforms due to PCI / PCIe Errors (continued) Event ID 100160 100161 Summary A recovered platform or I/O error was detected A unrecoverable platform or I/O error was detected Table 1-4 Events Generated on HP Superdome 2 Platform due to PCIe Errors Error ID 100143 100144 100145 100146 100147 100148 100149 100159 100160 100161 Summary Link Timeout to PCIe Device Malformed Transaction Layer Packet (TLP) Error Gross PCIe Link Failure PCIe Link Failure - Packet marked as Poisoned Surprise Down Software Caused PCIe Fatal / Non-Fatal Error Device Caused PCIe Fatal / Non-Fatal Error Device signaled an error and the First Error Pointer is not valid A recovered platform or I/O error was detected A unrecoverable platform or I/O error was detected Error Messages for PCI Error Recovery 11

Automatic Recovery from a PCI Error With the PCI Error Recovery feature enabled, if an error occurs on a PCI bus containing an I/O card that supports PCI Error Recovery, the following sequence of events occur during automatic error recovery: 1. The PCI bus is isolated from further I/O 2. The I/O devices are quiesced 3. The error is cleared 4. The bus is reset 5. The devices are resumed The following example illustrates what you can expect if automatic recovery from a PCI error occurs: 1. Automatic recovery from a PCI error occurs on a PCI bus containing a LAN card at hardware path 0/0/0, which is associated with the iether driver. 2. Error and recovery messages are displayed on the console as follows: PCI Error reported at Hardware Path 0/0/0 Hardware path 0/0/0 Successfully recovered from PCI Error 3. The olrad -q command output will be normal after a PCI Error recovery. See path 0/0/0/1 in the following example:# olrad -q 4. The ioscan -fnh command output will be normal after the PCI Error recovery, for example: #ioscan -fnh 0/0/0 NOTE: If the devices on a bus that supports PCI error recovery encounter further errors within the time interval specified by the pci_error_tolerance_time tunable following automatic error recovery, they will remain quiesced. If this happens and the devices are in hotpluggable slots, you can recover manually by using the olrad command or the attention button to do an online replacement. You can also recover manually using the olrad command to do an online deletion. For more information on the pci_error_tolerance_time tunable, see Tunable Kernel Parameters. 12 PCI / PCIe Error Recovery Product Note

Manual Recovery from a PCI Error After a successful automatic PCI error recovery, if another PCI Error is detected within the time interval specified by the pci_error_tolerance_time tunable, the card in the I/O slot will be suspended. A manual PCI Error Recovery operation is required to restore the card. The following error messages are examples of what will be displayed on the console, if a PCI error is detected within the time interval specified by the pci_error_tolerance_time tunable following an automatic PCI error recovery: PCI Error reported at Hardware Path 0/0/0 Multiple PCI Errors reported at Hardware path 0/0/0 within pci error tolerance time limit of 1 minutes. Refer to pci_error_tolerance_time(5) man page for details. Automatic PCI Error Recovery Operation failed at Hardware path 49/0/1/2/0. Path may be recovered using a Manual Error Recovery operation on supported platforms. Refer to olrad(1m) man page for details. A successful attempt at manual recovery will restore the card. A failed attempt at manual recovery will confirm that there is a persistent error condition. To recover from the PCI error manually, follow these steps: 1. Execute the olrad -q command to confirm that the card is suspended. In the following example, the device in slot 0-0-1-0, path 0/0/0/1, is suspended:# olrad -q NOTE: The olrad capability is supported only on legacy system and is not supported during the first release of HP Superdome 2 platform. 2. Execute the ioscan -fnh command on the driver associated with the card in the suspended slot to confirm the card is in the error state, for example: ioscan -fnh 0/0/0 3. Execute the olrad -r command to power off the slot, for example: # olrad -r 0-0-1-0 Activity : Start of Prepare Replace Target slot : 0-0-1-0 Critical Resource Analysis(CRA) in progress... [ NOTE: The CRA may take a few minutes to complete on large configurations. It is recommended not to disrupt this operation. ] CRA REPORT SUMMARY: CRA returned WARNING. Detailed CRA report is available in /var/adm/cra.log file. CRA output : resources in use on affected device(s) Target slot : 0-0-1-0 Activity : End of Prepare Replace Target slot : 0-0-1-0 Activity : Target slot powered off, drivers suspended, OK to replace the card Target slot : 0-0-1-0 4. Execute the olrad -q command to confirm the power is off. In the following example, power for the device in slot 0-0-1-0, path 0/0/0/1, is off: #olrad -q 5. Execute the olrad -R command to resume the card, for example: #olrad -R 0-0-1-0 Activity : Start of Post Replace Target slot : 0-0-1-0 Hardware path 0/0/0 Successfully recovered from PCI Error Activity : End of Post Replace Target slot : 0-0-1-0 Activity : Target slot powered on, drivers resumed, OK to start using the c ard Target slot : 0-0-1-0 6. Execute the olrad -q command to confirm that the card has been resumed. In the following example, the device in slot 0-0-1-0, path 0/0/0/1, is not suspended as indicated by a No in the Susp column: # olrad -q Manual Recovery from a PCI Error 13

7. After the card has been resumed, a recovery message will be displayed in the console, for example: Hardware path 0/0/0 Successfully recovered from PCI Error 8. If the olrad -R command does not succeed, you have a persistent PCI error condition. There is a high probability that the I/O card is defective. A failure message will be displayed on the console, for example: Automatic PCI Error Recovery Operation failed at Hardware path 0/0/0. Path may be recovered using a Manual Error Recovery operation. Refer to olrad(1m) man page for details The olrad -q command will show the slot as being suspended (Susp Mode = Yes) and the power will be off (Pwr = Off). At this point you have two options: You can perform an OL* online replacement operation to replace the I/O card with an I/O card that has the same HP Manufacturing Part Number and the same (or later) release version number. You can perform an OL* online deletion operation to delete the card and the driver instance associated with that card. After a successful online deletion, the slot is available to be used with another I/O card. If you want to perform an online addition operation to add a different I/O card to the slot, see the Interface Card OL* Support Guide. NOTE: Manual PCI Error Recovery operation to restore the card is supported only on legacy platforms, and is not supported on HP Superdome 2 platform. Manual PCI Error Recovery operation will be supported on HP Superdome 2 platform when the OL* capability is enabled on the new platform. OL* online replacement operations can be performed with the pdweb GUI, with the olrad command, or with a hardware attention button procedure. OL* online deletion operations can be performed with the pdweb GUI, or with the olrad command. 14 PCI / PCIe Error Recovery Product Note

PCI Error Recovery Documentation The documentation that supports this release of the PCI Error Recovery feature consists of: PCI Error Recovery Support Matrix available at http://www.hp.com/go/ hpux-networking-docs in the HP-UX 11i v3 Networking Software category. Interface Card OL* Support Guide available at http://www.hp.com/go/hpux-networking-docs in the HP-UX 11i v3 Networking Software category. Patch Management User Guide for HP-UX 11.x Systems available at http://www.hp.com/go/ hpux-core-docs in the HP-UX 11i v2 category. olrad(1) after installing the PCI Error Recovery feature, enter man olrad from the command line to view the olrad manpage that includes PCI Error Recovery information. pci_eh_enable(5) used to disable PCI Error Recovery, if it is installed on your system. pci_error_tolerance_time(5) used to adjust error threshold for manual recovery option. PCI Error Recovery Documentation 15

Terms and Definitions HPMC High Priority Machine Check MCA Machine Check Abort Post Replace Operation Highest Priority interruption on PA-RISC based systems Highest Priority interruption on Itanium based systems By issuing the olrad -R slot_id command after an I/O card is replaced, slot power is turned on, suspended drivers are resumed, driver scripts (post_replace) for the slot (slot_id) and affected slots (if any) are run, and the attention LED for the slot (slot_id) is set to OFF 16 PCI / PCIe Error Recovery Product Note