IBM Software Group Considerations for using TSM in a SAN Sept 2002 Revision 5 Steve Strutt, Tivoli Software, IBM UK steve_strutt@uk.ibm.com August 2003 Agenda SAN Exploitation - LAN-Free backup Performance characteristics Requirements LAN, Hardware, Software, Device, dependencies SAN considerations Device fail-over HBA considerations SAN design considerations Device addressing considerations Going Live Testing, Diagnosing Problems Hints and Tips Question and Answer 2 1
LAN-free Backup LAN Advantages: - client data can be local or SAN-attached - transparent to application/database - takes backup traffic off the LAN - reduces CP cycles on backup server (no I/O) - faster speed (usually) - only one backup server needs administration DATA DATA DATA FC device DISK TAPE Disadvantages: - still requires CP cycles on client for backup I/O - careful scheduling to avoid tape drive contention (or exploit disk pooling) Direct to tape Disk pool staging 3 Performance characteristics LAN-Free is not necessarily faster Only network eliminated as bottleneck Could be other bottlenecks Tape drives, Disk subsystem Data types good performance for large files and databases small files, performance limited by file system and TSM architecture LAN-Free to tape Potentially better performance for large files, as bottleneck becomes file system or tape device Small files cause tape drives to stop-start more and drives drop out of streaming mode. LAN-Free to disk Ideal for small files, no stop-start overhead 4 2
Customer Performance Figures Large UK High Street Retailer TDP for SQL Server on ESS to 3584 LTO Backup 61.4GB hour (17MB/s) to single drive Restore 44.2GB hour (12.3MB/s) from single drive NT Filesystem on ESS disk to 3584 LTO - small files Backup 10GB hour to LTO Restore 6.2GB hour from LTO Could be slower than LAN if tape drives do more stop/start operations Large UK Bank TDP for Exchange to 3583 LTO Backup 52GB/h (14.4M B/s) to single drive Restore 51GB/h (14.2MB/s) from single drive 5 Agenda - Requirements Hardware LAN Library support for LAN-Free SAN device support Software Evolving TSM support for LAN-Free TSM code dependencies 6 3
Requirements - LAN LAN-Free still requires LAN for meta data For Large files and databases Minimal usage Small files Maybe the same if not more meta data on LAN than data on SAN if files are very small. LAN performance and loading still important 7 SAN Device support Initially hardware configurations were certified by Tivoli many different combinations, not all could be tested Common configurations now tested The Tivoli view is that hardware is transparent to TSM If the hardware vendors have validated the devices use in a SAN and TSM supports the device, then it is supported in a SAN configuration with Tivoli Storage Manager 8 4
Library support for LAN-Free Native TSM LAN-Free support for: 3494 Libraries SCSI Libraries Controlled via SCSI control path SCSI connect Fibre Channel Connect NO Native TSM LAN-Free support for STK and ADIC AML libraries Require Gresham EDT on every TSM server and Storage Agent. STK - ACSLS, Library Station (s390) ADIC DAS for AML/2, AML/J 9 TSM evolving support for LAN-Free LAN-Free supports BA Client, file level BA Client, Volume level/image Backup Prior to 5.1.5, the tape was rewound between each volume No support for Backup Set restore NT/W2K Supported from 4.1.0, NT/W2K server and NT/W2K TDPs 4.2.1 for Backup Archive Client Sun Solaris and AIX Supported from 4.2.0, TDPs and Backup Archive Client HP-UX Supported from 5.1.0, Full TSM device driver support Linux (x86, zlinux and zos) Supported from 5.2.0 10 5
TSM code dependencies TSM 4.2 and 5.1 TSM Server and Storage Agent code MUST be at the same PTF and patch level Restricts ability to roll out new code TSM 5.2 TSM Server and Storage Agent code only dependant at version and release level Independent of PTF level Easier to deploy and install maintenance 11 Agenda - SAN considerations High Availability considerations SANs offer high availability Tape high availability options Fail over and tape HBA and tape setup HBA Sharing SAN design considerations Attaching tape drives to SANs SAN Operational considerations Device addressing considerations Managing device addressing Persistent Naming Static Device Naming Mapping device names to devices 12 6
SAN s offer high availability Clustering Redundancy Multiple paths Works well for disk Arrays designed to have two access paths Device drivers exploit multiple paths Failover of paths Clustering software fails over applications and disk storage Effective TSM support for fail-over DB and Log Mirrors, DB page shadowing support for MSCS and HACMP Does not work well for tape Tape drives usually single connection SCSI Controlled tape libraries usually have a single control point Drivers only exploit single path Failover issues 13 Tape high availability options Good news for AIX 3590 Dual SCSI/Fibre ports AIX ATAPE driver can failover to second path If the primary path, say rmt0 is not available, the driver will use other paths/device names via an alternative HBA/device port to access the same device. This is described in the 3590 manual, IBM TotalStorage Tape Installation and User Guide. ftp://ftp.software.ibm.com/storage/devdrvr/doc/ 358x LTO libaries New option to have two library controller LUNs on different paths Use two different HBA s/switches Only supported with AIX ATAPE driver Still single path to drives Divide drives between two HBA s and switches Still access library, reduced number of drives 14 7
Fail-over and Tape SCSI Reserve/Release used to serialize access to drives Maintains integrity if device definitions are wrong Outstanding SCSI Reserves remain if server using drives go down Drives only released if: Failed server restarted Drive powered off and on. SCSI Reset issued Potential Impact on other users SCSI Attached Tape drives can be reset using SCSI Reset Fibre attached tape drives do not honor SCSI Resets 15 Supported SCSI Fail-over configurations TSM 4.2+ with MSCS on W2K Devices registered in wizard, SCSI reset on fail-over TSM 5.1.5 with HACMP on AIX Devices defined in startserver script, issues verifydevice to reset device TSM Server in Cluster IP network Shared SCSI Bus Shared Disk TSM Clients Shared Tape 16 8
Supported Fibre Failover configurations TSM 5.2.0 with HACMP on AIX Devices defined in startserver script, issues verifyfcdevice to reset device TSM Server in HACMP cluster 17 Fail-over and LAN-Free and Tape Library Sharing Outstanding SCSI Reserves remain if storage agents using drives go down Drives become unavailable to other LAN-Free/Tape Sharing users Only released if: Failed server restarted Drive powered off and on. Potential drive mapping issues FC Reset issued Potential Impact on other users Potentially need more tape drives Some horrible messages if TSM server tries to dismount tapes belonging to failed storage agents. 18 9
Failure scenario Server running Storage Agent goes down (hardware failure Fibre loss), when using a tape device ANR8925W Drive DRIVE0 in library ATLP1000 has not been confirmed for use by server UKSAN1_SA for over 1200 seconds. Drive will be reclaimed for use by others. ANR8336I Verifying label of DLT volume 00157D in drive DRIVE0 (MT6.1.0.1). ANR8311E An I/O error occurred while accessing drive DRIVE0 (MT6.1.0.1) for SETMODE operation, errno = 1. ANR8355E I/O error reading label for volume 00157D in drive DRIVE0 (MT6.1.0.1). ANR8311E An I/O error occurred while accessing drive DRIVE0 (MT6.1.0.1) for OFFL operation, errno = 1. ANR8469E Dismount of DLT volume 00157D from drive DRIVE0 (MT6.1.0.1) in library ATLP1000 failed. ANR9999D mmsscsi.c(1647): ThreadId<48> Volume may still be in the drive DRIVE0 (MT6.1.0.1). ANR8446I Manual intervention required for library ATLP1000. TSM makes drive unavailable 19 Supporting LAN-Free fail-over No TSM support for fail-over of Storage Agent Supporting Storage Agent Fail-over Configure two storage agents, one on each server Different Storage Agent Names No dependency between TSM clients and Storage Agent BA client and TDP failover, Storage Agent does not BA client and TDP automatically use new SA on fail-over server No device issues if LAN-Free client machine fails-over when not using tape devices Devices lost if LAN-Free client machine fails-over when using tape drives Require more tape drives Procedures to recover lost drives 20 10
HBA and tape Setup considerations Check configuration recommendations of tape hardware vendor IBM recommendations NT/W2K Max Scatter-Gather must be set to 65 or greater Unable to write to new tapes on Storage Agent W2K creates tapes which cannot be read TSM will check for this in 4.2.1.11 and higher levels. Issues message, unable to use drive. W2K and LTO use 5.0.2.4 or higher level of Ultrium LTO driver. 21 HBA Sharing Sharing of disk and tape on same HBA not always supported by hardware vendors Disk OK Access to tape drives lost under high workload conditions drives timeout, go offline IBM supports disk and 3590 on AIX with 6227/8 adapter under moderate workloads IBM-SSG do not recommend sharing disk and tape in other configurations. i.e. LTO some evidence that it is OK in low workload environments, such as previous NT/SQL server LAN-Free environment. Possible solution, with new generation of adapters being certified to support dual use without restrictions 22 11
SAN design Most SANs designed for disk access data flow is optimized for hosts <==> disks stovepipe design, separate SAN islands Tape backup requires flow across SAN May need additional Inter-Switch Links More ports because of HBA Sharing issues TAPE Disk Array 23 Attaching tape drives to SANs Most tape drives are Arbitrated Loop devices Note some directors do not support direct attachment of FC-AL devices Brocade switches support Public and Private Loop devices in fabric InRange OK McData directors do not support Arbitrated Loop Devices require use of Sphereon 4300 or similar departmental switch. Or SCSI attached via Fibre-to-SCSI Gateway/Router ED5000 4300 etc TAPE TAPE TAPE 24 12
SAN Operational considerations Using fibre channel arbitrated loops (Hubs) During boot up sequence LIP will interrupt tape operations for shared devices. Rebooting servers can cause tape failures When devices and servers are on the same loop, devices should not be rebooted while tape is in use by a TSM server or Storage Agent Power up sequence SAN, tape devices, then TSM Server and Storage Agents Scheduling to avoid drive contention Storage Agents must wait if no drives available. Use MountWait parameter to avoid backup failures Managing device access Device Class, Mount Limit Device PATHs 25 Device Addressing Considerations All device addresses are defined centrally on TSM server for LAN-Free Library, Device and Path statements. Each host knows devices by different device names TAPE WWN1 TAPE WWN2 Changes in device names or SCSI addresses can cause failures Requirement to manage device addressing All definitions must point to same physical device Question: How to map all device names to the same physical device? FC device W2K TSM Server AIX Storage Agent Solaris Storage Agent Device Name lb1.0.1.3 //./tape0 //./tape1 /dev/rmt0 /dev/rmt1 /dev/rmt/0st /dev/rmt/1st TSM Definition Library Lib1 lb1.0.1.3 Drive Drive0 //./tape0 Drive Drive1 //./tape1 Path Drive0 /dev/rmt0 Path Drive1 /dev/rmt1 Path Path Drive0 /dev/rmt/0st Drive1 /dev/rmt/1st 26 13
Managing Device Addressing in SANs - 1 Host Device Names and SCSI addresses can change devices added or removed devices failing A number of address mappings occur. WWN to SCSI address SCSI address to device address Gateway/Router - device SCSI address to LUN Host WWN Tape OS Drives SAN TSM Device HBA Gateway/Router Driver OS Device Name to TSM Device Name SCSI ID to OS Device Name Device WWN to SCSI ID Device WWN WWN ID 1 SCSI Bus ID 2 SCSI ID to LUN Described in Redpiece: Managing device addressing of SAN attached tape for use with Tivoli Storage Manager, REDP-0150-00 ID 3 27 Managing Device Addressing in SANs - 2 Changes are more likely to happen if there are large numbers of devices and hosts in SAN. New devices being added, or removed Solution Use HBA Persistent Naming Fixes SCSI address to device WWN Static device name mapping Device Names remain unchanged Fixed device name to SCSI address mapping TSM 5.2 Automatic device tracking 28 14
HBA persistent naming support HBA maintains a fixed WWN to SCSI address relationship Support matrix Platform Emulex Qlogic AIX Not Applicable (use 6227/8 adapter) Not Applicable (use 6227/8 adapter) Windows NT/W2K YES Yes (from 8.1.3 with SANblade Manager) Solaris YES YES 29 Persistent Naming with Emulex on Windows 30 15
Qlogic HBA on Windows Qlogic from 8.1.3 onwards with SANBlade Manager/SANSurfer Download management utility from Qlogic Website 31 Static device naming with Microsoft Windows TSM device driver uses a naming convention which does not change as devices are added or removed mtx.y.z.n LTO device driver uses default Windows device naming \\.\tape0 Can change if new devices added or removed With W2K can change in-flight if devices added or removed from SAN Recommendation to use mtx.y.z.n, Use information from TSM Device Information screen to determine mtx.y.z.n device addresses In TSM 5.1, the mtx.y.z.n name is shown. 32 16
Static device naming on AIX and Solaris AIX Device drivers automatically track devices using serial numbers This mapping is static Solaris Static device naming convention Uses symbolic link to map device name to SCSI address ls l /dev/rmt/* lrwxrwxrwx 1 root other 45 Jan 3 14:22 /dev/rmt/0mt ->../../devices/pci@1f,0/pci@1/scsi@2/mt@5,1:mt Requires HBA Persistent Naming to be configured HBA configuration file 33 TSM 5.2 Automatic device tracking TSM tracks devices by serial number Specify S/N when defining devices Automatic discovery of S/N when device defined At start of each operation TSM server and SA will check the device is the one it expects it to be: Windows Initiates a search for the device and changes mapping to point to new device and then continues operation. UNIX Issues message and fails operation on that device Avoids need for persistent binding on Windows Qlogic complex to set up. Extra mgmt utilities. Persistent binding options not tested by hardware vendors! 34 17
Mapping device names to devices -1 In a SAN all definitions for a shared device must point to the same physical device. Each host usually has a different device name for the device depending on the hardware and configuration. Only common information available on each platform is the device WWN and serial number. TAPE WWN1 TAPE WWN2 FC device W2K TSM Server AIX Storage Agent Solaris Storage Agent Device Name lb1.0.1.3 //./tape0 //./tape1 /dev/rmt0 /dev/rmt1 /dev/rmt/0st /dev/rmt/1st TSM Definition Library Lib1 lb1.0.1.3 Drive Drive0 //./tape0 Drive Drive1 //./tape1 Path Drive0 /dev/rmt0 Path Drive1 /dev/rmt1 Path Path Drive0 /dev/rmt/0st Drive1 /dev/rmt/1st 35 Mapping device names to devices -2 1. Define drives on TSM server and determine relationship between TSM device name, host device name and device WWN/Serial No. Also determine the Element Number of the device in the library 2. For each (Storage Agent) host determine WWN/Serial # and device name for each tape device. 3. Use WWN s/serial # to relate device names on each host to TSM device name. 4. Define drives on TSM server using PATH statements 36 18
Mapping device names to devices-3 TSM Device Name Drive0 Drive1 Drive WWN/Serial No. Library Element Number TSM Server Host Device Names Storage Agent1 Storage Agent2 Storage Agent3 37 Mapping device names to devices Windows 1 TSM V5.1 TSM Utilities, Device Information Shows a devices WWN and Serial Number as well as device address (mtx.y.z.n) Also allows central discovery of Storage Agent device mappings Requires SNIA HBA SAN Mgmt API to be installed 38 19
Mapping device names to devices Windows 2 TSM Server, Device Information display with Qlogic SNIA SAN Mgmt API Installed Device information can also be displayed from this screen for Storage Agents in the same Windows Domain 39 Mapping device names to devices -5 AIX lsattr El mtx/rmtx shows devices WWN lscfg vl mtx/rmtx shows devices Serial Number Solaris Relate device name to WWN using SCSI and LUN addresses ls -l shows device name and SCSI/LUN mapping dmesg output shows SCSI Target address to WWN mapping ls l /dev/rmt/* lrwxrwxrwx 1 root other 45 Jan 3 14:22 /dev/rmt/0mt ->../../devices/pci@1f,0/pci@1/scsi@2/mt@5,1:mt dmesg (/var/adm/messages)... qla2200-hba0-scsi-target-id-5-fibre-channel-name="100000e00201d0d7"; 40 20
Agenda - Going Live Testing testing Diagnosing Problems Storage agent messages TSM device utilities 41 Testing Test after every hardware change Changes can introduce errors which can cause failure to recover data. Proving data integrity TSM 5.1 includes end to end CRC checking use only during testing 42 21
Testing Check tape hardware works reliably with TSM server in LAN configuration first Check TDP s and B/A Client work on LAN first Test each drive with each Storage Agent to check they are properly defined and accessible. Use BA Client ANR8779E (Session: 7, Origin: UKSAN4_SA) Unable to open drive /dev/mt1, error number=2. Invalid device specified error number=16, Device Busy (SCSI Reserved to another system) 43 Diagnosing problems Storage Agent can be run in foreground session, to see all messages. All Storage Agent messages should be logged centrally in the server Activity Log Can issue commands from TSM server console storage_agent1: QUERY SESSION mttest and lbtest utilities Provided in utilities or devices directory test operation of tape devices or library operations. Shows device serial numbers 44 22
Question and Answer 45 23