Storage Protocol Analyzers: Not Just for R & D Anymore. Brandy Barton, Medusa Labs - Finisar

Similar documents
STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Keywords: ASM, Fiber channel, Testing, Avionics.

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Comparing Server I/O Consolidation Solutions: iscsi, InfiniBand and FCoE. Gilles Chekroun Errol Roberts

Design and Implementations of FCoE for the DataCenter. Mike Frase, Cisco Systems

Virtual Storage Networks

Advanced iscsi Management April, 2008

APG-FC2 PROTOCOL SUPPORT FOR AVIONICS APPLICATION FIBRE CHANNEL TESTING. Ken Bisson

Identifying and Eliminating Backup System Bottlenecks: Taking Your Existing Backup System to the Next Level

Serial Attached SCSI Comparison to Fibre Channel with FCP

Notes & Lessons Learned from a Field Engineer. Robert M. Smith, Microsoft

ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar

Port Tapping Session 3 How to Survive the SAN Infrastructure Storm

Use Cases for iscsi and FCoE: Where Each Makes Sense

ADVANCED DATA REDUCTION CONCEPTS

Configuring Fibre Channel Interfaces

High Availability Using Fault Tolerance in the SAN. Mark S Fleming, IBM

Course Documentation: Printable course material This material is copyrighted and is for the exclusive use of the student only

Absolute Analysis Investigator Architecture Fibre Channel Solutions

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation

Traditional SAN environments allow block

SNIA Discussion on iscsi, FCIP, and IFCP Page 1 of 7. IP storage: A review of iscsi, FCIP, ifcp

Redefining Storage Economics. Ethernet SAN Comparison of FC (and FCoE), iscsi, and AoE Robert Przykucki, Director of Product Management

Barcelona: a Fibre Channel Switch SoC for Enterprise SANs Nital P. Patwa Hardware Engineering Manager/Technical Leader

Storage Virtualization II Effective Use of Virtualization - focusing on block virtualization -

Switching & ARP Week 3

Title: CompTIA Storage+ Powered by SNIA

SRM: Can You Get What You Want? John Webster, Evaluator Group.

SRM: Can You Get What You Want? John Webster Principal IT Advisor, Illuminata

Storage Virtualization I What, Why, Where and How? Rob Peglar Xiotech Corporation

NETWORKING COMPONENTS

The Role of WAN Optimization in Cloud Infrastructures. Josh Tseng, Riverbed

Storage and Virtualization: Five Best Practices to Ensure Success. Mike Matchett, Akorri

WAN Application Infrastructure Fueling Storage Networks

SAN Audit Report For ABC Company

High Availability Using Fault Tolerance in the SAN. Wendy Betts, IBM Mark Fleming, IBM

CompTIA Storage+ Powered by SNIA

10Gbps Ethernet Solutions

Port Tapping Session 2 Race tune your infrastructure

Fibre Channel vs. iscsi. January 31, 2018

Brocade Fabric Vision Technology

Storage Virtualization II. - focusing on block virtualization -

CONTENTS. 1. Introduction. 2. How To Store Data. 3. How To Access Data. 4. Manage Data Storage. 5. Benefits Of SAN. 6. Conclusion

Brocade Fabric Vision Technology

Interoperable Cloud Storage with the CDMI Standard. Mark Carlson, SNIA TC and Oracle Co-Chair, SNIA Cloud Storage TWG

Evolution with End-to-End Data Center Virtualization

iscsi Technology: A Convergence of Networking and Storage

Storage Performance Management Overview. Brett Allison, IntelliMagic, Inc.

Vendor: EMC. Exam Code: E Exam Name: Cloud Infrastructure and Services Exam. Version: Demo

Hands-On Wide Area Storage & Network Design WAN: Design - Deployment - Performance - Troubleshooting

A Crash Course In Wide Area Data Replication. Jacob Farmer, CTO, Cambridge Computer

vsphere Networking Update 2 VMware vsphere 5.5 VMware ESXi 5.5 vcenter Server 5.5 EN

Fibre Channel Performance: Congestion, Slow Drain, and Over Utilization, Oh My! Live Webcast February 6, :00 am PT

ETHERNET ENHANCEMENTS FOR STORAGE. Sunil Ahluwalia, Intel Corporation

Storage Area Network (SAN)

Cisco SAN Analytics and SAN Telemetry Streaming

Data Acquisition in High Speed Ethernet & Fibre Channel Avionics Systems

Exam : S Title : Snia Storage Network Management/Administration. Version : Demo

Interoperable Cloud Storage with the CDMI Standard. Mark Carlson, SNIA TC and Oracle Chair, SNIA Cloud Storage TWG

Fibre Channel Traffic Issues with MXP MR 10DME C Linecards

T10/03-186r2 SAS-1.1 Transport layer retries

Microsoft E xchange 2010 on VMware

Overview and Current Topics in Solid State Storage

Virtualization Practices:

FIBRE CHANNEL TESTING FOR AVIONICS APPLICATIONS. Bill Fleissner Gary Warden

Portable 2-Port Gigabit Wirespeed Streams Generator & Network TAP

UNIT IV -- TRANSPORT LAYER

Jeff Dodson / Avago Technologies

Expert Insights: Expanding the Data Center with FCoE

FIBRE CHANNEL OVER ETHERNET

Ethernet Network Emulators GEM, XGEM

03-186r5 SAS-1.1 Transport layer retries 13 January 2004

70-414: Implementing an Advanced Server Infrastructure Course 01 - Creating the Virtualization Infrastructure

Performance and Innovation of Storage. Advances through SCSI Express

Introduction to iscsi

Clustering In A SAN For High Availability

Optimizing SAN Performance & Availability: A Financial Services Customer Perspective

IBM and BROCADE Building the Data Center of the Future with IBM Systems Storage, DCF and IBM System Storage SAN768B Fabric Backbone

pnfs, parallel storage for grid and enterprise computing Joshua Konkle, NetApp, Inc.

Ron Emerick, Oracle Corporation

Object-based Storage (OSD) Architecture and Systems

As storage networking technology

Brocade s Mainframe and FICON Presentations and Seminars that are available to Customers and Partners

Fibre Channel Technologies Current & Future. Dr. M. K. Jibbe, LSI Corporation (ESG) Steven Wilson, Brocade

CompTIA Exam SK0-003 CompTIA Server+ Certification Exam Version: 10.0 [ Total Questions: 529 ]

Server and Storage Consolidation with iscsi Arrays. David Dale, NetApp Suzanne Morgan, Microsoft

BIST-SCT Command Proposal

FCP-2 PR Comment Resets & Serial Access Devices. Roger Cummings July 18, 2001

ARINC-818 TESTING FOR AVIONICS APPLICATIONS. Ken Bisson Troy Troshynski

03-186r3r3 SAS-1.1 Transport layer retries 25 October 2003

To: T10 Membership T10/97-184R4 Subject: Use of Class 2 for Fibre Channel Tapes From: Date: 23-Oct-1997

IBM TotalStorage SAN Switch F32

2014 LENOVO INTERNAL. ALL RIGHTS RESERVED.

Overview and Current Topics in Solid State Storage

access addresses/addressing advantages agents allocation analysis

ARINC 818 on Copper The Successor to HOTLink II Video Links? A White Paper by Jon Alexander

Configuring Interface Buffers

The Benefits of Brocade Gen 5 Fibre Channel

VirtualWisdom SAN Performance Probe Family Models: ProbeFC8-HD, ProbeFC8-HD48, and ProbeFC16-24

Summary of MAC protocols

Transcription:

Storage Protocol Analyzers: Not Just for R & D Anymore Brandy Barton, Medusa Labs - Finisar

SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced without modification The SNIA must be acknowledged as source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. 2

Abstract Storage Protocol Analyzers Not Just For R&D Anymore In recent years, a growing number of data centers have begun to employ the use of traditional R&D analysis tools, such as protocol analyzers, into their production SAN environments. Reasons for this adoption have been cited as protecting the ROI of the SAN, decreasing downtime, reducing the risk of outages and preventing lost or corrupt data. This session will provide an overview of protocol analyzer tools and how to use them. 3

Agenda Topics General SAN Tool Comparison How Protocol Analyzers Work Analyzer Placement Key Fibre Channel Concepts for Troubleshooting Key SAN Issues Found by Protocol Analyzers 4

SAN Lifecycle Purchase SAN Proof Operate Alert Upgrade Characterize Configure Error Insertion Load balance Validate interoperability Verify design Stabilize Obtain baseline Monitor traffic Gather statistics Compare & Report SLA Mgmt Manage growth Budget Ensure availability Link to biz results Support what ifs Record Alarms Thresholds Analyze Replay Analyze Isolate Root cause Plan Test Validate Re-baseline Performance delta Risk abatement Repair or Reconfigure 5

SAN Tool Matrix Lifecycle Tool Matrix Proof Operate Alert Analyze Load Generators x SAN Testers x B.E.R.T. x Switch Monitoring x Storage Monitoring x x HBA Monitoring x Network Monitoring x x Emulators x x Protocol Analyzers x x x 6

Protocol Analyzer Tools Proprietary implementations IO Traces Tools & Analysis group within SNIA working on standardization Capture I/O Traces Detailed protocol-level data No issue with heterogeneous configurations Do not see what is inside a server bus (e.g. HBA to OS, Storage Controller to Drive). 7

Why Analyzers Are Making Their Way to the Data Center Problem hardware and software must be pinpointed before changes are approved SAN admins must prove they know where the problem is coming from. SAN error collection on intermediary hardware and software does not always show where the source of the problem resides. Switch trunking makes analyzing delivery issues more difficult. 8

Analyzer Architecture SFP SFP Performance Monitor Trigger & Filter Comparators An example of how an analyzer might work Choose position In-line Tap/sniff Choose probing method Analog bypass (Repeating) Retiming Capture Memory Capture Memory Time Tag To other Analyzer 9

Connect to any Topology (FC & iscsi) Point-to-point Analyzer SAN LAN Bridge Analyzer Analyzer Targets Analyzer Analyzer Fabric Analyzer Analyzer Initiators Loop 10

Ways Into the Link In-line or in between the devices from which you want to capture the traffic or Placed outside the direct link with the use of an optical splitter ( sniffing ) or hub of optical splitters. 11

In-Line Analyzer Placement I Analyzer Switch Analyzer T Fabric T n I 1 Analyzer GTA T 2 T 1 Direct Attach Key: I Initiator (Host) T Target (Storage) 12

Using Optical Splitters Switch Optical Splitters Host Analyzer This analyzer is placed to capture traffic between the Host and the Switch by sniffing the link through a set of optical splitters. Optical splitters are installed while the network is down but once they are in you can add and remove analysis tools as needed 13

Hub of Optical Splitters Switch / Devices Analyzer 14

Which Link Do I Look At? If you have only 2-ports you are challenged to place the analyzer in a position to capture the right data since you can only capture from one link at a time Will see frames for SCSI I/O. Will not see frames and credits between host and switch (or switch and storage) link if that s not the one you are plugged into. May take moving the analyzer and taking more than one trace to accurately troubleshoot some issues. 15

Task: Troubleshooting I/O Sometimes finding the root cause of an issue requires a more focused look at the I/O flowing on the SAN. This type of troubleshooting will require some familiarity with protocol specification standards. Data needed to provide analysis: I/O capture (to see the flow of commands and their fulfillment). Error 16

Protocol Constructs The fundamental Protocol Constructs in Fibre Channel are: Exchanges Sequences Frames Exchange Sequence... Frame Frame... Frame 17

Exchange The Upper Level Storage Protocol defines a conversation in order to exchange data Command / Data / Status phases (SCSI) In Fibre Channel this conversation is called an Exchange An Exchange consists of a set of related Sequences Exchanges tend to be bi-directional The Exchange is identified by the OX_ID in Fibre Channel 18

Frame Structure Frame Format Idles SOF Frame HDR Data Field CRC EOF Idles 24* 4 24 0-2112 4 4 (Bytes) * 6 Idle word (24 bytes 4 bytes each) required by transmitter 2 Idle words (8 bytes) guaranteed to receiver 19

SCSI Read Exchange Initiator CMD Fabric Target Frame Sequence (OX_ID = 0xABCD) Sequence Frame Frame Frame Data Status Exchange Sequence Frame 20

SCSI Write Exchange/Task Initiator CMD Fabric Target Sequence Sequence Transfer Ready Data Exchange Sequence (OX_ID = 0xABCD) Status Sequence 21

Upper Layer Protocol = SCSI 90+% of Fibre Channel implementations are transporting SCSI as the Upper Layer Protocol (ULP) A complete operation consists of the successful delivery of the SCSI Command (1 FC frame), if applicable to the command the SCSI Data (1- many FC frames), and the SCSI Status (1 FC frame). Failure to deliver any part of the SCSI operation will result in a SCSI Timeout situation. 22

Class 3 Datagram Most FC implementations are running Class 3 configurations: No FC means of recovery for dropped or corrupted frames Recovery is handled by SCSI Timeouts take from 60-120 seconds to occur depending on the OS on the host system. Protocol Analyzers are the only tool available to pinpoint the actual failed I/O. 23

The Issue SCSI Timeouts in FC

What Happens When SCSI Fails & Times Out? Host (initiator) fails to deliver to the storage (target) Target may return a Check Condition. Initiator will normally retry the command (operation). Target fails to deliver to the host: Initiator may return an ABTS (Abort Sequence) which terminates the operation. Initiator will normally retry the operation. (Note: some hosts do not send ABTS, they simply move to the retry process). 25

Capturing Usable Information In order to have enough data to analyze, all traces should adhere to the following: Capture both frame and non-frame events Capture Header plus a minimum of 64 bytes of payload Valid and relevant data from both directions of the link must be captured (i.e. full duplex) 26

Locating An Error The status came out of the Target ok. But it came out of the switch with a CRC error. Destination & Source Device Addresses Affected Exchange OX_ID = 0x028D 27

CRC Error Indicates corruption in the FC frame. Class 3 FC will drop this frame The Status of this SCSI IO (the contents of this frame) will not be delivered. SCSI Initiator times out ~ 60 seconds. 28

Determining the Fallout of the CRC Error SCSI Timeout A little over 57 seconds later ABTS Note: the association is made from the OX_ID, 0x028D 29

Check Condition Cause of SCSI Timeout Condition 1. Data offsets overlap Parameter Field 2. SCSI Target cites Check Condition for Relative Offset Mismatch. Aborts Command. 3. Initiator sends ABTS. Target accepts it. 30

What Do I Do Now? Congratulations, you have just identified the source of a single IO failure!! Take additional captures to watch for increased errors on the link upon which the CRC error occurred Plan and justify a course of action (e.g replace a cable, replace an SFP, call your vendor) Check out SNIA Tutorial: SAN Performance Testing 31

Q&A / Feedback Please send any questions or comments on this presentation to SNIA: trackstorage@snia.org Many thanks to the following individuals for their contributions to this tutorial. SNIA Education Committee Steve Klotz Jim Marrone Elaine Silber Rob Peglar 32