VTUNOTESBYSRI MODULE 1: INTRODUCTION TO INFORMATION STORAGE DATA CENTER ENVIRONMENT DATA PROTECTION - RAID INTELLIGENT STORAGE SYSTEM

Size: px

Start display at page:

Download "VTUNOTESBYSRI MODULE 1: INTRODUCTION TO INFORMATION STORAGE DATA CENTER ENVIRONMENT DATA PROTECTION - RAID INTELLIGENT STORAGE SYSTEM"

Corey Waters
5 years ago
Views:

1 MODULE 1: INTRODUCTION TO INFORMATION STORAGE DATA CENTER ENVIRONMENT DATA PROTECTION - RAID INTELLIGENT STORAGE SYSTEM 1.1 Information Storage Data Types of Data Big Data Data Science Information Storage 1.2 Evolution of Storage Architecture Server Centric Storage Architecture Information Centric Architecture 1.3 Data Center Infrastructure Core Elements of a Data Center Key Characteristics for Data Center Managing a Data Center 1.4 Virtualization and Cloud Computing Virtualization Cloud Computing 1.5 Data Center Environment Application DBMS Host Operating System (OS) Memory Virtualization Device Driver Logical Volume Manager (LVM) File System Compute Virtualization Connectivity Physical Components Interface Protocols IDE/ATA SCSI Fibre Channel IP Storage 1.6 RAID Implementation Methods Software RAID Hardware RAID 1.7 RAID Array Components 1.8 RAID Techniques Striping Mirroring Parity 1.9 RAID Levels RAID RAID Nested-RAID 1-1

2 1.9.4 RAID RAID RAID RAID RAID Impact on Disk Performance Application IOPS and RAID Implementations 1.11 RAID Comparison 1.12 Components of an Intelligent Storage System Front End Cache Structure of Cache Read Operation with Cache Write Operation with Cache Cache Implementation Cache Management Cache Data Protection Back End Physical Disk 1.13 Storage Provisioning Traditional Storage Provisioning Logical Unit (LUN) Virtual Storage Provisioning 1.14 Types of Intelligent Storage System High End Storage System Midrange Storage System 1-2

3 Icons Used In The Notes 1-3

MODULE 1: INTRODUCTION TO INFORMATION STORAGE 1.1 Information Storage Companies use data to derive information that is critical to their day-to-day operations.

4 MODULE 1: INTRODUCTION TO INFORMATION STORAGE 1.1 Information Storage Companies use data to derive information that is critical to their day-to-day operations. Storage is a repository that is used to store and retrieve the digital-data Data Data is a collection of raw facts from which conclusions may be drawn. Example: Handwritten-letters Printed book Photograph Movie on video-tape The data can be generated using a computer and stored in strings of 0s and 1s (Figure 1-1). Data in 0s/1s form is called digital-data. Digital-data is accessible by the user only after it is processed by a computer. Figure 1-1: Digital data The factors contributing to the growth of digital-data are: 1) Increase in Data Processing Capabilities Modern computers provide a significant increase in data-processing capabilities. This allows conversion of various types of data (like book, photo or video) into digital-formats. 2) Lower Cost of Digital Storage With the advancement in technology, the cost of storage-devices have decreased. This cost-benefit has increased the rate at which data is being generated and stored. 3) Affordable and Faster Communication Technology Nowadays, rate of sharing digital-data is much faster than traditional approaches (e.g. postal) For example, i) A handwritten-letter may take a week to reach its destination. ii) On the other hand, an message may take a few seconds to reach its destination. 4) Increase of Smart Devices and Applications Smartphones, tablets and smart applications have contributed to the generation of digital-content 1-4

1.1.2 Types of Data Data can be classified as structured or unstructured based on how it is stored & managed(figure 1-2) Structured data is organized in table format.

5 1.1.2 Types of Data Data can be classified as structured or unstructured based on how it is stored & managed(figure 1-2) Structured data is organized in table format. Therefore, applications can query and retrieve the data efficiently. Structured data is stored using a DBMS. (Table contains rows and columns). Unstructured data cannot be organized in table format. Therefore, applications find it difficult to query and retrieve the data. For example, customer contacts may be stored in various forms such as Sticky notes messages Business cards Figure 1-2: Types of data Big Data It refers to data-sets whose sizes are beyond the capability of commonly used software tools. The software tools are used to store, manage, and process data within acceptable time limits. Big-data includes both structured- and unstructured-data. The data is generated by different sources such as business application web pages videos images s social media These data-sets require real-time capture or updates for analysis predictive modeling and decision making. Significant opportunities exist to extract value from big data. 1) Devices that collect data from multiple locations and also generate new data about this data. 2) Data collectors who gather data from devices and users. 3) Data-aggregators that compile the collected data to extract meaningful information. 4) Data users & buyers who benefit from info collected & aggregated by others in the data value chain. 1-5

6 Data Science Data Science is a discipline which enables companies to derive business-value from big-data. Data Science represents the synthesis of various existing disciplines such as statistics math data visualization and computer science. Several industries and markets currently looking to employ data science techniques include scientific research healthcare public administration fraud detection social media banks The storage architecture should be simple, efficient, and inexpensive to manage. provide access to multiple platforms and data sources simultaneously Information Information vs. Data: i) Information is the intelligence and knowledge derived from data. ii) Data does not fulfill any purpose for companies unless it is presented in a meaningful form. Companies need to analyze data for it to be of value. Effective data analysis extends its benefits to existing companies. Also, effective data analysis creates the potential for new business opportunities. Example: Job portal. Job seekers post their resumes on the websites like Naukiri.com, LinkedIn.com, Shine.com These websites collect & post resumes on centrally accessible locations for prospective employers In addition, employers post available positions on these websites Job-matching software matches keywords from resumes to keywords in job postings. In this way, the search engine uses data & turns it into information for employers & job seekers Storage Data created by companies must be stored so that it is easily accessible for further processing. In a computing-environment, devices used for storing data are called as storage-devices. Example: Memory in a cell phone or digital camera DVDs, CD-ROMs and hard-disks in computers. 1-6

7 1.2 Evolution of Storage Architecture Figure 1-3: Evolution of storage architectures Server Centric Storage Architecture In earlier days, companies had a data-center consisting of 1) Centralized computers (mainframes) and 2) Information storage-devices (such as tape reels and disk packs) Each department had their own servers and storage because of following reasons (Figure 1-3): evolution of open-systems affordability of open-systems and easy deployment of open-systems. Disadvantages: 1) The storage was internal to the server. Hence, the storage cannot be shared with any other servers. 2) Each server had a limited storage-capacity. 3) Any administrative tasks resulted in unavailability of information. The administrative tasks can be maintenance of the server or increasing storage-capacity 4) The creation of departmental servers resulted in ' unprotected, unmanaged, fragmented islands of information and increased capital and operating expenses. To overcome these challenges, storage evolved from server-centric architecture to information-centric architecture Information Centric Architecture Storage is managed centrally and independent of servers. Storage is allocated to the servers on-demand from a shared-pool. A shared-pool refers to a group of disks. The shared-pool is used by multiple servers. When a new server is deployed, storage-capacity is assigned from the shared-pool. The capacity of shared-pool can be increased dynamically by adding more disks without interrupting normal-operations. Advantages: 1) Information management is easier and cost-effective. 2) Storage technology even today continues to evolve. This enables companies to consolidate & leverage their data to achieve highest return on info assets 1-7

1.3 Data Center Infrastructure Data-center provides centralized data processing capabilities to companies. 1.3.1 Core Elements of a Data Center Five core-elements of a data-center: 1) Application An application is a program that provides the logic for computing-operations.

8 1.3 Data Center Infrastructure Data-center provides centralized data processing capabilities to companies Core Elements of a Data Center Five core-elements of a data-center: 1) Application An application is a program that provides the logic for computing-operations. For example: Order-processing-application. Here, an Order-processing-application can be placed on a database. Then, the database can use OS-services to perform R/W-operations on storage. 2) Database DBMS is a structured way to store data in logically organized tables that are interrelated. Advantages: 1) Helps to optimize the storage and retrieval of data. 2) Controls the creation, maintenance and use of a database. 3) Server and OS A computing-platform that runs 1) applications and 2) databases. 4) Network A data-path that facilitates communication 1) between clients and servers or 2) between servers and storage. 5) Storage Array A device that stores data permanently for future-use. Example: Figure 1-4 shows an Order-processing application Step 1: A customer places an order through the AUI on the client-computer. Step 2: The client accesses the DBMS located on the server to provide order-related information. (Order-related information includes customer-name, address, payment-method & product-ordered). Step 3: The DBMS uses the server to write this data to the disks in the storage. Step 4: The storage-network provides the communication-link between server and storage and transports the write-command from server to storage Step 5: After receiving the write-command, the storage saves the data on disks. (AUI --> Application User Interface Disk --> hard disk). Figure 1-4: Example of an order processing-application 1-8

1.3.2 Key Characteristics for Data Center Figure 1-5: Key characteristics of data center elements 1) Availability In data-center, all core-elements must be designed to ensure availability (Figure

9 1.3.2 Key Characteristics for Data Center Figure 1-5: Key characteristics of data center elements 1) Availability In data-center, all core-elements must be designed to ensure availability (Figure 1-5). If the users cannot access the data in time, then it will have negative impact on the company. (For example, if amazon server goes down for even 5 min, it incurs huge loss in millions). 2) Security To prevent unauthorized-access to data, 1) Good polices & procedures must be used. 2) Proper integration of core-elements must be established. Security-mechanisms must enable servers to access only their allocated-resources on the storage. 3) Scalability It must be possible to allocate additional resources on-demand w/o interrupting normal-operations. The additional resources includes CPU-power and storage. Business growth often requires deploying more servers new applications and additional databases. The storage-solution should be able to grow with the company. 4) Performance All core-elements must be able to provide optimal-performance and service all processing-requests at high speed. The data-center must be able to support performance-requirements. 5) Data Integrity Data integrity ensures that data is written to disk exactly as it was received. For example: Parity-bit or ECC (error correction code). Data-corruption may affect the operations of the company. 6) Storage Capacity The data-center must have sufficient resources to store and process large amount of data efficiently. When capacity-requirement increases, the data-center must be able to provide additional capacity without interrupting normal-operations. Capacity must be managed by reallocation of existing-resources rather than by adding new resources 7) Manageability A data-center must perform all operations and activities in the most efficient manner. Manageability is achieved through automation i.e. reduction of human-intervention in common tasks. 1-9

10 1.3.3 Managing a Data Center Managing a data-center involves many tasks. Key management-tasks are: 1) Monitoring 2) Reporting and 3) Provisioning. 1) Monitoring is a process of continuous collection of information and review of the entire storage infrastructure (called as Information Storage System). Following parameters are monitored: i) Security ii) Performance iii) Accessibility and iv) Capacity. 2) Reporting is done periodically on performance, capacity and utilization of the resources. Reporting tasks help to establish business-justifications and establish chargeback of costs associated with operations of data-center. 3) Provisioning is process of providing h/w, s/w & other resources needed to run a data-center. Main tasks are: i) Capacity Planning and ii) Resource Planning. i) Capacity Planning It ensures that future needs of both user & application will be addressed in most cost-effective way ii) Resource Planning It is the process of evaluating & identifying required resources such as Personnel (employees) Facility (site or plant) and Technology (Artificial Intelligence, Deep Learning). 1-10

11 1.4 Virtualization and Cloud Computing Virtualization Virtualization is a technique of abstracting & making physical-resource appear as logical-resource. The resource includes compute, storage and network. Virtualization existed in the IT-industry for several years in different forms. Form-1 Virtualization enables pooling of resources and providing an aggregated view of the resource capabilities. 1) Storage virtualization enables pooling of multiple small storage-devices (say ten thousand 10GB) and providing a single large storage-entity (10000*10=100000GB = 100TB). 2) Compute-virtualization enables pooling of multiple low-power servers (say one thousand 2.5GHz) and providing a single high-power entity (1000*2.5=2500GHz = 2.5THz). Form-2 Virtualization also enables centralized management of pooled-resources. Virtual-resources can be created from the pooled-resources. For example, virtual-disk of a given capacity(say 10GB) can be created from a storage-pool (100TB) virtual-server with specific power (2.5GHz) can be created from a compute-pool (2.5THz) Advantages: 1) Improves utilization of resources (like storage, CPU cycle). 2) Scalable Storage-capacity can be added from pooled-resources w/o interrupting normal-operations. 3) Companies save the costs associated with acquisition of new resources. 4) Fewer resources means less-space and -energy (i.e. electricity) Cloud Computing Cloud-computing enables companies to use IT-resources as a service over the network. For example: CPU hours used Amount of data-transferred Gigabytes of data-stored Advantages: 1) Provides highly scalable and flexible computing-environment. 2) Provides resources on-demand to the hosts. 3) Users can scale up or scale down the demand of resources with minimal management-effort. 4) Enables self-service requesting through a fully automated request-fulfillment process. 5) Enables consumption-based metering..'. consumers pay only for resources they use. For example: Jio provides 11Rs plan for 400MB 6) Usually built upon virtualized data-centers, which provide resource-pooling

12 MODULE 1(CONT.): DATA CENTER ENVIRONMENT 1.5 Data Center Environment The data flows from an application to storage through various components collectively referred as a data-center environment. The five main components in this environment are 1) Application 2) DBMS 3) Host 4) Connectivity and 5) Storage. These entities, along with their physical and logical-components, facilitate data-access Application An application is a program that provides the logic for computing-operations. It provides an interface between user and host. (R/W --> read/write) The application sends requests to OS to perform R/W-operations on the storage. Applications can be placed on the database. Then, the database can use OS-services to perform R/W-operations on the storage. Applications can be classified as follows: business applications infrastructure management applications data protection applications security applications. Some examples of the applications are: enterprise resource planning (ERP) backup antivirus Characteristics of I/Os generated by application influence the overall performance of storage-device. Common I/O characteristics are: Read vs. Write intensive Sequential vs. Random I/O size DBMS DBMS is a structured way to store data in logically organized tables that are inter-related. The DBMS processes an application s request for data and instructs the OS to transfer the appropriate data from the storage. Advantages: 1) Helps to optimize the storage and retrieval of data. 2) Controls the creation, maintenance and use of a database Host Host is a client- or server-computer that runs applications. Users store and retrieve data through applications. Hosts can be physical- or virtual-machines. Example of host includes desktop computers servers laptops smartphones. (hosts --> compute-systems) 1-12

13 A host consists of 1) CPU 2) Memory 3) I/O devices 4) Software. The software includes i) OS ii) Device-drivers iii) Logical volume manager (LVM) iv) File-system The software can be installed individually or may be part of the OS Operating System (OS) An OS is a program that acts as an intermediary between application and hardware-components. The OS controls all aspects of the computing-environment. Data-access is one of the main service provided by OS to the application. Tasks of OS: 1) Monitor and respond to user actions and the environment. 2) Organize and control hardware-components. 3) Manage the allocation of hardware-resource (simply the resource). 4) Provide security for the access and usage of all managed resources. 5) Perform storage-management tasks. 6) Manage components such as file-system, LVM & device drivers Memory Virtualization Memory-virtualization is used to virtualize the physical-memory (RAM) of a host. It creates a VM with an address-space larger than the physical-memory space present in computer. The virtual-memory consists of address-space of the physical-memory and part of address-space of the disk-storage. The entity that manages the virtual-memory is known as the virtual-memory manager (VMM). The VMM manages the virtual-to-physical-memory mapping and fetches data from the disk-storage The space used by the VMM on the disk is known as a swap-space. A swap-space is a portion of the disk that appears like physical-memory to the OS. The memory is divided into contiguous blocks of fixed-size pages. (VM --> virtual-memory) Paging A paging moves inactive-pages onto the swap-file and brings inactive-pages back to the physical-memory when required. Advantages: 1) Enables efficient use of the available physical-memory among different applications. Normally, the OS moves the least used pages into the swap-file. Thus, sufficient RAM is provided for processes that are more active. Disadvantage: 1) Access to swap-file pages is slower than physical-memory pages. This is because swap-file pages are allocated on the disk which is slower than physical-memory Device Driver It is a special software that permits the OS & hardware-component to interact with each other. The hardware-component includes printer, a mouse and a hard-drive. A device-driver enables the OS to recognize the device and use a standard interface to access and control devices. Device-drivers are hardware-dependent and OS-specific. 1-13

14 Logical Volume Manager (LVM) LVM is a software that runs on the host and manages the logical- and physical-storage. It is an intermediate-layer between file-system and disk. Advantages: 1) Provides optimized storage-access. 2) Simplifies storage-management. (PVID --> Physical-Volume IDentifier) 3) Hides details about disk and location of data on the disk. 4) Enables admins to change the storage-allocation without interrupting normal-operations. 5) Enables dynamic-extension of storage-capacity of the file-system. The main components of LVM are: 1) Physical-volumes 2) Volume-groups and 3) Logical-volumes. 1) Physical-Volume (PV): refers to a disk connected to the host. 2) Volume-Group (VG): refers to a group of one or more PVs. A unique PVID is assigned to each PV when it is initialized for use. PVs can be added or removed from a volume-group dynamically. PVs cannot be shared between different volume-groups. The volume-group is handled as a single unit by the LVM. Each PV is divided into equal-sized data-blocks called physical-extents. 3) Logical-Volume (LV): refers to a partition within a volume-group. Logical-volumes vs. Volume-group i) LV can be thought of as a disk-partition. ii) Volume-group can be thought of as a disk. The size of a LV is based on a multiple of the physical-extents. The LV appears as a physical-device to the OS. A LV is made up of non-contiguous physical-extents and may span over multiple PVs. A file-system is created on a LV. These LVs are then assigned to the application. A LV can also be mirrored to improve data-availability. Figure 1-6: Disk partitioning and concatenation It can perform partitioning and concatenation (Figure 1-6). 1) Partitioning A larger-capacity disk is partitioned into smaller-capacity virtual-disks. Disk-partitioning is used to improve the utilization of disks. 2) Concatenation Several smaller-capacity disks are aggregated to form a larger-capacity virtual-disk. The larger-capacity virtual-disk is presented to the host as one big logical-volume. 1-14

1.5.3.4 File System A file is a collection of related-records stored as a unit with a name. (say employee.lst) A file-system is a structured way of storing and organizing data in the form of files.

15 File System A file is a collection of related-records stored as a unit with a name. (say employee.lst) A file-system is a structured way of storing and organizing data in the form of files. File-systems enable easy access to data-files residing within disk-drive disk-partition or logical-volume. A file-system needs host-based software-routines (API) that control access to files. It provides users with the functionality to create, modify, delete and access files. A file-system organizes data in a structured hierarchical manner via the use of directories (i.e. folder) A directory refers to a container used for storing pointers to multiple files. All file-systems maintain a pointer-map to the directories and files. Some common file-systems are: FAT 32 (File Allocation Table) for Microsoft Windows NT File-system (NTFS) for Microsoft Windows UNIX File-system (UFS) for UNIX Extended File-system (EXT2/3) for Linux Figure 1-7: Process of mapping user files to disk storage Figure 1-7 shows process of mapping user-files to the disk-storage with an LVM: 1) Files are created and managed by users and applications. 2) These files reside in the file-system. 3) The file-system are mapped to file-system blocks. 4) The file-system blocks are mapped to logical-extents. 5) The logical-extents are mapped to disk physical-extents by OS or LVM. 6) Finally, these physical-extents are mapped to the disk-storage. 1-15

1.5.3.5 Compute Virtualization Compute-virtualization is a technique of masking(or abstracting) the physical-hardware from the OS.

16 Compute Virtualization Compute-virtualization is a technique of masking(or abstracting) the physical-hardware from the OS. It can be used to create portable virtual-computers called as virtual-machines (VMs). A VM appears like a host to the OS with its own CPU, memory and disk (Figure 1-8). However, all VMs share the same underlying hardware in an isolated-manner Compute-virtualization is done by virtualization-layer called as hypervisor. The hypervisor resides between the hardware and VMs. provides resources such as CPU, memory and disk to all VMs. Within a server, a large no. of VMs can be created based on the hardware-capabilities of the server. Advantages: 1) Allows multiple-os and applications to run concurrently on a single-computer. 2) Improves server-utilization. 3) Provides server-consolidation. Because of server-consolidation, companies can run their data-center with fewer servers Advantages of server-consolidation: i) Cuts down the cost for buying new servers. ii) Reduces operational-cost. iii) Saves floor- and rack-space used for data-center. 4) VM can be created in less time when compared to setting up the actual server. 5) VM can be restarted or upgraded without interrupting normal-operations. 6) VM can be moved from one computer to another w/o interrupting normal-operations. Figure 1-8: Server virtualization 1-16

1.5.4 Connectivity Connectivity refers to interconnection between host and peripheral-devices such as storage-devices Components of connectivity is classified as: 1) Physical-Components and 2)

17 1.5.4 Connectivity Connectivity refers to interconnection between host and peripheral-devices such as storage-devices Components of connectivity is classified as: 1) Physical-Components and 2) Interface Protocols Physical Components Physical-components refers to hardware-components used for connection between host & storage. Three components of connectivity are (Figure 1-9): 1) Host interface device 2) Port and 3) Cable 1) Host Interface Device is used to connect a host to other hosts and storage-devices. Example: HBA (host bus adapter) NIC (network interface card). HBA is an ASIC board that performs I/O-operations between host and storage. Advantage: 1) HBA relieves the CPU from additional I/O-processing workload. A host typically contains multiple HBAs. (ASIC --> application-specific integrated circuit). 2) Port refers to a physical connecting-point to which a device can be attached. An HBA may contain one or more ports to connect the host to the storage-device. 3) Cable is used to connect hosts to internal/external devices using copper-wire or optical-fiber. Figure 1-9: Physical components of connectivity Interface Protocol Interface-Protocol enables communication between host and storage. Protocols are implemented using interface-devices (or controllers) at both source and destination. The popular protocols are: 1) IDE/ATA (Integrated Device Electronics/Advanced Technology Attachment) 2) SCSI (Small Computer System Interface) 3) FC (Fibre Channel) and 4) IP (Internet Protocol) IDE/ATA It is a standard interface for connecting storage-devices inside PCs (Personal Computers). The storage-devices can be disk-drives or CD-ROM drives. It supports parallel-transmission. Therefore, it is also known as Parallel ATA (PATA). It includes a wide variety of standards. 1) Ultra DMA/133 ATA supports a throughput of 133 Mbps. 2) In a master-slave configuration, ATA supports 2 storage-devices per connector. 3) Serial-ATA (SATA) supports single bit serial-transmission.. 4) SATA version 3.0 supports a data-transfer rate up to 6 Gbps. 1-17

18 SCSI It has emerged as a preferred protocol in high-end computers. Compared to ATA, SCSI supports parallel-transmission and provides improved performance, scalability, and compatibility. Disadvantage: 1) Due to high cost, SCSI is not used commonly in PCs. It includes a wide variety of standards. 1) SCSI supports up to 16 devices on a single bus. 2) SCSI provides data-transfer rates up to 640 Mbps (for the Ultra-640 version). 3) SAS (Serial Attached SCSI) is a point-to-point serial protocol. 4) SAS version 2.0 supports a data-transfer rate up to 6 Gbps Fibre Channel It is a widely used protocol for high-speed communication to the storage-device. Advantages: 1) Supports gigabit network speed. 2) Supports multiple protocols and topologies. It includes a wide variety of standards. 1) It supports a serial data-transmission that operates over copper-wire and optical-fiber. 2) FC version 16FC supports a data-transfer rate up to 16 Gbps IP It is a protocol used for communicating data across a packet-switched network. It has been traditionally used for host-to-host traffic. '.' of new technologies, IP network has become a feasible solution for host-to-storage communication Advantages: 1) Reduced cost & maturity 2) Enables companies to use their existing IP-based network. Common example of protocols that use IP for host-to-storage communication: 1) iscsi and 2) FCIP 1-18

19 1.5.5 Storage A storage-device uses magnetic-, optical-, or solid-state-media. 1) Disk, tape and diskette uses magnetic-media for storage. 2) CD/DVD uses optical-media for storage. 3) Flash drives uses solid-state-media for storage. 1) Tapes are a popular storage-device used for backup because of low cost. Disadvantage: i) Data is stored on the tape linearly along the length of the tape. Search and retrieval of data is done sequentially. As a result, random data-access is slow and time consuming. Hence, tapes is not suitable for applications that require real-time access to data. ii) In shared environment, data on tape cannot be accessed by multiple applications simultaneously. Hence, tapes can be used by one application at a time. iii) On a tape-drive, R/W-head touches the tape-surface. Hence, the tape degrades or wears out after repeated use. iv) More overhead is associated with managing the tape-media because of storage and retrieval requirements of data from the tape. 2) Optical-disk is popular in small, single-user computing-environments. It is used to store data like photo, video as a backup-medium on PCs. Example: CD-RW Blu-ray disc and DVD. It is used as a distribution medium for single applications such as games. It is used as a means of transferring small amounts of data from one computer to another. Advantages: 1) Provides the capability to write once and read many (WORM). For example: CD-ROM 2) Optical-disks, to some degree, guarantee that the content has not been altered. Disadvantage: 1) Optical-disk has limited capacity and speed. Hence, it is not used as a business storage-solution Collections of optical-discs in an array is called as a jukebox. The jukebox is used as a fixed-content storage-solution. 3) Disk-drives are used for storing and accessing data for performance-intensive, online applications. Advantages: 1) Disks support rapid-access to random data-locations. Thus, data can be accessed quickly for a large no. of simultaneous applications. 2) Disks have a large capacity. 3) Disk-storage is configured with multiple-disks to provide increased capacity and enhanced performance. 4) Flash drives uses semiconductor media. (Flash drives --> Pen drive) Advantages: 1) Provides high performance and 2) Provides low power-consumption. 1-19

20 MODULE 1(CONT.): DATA PROTECTION - RAID 1.6 RAID Implementation Methods RAID stands for Redundant Array of Independent Disk. RAID is the way of combining several independent small disks into a single large-size storage. It appears to the OS as a single large-size disk. It is used to increase performance and availability of data-storage. There are two types of RAID implementation 1) hardware and 2) software. RAID-controller is a specialized hardware which performs all RAID-calculations and presents disk-volumes to host. Key functions of RAID-controllers: 1) Management and control of disk-aggregations. 2) Translation of I/O-requests between logical-disks and physical-disks. 3) Data-regeneration in case of disk-failures Software-RAID It uses host-based software to provide RAID functions. It is implemented at the OS-level. It does not use a dedicated hardware-controller to manage the storage-device. Advantage: 1) Provides cost- and simplicity-benefits when compared to hardware-raid. Disadvantages: 1) Decreased Performance RAID affects overall system-performance. This is due to the additional CPU-cycles required to perform RAID-calculations. 2) Supported Features RAID does not support all RAID-levels. 3) OS compatibility RAID is tied to the host-os. Hence, upgrades to RAID (or OS) should be validated for compatibility Hardware-RAID It is implemented either on the host or on the storage-device. It uses a dedicated hardware-controller to manage the storage-device. 1) Internal-Controller A dedicated controller is installed on a host. Disks are connected to the controller. The controller interacts with the disks using PCI-bus. Manufacturers integrate the controllers on motherboards. Advantage: 1) Reduces the overall cost of the system. Disadvantage: 1) Does not provide the flexibility required for high-end storage-devices. 2) External-controller The external-controller is an array-based hardware-raid. It acts as an interface between host and disks. It presents storage-volumes to host, which manage the drives using the supported protocol. 1-20

1.7 RAID Array Components A RAID-array is a large container that holds (Figure 1-10): 1) RAID-controller (or simply the controller) 2) Number of disks 3) Supporting hardware and software.

21 1.7 RAID Array Components A RAID-array is a large container that holds (Figure 1-10): 1) RAID-controller (or simply the controller) 2) Number of disks 3) Supporting hardware and software. Figure 1-10: Components of RAID array The logical-array is a subset of disks grouped to form logical-associations. Logical-arrays are also known as a RAID-set. (or simply the set). Logical-array consists of logical-volumes (LV). The OS recognizes the LVs as if they are physical-disks managed by the controller. 1-21

1.8 RAID Techniques RAID-levels are defined based on following 3 techniques: 1) Striping (used to improve performance of storage) 2) Mirroring (used to improve data-availability) and 3) Parity (used

For example: Striping with mirroring Striping with parity 1.8.1 Striping Striping is used to improve performance of a storage-device.

22 1.8 RAID Techniques RAID-levels are defined based on following 3 techniques: 1) Striping (used to improve performance of storage) 2) Mirroring (used to improve data-availability) and 3) Parity (used to provide data-protection) The above techniques determine performance of storage-device (i.e. better performance --> least response-time) data-availability data-protection Some RAID-arrays use a combination of above 3 techniques. For example: Striping with mirroring Striping with parity Striping Striping is used to improve performance of a storage-device. It is a technique of splitting and distribution of data across multiple disks. Main purpose: To use the disks in parallel. It can be bitwise, byte-wise or block wise. A RAID-set is a group of disks. Figure 1-11: Striped RAID set 1-22

23 In each disk, a predefined number of strips are defined. Strip refer to a group of continuously-addressable-blocks in a disk. Stripe refer to a set of aligned-strips that spans all the disks. (Figure 1-11) Strip-size refers to maximum amount-of-data that can be accessed from a single disk. In other words, strip-size defines the number of blocks in a strip. In a stripe, all strips have the same number of blocks. Stripe-width refers to the number of strips in a stripe. Striped-RAID does not protect data. To protect data, parity or mirroring must be used. Advantage: 1) As number of disks increases, the performance also increases. This is because more data can be accessed simultaneously. (Example for stripping: If one man is asked to write A-Z the amount of time taken by him will be more as compared to 2 men writing A-Z because from the 2 men, one man will write A-M and another will write N-Z at the same time so this will speed up the process) 1-23

24 1.8.2 Mirroring Mirroring is used to improve data-availability (or data-redundancy). All the data is written to 2 disks simultaneously. Hence, we have 2 copies of the data. Advantages: 1) Reliable Provides protection against single disk-failure. In case of failure of one disk, the data can be accessed on the surviving-disk (Figure 1-12). Thus, the controller can still continue to service the host s requests from surviving-disk. When failed-disk is replaced with a new-disk, controller copies data from surviving-disk to new-disk The disk-replacement activity is transparent to the host. 2) Increases read-performance because each read-request can be serviced by both disks. Disadvantages: 1) Decreases write-performance because each write-request must perform 2 write-operations on the disks. 2) Duplication of data. Thus, amount of storage-capacity needed is twice amount of data being stored (E.g. To store 100GB data, 200GB disk is needed). 3) Considered expensive and preferred for mission-critical applications (like military application). Mirroring is not a substitute for data-backup. Mirroring vs. Backup 1) Mirroring constantly captures changes in the data. 2) On the other hand, backup captures point-in-time images of data. Figure 1-12: Mirrored disks in an array 1-24

25 1.8.3 Parity Parity is used to provide data-protection in case of a disk-failure. An additional disk is added to the stripe-width to hold parity. In case of disk-failure, parity can be used for reconstruction of the missing-data. Parity is a technique that ensures protection of data without maintaining a duplicate-data Parity-information can be stored on separate, dedicated-disk or distributed across all the disks. For example (Figure 1-13): Consider a RAID-implementation with 5 disks (5*100 GB = 500 GB). 1) The first four disks contain the data (4*100 = 400GB). 2) The fifth disk stores the parity-information (1*100 = 100GB). Parity vs. Mirroring i) Parity requires 25% extra disk-space. (i.e. 500GB disk for 400GB data). ii) Mirroring requires 100% extra disk-space.(i.e. 800GB disk for 400GB data). The controller is responsible for calculation of parity. Parity-value can be calculated by P = D1 + D2 + D3 + D4 where D1 to D4 is striped-data across the set of five disks. Now, if one of the disks fails (say D1), the missing-value can be calculated by D1 = P - (D2 + D3 + D4) Figure 1-13: Parity RAID Advantages: 1) Compared to mirroring, parity reduces the cost associated with data-protection. 2) Compared to mirroring, parity consumes less disk-space. In previous example, i) Parity requires 25% extra disk-space. (i.e. 500GB disk for 400GB data). ii) Mirroring requires 100% extra disk-space. (i.e. 800GB disk for 400GB data). Disadvantage: 1) Decreases performance of storage-device. For example: Parity-information is generated from data on the disk. Therefore, parity must be re-calculated whenever there is change in data. This re-calculation is time-consuming and hence decreases the performance. 1-25

1.9 RAID Levels Table 1-1: Raid Levels 1.9.1 RAID-0 RAID-0 is based on striping-technique (Figure 1-14). Striping is used to improve performance of a storage-device.

Read operation: To read data, all the strips are combined together by the controller. Advantages: 1) Used in applications that need high I/O-throughput. (Throughput --> Efficiency).

26 1.9 RAID Levels Table 1-1: Raid Levels RAID-0 RAID-0 is based on striping-technique (Figure 1-14). Striping is used to improve performance of a storage-device. It is a technique of splitting and distribution of data across multiple disks. Main purpose: To use the disks in parallel. Therefore, it utilizes the full storage-capacity of the storage-device. Read operation: To read data, all the strips are combined together by the controller. Advantages: 1) Used in applications that need high I/O-throughput. (Throughput --> Efficiency). 2) As number of disks increases, the performance also increases. This is because more data can be accessed simultaneously. Disadvantage: 1) Does not provide data-protection and data-availability in case of disk-failure. Figure 1-14: RAID

27 1.9.2 RAID-1 RAID-1 is based on mirroring-technique. Mirroring is used to improve data-availability (or data-redundancy). Write operation: The data is stored on 2 different disks. Hence, we have 2 copies of data. Advantages: 1) Reliable Provides protection against single disk-failure. In case of failure of one disk, the data can be accessed on the surviving-disk (Figure 1-15). Thus, the controller can still continue to service the host s requests from surviving-disk. When failed-disk is replaced with a new-disk, controller copies data from surviving-disk to new-disk The disk-replacement activity is transparent to the host. 2) Increases read-performance because each read-request can be serviced by both disks. Disadvantages: 1) Decreases write-performance because each write-request must perform 2 write-operations on the disks. 2) Duplication of data. Thus, amount of storage-capacity needed is twice amount of data being stored (E.g. To store 100 GB data, 200 GB disk is required). 3) Considered expensive and preferred for mission-critical applications (like military application) Figure 1-15: RAID

28 1.9.3 Nested-RAID Figure 1-16: Nested-RAID Most data-centers require data-availability & performance from their storage-devices (Figure 1-16). RAID-01 and RAID-10 combines performance-benefit of RAID-0 and availability-benefit of RAID-1. (RAID-10 is also known as RAID-1+0) 1-28

29 It uses mirroring- and striping-techniques. It requires an even-number of disks. Minimum no. of disks = 4. Some applications of RAID-10: 1) High transaction rate OLTP (Online Transaction Processing) 2) Large messaging installations 3) Database applications that require high I/O-throughput random-access and high-availability. Common misunderstanding is that RAID-10 and RAID-01 are the same. But, they are totally different 1) RAID-10 RAID-10 is also called striped-mirror. The basic element of RAID-10 is a mirrored-pair. 1) Firstly, the data is mirrored and 2) Then, both copies of data are striped across multiple-disks. 2) RAID-01 RAID-01 is also called mirrored-stripe The basic element of RAID-01 is a stripe. 1) Firstly, data are striped across multiple-disks and 2) Then, the entire stripe is mirrored. Advantage of rebuild-operation:: 1) Provides protection against single disk-failure. In case of failure of one disk, the data can be accessed on the surviving-disk (Figure 1-15). Thus, the controller can still continue to service the host s requests from surviving-disk. When failed-disk is replaced with a new-disk, controller copies data from surviving-disk to new-disk Disadvantages of rebuild-operation: 1) Increased and unnecessary load on the surviving-disks. 2) More vulnerable to a second disk-failure. 1-29

1.9.4 RAID-3 RAID-3 uses both striping & parity techniques. 1) Striping is used to improve performance of a storage-device. 2) Parity is used to provide data-protection in case of disk-failure.

30 1.9.4 RAID-3 RAID-3 uses both striping & parity techniques. 1) Striping is used to improve performance of a storage-device. 2) Parity is used to provide data-protection in case of disk-failure. Parity-information is stored on separate, dedicated-disk. Data is striped across all disks except the parity-disk in the array. In case of disk-failure, parity can be used for reconstruction of the missing-data. For example (Figure 1-17): Consider a RAID-implementation with 5 disks (5*100GB = 500GB). 1) The first 4 disks contain the data (4*100 = 400GB). 2) The fifth disk stores the parity-information (1*100 = 100GB). Therefore, parity requires 25% extra disk-space (i.e. 500GB disk for 400GB data). Advantages: 1) Striping is done at the bit-level. Thus, RAID-3 provides good bandwidth for the transfer of large volumes of data. 2) Suitable for video streaming applications that involve large sequential data-access. Disadvantages: 1) Always reads & writes complete stripes of data across all disks '.' disks operate in parallel. 2) There are no partial writes that update one out of many strips in a stripe. Figure 1-17: RAID

31 1.9.5 RAID-4 Similar to RAID-3, RAID-4 uses both striping & parity techniques. 1) Striping is used to improve performance of a storage-device. 2) Parity is used to provide data-protection in case of disk-failure. Parity-information is stored on a separate dedicated-disk. Data is striped across all disks except the parity-disk. In case of disk-failure, parity can be used for reconstruction of the missing-data. Advantages: 1) Striping is done at the block-level. Hence, data-element can be accessed independently. i.e. A specific data-element can be read on single disk without reading an entire stripe 2) Provides good read-throughput and reasonable write-throughput RAID-5 Problem: In RAID-3 and RAID-4, parity is written to a dedicated-disk. If parity-disk fails, we will lose our entire backup. Solution: To overcome this problem, RAID-5 is proposed. In RAID-5, we distribute the parity-information evenly among all the disks. RAID-5 similar to RAID-4 because it uses striping and the drives (strips) are independently accessible. Advantages: 1) Preferred for messaging & media-serving applications. 2) Preferred for RDBMS implementations in which database-admins can optimize data-access. Figure 1-18: RAID

32 1.9.7 RAID-6 RAID-6 is similar to RAID-5 except that it has a second parity-element to enable survival in case of 2 disk-failures. (Figure 1-19). Therefore, a RAID-6 implementation requires at least 4 disks. Similar to RAID-5, parity is distributed across all disks. Disadvantages: Compared to RAID-5, 1) Write-penalty is more..'. RAID-5 writes perform better than RAID-6 2) The rebuild-operation may take longer time. This is due to the presence of 2 parity-sets. Figure 1-19: RAID

33 1.10 RAID Impact on Disk Performance When choosing a RAID-type, it is important to consider the impact to disk-performance. In both mirrored and parity-raids, each write-operation translates into more I/O-overhead for the disks. This is called write-penalty Figure 1-20 illustrates a single write-operation on RAID-5 that contains a group of five disks. 1) Four disks are used for data and 2) One disk is used for parity. The parity (E p) can be calculated by: E p = E 1 + E 2 + E 3 + E 4 Where, E 1 to E 4 is striped-data across the set of five disks. Whenever controller performs a write-operation, parity must be computed by reading old-parity (E p old) & old-data (E 4 old) from the disk. This results in 2 read-operations The new parity (E p new) can be calculated by: E p new = E p old E 4 old + E 4 new After computing the new parity, controller completes write-operation by writing the new-data and new-parity onto the disks. This results in 2 write-operations. Therefore, controller performs 2 disk reads and 2 disk writes for each write-operation. Thus, in RAID-5, the write-penalty = 4. Figure 1-20: Write penalty in RAID

34 Application IOPS and RAID Implementations Input Output per Second (IOPS) refers to number of reads and writes performed per second. When deciding no. of disks for application, it is important to consider impact of RAID based on IOPS. The total disk-load depend on 1) Type of RAID-implementation (RAID-0, RAID-1 or RAID 3) and 2) Ratio of read compared to write from the host. The following example illustrates the method of computing the disk-load in different types of RAID. Consider an application that generates 5,200 IOPS, with 60% of them being reads. Case 1: RAID-5 The disk-load is calculated as follows: Disk-load = 0.6 5, (0.4 5,200) [because the write-penalty for RAID-5 is 4] = 3, ,080 = 3, ,320 = 11,440 IOPS Case 2: RAID-1 The disk-load is calculated as follows: Disk-load = 0.6 5, (0.4 5,200) ['.' every write results as 2 writes to disks] = 3, ,080 = 3, ,160 = 7,280 IOPS The disk-load determines the number of disks required for the application. If disk has a maximum 180 IOPS for the application, then number of disks required is as follows: RAID-5: 11,440 / 180 = 64 disks RAID-1: 7,280 / 180 = 42 disks (approximated to the nearest even number) 1-34

35 1.11 RAID Comparison Table 1.2: Comparison of different RAID Types 1-35

MODULE 1(CONT.): INTELLIGENT STORAGE SYSTEM 1.12 Components of an Intelligent Storage System (ISS) ISS is a feature-rich RAID-array that provides highly optimized I/O-processing capabilities.

36 MODULE 1(CONT.): INTELLIGENT STORAGE SYSTEM 1.12 Components of an Intelligent Storage System (ISS) ISS is a feature-rich RAID-array that provides highly optimized I/O-processing capabilities. To improve the performance, storage-device provides large amount of cache multiple paths (storage-system --> storage-device) It handles the management, allocation, and utilization of storage-capacity. A storage-device consists of 4 components (Figure 1-21): 1) Front-end 2) Cache 3) Back-end and 4) Physical-disk (or simply the disk). A RW-request is used for reading and writing of data from the disk. 1) Firstly, a read-request is placed at the host. 2) Then, the read-request is passed to front-end, then to cache and then to back-end. 3) Finally, the read-request is passed to disk. A read-request can be serviced directly from cache if the requested-data is available in cache. Figure 1-21: Components of an intelligent storage system Front End Front-end provides the interface between host and storage. It consists of 2 components: 1) front-end port and 2) front-end controller. 1) Front-End Port Front-end port is used to connect the host to the storage. Each port has processing-logic that executes appropriate transport-protocol for storage-connections Transport-protocol includes SCSI, FC, iscsi and FCoE. Extra-ports are provided to improve availability. 2) Front-End Port Front-end port receives and processes I/O-requests from the host and communicates with cache. When cache receives write-data, controller sends an acknowledgment back to the host. The controller optimizes I/O-processing by using command queuing algorithms. 1-36

1.12.2 Cache Cache is a semiconductor-memory.

37 Cache Cache is a semiconductor-memory. Advantages: 1) Data is placed temporarily in cache to reduce time required to service I/O-requests from host For example: Reading data from cache takes less time when compared to reading data directly from disk (Analogy: Travelling from Chikmagalur to Hassan takes less time when compared to travelling from Dharwad to Hassan. Thus, we have Host = Hassan Cache = Chikmagalur Disk = Dharwad). 2) Performance is improved by separating hosts from mechanical-delays associated with disks. Rotating-disks are slowest components of a storage. This is '.' of seek-time & rotational-latency Structure of Cache A cache is partitioned into number of pages. A page is a smallest-unit of cache-memory which can be allocated (say 1 KB). The size of a page is determined based on the application's I/O-size. Figure 1-22: Structure of cache Cache consists of 2 main components (Figure 1-22): 1) Data Store Data-store is used to hold the data-transferred between host and disk. 2) Tag RAM Tag-RAM is used to track the location of the data in data-store and disk. It indicates where data is found in cache and where the data belongs on the disk. It also consists of i) dirty-bit flag ii) Last-access time i) Dirty-bit flag indicates whether the data in cache has been committed to the disk or not. i.e. 1 --> committed (means data copied successfully from cache to disk) 0 --> not committed ii) Last-access time is used to identify cached-info that has not been accessed for a long-time Thus, data can be removed from cache and the memory can be de-allocated. 1-37

1.12.2.2 Read Operation with Cache When host issues a read-request, the controller checks whether requested-data is available in cache A read-operation can be implemented in 3 ways: 1) Read-Hit 2)

38 Read Operation with Cache When host issues a read-request, the controller checks whether requested-data is available in cache A read-operation can be implemented in 3 ways: 1) Read-Hit 2) Read-Miss & 3) Read-Ahead. 1) Read-Hit Here is how it works: 1) A read-request is sent from the host to cache. If requested-data is available in cache, it is called a read-hit. 2) Then, immediately the data is sent from cache to host. (Figure 1-23[a]). Advantage: 1) Provides better response-time. This is because the read-operations are separated from the mechanical-delays of the disk. 2) Read-Miss Here is how it works: 1) A read-request is sent from the host to cache. If the requested-data is not available in cache, it is called a read-miss. 2) Then, the read-request is forwarded from the cache to disk. Now, the requested-data is read from the disk (Figure 1-23[b]). For this, the back-end controller selects the appropriate disk and retrieves the requested-data from the disk. 3) Then, the data is sent from disk to cache. 4) Finally, the data is forwarded from cache to host. Disadvantage: 1) Provides longer response-time. This is because of the disk-operations. Figure 1-23: Read hit and read miss 1-38

39 3) Pre-Fetch (or Read-Ahead) A pre-fetch algorithm can be used when read-requests are sequential. Here is how it works: 1) In advance, a continuous-set of data-blocks will be read from the disk and placed into cache. 2) When host subsequently requests the blocks, data is immediately sent from cache to host. Advantage: 1) Provides better response-time. The size of prefetch-data can be i) fixed or ii) variable. i) Fixed Pre-Fetch The storage-device pre-fetches a fixed amount of data. (say 1*10 KB =10 KB). It is most suitable when I/O-sizes are uniform. ii) Variable Pre-Fetch The storage-device pre-fetches an amount of data in multiples of size of host-request. (say 4*10 KB =40 KB) Read-Hit-Ratio Read-performance is measured in terms of the read-hit-ratio (or simply hit-ratio). hit-ratio = number of read-hits number of read-requests A higher hit-ratio means better read-performance. 1-39

1.12.2.3 Write Operation with Cache Write-operation (Figure 1-24): Writing data to cache provides better performance when compared to writing data directly to disk.

This is because many smaller write-operations can be combined to provide larger data-transfer to disk via cache Figure 1-24: Write-back Cache and Write-through Cache A write-operation can be

40 Write Operation with Cache Write-operation (Figure 1-24): Writing data to cache provides better performance when compared to writing data directly to disk. In other words, writing data to cache takes less time when compared to writing data directly to disk. Advantage: Sequential write-operations allow optimization. This is because many smaller write-operations can be combined to provide larger data-transfer to disk via cache Figure 1-24: Write-back Cache and Write-through Cache A write-operation can be implemented in 2 ways: 1) Write-back Cache & 2) Write-through Cache 1) Write Back Cache Here is how it works: 1) Firstly, a data is placed in the cache. 2) Then, immediately an acknowledgment is sent from cache to host. 3) Later after some time, the data is forwarded from cache to disk. 4) Finally, an acknowledgment is sent from disk to cache. Advantage: 1) Provides better response-time. This is because the write-operations are separated from the mechanical-delays of the disk. Disadvantage: 1) In case of cache-failure, there may be risk-of-loss of uncommitted-data. 2) Write Through Cache Here is how it works: 1) Firstly, a data is placed in the cache. 2) Then, immediately the data is forwarded from cache to disk. 3) Then, an acknowledgment is sent from disk to cache. 4) Finally, the acknowledgment is forwarded from cache to host. Advantage: 1) Risk-of-loss is low. This is '.' data is copied from cache to disk as soon as it arrives. Disadvantage: 1) Provides longer response-time. This is because of the disk-operations. Write Aside Size Write-aside-size refers to maximum-size of I/O-request that can be handled by the cache. If size of I/O-request exceeds write-aside-size, then data is written directly to disk bypassing cache Advantage: Suitable for applications where cache-capacity is limited and cache is used for small random-requests 1-40

1.12.2.4 Cache Implementation Cache can be implemented as either 1) dedicated-cache or 2) global-cache.

41 Cache Implementation Cache can be implemented as either 1) dedicated-cache or 2) global-cache. 1) In dedicated-cache, separate set of memory-locations are reserved for read and write-operations 2) In global-cache, same set of memory-locations can be used for both read- and write-operations. Advantages of global-cache: 1) Global-cache is more efficient when compared to dedicated-cache. This is because only one global-set of memory-locations has to be managed. 2) The user can specify the percentage of cache-capacity used for read- and write-operation. (For example: 70% for read and 30% for write) Cache Management Cache is a finite and expensive resource that needs proper management. When all cache-pages are filled, some pages have to be freed-up to accommodate new data. Two cache-management algorithms are: 1) Least Recently Used (LRU) Working principle: Replace the page that has not been used for the longest period of time. Based on the assumption: data which hasn t been accessed for a while will not be requested by the host. 2) Most Recently Used (MRU) Working principle: Replace the page that has been accessed most recently. Based on the assumption: recently accessed data may not be required for a while. As cache fills, storage-device must take action to flush dirty-pages to manage availability. A dirty-page refers to data written into the cache but not yet written to the disk. Flushing is the process of committing data from cache to disk. Based on access-rate and -pattern of I/O, watermarks are set in cache to manage flushing process. Watermarks can be set to either high or low level of cache-utilization. 1) High watermark (HWM) The point at which the storage-device starts high-speed flushing of cache-data. 2) Low watermark (LWM) The point at which storage-device stops high-speed flushing & returns to idle flush behavior. Figure 1-25: Types of flushing The cache-utilization level drives the mode of flushing to be used (Figure 1-25): 1) Idle Flushing occurs at a modest-rate when the level is between the high and low watermarks 2) High Watermark Flushing occurs when the cache utilization level hits the high watermark. Disadvantage: 1) The storage-device dedicates some additional resources to flushing. Advantage: 1) This type of flushing has minimal impact on host. 3) Forced Flushing occurs in the event of a large I/O-burst when cache reaches 100% of its capacity. Disadvantage: 1) Affects the response-time. Advantage: 1) The dirty-pages are forcibly flushed to disk. 1-41

42 Cache Data Protection Cache is volatile-memory, so cache-failure will cause the loss-of-data not yet committed to the disk. This problem can be solved in various ways: 1) Powering the memory with a battery until AC power is restored or 2) Using battery-power to write the cached-information to the disk. This problem can also be solved using following 2 techniques: 1) Cache Mirroring 2) Cache Vaulting 1) Cache Mirroring i) Write Operation Each write to cache is held in 2 different memory-locations on 2 independent memory-cards. In case of cache-failure, the data will be still safe in the surviving-disk. Hence, the data can be committed to the disk. ii) Read Operation A data is read to the cache from the disk. In case of cache-failure, the data will be still safe in the disk. Hence, the data can be read from the disk. Advantage: 1) As only write-operations are mirrored, this method results in better utilization of available cache Disadvantage: The problem of cache-coherency is introduced. Cache-coherency means data in 2 different cache-locations must be identical at all times. 2) Cache Vaulting It is process of dumping contents of cache into a dedicated disk during a power-failure. A disk used to dump the contents of cache are called vault-disk. Write Operation When power is restored, data from vault-disk is written back to the cache and then data is written to the intended-disks Back End The back-end provides an interface between cache and disk. controls data-transfer between cache and disk. Write operation: From cache, data is sent to the back-end and then forwarded to the destination-disk. It consists of 2 components: 1) back-end ports and 2) back-end controllers. 1) Back End Ports Back End Ports is used to connect the disk to the cache. 2) Back End Controllers Back End Controllers is used to route data to and from cache via internal data-bus. The controller communicates with the disks when performing read- and write-operations and provides small temporary data-storage. The controllers provides error-detection and -correction (e.g. parity) RAID-functionality. Dual Controller To improve availability, storage-device can be configured with dual-controllers with multiple-ports In case of a port-failure, controller provides an alternative path to disks. Advantage: 1) Dual-controllers also facilitate load-balancing. Dual Port Disk The availability can be further improved if the disks are also dual-ported. In this case, each disk-port can be connected to a separate controller. 1-42

43 Physical Disk A disk is used to store data persistently for future-use. Disks are connected to the back-end using SCSI or FC. Modern storage-devices provide support for different type of disks with different speeds. Different type of disks are: FC, SATA, SAS and flash drives (pen drive). It also supports the use of a combination of flash, FC, or SATA. 1-43

1.13 Storage Provisioning It is process of assigning storage-capacity to hosts based on performance-requirements of the hosts. It can be implemented in two ways: 1) traditional and 2) virtual. 1.13.1 Traditional Storage Provisioning 1.

44 1.13 Storage Provisioning It is process of assigning storage-capacity to hosts based on performance-requirements of the hosts. It can be implemented in two ways: 1) traditional and 2) virtual Traditional Storage Provisioning Logical Unit (LUN) The available capacity of RAID-set is partitioned into volumes known as logical-units (LUNs). The logical-units are assigned to the host based on their storage-requirements. For example (Figure 1-26) LUNs 0 and 1 are used by hosts 1 and 2 for accessing the data. LUNs are spread across all the disks that belong to that set. Each logical-unit is assigned a unique ID called a logical-unit number (LUN#). Advantages: 1) LUNs hide the organization and composition of the set from the hosts. 2) The use of LUNs improves disk-utilization. For example, i) Without using LUNs, a host requiring only 200 GB will be allocated an entire 1 TB disk. ii) With using LUNs, only the required 200 GB will be allocated to the host. This allows the remaining 800 GB to be allocated to other hosts. Figure 1-26: Logical-unit number LUN Expansion: MetaLUN MetaLUN is a method to expand logical-units that require additional capacity or performance. It can be created by combining two or more logical-units (LUNs). It consists of i) base-lun and ii) one or more component-luns. It can be either concatenated or striped (Figure 1-27). 1) Concatenated MetaLUN The expansion adds additional capacity to the base-lun. The component-luns need not have the same capacity as the base-lun. All LUNs must be either protected (parity or mirrored) or unprotected (RAID 0). For example, a RAID-0 LUN can be concatenated with a RAID-5 LUN. Advantage: 1) The expansion is quick. Disadvantage: 1) Does not provide any performance-benefit. 2) Striped MetaLUN The expansion restripes the data across the base-lun and component-luns. All LUNs must have same capacity and same RAID-level. Advantage: 1) Expansion provides improved performance due to the increased no. of disks being striped 1-44

Figure 1-27: LUN Expansion Advantages of traditional storage-provisioning: 1) Suitable for applications that require predictable performance. 2) Provides full control for precise data-placement.

2 Virtual Storage Provisioning Virtual-provisioning uses virtualization technology for providing storage for applications.

45 Figure 1-27: LUN Expansion Advantages of traditional storage-provisioning: 1) Suitable for applications that require predictable performance. 2) Provides full control for precise data-placement. 3) Allows admins to create logical-units on different RAID-groups if there is any workload-contention Virtual Storage Provisioning Virtual-provisioning uses virtualization technology for providing storage for applications. Logical-units created using virtual-provisioning is called thin-lun to distinguish from traditional LUN. A host need not be completely allocated a storage when thin-lun is created. Storage is allocated to the host on-demand from a shared-pool. A shared-pool refers to a group of disks. Shared-pool can be homogeneous (containing a single drive type) or heterogeneous (containing mixed drive types, such as flash, FC, SAS, and SATA drives). Advantages: 1) Suitable for applications where space-consumption is difficult to forecast. 2) Improves utilization of storage-space. 3) Simplifies storage-management. 4) Enables oversubscription. Here, more capacity is presented to the hosts than actually available on the storage-array 5) Scalable: Both shared-pool and thin-lun can be expanded, as storage-requirements of the hosts grow 6) Sharing: Multiple shared-pools can be created within a storage-array. A shared-pool may be shared by multiple thin LUNs. 7) Companies save the costs associated with acquisition of new storage. 1-45

1.14 Types of Intelligent Storage System Storage-devices can be classified into 2 types: 1) High-end storage-system 2) Midrange storage-system Figure 1-28: Active-active configuration Figure 1-29:

46 1.14 Types of Intelligent Storage System Storage-devices can be classified into 2 types: 1) High-end storage-system 2) Midrange storage-system Figure 1-28: Active-active configuration Figure 1-29: Active-passive configuration High End Storage System High-end storage-device is also known as active-active array. It is suitable for large companies for centralizing corporate-data. (e.g. Big Bazaar) An active-active array implies that the host can transfer the data to its logical-units using any of the available paths (Figure 1-28) It provides the following capabilities: 1) Large number of controllers and cache. 2) Multiple front-end ports to serve a large number of hosts. 3) Multiple back-end controllers to perform disk-operations optimally. The controller includes FC and SCSI RAID. 4) Large storage-capacity. 5) Large cache-capacity to service host's requests optimally. 6) Mirroring technique to improve data-availability. 7) Interoperability: Connectivity to mainframe-computers and open-systems hosts. 8) Scalability to support following requirements: increased connectivity increased performance and increased storage-capacity 9) Ability to handle large amount of concurrent-requests from a no. of servers and applications. 10) Support for array-based local- and remote-replication. 11) Suitable for mission-critical application (like military) Midrange Storage System Midrange storage-device is also referred to as active-passive array. It is suitable for small- and medium-sized companies for centralizing corporate-data. It is designed with 2 controllers. Each controller contains host-interfaces cache RAID-controllers, and interface to disks. 1-46

47 It provides the following capabilities: 1) Small storage-capacity 2) Small cache-capacity to service host-requests 3) Provides fewer front-end ports to serve a small number of hosts. 4) Ensures high-availability and high-performance for applications with predictable workloads. 5) Supports array-based local- and remote-replication. 6) Provides optimal storage-solutions at a lower cost. In an active-passive array, 1) A host can transfer data to its logical-units only through the path owned by the controller. These paths are called active-paths. 2) The other paths are passive with respect to this logical-units. These paths are called passive-paths. As shown in Figure 1-29, The host can transfer the data to its LUNs only through the path owned by the controller-a. This is because controller-a is the owner of this LUN. The path to controller-b remains passive and no data-transfer is performed through this path. 1-47

48 MODULE 2: STORAGE AREA NETWORKS 2.1 Business Needs and Technology Challenges 2.2 SAN 2.3 Fibre-Channel: Overview 2.4 Components of SAN Node-Ports Cabling Connector Interconnect-Devices Storage-Arrays Management-Software 2.5 FC Connectivity Point-to-Point FC-AL (Fibre-Channel Arbitrated Loop) FC-SW (Fibre-Channel Switched-Fabric) FC-SW Transmission 2.6 Fibre-Channel Ports 2.7 Fibre-Channel Architecture Fibre-Channel Protocol Stack Fibre-Channel Addressing FC-Address WWN (World Wide Name) FC Frame Structure and Organization of FC Data Flow-Control Classes of Service 2.8 Fabric Services 2.9 Zoning Types of Zoning 2.10 Fibre-Channel Login Types 2.11 FC Topologies Core-Edge Fabric Benefits and Limitations of Core-Edge Fabric Mesh Topology 2-1

49 MODULE 2: STORAGE AREA NETWORKS 2.1 Business Needs and Technology Challenges Companies are experiencing an explosive growth in information. This information needs to be stored, protected, optimized, and managed efficiently. Challenging task for data-center managers: Providing low-cost, high-performance information-management-solution (ISM). ISM must provide the following functions: 1) Just-in-time information to users Information must be available to users when they need it. Following key challenges must be addressed: explosive growth in online-storage creation of new servers and applications spread of mission-critical data throughout the company and demand for 24 7 data-availability 2) Integration of information infrastructure with business-processes Storage-infrastructure must be integrated with business-processes w/o compromising on security 3) Flexible and resilient storage architecture Storage-infrastructure must provide flexibility that aligns with changing business-requirements. Storage should scale without compromising performance requirements of the applications. At the same time, the total cost of managing information must be low. Direct-attached storage (DAS) is often referred to as a stovepiped storage environment. Problem with DAS: 1) Hosts own the storage. Hence, it is difficult to manage and share resources on these separated storage-devices. Solution: 1) Efforts to organize this dispersed data led to the emergence of the storage area network (SAN). 2-2

50 2.2 SAN SAN is a high-speed, dedicated network of servers and shared storage devices (Figure 2-1). SAN enables storage consolidation (servers --> hosts) enables storage to be shared across multiple-servers enables companies to connect geographically dispersed servers and storage. Advantages: 1) Improves the utilization of storage resources compared to DAS architecture. 2) Reduces the total amount of storage an organization needs to purchase and manage. 3) Storage management becomes centralized and less complex. 4) Reduces the cost of managing information. 5) Provides effective maintenance and protection of data. 6) Meets the storage demands efficiently with better economies of scale Common SAN deployments are 1) Fibre Channel (FC) SAN and 2) IP SAN. 1) FibreChannel SAN uses Fibre Channel protocol for the transport of data, commands, and status information between servers and storage devices. 2) IP SAN uses IP-based protocols for communication. Figure 2-1: SAN implementation 2.3 Fibre-Channel: Overview The FC architecture forms the fundamental construct of the SAN-infrastructure. Fibre-channel is a high-speed network technology that runs on i) high-speed optical-fiber cable and ii) serial copper cable. Normally, optical-fiber cable is preferred for front-end SAN connectivity. Serial copper cable is preferred for back-end disk connectivity. Advantages: 1) Developed to increase speeds of data-transmission b/w servers & storage devices. 2) Credit-based flow control mechanism delivers data as fast as the destination buffer is able to receive it, without dropping frames. 3) Has very little transmission overhead. 4) Highly scalable: A single FC network can accommodate approximately 15 million devices. 2-3

2.4 Components of SAN A SAN consists of 5 basic components: 1) Node-ports 2) Cabling 3) Interconnecting-devices (such as FC-switches or hubs) 4) Storage-arrays and 5) Management-software. 2.4.1 Node-Ports Nodes refer to devices such as host, storage and tape libraries.

51 2.4 Components of SAN A SAN consists of 5 basic components: 1) Node-ports 2) Cabling 3) Interconnecting-devices (such as FC-switches or hubs) 4) Storage-arrays and 5) Management-software Node-Ports Nodes refer to devices such as host, storage and tape libraries. (Figure 2-2) Each node is a source or destination of information. Each node has ports to provide a physical-interface for communicating with other nodes. The ports are integral components of an HBA and the storage front-end controllers. A port operates in full-duplex mode which has i) transmit (Tx) link and ii) receive (Rx) link. Figure 2-2: Nodes, ports, and links Cabling For cabling, both optical-cable and copper-wire is used. 1) Optical-fiber is used for long distances. 2) Copper is used for shorter distances. This is because copper provides a better SNR for distances up to 30 meters. Optical-cable carry data in the form of light. (SNR --> signal-to-noise ratio) Two types of optical-cables: 1) multi-mode and 2) single-mode. Figure 2-3: Multi-mode fiber and single-mode fiber 1) Multi-Mode Fiber (MMF) The cable carries multiple beams of light projected at different angles simultaneously onto the core of the cable. Based on the bandwidth, this is classified as OM1 (62.5μm) OM2 (50μm) and laser optimized OM3 (50μm). 2-4

52 Advantage: 1) Used within data centers for shorter distances. Disadvantages: 1) Modal-Dispersion Multiple light beams traveling inside the cable tend to disperse and collide (Figure 2-3 (a)). This collision weakens the signal strength. This process is known as modal-dispersion. 2) Attenuation An MMF cable is typically used for short distances. This is because of signal degradation (attenuation) due to modal-dispersion. 2) Single-Mode Fiber (SMF) The cable carries a single ray of light projected at the center of the core (Figure 2-3 (b)). The cables are available in diameters of 7 11 microns. The most common size is 9 microns. Advantages: 1) The small core and the single light wave limits modal-dispersion. 2) Provides minimum signal attenuation over maximum distance (up to 10 km). 3) Used for longer distances. The distance depends on i) power of the laser at the transmitter and ii) sensitivity of the receiver. Figure 2-4: SC, LC, and ST connectors Connector A connector is attached at the end of a cable to enable swift connection and disconnection of the cable to and from a port. Three commonly used connectors (Figure 2-4): 1) Standard Connector (SC) An SC is used for data-transmission speeds up to 1 Gbps. 2) Lucent Connector (LC) An LC is used for data-transmission speeds up to 4 Gbps. 3) Straight Tip (ST) An ST is used with Fibre patch panels. 2-5

53 2.4.3 Interconnect-Devices Three commonly used interconnect-devices are hubs, switches, and directors. i) Hub Hub is used as interconnect-device in FC-AL implementations. It is used to connect nodes in a star-topology. All the nodes must share the bandwidth because data travels through all the connection-points. ii) Switch Switch is more intelligent than hub. It is used to directly route data from one physical-port to another Advantage: 1) Low cost 2) High performance 3) Each node has a dedicated path. This results in bandwidth aggregation. Switches are available with a fixed port count or with modular design. In a modular switch, port count is increased by installing additional port cards to open slots. iii) Director Director is high-end switch with higher port count and better fault tolerance capabilities. It is larger than switch. It is deployed for data center implementations. In modular director, port count is increased by installing additional line cards to the director s chassis High-end directors and switches contain redundant components to provide high availability. Both directors and switches have management-ports for connectivity to management-servers Storage-Arrays The fundamental purpose of a SAN is to provide host-access to storage-resources. Modern storage-arrays are used for storage-consolidation and -centralization. Storage-array provides high availability and redundancy improved performance business continuity and multiple host connectivity Management-Software Management-software manages the interfaces between 1) Hosts 2) Interconnect-devices, and 3) Storage-arrays. It provides a view of the SAN environment. It enables management of various resources from one central console. It provides key functions such as 1) mapping of storage-devices, switches, and servers 2) monitoring and generating alerts for discovered devices, and 3) logical partitioning of the SAN called zoning. 2-6

54 2.5 FC Connectivity The FC architecture supports 3 basic interconnectivity options: 1) Point-to-point 2) Arbitrated loop (FC-AL) and 3) Fabric-connect (FC-SW) Point-to-Point Two devices are connected directly to each other (Figure 2-5). Advantage: 1) Provides a dedicated-connection for data-transmission between nodes. Disadvantages: 1) Provides limited connectivity, '.' only 2 devices can communicate with each other at given time 2) Not Scalable: Cannot be scaled to accommodate a large number of network-devices. Standard DAS uses point-to-point connectivity. Figure 2-5: Point-to-point topology 2-7

55 2.5.2 FC-AL (Fibre-Channel Arbitrated Loop) Devices are attached to a shared loop (Figure 2-6). FC-AL has the characteristics of 1) token ring topology and 2) physical star-topology. Each device competes with other devices to perform I/O operations. All devices must compete to gain control of the loop. At any time, only one device can perform I/O operations on the loop. Two implementations: 1) FC-AL can be implemented without any interconnecting-devices. i.e. Devices are directly connected to one another in a ring through cables. 2) FC-AL can be implemented using hubs where the arbitrated loop is physically connected in a star topology. Figure 2-76: Fibre Channel arbitrated loop Disadvantages: 1) Low Performance Since all devices share bandwidth in the loop, only one device can perform I/O operation at a time. Hence, other devices have to wait to perform I/O operations. 2) Addressing Since 8-bit addressing is used, only up to 127 devices can be supported on a loop. 3) Not Scalable Adding a device results in loop re-initialization. This causes a momentary pause in loop traffic. 2-8

56 2.5.3 FC-SW (Fibre-Channel Switched-Fabric) FC-SW is also referred to as fabric-connect. A fabric is a logical-space in which all nodes communicate with one another in a network (Figure 2-7) The logical-space can be created with a switch or a network-of-switches. In a fabric, i) each switch contains a unique domain-identifier, which is part of the FC-address. ii) each port has a unique 24-bit FC-address for communication. iii) ISL refers to a link used to connect any two switches. ISL is used to transfer data between host and storage (ISL --> Inter-Switch-Link) fabric-management-information between 2 switches. ISL enables switches to be connected together to form a single-larger fabric. Advantages: 1) Provides a dedicated-bandwidth for data-transmission between nodes. 2) Scalable New devices can be added without interrupting normal-operations. Figure 2-7: Fibre Channel switched fabric A fabric can have many tiers (Figure 2-8). When no. of tiers increases, the distance traveled by message to reach each switch also increases. As the distance increases, the time taken to propagate the message also increases. The message may include fabric-reconfiguration event such as the addition of a new switc or zone-set propagation event. Figure 2-8 illustrates two-tier and three-tier fabric architecture.(tiers --> levels) Figure 2-8: Tiered structure of FC-SW topology 2-9

2.5.3.1 FC-SW Transmission FC-SW uses switches that are intelligent-devices. Switch can be used to route data traffic between nodes directly through ports.

57 FC-SW Transmission FC-SW uses switches that are intelligent-devices. Switch can be used to route data traffic between nodes directly through ports. Fabric can be used to route the frames between source and destination. For example (Figure 2-9): If node-b wants to communicate with node-d, then i) node-b must login first and ii) Then, node-b must transmit data via the FC-SW. This link is considered a dedicated-connection b/w initiator (node-b) and target (node-d). Figure 2-9: Data tansmission in FC-SW topology 2-10

58 2.6 Fibre-Channel Ports Ports are the basic building blocks of an FC-network. Ports on the switch can be one of the following types (Figure 2-10): 1) N_Port (node-port) N_port is an end-point in the fabric. It is used to connect to a switch in a fabric. It can be host port (HBA) or ` storage-array port 2) E_Port (expansion port) E_port is used to setup connection between two FC-switches. The E_Port on an FC-switch is connected to the E_Port of another FC-switch. 3) F_Port (fabric port) It is used to connect to a node in the fabric. It cannot participate in FC-AL. 4) G_Port (generic port) G_port can operate as an E_port or an F_port. It can determine its functionality automatically during initialization. Figure 2-10: Fibre channel ports 2-11

2.7 Fibre-Channel Architecture The FC architecture represents true network integration with standard interconnecting-devices. FC is used for making connections in SAN.

59 2.7 Fibre-Channel Architecture The FC architecture represents true network integration with standard interconnecting-devices. FC is used for making connections in SAN. Channel technologies provide high levels of performance (FCP --> Fibre-channel Protocol) low protocol overheads. Such performance is due to the static nature of channels and the high level of hardware and software integration provided by the channel technologies. FCP is the implementation of serial SCSI-3 over an FC-network. All external and remote storage-devices attached to the SAN appear as local devices to the host OS. The key advantages of FCP are as follows: 1) Sustained transmission bandwidth over long distances. 2) Support for a larger number of addressable devices over a network. Theoretically, FC can support over 15 million device addresses on a network. 3) Exhibits the characteristics of channel transport and provides speeds up to 8.5 Gb/s (8 GFC) Fibre-Channel Protocol Stack A communication-protocol can be understood by viewing it as a structure of independent-layers. FCP defines the protocol in 5 layers (Figure 2-11): 1) FC-0 Physical-interface 2) FC-1 Transmission Protocol 3) FC-2 Transport Layer 4) FC-3 (FC-3 layer is not implemented). 5) FC-4 Upper Layer Protocol In a layered-model, the peer-layers on each node talk to each other through defined-protocols. Figure 2-11: Fibre channel protocol stack FC-4 Upper Layer Protocol FC-4 is the uppermost layer in the stack. This layer defines application interfaces and how ULP is mapped to the lower FC-layers. The FC standard defines several protocols that can operate on the FC-4 layer. For example: SCSI HIPPI Framing Protocol Enterprise Storage Connectivity (ESCON) ATM IP (ULP --> Upper Layer Protocol) 2-12

60 FC-2 Transport Layer This layer contains 1) payload 2) addresses of the source and destination ports and 3) link-control-information. This layer provides FC-addressing structure & organization of data (frames, sequences, and exchanges). This layer also defines fabric-services classes of service (1, 2 or 3) flow-control and routing. FC-1 Transmission Protocol This layer defines how data is encoded prior to transmission and decoded upon receipt. Here is how encoding and decoding is done: 1) At the transmitter-node, i) FC-1 layer encodes an 8-bit character into a 10-bit transmissions character. ii) Then, this 10-bit character is transmitted to the receiver-node. 2) At the receiver-node, i) 10-bit character is passed to the FC-1 layer. ii) Then, FC-1 layer decodes the 10-bit character into the original 8-bit character. This layer also defines the transmission words such as i) FC frame delimiters which identify the start and end of a frame and ii) primitive signals that indicate events at a transmitting port. This layer also performs link initialization and error recovery FC-0 Physical-Interface FC-0 is the lowest layer in the stack. This layer defines physical-interface transmission medium used (e.g. copper-wire or optical-fiber ) transmission of raw bits. The specification includes i) cables ii) connectors (such as SC, LC) and iii) optical and electrical parameters for different data rates. The transmission can use both electrical and optical media. 2-13

2.7.2 Fibre-Channel Addressing 2.7.2.1 FC-Address An FC-address is dynamically assigned when a port logs on to the fabric.

61 2.7.2 Fibre-Channel Addressing FC-Address An FC-address is dynamically assigned when a port logs on to the fabric. Various fields in FC-address are (Figure 2-12): 1) Domain ID This field contains the domain ID of the switch A Domain ID is a unique number provided to each switch in the fabric. Out of the possible 256 domain IDs, i) 239 are available for use; ii) remaining 17 addresses are reserved for fabric management services For example, FFFFFC is reserved for the name-server FFFFFE is reserved for the fabric login service. 2) Area ID This field is used to identify a group of ports used for connecting nodes. An example of a group of ports with common area ID is a port card on the switch. 3) Port ID This field is used to identify the port within the group. The maximum possible number of node ports in a fabric is calculated as 239 domains 256 areas 256 ports = 15,663,104. Figure 2-12: 24-bit FC address of N_port WWN (World Wide Name) WWN refers to a 64-bit unique identifier assigned to each device in the FC-network. Two types of WWNs are used (Figure 2-13): i) WWNN (World Wide Node Name) and (MAC --> Media Access Control) ii) WWPN (World Wide Port Name). FC-address vs. WWN ii) FC-address is a dynamic name for each device on an FC-network. ii) WWN is a static name for each device on an FC-network. WWN is similar to the MAC-address used in IP-network. WWN is burned into the hardware or assigned through software. Normally, WWN is used for identifying storage-device and HBA. The name-server are used to store the mapping of WWNs to FC-addresses for nodes. Figure 2-13 illustrates the WWN structure for an array and the HBA. Figure 2-13: World Wide Names 2-14

62 2.7.3 FC Frame An FC frame consists of five parts (Figure 2-14): 1) SOF (Start of frame) 2) Frame-header 3) Data field 4) CRC 5) EOF (End of frame). 1) SOF (Start of frame) This field acts as a delimiter. In addition, this field acts as a flag that indicates whether the frame is the first frame in a sequence. 2) Data field This field is used to carry the payload. 3) EOF (End of frame) This field also acts as a delimiter. 4) Frame-Header This field contains addressing information for the frame. Length of frame-header - 24 bytes This field includes the following information: Source ID (S_ID) & Destination ID (D_ID) Sequence ID (SEQ_ID) & Sequence Count (SEQ_CNT) Originating Exchange ID (OX_ID) & Responder Exchange ID (RX_ID) and Some other control-fields. Figure 2-14: FC frame The frame-header also defines the following fields: i) Routing Control (R_CTL) This field indicates whether the frame is a link-control-frame or a data-frame. Link-control-frames are non-data-frames that do not carry any payload. Data-frame vs. Control-frame Data-frames are used carry the payload and thus perform data-transmission. On the other hand, control-frames are used for setup and messaging. ii) Class Specific Control (CS_CTL) This field specifies link-speeds for class-1 and class-4 data-transmission. iii) TYPE This field serves 2 purpose: i) If the frame is a data-frame, this field describes the ULP to be carried on the frame. The ULP can be SCSI, IP or ATM. For example, If the frame is a data-frame and TYPE=8, then SCSI will be carried on the frame. ii) If the frame is a control-frame, this field is used to signal an event such as fabric-busy. iv) Data Field Control (DF_CTL) This field indicates the presence of any optional-headers at the beginning of the data-payload. It is a mechanism to extend header-information into the payload. v) Frame Control (F_CTL) This field contains control-information related to frame-content. For example, One of the bits in this field indicates whether this is the first sequence of the exchange. 2-15

63 Structure and Organization of FC Data In an FC-network, data transport is analogous to a conversation between two people, wherein frame represents a word sequence represents a sentence and exchange represents a conversation. 1) Exchange Operation This enables two node ports to identify and manage a set of information-units. Each ULP has its protocol-specific information that must be sent to another port. This protocol-specific information is called an information unit. The information-unit maps to a sequence. An exchange consists of one or more sequences. 2) Sequence It refers to a contiguous set of frames that are sent from one port to another. It corresponds to an information-unit, as defined by the ULP. 3) Frame It is the fundamental unit of data-transfer at Layer 2. Each frame can contain up to 2112 bytes of payload Flow-Control Flow-control defines the pace of the flow of data-frames during data-transmission. Two flow-control mechanisms: 1) BB_Credit and 2) EE_Credit. 1) BB_Credit (buffer-to-buffer credit) It is used for hardware-based flow-control. It controls the maximum number of frames that can be present over the link at any time. BB_Credit management may take place between any two ports. The transmitting-port maintains a count of free receiver buffers and continues to send frames if the count is greater than 0. It provides frame acknowledgment through the Receiver Ready (R_RDY) primitive. 2) EE_Credit (end-to-end credit) The function of EE_Credit is similar to that of BB_ Credit. When an initiator and a target establish themselves as nodes communicating with each other, they exchange the EE_Credit parameters (part of Port Login). The EE_Credit mechanism affects the flow-control for class 1 and class 2 traffic only Classes of Service Three different classes of service are defined to meet the requirements of different applications. The table below shows three classes of services and their features (Table 2-1). Table 2-1: FC Class of Services Class F is another class of services. Class F is intended for use by the switches communicating through ISLs. Class F is similar to Class 2 because Class F provides notification of non-delivery of frames. Other defined Classes 4, 5, and 6 are used for specific applications. 2-16

64 2.8 Fabric Services FC switches provide a common set of services: 1) Fabric Login Server 2) Fabric Controller 3) Name Server, and 4) Management Server. 1) Fabric Login Server It is used during the initial part of the node s fabric login process. It is located at the predefined address of FFFFFE. 2) Name Server It is responsible for name registration and management of node ports. It is located at the predefined address FFFFFC and Each switch exchanges its Name Server information with other switches in the fabric to maintain a synchronized, distributed name service. 3) Fabric Controller It is responsible for managing and distributing Registered State Change Notifications (RSCNs) to the registered node ports. It also generates Switch Registered State Change Notifications (SW-RSCNs) to every other domain (switch) in the fabric. The RSCNs keep the name server up-to-date on all switches in the fabric. It is located at the predefined address FFFFFD. 4) Management Server It enables the FC SAN management software to retrieve information and administer the fabric. It is located at the predefined address FFFFFA 2-17

2.9 Zoning Zoning is an FC-switch function (Figure 2-15). Zoning enables nodes within the fabric to be logically segmented into groups, so that groups can communicate with each other.

65 2.9 Zoning Zoning is an FC-switch function (Figure 2-15). Zoning enables nodes within the fabric to be logically segmented into groups, so that groups can communicate with each other. A name-server contains FC-address & world wide-name of all devices in the network. A device can be host or storage-array. 1) When a device logs onto a fabric, it is registered with the name-server. 2) When a port logs onto the fabric, it goes through a device discovery-process. The zoning function controls discovery-process by allowing only the members in the same zone to establish these link-level services. A zoning process can be defined by the hierarchy of members, zones, and zone-sets. (Figure 2-16). 1) A member refers to a node within the SAN or a port within the switch 2) A zone refers to a set of members that have access to one another. 3) A zone-set refers to a set of zones. These zones can be activated or deactivated as a single entity in a fabric. Only one zone-set per fabric can be active at a time. Zone-sets are also referred to as zone configurations. Figure 2-15: Zoning Figure 2-16: Members, zones, and zone sets 2-18

66 2.9.1 Types of Zoning Zoning can be classified into 3 types (Figure 2-17): 1) Port Zoning Port zoning is also called hard zoning. It uses FC-addresses of the physical-ports to define zones. The access to data is determined by the switch-port to which a node is connected. The FC-address is dynamically assigned when the port logs onto the fabric. Therefore, any change in the fabric-configuration affects zoning. Advantage: This method is secure Disadvantage: Has to update zoning configuration information in case of fabric-reconfiguration. 2) WWN Zoning WWN zoning is also called soft zoning. It uses World Wide Names to define zones. Advantage: i) Its flexibility. ii) Scalable: allows the SAN to be recabled without reconfiguring the zone information. This is possible because the WWN is static to the node-port. 3) Mixed zoning It combines the qualities of both WWN zoning and port zoning. Using mixed zoning enables a specific port to be tied to the WWN of a node. Figure 2-17 shows the three types of zoning on an FC-network. Figure 2-17: Types of zoning 2-19

67 2.10 Fibre-Channel Login Types Fabric-services define 3 login-types: 1) Fabric Login (FLOGI) Fabric login is performed between N_port and F_port. To log on to the fabric, a node sends a FLOGI frame to the login service at the FC-address FFFFFE. The node also sends WWNN and WWPN parameters. Then, the switch accepts the login and returns an Accept (ACC) frame with the assigned FC-address for the node. Finally, the N_port registers itself with the local name-server on the switch. The registered data includes WWNN, WWPN, and assigned FC-address. (WWNN --> World Wide Node Name WWPN --> World Wide Port Name) After the N_Port has logged in, it can query the name server database for information about all other logged in ports. 2) Port Login (PLOGI) Port login is performed between two N_ports to establish a session. i) Firstly, The initiator N_port sends a PLOGI frame to the target N_port. ii) Then, the target N_port returns an ACC frame to the initiator N_port. iii) Finally, the N_ports exchange service-parameters relevant to the session. 3) Process Login (PRLI) Process login is also performed between two N_ports. This login is related to the FC-4 ULPs such as SCSI. N_ports exchange SCSI-3-related service-parameters. N_ports share information about FC-4 type in use SCSI initiator or SCSI target. 2-20

2.11 FC Topologies Fabric-design follows standard topologies to connect devices. Core-edge fabric is one of the popular topology designs. 2.11.1 Core-Edge Fabric There are two types of switch-tiers.

68 2.11 FC Topologies Fabric-design follows standard topologies to connect devices. Core-edge fabric is one of the popular topology designs Core-Edge Fabric There are two types of switch-tiers. 1) Edge-Tier The edge-tier consists of switches. Advantage: Offers an inexpensive approach to adding more hosts in a fabric. Each switch at the edge tier is attached to a switch at the core tier through ISLs. 2) Core-Tier The core-tier consists of directors that ensure high fabric availability. In addition, typically all traffic must either traverse this tier or terminate at this tier. All storage devices are connected to the core tier, enabling host-to-storage traffic to traverse only one ISL. Hosts that require high performance may be connected directly to the core tier and consequently avoid ISL delays. The edge-tier switches are not connected to each other. Advantages: 1) This topology increases connectivity within the SAN while conserving overall port utilization. 2) If expansion is required, an additional edge-switch can be connected to the core. 3) The core of the fabric is also extended by adding more switches or directors at the core tier. Figure 2-18: Single core topology This topology can have different variations. 1) In a single-core topology (Figure 2-18), i) All hosts are connected to the edge-tier and ii) The storage is connected to the core-tier. 2) In a dual-core topology (Figure 2-19), expansion can be done to include more core-switches. However, to maintain the topology, it is essential that new ISLs are created to connect each edge-switch to the new core-switch that is added. 2-21

Figure 2-19: Dual-core topology 2.11.1.1 Benefits and Limitations of Core-Edge Fabric Benefits 1) This fabric provides one-hop storage-access to all storage in the system.

69 Figure 2-19: Dual-core topology Benefits and Limitations of Core-Edge Fabric Benefits 1) This fabric provides one-hop storage-access to all storage in the system. Each tier s switch is used for either storage or hosts. Thus, i) it is easy to identify which resources are approaching their capacity. ii) it is easy to develop a set of rules for scaling. 2) A well-defined, easily reproducible building-block approach makes rolling out new fabrics easier. Core-edge fabrics can be scaled to larger environments by linking core-switches adding more core-switches, or adding more edge-switches. 3) This method can be used to extend the existing simple core-edge model or to expand the fabric into a compound core-edge model. Limitations 1) The fabric may lead to some performance-related problems because scaling a topology involves increasing the number of ISLs in the fabric. As no. of edge-switch increases, the domain count in the fabric increases. 2) A common best practice is to keep the number of host-to-storage hops unchanged, at one hop, in a core-edge. Hop count refers to total number of devices the data has to traverse from its source to destination. Generally, a large hop count means greater transmission-delay 3) As no. of core increases, it becomes difficult to maintain ISLs from each core to each edge-switch. When this happens, the Fabric-design can be changed to a compound core-edge design. 2-22

Storage Area Network IV, II Sem

Storage Area Network IV, II Sem UNIT-1 Information is increasingly important in our daily lives. We have become informationdependent in the 21st century, living in an on-command, on-demand world, which