The s Evolution Christian Bandulet, Sun Microsystems
SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced without modification The SNIA must be acknowledged as source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Neither the Author nor the Presenter is an attorney and nothing in this presentation is intended to be nor should be construed as legal advice or opinion. If you need legal advice or legal opinion please contact an attorney. The information presented herein represents the Author's personal opinion and current understanding of the issues involved. The Author, the Presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. 2
Abstract The s Evolution s impose structure on the address space of one or more physical or virtual devices. Starting with local file systems over time additional file systems appeared focusing on specialized requirements such as data sharing, remote file access, distributed file access, parallel files access, HPC, archiving, security etc.. Due to the dramatic growth of unstructured data files as the basic units for data containers are morphing into file objects providing more semantics and feature-rich capabilities for content processing. This presentation will categorize and explain the basic principles of currently available file systems (e.g. local FS, shared FS, SAN FS, clustered FS, network FS, WAFS, distributed FS, parallel FS, object FS,...). It will also explain technologies like NAS aggregation, NAS clustering, scalable NFS, global namespace, parallel NFS, storage grids and cloud storage. All of these files system categories are complementary. They will be enhanced in parallel with additional value added functionality. New file system architectures will be developed and some of them will be blended in the future. 3
Check Out Other Tutorials Check out SNIA Tutorial: DFS Over CIFS Check out SNIA Tutorial: Storage Tiering for File & NAS Systems Check out SNIA Tutorial: NAS and iscsi Technology Overview Check out SNIA Tutorial: Find and Select the Right File Storage for Your Check out SNIA Tutorial: Scaling NFS Through pnfs 4
Agenda Basics s Taxonomy Local FS Shared FS/Global FS SAN FS, Cluster FS Network FS Scalable NAS / Scalable NFS Wide Area FS Distributed FS File Virtualization Distributed Parallel FS NAS Cluster / NAS Grid FS Future Developments 5
& Operating System User and Libraries (ls, mv, rm, cp,...) Userspace System Calls (open(), close(), read(), write(), ioctl(), mmap(),...) Kernelspace s VFS Process Management Metadata Cache* Segmap Cache mmap() Memory Mgmt Scheduler IPC *can be bypassed: Direct I/O Volume Manager Device Drivers Buffers DMA machine dependent code Hardware 6
Agenda Basics s Taxonomy Local FS Shared FS/Global FS SAN FS, Cluster FS Network FS Scalable NAS / Scalable NFS Wide Area FS Distributed FS File Virtualization Distributed Parallel FS NAS Cluster / NAS Grid FS Future Developments 7
The s Evolution File systems evolved over time Starting with local file systems over time additional file systems appeared focusing on specialized requirements such as data sharing, remote file access, distributed file access, parallel files access, HPC, archiving, etc. Local FS Shared FS SAN FS Cluster FS Network FS Wide Area FS Distribute d FS Object FS Parallel FS? Time Note: The picture above does not reflect the exact sequence in which the files system types appeared. Some of them actually appeared in parallel. It is also not the intention to indicate that a new file system replaces its predecessors. Instead they are targeting complimentary objectives. 8
Taxonomy File Systems Local FS Shared FS Network FS SAN FS Cluster FS WAFS Distributed FS Distributed Parallel FS 9
Agenda Basics s Taxonomy Local FS Shared FS/Global FS (SAN FS, Cluster FS) Network FS Scalable NAS / Scalable NFS Wide Area FS Distributed FS File Virtualization Distributed Parallel FS NAS Cluster / NAS Grid FS Future Developments File Systems Local FS Shared FS Network FS SAN FS Cluster FS WAFS Distributed FS Distributed Parallel FS 10
Local FS Local FS FS is co-located with application server 11
Local FS (cont d) Local FS Local FS Local FS Local FS Islands of storage (no data sharing) 12
Shared Device vs. Shared Data Disk Array Shared Device: A physical device is shared by more than one client - Each client has exclusive access to a dedicated LUN 13
Shared Device vs. Shared Data Disk Array Shared Device: A physical device is shared by more than one client - Each client has exclusive access to a dedicated LUN Disk Array Shared Data: A physical device is shared by more than one client - Clients access the same LUN in parallel 14
Traditional - Inode When a file system is created, data structures that contain information about files are created. Each file has an inode and is identified by an inode number (often referred to as an "i-number" or "inode") in the file system where it resides. Inode Host 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 direct 0 direct 1 direct 2 direct 3 direct 4 direct 5 direct 6 direct 7 direct 8 direct 9 single indirect double indirect triple indirect Data Blocks 15
Traditional - Inode The inode also contains file attributes... Data Blocks 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Inode direct 0 direct 1 direct 2 direct 3 direct 4 direct 5 direct 6 direct 7 direct 8 direct 9 single indirect double indirect triple indirect Host File Owner File Type File Attributes: Permissions Last Access... Size The s # of Evolution links 16
Agenda Basics s Taxonomy Local FS Shared FS/Global FS SAN FS, Cluster FS Network FS Scalable NAS / Scalable NFS Wide Area FS Distributed FS File Virtualization Distributed and Parallel FS NAS Cluster / NAS Grid FS Future Developments File Systems Local FS Shared FS Network FS SAN FS Cluster FS WAFS Distributed FS Distributed Parallel FS 17
Shared FS Scale-Out Horizontal Scaling... SAN Shared Data 18
Shared FS/Global FS Data Access Separate Metadata (MDS) Separation between logical and physical placement File access is a three-step transaction... MDS Client MDS Client MDS Client 1 2 3 19
Shared FS / Global FS SAN FS System Area Network e.g. Web e.g. Web e.g. Web MDS (active) MDS (passive) SAN FS SAN FS SAN FS Storage Network Shared Data MDS is not part of each node (i.e. master/slave - asymmetric) Heterogeneous with unlimited number of nodes Unlimited distance between nodes 20
Shared FS / Global FS Cluster FS System Area Network e.g. Web e.g. Web e.g. Web e.g. Web e.g. Web MDS (active) Cluster FS MDS (active) Cluster FS MDS (active) MDS (active) MDS (active) Cluster FS Cluster FS Cluster FS Storage Network Shared Data MDS is part of each client (cluster) node; (i.e. peer-to-peer - symmetric) Homogeneous with limited number of nodes Limited distance between (cluster) nodes 21
Agenda Basics s Taxonomy Local FS Shared FS/Global FS SAN FS, Cluster FS Network FS Scalable NAS / Scalable NFS Wide Area FS Distributed FS File Virtualization Distributed Parallel FS NAS Cluster / NAS Grid FS Future Developments File Systems Local FS Shared FS Network FS SAN FS Cluster FS WAFS Distributed FS Distributed Parallel FS 22
Network Files System- aka Proxy FS Local FS Network FS Client Client Client Client Network Protocol* * e.g. NFS, CIFS, AFP, WebDAV, FTP, HTTP,... A network file system is any file system that supports sharing of files over a computer network protocol between a client and a server 23
Network Protocol (NFS) NFS Client NFS Client NFS Client NFS Client Computer Network Protocol NFS SAN 24
NFSv4 Single- Namespace Pseudo FS aka Shared Name Space NFS Client NFS Client NFS Client NFS Client NFS NFSv4 client view: / /a /b /c SAN /a /b /a NFSv4 server view: /b /c Transparent Files System Transitions /c The NFSv4 spec (RFC 3530) defines how a server maintains a pseudo-filesystem namespace linking the filesystems it shares, so that clients can navigate to them from the server root. Many clients rely on this "single-server namespace" to be able to access all file systems on the server transparently. 25
Agenda Basics s Taxonomy Local FS Shared FS/Global FS SAN FS, Cluster FS Network FS Scalable NAS / Scalable NFS Wide Area FS Distributed FS File Virtualization Distributed Parallel FS NAS Cluster / NAS Grid FS Future Developments File Systems Local FS Shared FS Network FS SAN FS Cluster FS WAFS Distributed FS Distributed Parallel FS 26
Scalable NAS (NFS & Shared FS) Export NFS from Shared FS NFS Client NFS Client NFS Client NFS Client NFS Client IP NFS NFS NFS Shared FS Shared FS Shared FS MDS* Shared FS Shared FS Shared FS MDS* *MDS optional *MDS optional Shared FS with Shared Data Scalable NFS with Shared FS 27
Agenda Basics s Taxonomy Local FS Shared FS/Global FS SAN FS, Cluster FS Network FS Scalable NAS / Scalable NFS Wide Area FS Distributed FS File Virtualization Distributed Parallel FS NAS Cluster / NAS Grid FS Future Developments File Systems Local FS Shared FS Network FS SAN FS Cluster FS WAFS Distributed FS Distributed Parallel FS 28
Network FS Stack Data SCSI Port SAN NFS/CIFS NFS/CIFS Client NFS/CIFS Client NFS/CIFS Client NFS/CIFS Client Client RPC/XDR RPC/XDR RPC/XDR RPC/XDR TCP/IP RPC/XDR TCP/IP TCP/IP TCP/IP Ethernet NIC TCP/IP Ethernet NIC Ethernet NIC Ethernet NIC Ethernet NIC SCSI HBA SCSI Driver Volume Mgr NFS/CIFS RPC/XDR TCP/IP Ethernet NIC LAN 29
Network FS in a Distributed World Consolidating file and storage resources into the data center eases management, administration, cost, and compliance Global file sharing and collaboration Remote office consolidation and optimization Most application an file access protocols perform poorly over the WAN Data SCSI Port SAN NFS/CIFS NFS/CIFS Client NFS/CIFS Client NFS/CIFS Client NFS/CIFS Client Client RPC/XDR RPC/XDR RPC/XDR RPC/XDR TCP/IP RPC/XDR TCP/IP TCP/IP TCP/IP Ethernet NIC TCP/IP Ethernet NIC Ethernet NIC Ethernet NIC Ethernet NIC SCSI HBA SCSI Driver Volume Mgr NFS/CIFS RPC/XDR TCP/IP Ethernet NIC WAN 30
Wide Area FS Wide Area File Services aka Wide Area (actually not a FS!) Protocol-specific optimization: HTTP, NFS, CIFS, WebDAV, FTP, TCP/IP,... -specific optimization: email, document management, SQL,... Intelligent caching: read-ahead, deferred write, coherency,... Data compression: file-aware differencing, data aggregation, I/O clustering, dictionary-based compression (de-duplication), cross-protocol data reduction,... NFS/CIFS NFS/CIFS Client NFS/CIFS Client NFS/CIFS Client NFS/CIFS Client Client RPC/XDR RPC/XDR RPC/XDR RPC/XDR TCP/IP RPC/XDR TCP/IP TCP/IP TCP/IP Ethernet NIC TCP/IP Ethernet NIC Ethernet NIC Ethernet NIC Ethernet NIC TCP/IP Ethernet NIC WAFS Engine TCP/IP Ethernet NIC TCP/IP Ethernet NIC WAFS Engine TCP/IP Ethernet NIC Data SCSI Port SAN SCSI HBA SCSI Driver Volume Mgr NFS/CIFS RPC/XDR TCP/IP Ethernet NIC LAN WAN LAN 31
Agenda Basics s Taxonomy Local FS Shared FS/Global FS SAN FS, Cluster FS Network FS Scalable NAS / Scalable NFS Wide Area FS Distributed FS File Virtualization Distributed Parallel FS NAS Cluster / NAS Grid FS Future Developments File Systems Local FS Shared FS Network FS SAN FS Cluster FS WAFS Distributed FS Distributed Parallel FS 32
Distributed (DFS) Client Network Protocol client view: / /a /b /c /a /b Single FS /c A distributed file system is a network file system whose clients, servers, and storage devices are dispersed among the machines of a distributed system or intranet ( Parallel FS) 33
DFS Logical Data Access Path I 5 IV /home/a/b/c 4 /hom/a/b/c/foo.exe read /home/a/b/c/foo.exe II V Single FS / 1 3 /home/a/b III /home/a 2 VI /home Using Ethernet as a networking protocol between nodes, a DFS allows a single file system to span across all nodes in the DFS cluster, effectively creating a unified Global Namespace for all files. 34
NFSv4.1 Multi- Name Space NFS Client NFSv4 client view: / /a /b /c NFS A NFS B NFS C SAN SAN SAN /a /b /c Single FS fs_location attribute enables: referral, replicas, clones, migration NFSv4.1 supports attributes that allow a namespace to extend beyond the boundaries of a single server through location attributes. A server can inform a client that data it seeks lives at another location; this is called referral, and referrals can be used to construct an Global Namespace 35
Agenda Basics s Taxonomy Local FS Shared FS/Global FS SAN FS, Cluster FS Network FS Scalable NAS / Scalable NFS Wide Area FS Distributed FS File Virtualization Distributed Parallel FS NAS Cluster / NAS Grid FS Future Developments File Systems Local FS Shared FS Network FS SAN FS Cluster FS WAFS Distributed FS Distributed Parallel FS 36
Network Attached Storage (NAS) IP Storage Islands Separated Namespaces Multiple mount points NAS Appliance NAS Appliance NAS Appliance Data SAN Data SAN Data SAN 37
File Virtualization IP NAS Router In-Band Solution Aka NAS Aggregation NAS router Global Namespace NAS Appliance NAS Appliance NAS Appliance Data SAN Data SAN Data SAN 38
FS Virtualization NFS4.1 pnfs In-Band NAS: Out-of-Band NAS: NFSv4 client NFSv4 client NFSv4 client NFSv4.1 client with pnfs NFSv4.1 client with pnfs NFSv4.1 client with pnfs IP NAS Appliance Data Storage Protocols: SCSI (FCP, iscsi, SRP, SAS), NFSv4.1, OSD IP NAS Appliance with NFSv4.1 pnfs extensions Data SAN SAN Data-Path is de- coupled from Control- and Metadata-Path 39
FS Virtualization NFS4.1 pnfs NFSv4.1 client with pnfs NFSv4.1 client with pnfs NFSv4 client w/o pnfs Storage Protocol File: NFSv4.1 Block: iscsi, FCP, SRP, SAS Object: OSD NFS4.1 + pnfs Control Protocol NAS Appliance with NFSv4.1 pnfs extensions NFS MDS acts as proxy for clients not pnfs enabled Storage Device Storage Device Storage Device Storage Device one-to-one, stripe, miror, concatenation Global Namespace Data Metadata (MDS) creates Global Namespace 40
Agenda Basics s Taxonomy Local FS Shared FS/Global FS SAN FS, Cluster FS Network FS Scalable NAS / Scalable NFS Wide Area FS Distributed FS File Virtualization Distributed Parallel FS NAS Cluster / NAS Grid FS Future Developments File Systems Local FS Shared FS Network FS SAN FS Cluster FS WAFS Distributed FS Distributed Parallel FS 41
Distributed (DFS) Client Network Protocol client view: / /a /b /c /a /b Single FS /c Files are distributed across file servers 42
Distributed & Parallel File Segments distributed across storage nodes Parallel I/Os File SAN (Networking Protocol) Storage Storage Storage Storage Storage Aggregation of Storage s: RAIN + RAID (aka Network RAID) Global Namespace 43
Agenda Basics s Taxonomy Local FS Shared FS/Global FS SAN FS, Cluster FS Network FS Scalable NAS / Scalable NFS Wide Area FS Distributed FS File Virtualization Distributed Parallel FS NAS Cluster / NAS Grid FS Future Developments File Systems Local FS Shared FS Network FS SAN FS Cluster FS WAFS Distributed FS Distributed ParallelFS 44
Two-Node NAS Cluster (Failover) IP NAS Appliance Data Cluster Interconnect NAS Appliance Data 45
NAS Scale-Out Problem Statement NAS Appliance NAS Appliance NAS Appliance NAS Appliance Horizontal Scaling... Horizontal scaling without data replication or creating islands of data 46
NAS Cluster / NAS Grid Virtual IP NAS Appliance NAS Appliance NAS Appliance NAS Appliance Data Data Data Data Single Data Image Global Namespace 47
Cloud Storage/Computing (SaaS) Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node File Horizontal Scaling File File File... 48
Agenda Basics s Taxonomy Local FS Shared FS/Global FS SAN FS, Cluster FS Network FS Scalable NAS / Scalable NFS Wide Area FS Distributed FS File Virtualization Distributed Parallel FS NAS Cluster / NAS Grid FS Future Developments File Systems Local FS Shared FS Network FS SAN FS Cluster FS WAFS Distributed FS Distributed ParallelFS 49
s & Metadata Inode direct 0 direct 1 direct 2 direct 3 direct 4 direct 5 direct 6 direct 7 direct 8 direct 9 single indirect double indirect triple indirect File Owner File Type Host Data Blocks File Attributes: Permissions Last Access Size.. find. -name *.exe. # of links 50
Files Are Morphing Into File Objects Inode Data Blocks name OID Object name OID Object name OID Object name OID Object name OID Object 51
Files Are Morphing Into Objects... name name OID OID Object Object Object Object OID Object Object Object Object Data Metadata Object Attributes select * where customer_id < 17 and location = Frankfurt, Germany 52
Aggregation of Storage s (RAIN) Data Placement Storage Grid Object 1 Storage Node Storage Node Storage Node Storage Node Object 2 Storage Node Storage Node Storage Node Storage Node Object 3 Storage Node Storage Node Storage Node Storage Node = Data Storage Node Storage Node Storage Node Storage Node = Parity 53
Data Serving Hierarchy 3 Levels of Abstraction may interface with the storage subsystem in anyone of three layers: Block with highest performance and very little meta data File with medium performance and some meta data Object with medium performance and rich meta data Object Many to One File Many to One Block Data Platform 54
Q&A / Feedback Please send any questions or comments on this presentation to SNIA: trackfilemgmt@snia.org Many thanks to the following individuals for their contributions to this tutorial. - SNIA Education Committee Christian Bandulet, Sun Microsystems 55