Distributed File Systems Part IV Daniel A. Menascé Hierarchical Mass Storage Systems On-line data requirements Mass Storage Systems Concepts Mass storage system architectures Example systems Performance of Mass Storage Systems 1
On-line data requirements Thomas Jefferson National Accelerator Facility: will collect over one Terabyte of raw accelerator data per day when post-processing is included, 500 TB of raw and formatted data will be generated per year. a total of one Petabyte (1,000 TB) stored by the year 2,000. FY CPU (MIPS) Disk (GB) Near-line Tape (TB) 96 2K 100 5 97 10K 500 150 98 20K 1000 300 99 30K 2000 1200 On-line data requirements DKRZ, Hamburg, Germany, climate research support and complex climate simulations 97 98 99 2000 CPU performance (GFlops) 20 20 35 40 data generation rate (GB/day) 150 150 300 400 required data archival capacity (TB) 80 120 200 300 required peak transfer rate (MBytes/sec) 60 60 120 160 2
On-line data requirements Goddard Space Flight Center Distributed Active Archive Center, Greenbelt, MD supports Earth Observing (EOS) science datasets about 600 TB ordered per month in 1995 peak of 250 TB ordered per hour in 1995. users tend to reference files just created or files created a long time ago (over 3 months) close to 4,000 tape mounts per week about 50 files transferred per tape mount On-line data requirements NASA s Center for Computational Sciences, Greenbelt, MD supports space and Earth researchers 6 StorageTek silos with 28.8 TB and one IBM 3494 robotic tape library with an additional 24 TB. about 1 TB retrieved per week about 700 TB of robotic storage will be needed by the year 2000. 3
NASA s Center for Computational Science Total Terabytes Stored NASA s Center for Computational Science Workload Intensity 4
Hierarchical Mass Storage Systems access time RAM Magnetic Disks Robotically Mounted Tapes Off-line Tapes cost/ Mbyte Hierarchical Mass Storage Systems How to obtain disk access times at a cost per MByte comparable with magnetic tapes? disk caches automatic file migration between the disk cache and the tape subsystem. 5
Mass Storage Systems: Disk Cache F6 F4 F5 F1 F2 F3 robot disk cache tape drive F1,..., F100 Mass Storage Systems: Cache Miss F6 F4 F1 F2 F3 disk cache F9 tape drive robot F1,..., F100 6
Mass Storage Systems: Migration between levels F6 F4 F9 F1 F2 F3 F9 robot disk cache tape drive - files unused for a long time are automatically migrated to tape. Host Attached Mass Storage System Architectures file server all peripherals are attached to the file server all data transfers between disk and tapes have to use the file server s (host) main memory. Network Attached peripherals are connected directly to the network data transfers between disk and tape do not use the file server s main memory. 7
Host-attached MSSs Client... Client File Server Disk Cache Tape MSS Host File Server Disks Disk Cache Disks Robotic Tape Server Host Attached Device Based Mass Storage System Cray - Convex/UniTree Mass Storage System 330 gigabytes disk (formatted) 8 StorageTek 3490 freestanding cartridge drives StorageTek ACS 1 9310 Powderhorn silo 8 cartridge drives (3490) Cray C98 6 CPUs, 1 gigaflop per processor 256 megawords central memory 512 megawords SSD 8
Network-attached MSSs Storage Access Control Network Client Client High Speed Data Network (e.g. HIPPI) HA Disk File Server NA Disk Server NA Tape Server Storage Unit Control Network Transfer Protocols: Device to Device Transfer 9
Network-attached MSSs Storage Access Control Network 1 Client Client High Speed Data Network (e.g. HIPPI) 6* 5* HA Disk File Server NA Disk Server NA Tape Server Storage Unit Control Network 2,3,7 2,4,7 Network-attached MSSs Features Separation of control and data paths Scalability: host memory is not the bottleneck any longer. 10
Robotic Tape Library robot cartridge tape drives tape cartridges tape cartridge to be mounted robot track Examples of Devices for MSSs StorageTek: Powderhorn (robotic tape library): 6,000 cartridge capacity 1-4 tape cartridge drives 2-16 robotic arms up to 350 tape exchanges/hour separation of control and data paths 11
Examples of Devices for MSSs Sony: DMS-B1000 (robotic tape library): 1,104 DTF tapes (12 GB per tape) up to 4 tape drives maximum data capacity of 13.2 TB access time < 6 sec separation of control and data paths tape drive: 300 MB/sec search speed 12 MB/sec transfer rate 40 sec rewind time File Systems for MSSs AMASS (EMASS) EMASS UNIX file system interface direct access to automated tape libraries Unitree (UniTree Software Inc.) based on the IEEE Mass Storage Reference Model NFS and ftp interface client/server architecture multiple robot/media support 12
Mass Storage System Example: Unitree Central File Manager Robotically mounted tape system (24,000 tapes). Off-line tape library. Magnetic disk file cache (155 GBytes). Automatic migration between levels. Compliance with the IEEE Mass Storage System Reference Model. Unitree I/O Architecture IDC TLI Tape Silo Convex C3830 IDC TLI IDC TLI IDC Tape Silo Tape Silo TLI Tape Silo 13
Convex Unitree Diagram Control Unit Sun WS Tape Silo Workload Characterization k-means clustering was performed on the file sizes of the requests. Larger k gives better fit/more classes in the model. A tightness measure was used. d j = t = 1 s j 1 k d d pi s j k j= 1 j ij Class File Size in MB Frequency of Occurrence Get-1 1.2 33.8% Get-2 19.6 9.9% Get-3 78.9 4.2% Get-4 220.6 1.4% Put-1 1.7 42.3% Put-2 34.8 3.3% Put-3 77.7 3.9% Put-4 144.1 1.2% 14
Host-attached MSSs Client... Client File Server Disk Cache Tape MSS Host File Server Disks Disk Cache Disks Robotic Tape Server Queuing Network Model 15
Workload Intensity Increase Results Client and Server Compression Client compression Server compression 16
File Abstraction Network-attached MSSs Storage Access Control Network Client Client High Speed Data Network (e.g. HIPPI) HA Disk File Server NA Disk Server NA Tape Server Storage Unit Control Network 17
Queuing Network Model HA Based vs. NA Based MSSs 18
Trends in Distributed File Systems New Hardware: cheap main memory file system in main memory with backups in videotape or optical disks. extremely fast fiber optic networks avoid client caching. Scalability: from 100 to 1,000 to 10,000 nodes! use of broadcast messages should be reduced. resources and algorithms should not be linear in the number of users. Trends in Distributed File Systems Wide Area Networking: Present: Backbone at 45 Mbps and access bandwidth at 19.2 Kbps. Future: Backbone at a few Gbps and access bandwidth at 56Kbps or higher with Cable-modems. Mobile Users: increase in disconnected operation mode. files will be cached for longer periods (hours or days) at the client laptop. 19
Trends in Distributed File Systems Fault Tolerance: down times are not well accepted by people in general. As distributed systems become more widespread, provisions for higher availability have to be incorporated into the design. Future: Backbone at a few Gbps and access bandwidth at 56Kbps. Multimedia: new applications such as video-on-demand, audio files pose different demands on the design of a file system. 20