Shared File System Requirements for SAS Grid Manager Table Talk #1546 Ben Smith / Brian Porter
About the Presenters Main Presenter: Ben Smith, Technical Solutions Architect, IBM smithbe1@us.ibm.com Brian Porter, Technical Solutions Architect, IBM bporter1@us.ibm.com Harry Seifert, Executive IT Specialist, IBM seifert@us.ibm.com Qingda Wang, Sr Architect, IBM qwang@ca.ibm.com
Recent Thoughts On Filesystems with SGM A Shared Filesystem is Required
SAS IO Characteristics Predominately Large Sequential Block IO SAS Foundation has large-block sizes of 64K, 128K, or 256K SAS tends to perform large sequential Reads and Writes. (80:20 to 60:40) SAS does not pre-allocate storage when initializing or when performing Writes to a file. Reading and writing of data is done via the operating system s (OS) file cache. A large number of temporary files can be created during long-running SAS jobs in SASWORK.
SAS Grid Manager Overview Analytic Data Warehouse / Marts SAS Grid Manager Control Server SAS Analyst s Desktops Shared File System SAS Grid Manager Grid Node SAS Grid Manager Grid Node Clustered Metadata Clustered Web Application Server(s) SAS Web Clients Enterprise Data Warehouse SAS Grid Manager Grid Node SAS Environment Mgr PWS SAS Display Manager SASGSUB Data Tier Server Tier Metadata Tier Web Tier Client Tier
Tuning Considerations in Addition to the File System(s) For the SAS Client Server / SAS IO Server O/S choice : Linux, AIX, Windows, Unix CPU and Memory Optimization HBA/FC Multipathing Tuning/Optimization Storage Fabric Interconnect Optimization SAN, NAS, or direct attached, Fiber Channel Ethernet/etc) Storage RAID and Storage Disk/SSD/Flash Tunables
The Area to focus for the Shared FS Must be Fast to utilize the CPU resources - Ie; 3 nodes with 16 cores = 3x16x100MB/sec = 4.8GB/sec Have all nodes able to see SASDATA and possibly SASWORK - SASWORK for HA with checkpoint/restart - Or if local resources are not as fast SAS Grid Manager Control Server SAS Grid Manager Grid Node SAS Grid Manager Grid Node SAS Grid Manager Grid Node Data Tier Server Tier
SAS GRID COMPUTING DATA IN A SHARED FILESYSTEM The files all need to be accessed from any of the SAS Grid nodes SASDATA LSF Config files LSF Binaries Deployed jobs SASGSUB work directory SASUSER directories Provide High Availability Permanent SAS files these include all SAS files (programs, catalogs, data sets, indexes, and so on) that need to persist between SAS sessions and can be shared between SAS users. SAS deployment and configuration files these include all SAS binaries, configuration files, logs from SAS servers and SAS server repositories. SAS WORK - if the system uses SAS Check Point and Label Restart technology.
Shared Filesystem General Characteristics It is highly recommended that ALL SAS Grid Manager deployments utilize a shared filesystem A Shared Filesystem should provide the following for best SAS Performance: Transparency for access, location, concurrency, replication, and etc. File system data retention in a local file cache (in memory) Efficient handling of file system Metadata Physical Resources --Coordination of data with multiple host systems Workloads Large Sequential Block IO is dominant storage pattern Can be SAN or NAS or Shared Nothing
Possible Shared FS Topologies Local SASWORK vs Shared SASWORK
Shared Filesystems that perform well with SAS IBM Spectrum Scale (aka GPFS) Red Hat GFS2 Veritas InfoScale Quantum StorNext Intel Enterprise Edition For Lustre
Shared Filesystems that have issues with SAS performance Red Hat Gluster per Red Hat Red Hat CEPH per Red Hat Oracle CFS Parallel NFS
Non-Shared Filesystems That Are Suitable For SASWORK If your workload employs heavy sequential READ and WRITE loads: AIX: JFS2 Linux RHEL: XFS Windows: NTFS
Why not use NFS? The issue is NFS metadata cache coherency that causes the cached file system metadata to dump very frequently. NFS does this every time a read or write lock is placed on a file or the file s attributes such as size change. Dumping of the cached metadata drastically interrupts large sequential writes and affects the ability to process the data because the file system is constantly re-reading via the network and updating the cached file system metadata. And sometimes NFS works ok for very small configurations SASDATA OK SASWORK Not so much Strongly discourage use of NFS with SASWORK when performance is a concern NFS Cache for file and directory attributes Use ACTIME= for better response Default settings of 1 minute is problematic for other servers (nodes) in the system File mods may not be visible to other systems until an NFS commit is executed Read/write/share locks may invalidate data and cause the cache to be refreshed
CIFS Common Internet File System (CIFS) is the native shared file system provided with Windows operating systems. With recent patches, CIFS can be used for workloads with moderate levels of concurrency and works best for workloads that get limited benefit from the local file cache. The recommended configuration is to place SAS WORK directories on a non- CIFS file system and use CIFS to manage shared, permanent files (both SAS data files and reports/output). With the release of the Windows 2008 operating system, many improvements were made to address performance issues and connectivity via 10-Gigabit Ethernet (GbE). These greatly improve the throughput and responsiveness of the CIFS file system. Resulting from changes made both to SAS Foundation 9.3 software and Windows Server 2008 R2 operating system, CIFS is functionally stable. 10 However, workload results showed that there was relatively poor retention of data in the local file cache. Workloads that reuse data from local file cache will not perform nearly as well with CIFS when compared to a local file system. The workload configuration had three systems running the Windows Server 2008 R2 operating system; one acting as a file server and two as clients all connected via 10 GbE.
Reference Papers Shared Filesystems: Determining the Best Choice For your Distributed SAS Foundation Applications Paper SAS569-2017 A Survey of Shared Filesystems 22 Oct 2014 support.sas.com/resources/papers/proceedings13/484-2013.pdf When to use NFS with SAS (blog) blogs.sas.com/content/sgf/2015/01/07/when-to-use-nfs-with-sas/ SAS Grid Manager IO support.sas.com/resources/papers/proceedings14/1559-2014.pdf
Don't Forget to Provide Feedback! 1. Go to the Agenda icon in the conference app. 2. Find this session title and select it. 3. On the sessions page, scroll down to Surveys and select the name of the survey. 4. Complete the survey and click Finish.