Shared File System Requirements for SAS Grid Manager. Table Talk #1546 Ben Smith / Brian Porter

Similar documents
Lustre A Platform for Intelligent Scale-Out Storage

Parallel File Systems Compared

SAS workload performance improvements with IBM XIV Storage System Gen3

Creating a Multi-Tenant Environment using SAS Enterprise Guide and SAS Grid Tim Acton - General Dynamics Health Solutions

Data Sheet: Storage Management Veritas Storage Foundation for Oracle RAC from Symantec Manageability and availability for Oracle RAC databases

Bridging Traditional Analytics with BigData - SAS on UCS

Veritas InfoScale Enterprise for Oracle Real Application Clusters (RAC)

Linux Clustering Technologies. Mark Spencer November 8, 2005

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

Software Defined Storage at the Speed of Flash. PRESENTATION TITLE GOES HERE Carlos Carrero Rajagopal Vaideeswaran Symantec

Open Source Storage. Ric Wheeler Architect & Senior Manager April 30, 2012

An Introduction to GPFS

Oracle Database 11g Direct NFS Client Oracle Open World - November 2007

Frequently Asked Questions Regarding Storage Configurations Margaret Crevar and Tony Brown, SAS Institute Inc.

Technical Paper. Performance and Tuning Considerations for SAS on the Hitachi Virtual Storage Platform G1500 All-Flash Array

Veritas Storage Foundation for Oracle RAC from Symantec

SAS Certified Deployment and Implementation Specialist for SAS Grid Manager 9.4

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

Executive Brief June 2014

IBM Spectrum NAS. Easy-to-manage software-defined file storage for the enterprise. Overview. Highlights

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other

Linux File Systems: Challenges and Futures Ric Wheeler Red Hat

Deep Dive: Cluster File System 6.0 new Features & Capabilities

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

Parallel File Systems for HPC

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

The Deployment of SAS Enterprise Business Intelligence Solution in a large IBM POWER5 Environment

NAS for Server Virtualization Dennis Chapman Senior Technical Director NetApp

From an open storage solution to a clustered NAS appliance

A GPFS Primer October 2005

BSA Sizing Guide v. 1.0

HANDLING PERSISTENT PROBLEMS: PERSISTENT HANDLES IN SAMBA. Ira Cooper Tech Lead / Red Hat Storage SMB Team May 20, 2015 SambaXP

GlusterFS and RHS for SysAdmins

Veritas NetBackup on Cisco UCS S3260 Storage Server

PAC094 Performance Tips for New Features in Workstation 5. Anne Holler Irfan Ahmad Aravind Pavuluri

February 15, 2012 FAST 2012 San Jose NFSv4.1 pnfs product community

-Presented By : Rajeshwari Chatterjee Professor-Andrey Shevel Course: Computing Clusters Grid and Clouds ITMO University, St.

DB2 purescale: High Performance with High-Speed Fabrics. Author: Steve Rees Date: April 5, 2011

Mission-Critical Enterprise Linux. April 17, 2006

Veritas Volume Replicator Option by Symantec

Experiences in Clustering CIFS for IBM Scale Out Network Attached Storage (SONAS)

OS Virtualization. Linux Containers (LXC)

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

BeoLink.org. Design and build an inexpensive DFS. Fabrizio Manfredi Furuholmen. FrOSCon August 2008

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

The General Parallel File System

The Oracle Database Appliance I/O and Performance Architecture

Red Hat Enterprise 7 Beta File Systems

Copyright 2003 VERITAS Software Corporation. All rights reserved. VERITAS, the VERITAS Logo and all other VERITAS product names and slogans are

System input-output, performance aspects March 2009 Guy Chesnot

An Oracle White Paper December Accelerating Deployment of Virtualized Infrastructures with the Oracle VM Blade Cluster Reference Configuration

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Technical Paper. Performance and Tuning Considerations for SAS on Dell EMC VMAX 250 All-Flash Array

Parallel File Systems. John White Lawrence Berkeley National Lab

<Insert Picture Here> DBA Best Practices: A Primer on Managing Oracle Databases

Database Performance on NAS: A Tutorial. Darrell Suggs. NAS Industry Conference Network Appliance - Darrell Suggs

Sun N1: Storage Virtualization and Oracle

Tuning Enterprise Information Catalog Performance

1Z0-433

Introducing SUSE Enterprise Storage 5

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

VERITAS Storage Foundation 4.0 for Oracle

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

Hitachi Virtual Storage Platform Family

Solid State Performance Comparisons: SSD Cache Performance

Oracle Enterprise Manager Ops Center. Overview. What You Need. Create Oracle Solaris 10 Zones 12c Release 3 ( )

Veritas Storage Foundation from Symantec

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

Intellicus Cluster and Load Balancing- Linux. Version: 18.1

Introduction To Gluster. Thomas Cameron RHCA, RHCSS, RHCDS, RHCVA, RHCX Chief Architect, Central US Red

DBaaS (Oracle and Open Source)

EMC VPLEX Geo with Quantum StorNext

1 Quantum Corporation 1

Horizontal Scaling Solution using Linux Environment

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure

Performance Characterization of the Dell Flexible Computing On-Demand Desktop Streaming Solution

Next Generation Storage for The Software-Defned World

Architecting Storage for Semiconductor Design: Manufacturing Preparation

System Description. System Architecture. System Architecture, page 1 Deployment Environment, page 4

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

VMware vsphere with ESX 4.1 and vcenter 4.1

Re-platforming the E-Business Suite Database

HPC File Systems and Storage. Irena Johnson University of Notre Dame Center for Research Computing

RAIDIX 4.5. Product Features. Document revision 1.0

InfiniBand based storage target

Session: Oracle RAC vs DB2 LUW purescale. Udo Brede Quest Software. 22 nd November :30 Platform: DB2 LUW

Storage for HPC, HPDA and Machine Learning (ML)

Clustered NAS For Everyone Clustering Samba With CTDB. NLUUG Spring Conference 2009 File Systems and Storage

Red Hat Gluster Storage performance. Manoj Pillai and Ben England Performance Engineering June 25, 2015

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

WHEN CONTAINERS AND VIRTUALIZATION DO - AND DON T - WORK TOGETHER

Chapter 10: File System Implementation

Symantec Storage Foundation for Oracle Real Application Clusters (RAC)

<Insert Picture Here> Linux: The Journey, Milestones, and What s Ahead Edward Screven, Chief Corporate Architect, Oracle

JVM Performance Study Comparing Java HotSpot to Azul Zing Using Red Hat JBoss Data Grid

Veritas Storage Foundation for Windows by Symantec

IBM Emulex 16Gb Fibre Channel HBA Evaluation

Abstract /10/$26.00 c 2010 IEEE

Netezza The Analytics Appliance

Transcription:

Shared File System Requirements for SAS Grid Manager Table Talk #1546 Ben Smith / Brian Porter

About the Presenters Main Presenter: Ben Smith, Technical Solutions Architect, IBM smithbe1@us.ibm.com Brian Porter, Technical Solutions Architect, IBM bporter1@us.ibm.com Harry Seifert, Executive IT Specialist, IBM seifert@us.ibm.com Qingda Wang, Sr Architect, IBM qwang@ca.ibm.com

Recent Thoughts On Filesystems with SGM A Shared Filesystem is Required

SAS IO Characteristics Predominately Large Sequential Block IO SAS Foundation has large-block sizes of 64K, 128K, or 256K SAS tends to perform large sequential Reads and Writes. (80:20 to 60:40) SAS does not pre-allocate storage when initializing or when performing Writes to a file. Reading and writing of data is done via the operating system s (OS) file cache. A large number of temporary files can be created during long-running SAS jobs in SASWORK.

SAS Grid Manager Overview Analytic Data Warehouse / Marts SAS Grid Manager Control Server SAS Analyst s Desktops Shared File System SAS Grid Manager Grid Node SAS Grid Manager Grid Node Clustered Metadata Clustered Web Application Server(s) SAS Web Clients Enterprise Data Warehouse SAS Grid Manager Grid Node SAS Environment Mgr PWS SAS Display Manager SASGSUB Data Tier Server Tier Metadata Tier Web Tier Client Tier

Tuning Considerations in Addition to the File System(s) For the SAS Client Server / SAS IO Server O/S choice : Linux, AIX, Windows, Unix CPU and Memory Optimization HBA/FC Multipathing Tuning/Optimization Storage Fabric Interconnect Optimization SAN, NAS, or direct attached, Fiber Channel Ethernet/etc) Storage RAID and Storage Disk/SSD/Flash Tunables

The Area to focus for the Shared FS Must be Fast to utilize the CPU resources - Ie; 3 nodes with 16 cores = 3x16x100MB/sec = 4.8GB/sec Have all nodes able to see SASDATA and possibly SASWORK - SASWORK for HA with checkpoint/restart - Or if local resources are not as fast SAS Grid Manager Control Server SAS Grid Manager Grid Node SAS Grid Manager Grid Node SAS Grid Manager Grid Node Data Tier Server Tier

SAS GRID COMPUTING DATA IN A SHARED FILESYSTEM The files all need to be accessed from any of the SAS Grid nodes SASDATA LSF Config files LSF Binaries Deployed jobs SASGSUB work directory SASUSER directories Provide High Availability Permanent SAS files these include all SAS files (programs, catalogs, data sets, indexes, and so on) that need to persist between SAS sessions and can be shared between SAS users. SAS deployment and configuration files these include all SAS binaries, configuration files, logs from SAS servers and SAS server repositories. SAS WORK - if the system uses SAS Check Point and Label Restart technology.

Shared Filesystem General Characteristics It is highly recommended that ALL SAS Grid Manager deployments utilize a shared filesystem A Shared Filesystem should provide the following for best SAS Performance: Transparency for access, location, concurrency, replication, and etc. File system data retention in a local file cache (in memory) Efficient handling of file system Metadata Physical Resources --Coordination of data with multiple host systems Workloads Large Sequential Block IO is dominant storage pattern Can be SAN or NAS or Shared Nothing

Possible Shared FS Topologies Local SASWORK vs Shared SASWORK

Shared Filesystems that perform well with SAS IBM Spectrum Scale (aka GPFS) Red Hat GFS2 Veritas InfoScale Quantum StorNext Intel Enterprise Edition For Lustre

Shared Filesystems that have issues with SAS performance Red Hat Gluster per Red Hat Red Hat CEPH per Red Hat Oracle CFS Parallel NFS

Non-Shared Filesystems That Are Suitable For SASWORK If your workload employs heavy sequential READ and WRITE loads: AIX: JFS2 Linux RHEL: XFS Windows: NTFS

Why not use NFS? The issue is NFS metadata cache coherency that causes the cached file system metadata to dump very frequently. NFS does this every time a read or write lock is placed on a file or the file s attributes such as size change. Dumping of the cached metadata drastically interrupts large sequential writes and affects the ability to process the data because the file system is constantly re-reading via the network and updating the cached file system metadata. And sometimes NFS works ok for very small configurations SASDATA OK SASWORK Not so much Strongly discourage use of NFS with SASWORK when performance is a concern NFS Cache for file and directory attributes Use ACTIME= for better response Default settings of 1 minute is problematic for other servers (nodes) in the system File mods may not be visible to other systems until an NFS commit is executed Read/write/share locks may invalidate data and cause the cache to be refreshed

CIFS Common Internet File System (CIFS) is the native shared file system provided with Windows operating systems. With recent patches, CIFS can be used for workloads with moderate levels of concurrency and works best for workloads that get limited benefit from the local file cache. The recommended configuration is to place SAS WORK directories on a non- CIFS file system and use CIFS to manage shared, permanent files (both SAS data files and reports/output). With the release of the Windows 2008 operating system, many improvements were made to address performance issues and connectivity via 10-Gigabit Ethernet (GbE). These greatly improve the throughput and responsiveness of the CIFS file system. Resulting from changes made both to SAS Foundation 9.3 software and Windows Server 2008 R2 operating system, CIFS is functionally stable. 10 However, workload results showed that there was relatively poor retention of data in the local file cache. Workloads that reuse data from local file cache will not perform nearly as well with CIFS when compared to a local file system. The workload configuration had three systems running the Windows Server 2008 R2 operating system; one acting as a file server and two as clients all connected via 10 GbE.

Reference Papers Shared Filesystems: Determining the Best Choice For your Distributed SAS Foundation Applications Paper SAS569-2017 A Survey of Shared Filesystems 22 Oct 2014 support.sas.com/resources/papers/proceedings13/484-2013.pdf When to use NFS with SAS (blog) blogs.sas.com/content/sgf/2015/01/07/when-to-use-nfs-with-sas/ SAS Grid Manager IO support.sas.com/resources/papers/proceedings14/1559-2014.pdf

Don't Forget to Provide Feedback! 1. Go to the Agenda icon in the conference app. 2. Find this session title and select it. 3. On the sessions page, scroll down to Surveys and select the name of the survey. 4. Complete the survey and click Finish.