Lessons learned implementing a multithreaded SMB2 server in OneFS

Similar documents
Multiprotocol Locking and Lock Failover in OneFS Aravind Srinivasan EMC, Isilon Storage Division

IBM Spectrum NAS. Easy-to-manage software-defined file storage for the enterprise. Overview. Highlights

SYMMETRIC STORAGE VIRTUALISATION IN THE NETWORK

Distributed Filesystem

High Performance Solid State Storage Under Linux

Video Surveillance EMC Storage with LENSEC Perspective VMS

Parallel Processing SIMD, Vector and GPU s cont.

Filesystem Performance on FreeBSD

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

Surveillance Dell EMC Storage with IndigoVision Control Center

NFS in Userspace: Goals and Challenges

Google File System. By Dinesh Amatya

Milestone Solution Partner IT Infrastructure Components Certification Report

Portland State University ECE 588/688. Graphics Processors

The Google File System

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

BUYING SERVER HARDWARE FOR A SCALABLE VIRTUAL INFRASTRUCTURE

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

The Google File System

Hardware & System Requirements

Surveillance Dell EMC Storage with Aimetis Symphony

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files


PAC094 Performance Tips for New Features in Workstation 5. Anne Holler Irfan Ahmad Aravind Pavuluri

Benchmark Performance Results for Pervasive PSQL v11. A Pervasive PSQL White Paper September 2010

A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing

Scalable Shared Databases for SQL Server 2005

Google File System. Arun Sundaram Operating Systems

NUMA replicated pagecache for Linux

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

The Google File System

The Google File System

Surveillance Dell EMC Storage with Milestone XProtect Corporate

vsan 6.6 Performance Improvements First Published On: Last Updated On:

Dynamic Fine Grain Scheduling of Pipeline Parallelism. Presented by: Ram Manohar Oruganti and Michael TeWinkle

Two hours. No special instructions. UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date. Time

Video Surveillance EMC Storage with LenSec Perspective VMS

Surveillance Dell EMC Storage with ISS SecurOS

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

High-performance message striping over reliable transport protocols

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006

OS-caused Long JVM Pauses - Deep Dive and Solutions

Samba in a cross protocol environment

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

Linux multi-core scalability

Isilon Performance. Name

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

COMP520-12C Final Report. NomadFS A block migrating distributed file system

Isilon OneFS and IsilonSD Edge. Technical Specifications Guide

Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters

Analysis-driven Engineering of Comparison-based Sorting Algorithms on GPUs

Surveillance Dell EMC Storage with Infinova 2217 Security Management System

Benchmarking computers for seismic processing and imaging

Background: I/O Concurrency

MultiLanes: Providing Virtualized Storage for OS-level Virtualization on Many Cores

Surveillance Dell EMC Storage with Verint Nextiva

CSE 124: Networked Services Lecture-16

Red Hat Gluster Storage performance. Manoj Pillai and Ben England Performance Engineering June 25, 2015

The Google File System

Surveillance Dell EMC Storage with S-1 Video Management Software (SVMS)

ECE519 Advanced Operating Systems

ECE 574 Cluster Computing Lecture 15

Experience the GRID Today with Oracle9i RAC

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Data storage on Triton: an introduction

CSE 124: Networked Services Fall 2009 Lecture-19

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

The Google File System (GFS)

Falcon: Scaling IO Performance in Multi-SSD Volumes. The George Washington University

Multiprocessor Systems. COMP s1

SMB 2.1 & SMB 3 Protocol features, Status, Architecture, Implementation. Gordon Ross Nexenta Systems, Inc.

GFS: The Google File System

CSE 124: Networked Services Lecture-17

White Paper. Major Performance Tuning Considerations for Weblogic Server

Choosing Hardware and Operating Systems for MySQL. Apr 15, 2009 O'Reilly MySQL Conference and Expo Santa Clara,CA by Peter Zaitsev, Percona Inc

Distributed System. Gang Wu. Spring,2018

CA485 Ray Walshe Google File System

Recovering Disk Storage Metrics from low level Trace events

pnfs support for ONTAP Unstriped file systems (WIP) Pranoop Erasani Connectathon Feb 22, 2010

COMMON INTERNET FILE SYSTEM PROXY

Dell Reference Configuration for Large Oracle Database Deployments on Dell EqualLogic Storage

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Lessons learned from Lustre file system operation

CLOUD-SCALE FILE SYSTEMS

Flash-Conscious Cache Population for Enterprise Database Workloads

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

Condusiv s V-locity VM Accelerates Exchange 2010 over 60% on Virtual Machines without Additional Hardware

Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation. What s in a process? Organizing a Process

An Overview of Fujitsu s Lustre Based File System

Surveillance Dell EMC Storage with Milestone XProtect Corporate

Dell EMC CIFS-ECS Tool

Readings. Storage Hierarchy III: I/O System. I/O (Disk) Performance. I/O Device Characteristics. often boring, but still quite important

HPMMAP: Lightweight Memory Management for Commodity Operating Systems. University of Pittsburgh

Multiprocessor Systems. Chapter 8, 8.1

SONAS Best Practices and options for CIFS Scalability

Background: Operating Systems

Transcription:

Lessons learned implementing a multithreaded SMB2 server in OneFS Aravind Velamur Srinivasan Isilon Systems, Inc 2011 Storage Developer Conference. Insert Your Company Name. All Rights Reserved.

Outline Overview OneFS Overview Multi-threaded SMB Server Overview SMB2 Credit Algorithm SMB2.1 Leases Performance optimizations SMB2 vs SMB1 in OneFS 2

Overview SMB2 protocol has its own advantages over SMB1 Less Chatty than SMB1 Inherent performance improvements An SMB2 server poses some challenges to the implementation: Credit Algorithm New Lease types in SMB2.1 Performance bottlenecks and optimizations in a multithreaded SMB server This talk will examine these challenges and lessons learned in OneFS 3

OneFS Overview 4

Isilon OneFS Cluster NAS file server Scalable Add more storage in 5 mins Reliable 8x mirror / +4 parity Striped across nodes Single volume file system 3 to 144 nodes Fully symmetric peers No metadata servers Commodity hardware CPU, Mem, Disks 5

Isilon OneFS File System Concurrent access to all files with all protocols SMB1/SMB2 NFSv3/NFSv4 SSH HTTP/FTP 6

OneFS Summary OneFS is Isilon's sixth-generation operating system that provides the intelligence behind all Isilon scaleout storage systems. It combines the three layers of traditional storage architectures file system, volume manager and RAID into one unified software layer, creating a single intelligent file system that spans all nodes within a cluster. 7

OneFS Summary OneFS works exclusively with the Isilon scale-out storage system, referred to as a cluster. Isilon's OneFS enables: Independent or linear scalability of performance and capacity A single point of management for large and rapidly growing repositories of data Mission-critical reliability and high availability with state-of-the-art data protection 8

Multithreaded SMB Server 9

LWIOD Architecture Single Process Multi-threaded New & Persistent Connections LWIOD Protocol T T T VFS T T T Blocking Syscalls Kernel 10

Multithreaded SMB Server Pros Pipelined Network I/O written in parallel Parallel syscalls: network I/O not blocked by file system Idle connections consume very little resources New connections limited by same threadpool as all other operations Steven s talk at 2010 SDC: http://www.snia.org/sites/default/files2/sdc_archives/2010_presentations/monday/stevendanne man_comparison_between_samba-rev.pdf 11

SMB2 Credit Algorithm 12

SMB2 Credit Algorithm SMB2 offers a server-side credit mechanism to throttle greedy clients Server Credit Algorithm, if not implemented properly, can result in weird behavior from the clients 13

SMB2 Credit Algorithm SMB2 Credits allows the client to send requests in parallel Allows the client to build a pipeline of requests instead of waiting for a response before sending the next request. Especially relevant when using a high latency network. 14

SMB2 Credit Algorithm Pitfalls The server should gradually scale the credits starting from a low value. If the server grants all available credits at once to the client, the client sometimes uses those credits just for one operation without performing any other operation 15

SMB2 Credit Algorithm Pitfalls Example A server which grants all available credits to the client at once (leaving just 1 credit) Try to copy a huge file (around 1 GB) from a Windows 7 client. 16

SMB2 Credit Algorithm Pitfalls Example (Contd) Try to create a file using the explorer window, while the copy is in progress The explorer window just hangs until the Write operation completes successfully The client uses all the credits just for the Write operation without using it for other things 17

SMB2 Credit Algorithm Lessons Learned Server side credit algorithm can affect the client behavior in some scenarios Match windows server behavior as much as possible Ramp up the credits starting from a smaller value up to a certain limit Can be further improved by using statistics of the existing connection as well as the load on the server 18

SMB2 Credit Algorithm Windows Behavior Windows servers gradually ramp up the credits starting from a value of 16, allowing the client to accumulate up to 128 outstanding credits Never grants more than 33 credits per round (using a 1GE link). 19

SMB2.1 Leases in OneFS 20

SMB2.1 Leases in OneFS SMB2.1 Leases provide more fine grained and flexible caching for clients and allow upgrades in addition to the oplock breaks we have in the legacy oplocks. Extending the oplock system to support leases is straightforward for most of the scenarios with a few exceptions 21

SMB2.1 Leases Lessons Learned The unique key used to identify the lock context has to be changed to accommodate the 128-bit client key generated for leases The interaction between the legacy oplocks and the new lease types are not completely straightforward 22

SMB2.1 Leases Interaction between Read, RH and Level2 Oplocks The interaction between Read lease, RH lease and Level2oplock is complicated Although Level2 and Read types look similar, they have a different relationship with RH lease Multiple Read and RH oplocks can coexist on the same stream Multiple Read and Level2 oplocks can coexist but Level2 and RH cannot coexist 23

SMB2.1 Leases Interaction between Read, RH and Level2 Oplocks 24

SMB2.1 Leases Interaction between Read, RH and Level2 Oplocks 25

SMB2.1 Leases Interaction between Read, RH and Level2 Oplocks 26

SMB2.1 Leases Interaction between Read, RH and Level2 Oplocks 27

SMB2.1 Leases Level2 and RH If we already have a RH oplock and a request for a Level2 oplock comes in for the same stream, the request is not granted. On the other hand, if we already have a Level2 oplock and a request for a RH oplock comes in for the same stream, a Read oplock is granted instead of RH Verified using SMB2-LEASE torture test against Windows server 28

Multithreaded SMB Server Improvements 29

Multithreaded Server Bottlenecks Thread synchronization Thread context-switch overhead 30

Thread Synchronization Pitfalls Lock Order Reversal Especially between Connection and Session objects Starvation due to excess synchronization Example locking the connection socket unnecessarily 31

Thread Context Switch The overhead of a context switch between threads is extremely high Internal testing revealed that a context switch can take upto 20us Try to minimize context switching as much as possible by performing direct execution of work instead of handing off to another thread 32

SMB2 vs SMB1 in OneFS 33

SMB2 vs SMB1 SMB2 offers inherent performance benefits over SMB1 We ran IOZone to measure the performance of SMB1 vs SMB2 in OneFS. 34

IOZone IOZone is a filesystem benchmark tool. The benchmark generates and measures a variety of file operations. The important metrics for IOZone are the single and multi-client numbers for the following operations: write random-read (rread) random-write (rwrite) read 35

SMB2 vs SMB1 - IOZone Each SKU in the result table has two entries: one for a single thread, and one for peak numbers. Each entry specifies the number of threads used to drive the specified load. For Single lines, these will always be 1. For Peak lines the number of threads will vary depending on how many threads it took to drive that particular test to peak. IOZone block size used for these tests were 512 KB and the results are in MBps 36

IOZone Test Results SMB1 vs SMB2 SMB1 Results Nodes Disk S/P Write/ Threads Rread/ Threads Rwrite/ Threads Read/T 3 24 Single 141.56/1 17.70/1 108.16/1 195.66/1 3 24 1:1 402.91/3 43.95/3 213.80/3 551.95/3 3 24 Peak 893.46/57 245.10/57 730.74/57 1312.91/21 SMB2 Results Nodes Disk S/P Write/ Threads Rread/ Threads Rwrite/ Threads Read/T 3 24 Single 227.13/1 27.80/1 134.06/1 313.90/1 3 24 1:1 640.08/3 57.97/3 266.22/3 917.12/3 3 24 Peak 1051.75/30 256.50/30 791.93/51 1513.79/12 Note: The results are not official numbers but rather experimental numbers to show the relative performance benefit of SMB2 over SMB1 37

SMB2 vs SMB1 - Conclusion SMB2 offers significant performance benefits over SMB1 SMB2 clients lot better than SMB1 clients Our test results further highlight the fact that SMB2 performs better than SMB1, independent of the underlying operating system. 38

Questions? Contact Aravind Srinivasan asrinivasan@isilon.com 39