Before We Start... 1

Similar documents
Provisioning with SUSE Enterprise Storage. Nyers Gábor Trainer &

SUSE Linux Enterprise Kernel Back to the Future

How To Make Databases on SUSE Linux Enterprise Server Highly Available Mike Friesenegger

Managing Linux Servers Comparing SUSE Manager and ZENworks Configuration Management

Using Linux Containers as a Virtualization Option

Best practices with SUSE Linux Enterprise Server Starter System and extentions Ihno Krumreich

SUSE Linux Enterprise High Availability Extension

SUSE in the Enterprise

Essentials. Johannes Meixner. about Disaster Recovery (abbreviated DR) with Relax-and-Recover (abbreviated ReaR)

SUSE OpenStack Cloud. Enabling your SoftwareDefined Data Center. SUSE Expert Days. Nyers Gábor Trainer &

Linux and z Systems in the Datacenter Berthold Gunreben

Open Enterprise & Open Community

SaltStack and SUSE Systems and Configuration Management that Scales and is Easy to Extend

SDS Heterogeneous OS Access. Technical Strategist

Linux High Availability on IBM z Systems

Build with SUSE Studio, Deploy with SUSE Linux Enterprise Point of Service and Manage with SUSE Manager Case Study

SUSE Manager Roadmap OS Lifecycle Management from the Datacenter to the Cloud

Alternatives to Solaris Containers and ZFS for Linux on System z

Protect your server with SELinux on SUSE Linux Enterprise Server 11 SP Sander van Vugt

Docker Networking In OpenStack What you need to know now. Fawad Khaliq

SUSE An introduction...

Expert Days SUSE Enterprise Storage

Define Your Future with SUSE

Introduction to Software Defined Infrastructure SUSE Linux Enterprise 15

<Insert Picture Here> Btrfs Filesystem

Cloud in a box. Fully automated installation of SUSE Openstack Cloud 5 on Dell VRTX. Lars Everbrand. Software Developer

Novell SLES 10/Xen. Roadmap Presentation. Clyde R. Griffin Manager, Xen Virtualization Novell, Inc. cgriffin at novell.com.

Saving Real Storage with xip2fs and DCSS. Ihno Krumreich Project Manager for SLES on System z

Exploring History with Hawk

Open Source Storage. Ric Wheeler Architect & Senior Manager April 30, 2012

Unleash the Power of Ceph Across the Data Center

<Insert Picture Here> Filesystem Features and Performance

SUSE Manager in Large Scale 17220

Welcome to SUSE Expert Days 2017 Service Delivery with DevOps

A Carrier-Grade Cloud Phone System

Virtualization at Scale in SUSE Linux Enterprise Server

SUSE Manager and Salt

From GIT to a custom OS image in a few click OS image made easy

Samba and Ceph. Release the Kraken! David Disseldorp

SICOOB. The Second Largest Linux on IBM System z Implementation in the World. Thiago Sobral. Claudio Kitayama

Building a Secure and Compliant Cloud Infrastructure. Ben Goodman Principal Strategist, Identity, Compliance and Security Novell, Inc.

Linux File Systems: Challenges and Futures Ric Wheeler Red Hat

Exploring the High Availability Storage Infrastructure. Tutorial 323 Brainshare Jo De Baer Technology Specialist Novell -

Using Crowbar to Deploy Your OpenStack Cloud. Adam Spiers Vincent Untz John H Terpstra

Samba HA Cluster on SLES 9

Veritas InfoScale Enterprise for Oracle Real Application Clusters (RAC)

Three Steps Toward Zero Downtime. Guide. Solution Guide Server.

The Btrfs Filesystem. Chris Mason

Novell Infiniband and XEN

Red Hat Enterprise 7 Beta File Systems

Data Sheet: Storage Management Veritas Storage Foundation for Oracle RAC from Symantec Manageability and availability for Oracle RAC databases

The Btrfs Filesystem. Chris Mason

BOV89296 SUSE Best Practices Sharing Expertise, Experience and Knowledge. Christoph Wickert Technical Writer SUSE /

The Journey of an I/O request through the Block Layer

The CephFS Gateways Samba and NFS-Ganesha. David Disseldorp Supriti Singh

Ben Walker Data Center Group Intel Corporation

Veritas Storage Foundation for Oracle RAC from Symantec

An Introduction to GPFS

Zdeněk Kubala Senior QA

SUSE. High Performance Computing. Eduardo Diaz. Alberto Esteban. PreSales SUSE Linux Enterprise

Linux Filesystems and Storage Chris Mason Fusion-io

ext4: the next generation of the ext3 file system

High Availability for Highly Reliable Systems

Too Many Metas A high level look at building a metadata desktop. Joe Shaw

openqa making QA interesting since 2013 Ondrej Holecek /aaannz/

Veritas Storage Foundation from Symantec

CA485 Ray Walshe Google File System

SUSE Linux Enterprise 11

Btrfs Current Status and Future Prospects

Enterprise Filesystems

SUSE Linux Enterprise Server

What's New in SUSE LINUX Enterprise Server 9

Current Topics in OS Research. So, what s hot?

CS5460: Operating Systems Lecture 20: File System Reliability

Example Implementations of File Systems

VERITAS Storage Foundation 4.0 TM for Databases

Hands-on with Btrfs. Course ATT1800 Version Lecture Manual September 6,2012

SUSE. High Performance Computing. Kai Dupke. Meike Chabowski. Senior Product Manager SUSE Linux Enterprise

Secure Authentication

System Administration. Storage Systems

New HPE 3PAR StoreServ 8000 and series Optimized for Flash

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into

Filesystems in Linux. A brief overview and comparison of today's competing FSes. Please save the yelling of obscenities for Q&A.

Pavel Anni Oracle Solaris 11 Feature Map. Slide 2

Linux on zseries Journaling File Systems

Oracle Linux 5 & 6 Advanced Administration

Choosing Hardware and Operating Systems for MySQL. Apr 15, 2009 O'Reilly MySQL Conference and Expo Santa Clara,CA by Peter Zaitsev, Percona Inc

IO110: Open Enterprise Server 2. Hardware you can hit with a hammer, software you can only curse at...

Block Device Driver. Pradipta De

A GPFS Primer October 2005

CS370 Operating Systems

MODERN FILESYSTEM PERFORMANCE IN LOCAL MULTI-DISK STORAGE SPACE CONFIGURATION

Distributed File Systems II

BTREE FILE SYSTEM (BTRFS)

Understanding Oracle RAC ( ) Internals: The Cache Fusion Edition

Scaling a Highly Available Global SUSE Manager Deployment at Rackspace to Manage Multiple Linux Platforms

The Leading Parallel Cluster File System

FUT20719 SUSE Linux Enterprise Server for System z. Building enterprise IT with SUSE on IBM Mainframes

SUSE Linux Enterprise Server 11 SP4

Buffer Management for XFS in Linux. William J. Earl SGI

Transcription:

1 Before We Start...

Isn't One Type of Car Enough? Functionality Efficiency Performance Emotions Reliability? MgE Functionality Efficiency Performance Emotions C. Nocke 2

Isn't One Type of filesystem Enough? Functionality Efficiency Performance Reliability Charged with Emotions MgE C. Nocke 3

How To Choose a Filesystem Matthias G. Eckermann Senior Product Manager mge@suse.com Jeff Mahoney Kernel Developer Team Lead: File Systems and Storage jeffm@suse.com

Agenda Local Linux Filesystems Tuning on the filesystem layer Tuning on the block layer A few more words on btrfs Copy on Write and what to do with it Conversion to btrfs 5

Major Linux (local) Filesystems Feature ext 2/3 reiserfs xfs ext4 btrfs Data/Metadata Journaling / / / / CoW Journal internal/external / / / / CoW Offline extend/shrink / / / / / Online extend/shrink / / / / / Inode-Allocation-Map table u.b*-tree B+-tree table B-tree Sparse Files Tail Packing Defrag ExtAttr / ACLs / / / / / Quotas Subvol. max. Filesystemsize 16 TiB 16 TiB 8 EiB 1 EiB 16 EiB max. Filesize 2 TiB 1 EiB 8 EiB 1 EiB 16 EiB 6

Example Filesizes

Dataset #1 Documents & Mails

Filesystem Space Efficiency Dataset#1 Test parameters Partition size: 24626 MiB, SSD Data type: E-Mails and Documents 9

Filesystem Space Efficiency Dataset#1 Data typology Filesizes and Count 180000 170425 Documents & Mails up to 1k 160000 1k- 2k 2k- 4k 140000 120000 110488 4k- 8k 8k- 16k 16k- 32k 32k- 64k 100000 64k-128k 128k-256k 80000 256k-512k 512k- 1M 60000 40000 36502 26373 1M- 2M 2M- 4M 4M- 8M 8M- 16M 20000 0 5367 10152 6810 3911 3429 Count 3335 2504 1702 894 504 148 28 above 16M 10

Filesystem Space Efficiency Dataset#1 Result 12000 Space available on partition (df -m) Documents & Mails 10000 10233 8000 6000 6065 6147 7824 8088 7100 ext3 ext4 xfs reiserfs btrfs 4000 4650 btrfs (compress=lzo) btrfs (compress=zlib) 2000 0 Available 11

Dataset #2 ISOs, RPMs, various

Filesystem Space Efficiency Dataset#2 Test parameters Partition size: 303884 MiB, HDD Data types ISOs (~100 GiB) RPMs(~50 GiB) VMs (~25 GiB) Pictures and Sounds (~20 GiB) Third party software including TeXLive (~16 GiB) /usr/src (including Kernel sources) (~16 GiB) Tested on xfs and btrfs only 13

Filesystem Space Efficiency Dataset#2 Data typology Filesizes and Count ISOs, RPMs, various 120000 107058 up to 1k 100000 1k- 2k 2k- 4k 4k- 8k 80000 8k- 16k 16k- 32k 32k- 64k 60000 64k-128k 128k-256k 256k-512k 40000 37626 35290 34460 29339 26503 24673 512k- 1M 1M- 2M 2M- 4M 4M- 8M 20000 11484 12455 10317 6791 5721 3532 2770 3657 1754 8M- 16M above 16M 0 Count 14

Filesystem Space Efficiency Dataset#2 Result Space available on partition (df -m) 70000 60000 62668 64815 ISOs, RPMs, various 50000 40000 30000 ext3 ext4 xfs reiserfs btrfs btrfs (compress=lzo) btrfs (compress=zlib) 20000 10000 0 Available 15

Result?

Linux (Local) Filesystems

Major Linux (local) Filesystems Feature ext 2/3 reiserfs xfs ext4 btrfs Data/Metadata Journaling / / / / CoW Journal internal/external / / / / CoW Offline extend/shrink / / / / / Online extend/shrink / / / / / Inode-Allocation-Map table u.b*-tree B+-tree table B-tree Sparse Files Tail Packing Defrag ExtAttr / ACLs / / / / / Quotas Subvol. max. Filesystemsize 16 TiB 16 TiB 8 EiB 1 EiB 16 EiB max. Filesize 2 TiB 1 EiB 8 EiB 1 EiB 16 EiB 18

Picking a File System Pick the right file system for the task Indexed metadata File sizes Number of files Workloads (database, mail server,...) AccessPaths Dump/Restore 19

File Systems: ReiserFS Applications that use many small files Mail servers NFS servers Database servers or other applications that use synchronous I/O Mature file system in maintenance mode. 20

File Systems: Ext-Family ext3 is default file system in SUSE Linux Enterprise 11 ext4 is default in other distributions Primary disadvantage Limitations in scaling by size Recommended sizes: ext3 < 01 TiB ext4 < 16 TiB 21

File Systems: XFS Best suited for Medium (>100GiB) to very large file systems (> 100 TiB) Large files/many files Streaming multimedia (low latencies) Special features and capabilities dump/restore online filesystem-check online-defragmentation 22

File Systems: XFS Advantages Maturity comes from IRIX ported to Linux > 10 years ago Track record for Performance Scalability Stability Active Development community Checksums Self-identifying metadata 23

Filesystems: btrfs Features Concepts Copy on Write B-Tree Extents Subvolume Snapshots Metadata Raw data 24

Filesystems: btrfs Copy on Write Disadvantages Performance impact on specific workloads, such as storing VMs Advantages Efficient Storage Deduplication Snapshots Integrity beyond Journalling 25

Filesystems: btrfs Maturity Mature / Supported Copy on Write Snapshots Subvolumes Metadata Integrity Data Integrity Online metadata scrubbing Manual Defragmentation Manual Deduplication Quota Groups Not (yet) mature Inode Cache Auto Defrag RAID Compression Send / Receive Hot add / remove Seeding devices Multiple Devices Big Metadata 26

Cluster File System: OCFS2 OCFS2 (Oracle Cluster File System) Shared access by multiple nodes Ensures data integrity in case of a node-failure Scale-out for data access Generic use POSIX-compliant Cluster-aware POSIX locking Higher throughput Parallel I/O Disaster Tolerance Integration with data replication for dual node 27

Tuning On The filesystem Layer

Dedicated Logging Devices ReiserFS mkreiserfs -j /dev/xxx -s 8192 /dev/xxy reiserfstune journal-new-device /dev/xxx -s 8192 Ext3/4 mke2fs -O journal_dev /dev/xxx mke2fs -j -J device=/dev/xxx,size=8192 /dev/xxy tune2fs -J device=/dev/xxx,size=8192 /dev/xxy XFS mkfs.xfs -l logdev=/dev/xxx,size=10000b /dev/xxy 29

File System Tuning Split file systems based on data access patterns Keep commit heavy data away from data that does not have to be synchronous Keep streaming writes and reads on different spindles than random I/O Consider disabling atime updates on files and directories # mount -o noatime,nodiratime 30

Tuning On The Block Layer

I/O Scheduler Flexible, pluggable I/O scheduler Selectable via boot parameter elevator=<x> noop deadline as (default in mainline kernels) cfq (default in SUSE Linux Enterprise) I/O Scheduler per device Check /sys/block/*dev*/queue/iosched Set echo SCHEDNAME > /sys/block/*dev*/queue/scheduler 32

I/O Scheduler: Noop No reordering, just merging Best for storage with extensive caching and scheduling of its own, such as: MultiPathing Virtual Machines on SSDs Activated by boot parameter elevator=noop 33

I/O Scheduler: Deadline Per-request service deadline Caps maximum latency per request Maintains good disk throughput Best for disk-intensive database applications Activated by boot parameter elevator=deadline 34

I/O Scheduler: CFQ Complete Fair Queuing Treat all competing processes equally by keeping a unique request queue for each and giving equal bandwidth to each queue Good compromise between throughput and latency Minimal worst case latency on all reads and writes Suitable for a wide variety of applications Default in SUSE Linux Enterprise Activated by boot parameter elevator=cfq 35

Block Layer Tuning Spreading the load across controllers Per-target locking for SCSI Software RAID bandwidth Battery backed caching 36

Blocker Layer Tunables Block read ahead buffer /sys/block/<sdx/hdx>/queue/read_ahead_kb Default is 128. Increase to 512 for fast storage (SCSI disks or RAID) May speed up streaming reads a lot Number of requests /sys/block/<sdx/hdx>/queue/nr_requests Default is 128. Increase to 256 with CFQ scheduler for fast storage Increases throughput at minor latency expense 37

Linux (Local) Filesystems

Major Linux (local) Filesystems Feature ext 2/3 reiserfs xfs ext4 btrfs Data/Metadata Journaling / / / / CoW Journal internal/external / / / / CoW Offline extend/shrink / / / / / Online extend/shrink / / / / / Inode-Allocation-Map table u.b*-tree B+-tree table B-tree Sparse Files Tail Packing Defrag ExtAttr / ACLs / / / / / Quotas Subvol. max. Filesystemsize 16 TiB 16 TiB 8 EiB 1 EiB 16 EiB max. Filesize 2 TiB 1 EiB 8 EiB 1 EiB 16 EiB 39

Filesystem Recommendations Yes New Filesystem? No OS Purpose? Data xfs Type? reiserfs ext2/3/4 Yes Snapshots? Snapshots? Yes No No xfs ext3 4 Convert btrfs 40

Copy On Write And What To Do With It

Using snapper To Manage Operating System Activities

What Is Snapper? Tool to manage btrfs snapshots Functions: create, modify, delete status (=compare), diff undochange cleanup Integration with Package management stack (e.g. yum, zypper) Systems management stack (SUSE: YaST) DBUS service 43

44 http://www.snapper.io/

Snapshotting On The Desktop /home/$user

Using Snapper For User Data Requirements /home/$user is a btrfs subvolume snapper with DBUS interface Snapper configuration per user Additional options Automated snapshotting on login/logout Requires pam-snapper Automated snapshotting on Suspend 46

Server Side Snapshots Btrfs as a Samba Backend Server Side Copy

Traditional File Copy Read Write File data takes disk and network round-trips Duplicate data stored on disk Read Write 48

Server-Side Copy Copy Server-Side Network round-trip avoided Server copies file data locally Duplicate data stored on disk Read Write 49

Btrfs Enhanced Server-Side Copy Copy Server-Side Data avoids network and disk round-trips No duplication of file data Clone Range 50

Server Side Snapshots Btrfs as a Samba Backend Recovery Point

Samba4 and btrfs, snapper Prototype Samba Implementation of Recovery Point Automatic snapshots by Snapper Previous versions of test.txt in Explorer File test.txt is changed Samba4 service File share Automated snapshots File test.txt is created Now Linux + Samba 4 Network share Windows 7, Vista or XP 52

Other Features Future

Conversion to btrfs btrfs-convert offline in-place migration from ext2/3/4 and reiserfs Keeps metadata of the old filesystem for a roll-back demonstration: convert reiserfs to btrfs 54

Continuously Running Systems Snapshot / Rollback for full system Based on btrfs Snapper Bootloader integration Booting directly from a btrfs snapshot Jump back to a former status of the OS, including kernel / initrd 55

Btrfs Planned Features Data de-duplication: De-duplication during writes Manual De-duplication Tiered storage e.g.: combine SSD and HDD Upstream since 2013-09-12 56

Summary

Filesystem Recommendations Yes New Filesystem? No OS Purpose? Data xfs Type? reiserfs ext2/3/4 Yes Snapshots? Snapshots? Yes No No xfs ext3 4 Convert btrfs 58

Go ahead, try xfs, btrfs, and ocfs2 today! Start Tuning! Your questions!? Thank you. 59

Corporate Headquarters Maxfeldstrasse 5 90409 Nuremberg Germany +49 911 740 53 0 (Worldwide) www.suse.com Join us on: www.opensuse.org 60

Unpublished Work of SUSE. All Rights Reserved. This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE. Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability. General Disclaimer This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.