Keeping Up With The Linux Kernel. Marc Dionne AFS and Kerberos Workshop Pittsburgh

Similar documents
Linux Kernel Evolution. OpenAFS. Marc Dionne Edinburgh

Linux multi-core scalability

Linux Filesystems and Storage Chris Mason Fusion-io

Red Hat Summit 2009 Rik van Riel

Disclaimer. This talk vastly over-simplifies things. See notes for full details and resources.

RCU. ò Walk through two system calls in some detail. ò Open and read. ò Too much code to cover all FS system calls. ò 3 Cases for a dentry:

VFS, Continued. Don Porter CSE 506

The Embedded Linux Problem

Virtual File System. Don Porter CSE 306

Flatpak and your distribution. Simon McVittie

Requirements for Building and Using the Kernel

An efficient method to avoid path lookup in file access auditing in IO path to improve file system IO performance

Disclaimer. This talk vastly over-simplifies things. See notes for full details and resources.

Status of the Linux NFS client

RCU. ò Dozens of supported file systems. ò Independent layer from backing storage. ò And, of course, networked file system support

Kernel driver maintenance : Upstream vs. Industry

Virtual File System. Don Porter CSE 506

SOFTWARE CONFIGURATION MANAGEMENT

API/ABI Stability and LTS: Current state and Future John McNamara, Ian Stokes, Luca Boccassi, Kevin Traynor, DPDK Summit Userspace - Dublin- 2017

To understand this, let's build a layered model from the bottom up. Layers include: device driver filesystem file

Ext3/4 file systems. Don Porter CSE 506

Filename encoding. and case-insensitive filesystems. Gabriel Krisman Bertazi

Background: I/O Concurrency

What is version control? (discuss) Who has used version control? Favorite VCS? Uses of version control (read)

kpatch Have your security and eat it too!

The kernel report. Jonathan Corbet LWN.net

OpenAFS Unix Cache Manager Performance Mark Vitale AFS and Kerberos Best Practices Workshop 20 August 2015

THE AFS NAMESPACE AND CONTAINERS

LINUX KERNEL UPDATES FOR AUTOMOTIVE: LESSONS LEARNED

TDDC88 Lab 4 Software Configuration Management

ECE 598 Advanced Operating Systems Lecture 19

Boot Camp. Dave Eckhardt Bruce Maggs

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

W4118 Operating Systems. Instructor: Junfeng Yang

How to decide Linux Kernel for Embedded Products. Tsugikazu SHIBATA NEC 20, Feb Embedded Linux Conference 2013 SAN FRANCISCO

A Survivor's Guide to Contributing to the Linux Kernel

FILE SYSTEMS, PART 2. CS124 Operating Systems Winter , Lecture 24

Overlayfs And Containers. Miklos Szeredi, Red Hat Vivek Goyal, Red Hat

Summary: Open Questions:

ò Server can crash or be disconnected ò Client can crash or be disconnected ò How to coordinate multiple clients accessing same file?

NFS. Don Porter CSE 506

Revision Control. Software Engineering SS 2007

How To Write a Linux Security Module That Makes Sense For You. Casey Schaufler February 2016

Block Device Scheduling. Don Porter CSE 506

Block Device Scheduling

Chapter 3. Revision Control

Review of the Stable Realtime Release Process

Security features for UBIFS. Richard Weinberger sigma star gmbh

Choosing Hardware and Operating Systems for MySQL. Apr 15, 2009 O'Reilly MySQL Conference and Expo Santa Clara,CA by Peter Zaitsev, Percona Inc

Version Control. Second level Third level Fourth level Fifth level. - Software Development Project. January 17, 2018

416 Distributed Systems. Distributed File Systems 2 Jan 20, 2016

Efficient. Password. management: The key to increasing IT productivity.

Kernel maintainership: an oral tradition

ò Very reliable, best-of-breed traditional file system design ò Much like the JOS file system you are building now

Connecting the EDG front-end to LLVM. Renato Golin, Evzen Muller, Jim MacArthur, Al Grant ARM Ltd.

Read-Copy Update (RCU) Don Porter CSE 506

CSE 333 Lecture 9 - storage

openqa Avoiding Disasters of Biblical Proportions Marita Werner QA Project Manager

Choosing and Tuning Linux File Systems

File System Implementation

CS 318 Principles of Operating Systems

Chapter 10: File System Implementation

LOCKDEP, AN INSIDE OUT PERSPECTIVE. Nahim El

Global Software Distribution with CernVM-FS

You Can t Move Forward Unless You Can Roll Back. By: Michael Black

CSE506: Operating Systems CSE 506: Operating Systems

File System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Are you Really Helped by Upstream Kernel Code?

SUSE Linux Enterprise Kernel Back to the Future

Lustre overview and roadmap to Exascale computing

NFS Version 4 Open Source Project. William A.(Andy) Adamson Center for Information Technology Integration University of Michigan

What a difference a year makes

Enabling User Provided Kernels in Amazon EC2

Laboratorio di Programmazione. Prof. Marco Bertini

OPERATING SYSTEM. Chapter 12: File System Implementation

Chapter 11: Implementing File

FILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24

Secureworld Conference

Operating System Concepts Ch. 11: File System Implementation

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Open Source Storage. Ric Wheeler Architect & Senior Manager April 30, 2012

Kernel Scalability. Adam Belay

Chapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition

WHITE PAPER NGINX An Open Source Platform of Choice for Enterprise Website Architectures

Topics. Operating System I. What is an Operating System? Let s Get Started! What is an Operating System? OS History.

Chapter 11: Implementing File Systems

MariaDB: Community Driven SQL Server. Kristian Nielsen MariaDB developer Monty Program AB

Who am I? I m a python developer who has been working on OpenStack since I currently work for Aptira, who do OpenStack, SDN, and orchestration

CVS. Computer Science and Engineering College of Engineering The Ohio State University. Lecture 21

Continuous Integration / Continuous Testing

416 Distributed Systems. Distributed File Systems 1: NFS Sep 18, 2018

Misc. Third Generation Batch Multiprogramming. Fourth Generation Time Sharing. Last Time Evolution of OSs

CNT 4603, Spring 2009: Introduction

Lecture 11: Linux ext3 crash recovery

Chapter 12: File System Implementation

Yaffs Tuning. Charles Manning

How technical excellence helps in LeSS adoption. Anton Bevzuk Dodo Pizza Chief Agile Officer

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

1 / 22. CS 135: File Systems. General Filesystem Design

The State of Samba (June 2011) Jeremy Allison Samba Team/Google Open Source Programs Office

Transcription:

Keeping Up With The Linux Kernel Marc Dionne AFS and Kerberos Workshop Pittsburgh - 2015

The stage Linux is widely deployed as an AFS client platform OpenAFS client available in popular distributions Ubuntu, Debian Many large AFS sites rely heavily on Linux on both servers and clients The AFS Linux clients includes a kernel module Very sensitive to kernel changes

The battle Linux perspective All useful drivers and modules are in-tree, or should be in the tree Changing the module API/ABI is not a problem in-tree code is adapted as part of the change AFS perspective Can't join the party incompatible license Must adapt on its own, can't benefit from kernel developers Can't have all the goodies - part of the API is out of reach

Since Oct 2006 42 kernel releases (2.6.19 4.1) 480 thousand commits ~20 million lines changed in 60 thousand files Estimate of > 150 AFS commits linked directly or closely to kernel changes Kernel releases with no impact on AFS: 2

Linux development process Fast New release every ~3 months No fixed schedule, released when it's ready.. but fairly consistent Fast moving Thousands of commits per release Tens of thousands of lines of code changed Big Close to 1000 developers involved in each release Heavy corporate participation

The code Linux releases are cut directly from mainline - master branch of Linus' tree 2 week merge window per cycle Followed by ~10 weeks of fixes and stabilization over 6-9 RC releases Stable releases are handled by separate maintainers, in separate trees Many active stable releases in parallel Some releases are tagged as long term

linux-next Tree for integration testing Contains code targeted for next release cycle Most, but not all subsystems Rebuilt from scratch daily expensive to follow Not all code in -next will make it to mainline in the following cycle Not all code will show up in -next before hitting mainline

How we try to keep up Continuously run kernels very close to mainline Testing system runs daily builds Follow linux-kernel, linux-fsdevel discussions and patches particular attention to vfs layer.. and other related lists Frequent builds and tests of current OpenAFS and Auristor

How we try to keep up Keep an eye out for new warnings Often a symptom of an API/ABI change Do real testing Not all changes can be detected at compile time Keep an eye on the VFS tree Occasional test of linux-next Can provide early warning

The result OpenAFS master supports most Linux kernel releases before they're released Usually early in the RC cycle Better than most other external modules Stable releases are more of a challenge There's a speed mismatch.. and getting these changes to distributions is also challenge Schedules are not in sync Many have custom patches or packaging

The fixes Some fixes are mostly mechanical Typical case : A new configure test to identify a new behaviour Conditional code (ifdefs) to do things the new way In some cases, new compatibility helpers to hide the ifdef maze Even when the fix is trivial, it may need a lot of packaging Unfortunately many changes require more analysis

The fixes Occasional kernel fixes Ex: accidental GPL breakage 1 instance in x86 code, 1 instance in workqueue code Patches accepted and merged quickly

Challenges VFS changes are sometimes merged late in the cycle But getting better Stricter merging policy late in merge cycle Many VFS changes appear in mainline with little notice Compatibility with older releases Risk of breaking support for an older kernel Impossible to test everything Use mitigating strategies for configure tests Sprawling feature tests more than any other platform make -j 16 all = 40s./configure = 95s

More challenges Keeping the code manageable and readable Keep ifdef jungle under control Distributions Have their own schedule, packaging, custom patches, bug reporting, maintainers Shrinking API Many useful debug features are off limits ex: lockdep Can't support RT kernel, Fedora rawhide, etc So far core functionality has been spared Accidental breakage

Highlights

Syscall table OpenAFS relied on modifying the syscall table to hook the setgroups call and preserve PAGs In the early 2.6 kernels, the syscall table was unexported and made read-only The new keyring feature is now used to implement PAGs internally Special PAG groups are still set for legacy reasons they are no longer used to determine PAG membership

Inode abstraction Client keeps references to disk cache files so it can quickly open them as needed Traditional reference on Unix systems was the inode number On Linux, some filesystems can't guarantee stable inode numbers Problem reports (xfs, reiserfs) led to filesystem restrictions in afsd (ext2/3) Linux 2.6.25: the API to open a file by inode number is no longer available

Inode abstraction Solution: exportfs interface Linux API to get a stable opaque file handle from the filesystem, and later use it to open the file Used by NFSD supported by all exportable filesystems Implemented progressively Minimal change in 1.4 to deal with 2.6.25; create our own inode number based handles for ext2/3 Later, call filesystems to generate handles Finally, extend method to pre-2.6.25 kernels Side benefit: any exportable filesystem can now be used

Linux 3.0, 4.0 Numbering change no major new features Impact limited to the build system, packaging Some discussion about default sysname values

Credentials Internal kernel handling of security credentials has evolved Separate structure with a pointer in the task struct RCU based change mechanism Support for new security subsystems selinux, etc. OpenAFS changes Use the new cred structure directly, instead of rolling our own Open cache files with the initial cache manager credentials resolves issues for systems with selinux and AppArmor

aklog -setpag Stopped working at some point a process was not allowed to change its parent's credentials.. but a new syscall allowed a process to set a keyring in its parent Currently works for recent kernels

BKL Big Kernel Lock - global kernel wide lock Gradually replaced by more granular locking, RCU Last bits removed in kernel 2.6.39 By that time, OpenAFS master was mostly BKL free.. but making 1.4 safe for BKL removal would have been invasive EOL for new kernel support in 1.4

RCU based path walking Major VFS change to reduce lock contention by relying on RCU where possible Requires that several VFS callbacks don't sleep But most OpenAFS callbacks take the global lock (GLOCK), and can sleep Fallback mechanism filesystems can indicate that they don't support RCU path walking VFS calls back with locks taken Significant locking changes (ex: no more dcache_lock)

RCU path walking For OpenAFS Return appropriate error codes to trigger the fallback to locking mode Rework locking Resulted in a few hard to diagnose bugs where some configure tests caused the VFS to think we supported RCU mode

IMA Activated in Fedora and Red Hat Enterprise kernels API was unbalanced, impossible to use correctly and avoid syslog warnings Missing call was GPL only All (eventually) ended well API reworked in kernel mainline Backported in time for RHEL 6 release, with customer pressure Affected Fedora reached EOL

Exportfs API OpenAFS relies on this API for two uses Tracking and opening disk cache files Exporting AFS files via the NFS translator Many revisions to this API over the past few years, some major Translator no longer supported requires GPL only symbols

getname/putname Functions were unexported, but not at the same time When auditing is enabled, behaviour is different Name only freed at syscall exit Open coded version would do the wrong thing Use of both functions replaced by internal code

File operations Many revisions to the VFS file read/write operations Different sets of operations need to be define depending on kernel version Difficult to get the correct set for all versions write op was require for core dumps Removed write op, and use generic vfs_write helper

d_alias Consolidated into a union, now d_u.d_alias Included as part of a security fix series Backported to many older releases Required many updates in distributions

Recent changes Linux 4.2: symbolic link limit (40) Every directory in path may count ELOOP for long paths that are not in the OS dentry cache Patch in gerrit may help only count mountpoints and some special case directories

As of today.. 4.1 support in 1.6.14 release 4.2 support in master branch 4.2 currently at RC7, release in upcoming weeks

Closing thoughts After a quiet period (3.11-3.15), the pace of change has picked up again Kernel devs were receptive to fixing accidental GPL breakage and other issues Even proposed fixes to OpenAFS in one case No signs that things will slow down but no sign of impending doom

Thanks!