1
The Turtle And The Hare A Tale of Two Kernels Rik van Riel Senior Software Engineer, Red Hat September 3, 2009 2
The Turtle And The Hare A Tale of Two Kernels Rik van Riel Senior Software Engineer, Red Hat September 3, 2009 BUT WAIT, THERE'S MORE... 3
The Turtle And The Hare A Tale of Three Kernels Rik van Riel Senior Software Engineer, Red Hat September 3, 2009 4
A Tale of Two Kernels Comparing the RHEL and upstream kernels How fast is fast? How slow is slow? The flow of the code What can be included in RHEL updates? Backporting Upstream features that could be in RHEL 6 How does the Fedora kernel fit in? Conclusions 5
RHEL kernel vs upstream kernel Upstream kernel Goal: allow development to move forward, so Linux can be everything to everyone Fast moving, cutting edge RHEL kernel Goal: provide a stable base for people to run their applications on Slow moving Users want no changes, except for..., so compromises need to be made 6
How fast is fast? How slow is slow? Kernel changesets over the past year or so September 2009: RHEL 5.4 972 changesets June 2009: 2.6.30 13008 changesets March 2009: 2.6.29 12526 changesets January 2009: RHEL 5.3 986 changesets December 2008: 2.6.28 9784 changesets October 2008: 2.6.27 11371 changesets July 2008: 2.6.26 10487 changesets May 2008: RHEL 5.2 April 2008: 2.6.25 12707 changesets 7
Regressions Upstream kernel Typically 30-70 regressions found from one kernel version to the next Typically more than half (the serious and easy to fix ones) get fixed before the next release Do not stand in the way of progress RHEL kernel Typically 5-10 kernel regressions are found in QA between releases All are release blockers Better revert a patch than allow a regression 8
Numbers recap In a similar amount of time: RHEL kernel: less than 1000 changesets 5-10 regressions, all release blockers Upstream kernel: 25000-30000 changesets 2-3 releases with 30-70 regressions per release, more than half of which get fixed before the next release 2.6.31-rc5 has 76 regressions, 28 unsolved as of Aug 10 2.6.30 still has 39 unresolved regressions 2.6.31 will probably have some regressions from 2.6.30, 2.6.29 and even 2.6.28 9
The flow of the code Selected changes are backported Red Hat has an upstream first policy 2.6.0 2.6.9 2.6.18 2.6.30 Upstream RHEL 5 RHEL 4 10
What can be included in RHEL updates Security updates Bug fixes New hardware support More during the early lifetime of a RHEL distribution, much less later on Select new features 11
What can not go in RHEL updates Big risky changes to the code Cannot afford any regressions in RHEL updates Changes that break the kernel ABI (kabi) Would break compatibility with 3 rd party and customer built kernel modules Features that are not upstream Would not want a feature in RHEL 4 that does not exist in RHEL 5 and will not exist in RHEL 6 12
Upstream first Features must be upstream before they are accepted into RHEL Feature is automatically included in future RHEL versions (not necessarily the same implementation) The feature has a maintainer Upstream developers have also reviewed the code Millions of upstream users are a good QA crowd The entire Linux community benefits from the feature 13
What is backporting? Backporting is: Taking code from a newer kernel (upstream) and making it work on an older kernel (RHEL) To minimize risk, backport the minimal amount of code necessary To facilitate backporting of future bugfixes to the code, keep the code as similar as possible to the upstream version 14
The pitfalls of backporting Backporting is not without its challenges The desired code may depend on infrastructure that is not present in the older target kernel The kabi cannot change; required changes to data structures often cannot be made and need to be worked around The desired code may depend on new behavior in existing other code Virtualization: must remain compatible with hosts and guests of a different version 15
Examples of backported features Lots of new hardware support New drivers mostly backported wholesale, to get the same code that was tested upstream Ext4 and ecryptfs filesystems KVM virtualization Virtualization IOMMU (VT-d) support Nested page table support zseries SHARED_KERNEL support Generic Receive Offload network support 16
Upstream features that may be in RHEL 6 Things that could not and should not be backported CFS CPU scheduler Tickless kernel Lockless page cache Split LRU VM Resource control (cgroups) 17
CFS CPU scheduler CFS: Completely Fair Scheduler Calculates priority purely by CPU use, instead of using heuristics like the O(1) scheduler Fewer corner cases, more deterministic Merged upstream as part of the realtime effort Comes with better SMP and NUMA balancing code Slightly better SMP scalability and throughput Implemented by Ingo Molnar, Peter Zijlstra, Thomas Gleixner 18
Tickless kernel Get rid of periodic timer interrupt Instead, dynamically program system timer to go off at the next scheduled event No timer tick while idle Good for power saving Even better for virtualization Timer can go off at exactly the right time Nanosleep() wakeups more precisely timed Micro accounting of CPU time used by processes Implemented by Ingo Molnar, Thomas Gleixner 19
Lockless page cache Multiple CPUs can look up pages from the same file simultaneously Simultaneous access has always been possible, but lookups were serialized Especially useful for glibc, database shared memory segments, etc... Also speeds up single threaded performance Implemented by: Nick Piggin 20
Split LRU VM Split filesystem backed, memory/swap backed and mlocked pages onto separate LRU lists Better targetted pageout scanning Different eviction policies for file backed and memory/swap backed pages Pageout code scales to larger amounts of memory Implemented by Rik van Riel, Lee Schermerhorn, Kosaki Motohiro 21
Resource control (cgroups) Kernel infrastructure to Group processes Control resource use of each group CPU Memory IO (disk and network) Can be used with workload manager Large upstream effort with involvement by many 22
How does the Fedora kernel fit in? Fedora release kernels are for using Fedora Rawhide kernels are for testing... but maybe not the same definitions of using and testing that enterprise users are used to Fedora release kernels and Fedora Rawhide kernels both follow different upstream kernel branches 23
Upstream stable kernel forks Periodically, Linus releases a kernel Released kernels still have bugs, so others maintain a stable kernel tree 2.6.28 2.6.29 2.6.30 2.6.31-rc Fedora Rawhide 2.6.30.1 Fedora Release 2.6.29.1 2.6.29.2 24
Fedora release kernels Kernels for released distributions are moderately cutting edge, primarily meant for using Fedora releases follow upstream stable kernel branches Fedora releases periodically rebase to a newer upstream stable kernel branch The Fedora distribution often uses cutting edge kernel features Fedora 11 and 12 are good platforms to play with features that will be in RHEL 6 25
The Fedora Rawhide kernel The Rawhide (Fedora development) kernel tracks the latest changes applied to the Linux kernel development branch Beyond cutting edge. Bleeding edge? A good testing ground for upstream development: Rawhide users help discover bugs in the upstream development kernel Often patches are included in the Rawhide kernel before going upstream, as a test / validation platform 26
Conclusions RHEL kernel moves much slower than upstream or Fedora, also much lower risk RHEL kernel gets some upstream improvements as backports RHEL users will have to wait until the next major RHEL release for the other features Upstream first policy ensures that all RHEL 5 features will be there in RHEL 6 Fedora is a good development & test lab, providing testing for Red Hat and upstream 27
28