The failure of Operating Systems,

Similar documents
OS Containers. Michal Sekletár November 06, 2016

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018

HTCondor: Virtualization (without Virtual Machines)

Improving User Accounting and Isolation with Linux Kernel Features. Brian Bockelman Condor Week 2011

Spring 2017 :: CSE 506. Introduction to. Virtual Machines. Nima Honarmand

Docker A FRAMEWORK FOR DATA INTENSIVE COMPUTING

SOFT CONTAINER TOWARDS 100% RESOURCE UTILIZATION ACCELA ZHAO, LAYNE PENG

Red Hat Summit 2009 Rik van Riel

Container Adoption for NFV Challenges & Opportunities. Sriram Natarajan, T-Labs Silicon Valley Innovation Center

Engineering Robust Server Software

Efficient Memory Management on Mobile Devices

Zdeněk Kubala Senior QA

1 Virtualization Recap

Who stole my CPU? Leonid Podolny Vineeth Remanan Pillai. Systems DigitalOcean

Memory - Paging. Copyright : University of Illinois CS 241 Staff 1

Quick Prototyping+CI with LXC and Puppet

Travis Cardwell Technical Meeting

Containers and isolation as implemented in the Linux kernel

for Kerrighed? February 1 st 2008 Kerrighed Summit, Paris Erich Focht NEC

Letting Go. John Stultz August 30th

Virtualization and Performance

Hostless Xen Deployment

Container's Anatomy. Namespaces, cgroups, and some filesystem magic 1 / 59

Sandboxing. CS-576 Systems Security Instructor: Georgios Portokalidis Spring 2018

OS Virtualization. Linux Containers (LXC)

Deflating the hype: Embedded Virtualization in 3 steps

Beyond Init: systemd

Container mechanics in Linux and rkt FOSDEM 2016

Qualifying exam: operating systems, 1/6/2014

Administrative Details. CS 140 Final Review Session. Pre-Midterm. Plan For Today. Disks + I/O. Pre-Midterm, cont.

Alternatives to Solaris Containers and ZFS for Linux on System z

MultiLanes: Providing Virtualized Storage for OS-level Virtualization on Many Cores

File systems, databases, cloud storage

Each time a file is opened, assign it one of several access patterns, and use that pattern to derive a buffer management policy.

Virtualizaton: One Size Does Not Fit All. Nedeljko Miljevic Product Manager, Automotive Solutions MontaVista Software

Distributed File Systems Issues. NFS (Network File System) AFS: Namespace. The Andrew File System (AFS) Operating Systems 11/19/2012 CSC 256/456 1

Kubernetes The Path to Cloud Native

KVM Forum Vancouver, Daniel P. Berrangé

ECE 598 Advanced Operating Systems Lecture 19

Using Linux Containers as a Virtualization Option

Libvirt presentation and perspectives. Daniel Veillard

SAINT LOUIS JAVA USER GROUP MAY 2014

Making Applications Mobile

PAGE REPLACEMENT. Operating Systems 2015 Spring by Euiseong Seo

Introduction to containers

Passthrough in QEMU/KVM on Linux

Docker Deep Dive. Daniel Klopp

Linux-CR: Transparent Application Checkpoint-Restart in Linux

[Docker] Containerization

Azure Sphere: Fitting Linux Security in 4 MiB of RAM. Ryan Fairfax Principal Software Engineering Lead Microsoft

High Performance Containers. Convergence of Hyperscale, Big Data and Big Compute

EE 660: Computer Architecture Cloud Architecture: Virtualization

STATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID

OS Security III: Sandbox and SFI

CNI, CRI, and OCI - Oh My!

More on file systems, Booting Todd Kelley CST8177 Todd Kelley 1

references Virtualization services Topics Virtualization

Introduction to Container Technology. Patrick Ladd Technical Account Manager April 13, 2016

Libvirt: a virtualization API and beyond

Outline. Cgroup hierarchies

Memory Allocation. Copyright : University of Illinois CS 241 Staff 1

Linux Containers Roadmap Red Hat Enterprise Linux 7 RC. Bhavna Sarathy Senior Technology Product Manager, Red Hat

Network stack virtualization for FreeBSD 7.0. Marko Zec

EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors. Junfeng Yang, Can Sar, Dawson Engler Stanford University

Rootless Containers with runc. Aleksa Sarai Software Engineer

Secure Sharing of an ICT Infrastructure Through Vinci

LXC(Linux Container) Lightweight virtual system mechanism Gao feng

KVM PV DEVICES.

CS2506 Quick Revision

1 / 23. CS 137: File Systems. General Filesystem Design

Filesystem Performance on FreeBSD

KVM PV DEVICES.

Red Hat Virtualization 4.1 Technical Presentation May Adapted for MSP RHUG Greg Scott

ECE 598 Advanced Operating Systems Lecture 22

Chapter 10: Case Studies. So what happens in a real operating system?

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

Outline. Cgroup hierarchies

LINUX CONTAINERS. Where Enterprise Meets Embedded Operating Environments WHEN IT MATTERS, IT RUNS ON WIND RIVER

Scheduling Mar. 19, 2018

Linux Tiny Penguin Weight Watchers. Thomas Petazzoni Free Electrons electrons.com

Virtualization. Pradipta De

Docker for HPC? Yes, Singularity! Josef Hrabal

Landlock LSM: toward unprivileged sandboxing

Real-Time KVM for the Masses Unrestricted Siemens AG All rights reserved

More on file systems, Booting Todd Kelley CST8177 Todd Kelley 1

Virtualization Overview NSRC

docker & HEP: containerization of applications for development, distribution and preservation

virtual machine block storage with the ceph distributed storage system sage weil xensummit august 28, 2012

Using ebpf for network traffic analysis. Samuele Sabella Luca Deri

Operating systems and security - Overview

Operating systems and security - Overview

Allowing Users to Run Services at the OLCF with Kubernetes

Scheduling in Xen: Present and Near Future

Project #4: Implementing NFS

Background: I/O Concurrency

Lecture 3: O/S Organization. plan: O/S organization processes isolation

Toward An Integrated Cluster File System

Virtual Memory III. Jo, Heeseung

IPv6 Neighbor Discovery (ND) Problems with Layer-2 Multicast State

Building blocks for Unix power tools

Transcription:

The failure of Operating Systems, and how we can fix it. Glauber Costa Lead Software Engineer August 30th, 2012 Linuxcon

Opening Notes I'll be doing Hypervisors vs Containers here. But: 2 2

Opening Notes I absolutely don't believe Hypervisors are useless. I will also not say Containers are better than Hypervisors (or vice-versa). That is dependent on your use-case, and the comparison between them is not the point of this talk. I am mostly talking about traditional Linux. Some of what I am presenting is already upstream, so you may get a sense of Oh, but Linux already does that. 3 3

A Brief History of Operating Systems 4 4

Introduction History books tell us that back in the day, a computer ran a single program. The white bearded guys in the audience may be able to confirm that. This is way too inefficient. 5 5

Smarter resource usage Whatever equivalent of Ingo Molnar existed at the time, came up with a scheduler. I/O is no longer wasted time. 6 6

Smarter resource usage Programs want to start at a fixed address, and want to point to its code and data at fixed locations. Meet the VM. 7 7

It has a number of limitations, though. Fire firefox (or chrome chromium) Start something else and use lots of memory. Watch the system page out. What do you think happens to your browser? Give an unprivileged user wanting CPU power a hand, He'll surely want an arm (even if running on x86). nice is not very nice for guaranteeing cpu time. Basic abstraction is a process, but services have plenty of them. 8 8

Memory eviction order Process 1 Process 2 Process 2 Process 2 LRU P1 will be penalized by P2's high memory usage. 9 9

No process schedule grouping Process 1 Process 2 50 % 50 % 10 10

No process schedule grouping Service 1 Service 1 Service 2 Service 1 75 % 25 % Dynamic adjusting those priorities is very hard. May not be fine grained enough. 11 11

Example exploit $ while true; do mkdir x; cd x; done Each 'x' will generate directory entry (dentry) Those dentries are pinned in memory. This is non-reclaimable, kernel memory! Will consume all memory before any disk quota can kick in. Unrelated processes will fail to be serviced. 12 12

Classical resource abuse $ :() { :;:& };: Fork bomb. Should be your problem, not mine. 13 13

The hypervisor solution and what it allows 14 14

Enter KVM (and others) Started by kivity.avi, Designed to make the Linux Kernel itself the HV in KVM case. Each VM is a (group of) process(es). Provides a N:1 CPU mapping; logical grouping. It also provides an fair view on memory If you don't overcommit VMs, you will never evict them. 15 15

Use cases I want to run web and mail servers, and databases. I don't want them to interfere with each other. Too much memory usage by one hurts another, forks can increase relative aggregate CPU throughput. I want to gather accurate service statistics. Preferably with standard tools like top. 16 16

Use cases I want each of them to have its own IP address. And the same standard ports 22, 25, 80, etc. This may also mean running an isolated userspace, With root in all of them, Maybe with their own init process. Compatible and certified stack (for old software). 17 17

Use cases I want to run different versions of Linux This can also be part of a certified stack. I want to run a heterogeneous datacenter Other OSes as well. 18 18

Containers 19 19

Use cases Usage Make sure processes high resource usage doesn't interfere with others. Logically map process to a single process. Provide logically grouped introspection. Provide processes with flexible view of resources, such as ports. Run different kernels. Should the OS do it? Yes Yes Yes Yes No 20 20

New additions to the Linux Kernel Network namespaces. Provides a unique IP per group of processes. Provides raw device view per-process as well. Easy packet filtering (per-device). 30 processes connecting to port 80? No problem. 21 21

New additions to the Linux Kernel Mount namespaces. Linux can chroot, but new mounts are globally visible. I may want a private mount. User namespaces. Allows more than one root user in the system. Of course, other users as well. 22 22

New additions to the Linux Kernel cgroup : logical grouping of processes. WebServer1, MailServer3, etc. Can attach controllers. I will briefly introduce two of them. 23 23

New additions to the Linux Kernel The cpu controller. The scheduler will first schedule among groups, then it will schedule inside each group. Inside each group, we can have another group. Can limit the maximum CPU time used. 24 24

New additions to the Linux Kernel The memory controller. Can limit the maximum amount of memory. Soft and Hard Limits. Hard Limits will page even if there is memory available in the system. Controlling resources used by the kernel is work in progress. Essential to prevent some exploits. 25 25

Other features Live Migration Came to be a killer feature for hypervisors. Can also be done by containers by checkpoint/restore Specialized loop device. Separate journal per-container, without heavy fs changes. inode number stability. 26 26

Containers, today. It is possible to run production containers today. Upstream Linux lacks the whole infrastructure. The OpenVZ fork provides a stable, secure, and mature Open Source container offer. You might have heard about LXC. Userspace tool, only run what Linux provides future only. 27 27

Work Status A lot of it is already in, and works all right cgroups basic infrastructure. CPU controller. Network Namespaces. Some of it in, needs improvements and the test of time User namespaces, fully unprivileged still not possible. Mount namespaces, joining still not possible. PID namespaces, joining still not possible. Block I/O controller. 28 28

Work Status To be merged Slab object accounting and fork bomb prevention. Group-aware kernel memory shrinking. Group-aware filesystem quotas. Specialized loop device (separate journal + inode# stability) Live Migration (Checkpoint/Restore) 29 29

Tooling LXC. OpenVZ tools are being patched as we speak to run with functionality already present in the Upstream Kernel. Love it or hate it, SystemD uses cgroups as one of its building blocks. The unshare command can run an arbitrary process in a separate namespace. 30 30