Is it safe to run applications in Linux Containers?

Similar documents

containerization: more than the new virtualization

Container Isolation at Scale (... and introducing gvisor) Dawn Chen and Zhengyu He

The failure of Operating Systems,

Linux Containers Roadmap Red Hat Enterprise Linux 7 RC. Bhavna Sarathy Senior Technology Product Manager, Red Hat

Container's Anatomy. Namespaces, cgroups, and some filesystem magic 1 / 59

Docker A FRAMEWORK FOR DATA INTENSIVE COMPUTING

Travis Cardwell Technical Meeting

Docker and Security. September 28, 2017 VASCAN Michael Irwin

Practical Techniques to Obviate Setuid-to-Root Binaries

ViryaOS RFC: Secure Containers for Embedded and IoT. A proposal for a new Xen Project sub-project

Engineering Robust Server Software

Landlock LSM: toward unprivileged sandboxing

CONTAINER AND MICROSERVICE SECURITY ADRIAN MOUAT

Portable, lightweight, & interoperable Docker containers across Red Hat solutions

Namespaces and Capabilities Overview and Recent Developments

1 Virtualization Recap

Container Security. Daniel J Walsh Consulting Engineer Blog: danwalsh.livejournal.com

Understanding user namespaces

Container mechanics in Linux and rkt FOSDEM 2016

Securing Containers on the High Seas. Jack OWASP Belgium September 2018

THE ROUTE TO ROOTLESS

Distribution Kernel Security Hardening with ftrace

Last time. Security Policies and Models. Trusted Operating System Design. Bell La-Padula and Biba Security Models Information Flow Control

Multi-tenancy Virtualization Challenges & Solutions. Daniel J Walsh Mr SELinux, Red Hat Date

SAINT LOUIS JAVA USER GROUP MAY 2014

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

Sandboxing. CS-576 Systems Security Instructor: Georgios Portokalidis Spring 2018

Azure Sphere: Fitting Linux Security in 4 MiB of RAM. Ryan Fairfax Principal Software Engineering Lead Microsoft

Sandboxing. (1) Motivation. (2) Sandboxing Approaches. (3) Chroot

Rootless Containers with runc. Aleksa Sarai Software Engineer

CS61 Scribe Notes Date: Topic: Fork, Advanced Virtual Memory. Scribes: Mitchel Cole Emily Lawton Jefferson Lee Wentao Xu

Secure Containers with EPT Isolation

Containers and isolation as implemented in the Linux kernel

File access-control per container with Landlock

OS Containers. Michal Sekletár November 06, 2016

Secure and Simple Sandboxing in SELinux

Hacking and Hardening Kubernetes

Container Adoption for NFV Challenges & Opportunities. Sriram Natarajan, T-Labs Silicon Valley Innovation Center

Who am I? I m a python developer who has been working on OpenStack since I currently work for Aptira, who do OpenStack, SDN, and orchestration

Singularity: Containers for High-Performance Computing. Grigory Shamov Nov 21, 2017

Intro to Container Security. Thomas Cameron Global Cloud Strategy Evangelist Mrunal Patel Principal Software Engineer

ISLET: Jon Schipp, AIDE jonschipp.com. An Attempt to Improve Linux-based Software Training

CSE 565 Computer Security Fall 2018

STATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID

The Case for Security Enhanced (SE) Android. Stephen Smalley Trusted Systems Research National Security Agency

Introduction to Container Technology. Patrick Ladd Technical Account Manager April 13, 2016

Container Security. Marc Skinner Principal Solutions Architect

OS Security III: Sandbox and SFI

Docker Security. Mika Vatanen

LXC(Linux Container) Lightweight virtual system mechanism Gao feng

Infrastructure Security 2.0

Security Namespace: Making Linux Security Frameworks Available to Containers

Data Security and Privacy. Unix Discretionary Access Control

Software containers are likely to become a very important tool over the

So on the survey, someone mentioned they wanted to work on heaps, and someone else mentioned they wanted to work on balanced binary search trees.

Building a Smart Segmentation Strategy

Stop Cyber Threats With Adaptive Micro-Segmentation. Jeff Francis Regional Systems Engineer

Introduction to UNIX/LINUX Security. Hu Weiwei

BACKING UP LINUX AND OTHER UNIX(- LIKE) SYSTEMS

Mitigating Docker Security Issues 1 Robail Yasrab

ctio Computer Hygiene /R S E R ich

Running Network Services under User-Mode

Linux Kernel Security Overview

IBM Research Report. Docker and Container Security White Paper

[Docker] Containerization

docker & HEP: containerization of applications for development, distribution and preservation

General Pr0ken File System

The Wonderful World of Services VINCE

Security principles Host security

SELinux type label enforcement

P1_L3 Operating Systems Security Page 1

TEN LAYERS OF CONTAINER SECURITY

The State of Rootless Containers

This material is based on work supported by the National Science Foundation under Grant No

[This link is no longer available because the program has changed.] II. Security Overview

Keeping Up With The Linux Kernel. Marc Dionne AFS and Kerberos Workshop Pittsburgh

OS Virtualization. Linux Containers (LXC)

Virtualizaton: One Size Does Not Fit All. Nedeljko Miljevic Product Manager, Automotive Solutions MontaVista Software

Security Assurance Requirements for Linux Application Container Deployments

What's new in the Linux kernel

Linux in the connected car platform

How to Restrict a Login Shell Using Linux Namespaces

Murray Goldschmidt. Chief Operating Officer Sense of Security Pty Ltd. Micro Services, Containers and Serverless PaaS Web Apps? How safe are you?

Solved How To Manually Remove Old Kernels From Ubuntu 12.04

Linux Kernel Security Update LinuxCon Europe Berlin, 2016

shortcut Tap into learning NOW! Visit for a complete list of Short Cuts. Your Short Cut to Knowledge

Linux Kernel Evolution. OpenAFS. Marc Dionne Edinburgh

Think Small to Scale Big

TEN LAYERS OF CONTAINER SECURITY

Creating joints for the NovodeX MAX exporter

Do Containers Enhance Application Level Security? Benjy Portnoy, CISA, CISSP

Overlayfs And Containers. Miklos Szeredi, Red Hat Vivek Goyal, Red Hat

Welcome to Rootkit Country

Flatpak a technical walk-through. Alexander Larsson, Red Hat

CS61 Scribe Notes Lecture 18 11/6/14 Fork, Advanced Virtual Memory

CIS c. University of Pennsylvania Zachary Goldberg. Notes

Internet infrastructure

Transcription:

Is it safe to run applications in Linux Containers? Jérôme Petazzoni @jpetazzo Docker Inc. @docker

Is it safe to run applications in Linux Containers? And, can Docker do anything about it?

Question: Is it safe to run applications in Linux Containers?

...

Yes

/* * * * * * * * * * * * * * * * * * * * * * * * * * shocker: docker PoC VMM-container breakout (C) 2014 Sebastian Krahmer Demonstrates that any given docker image someone is asking you to run in your docker setup can access ANY file on your host, e.g. dumping hosts /etc/shadow or other sensitive info, compromising security of the host and any other docker VM's on it. docker using container based VMM: Sebarate pid and net namespace, stripped caps and RO bind mounts into container's /. However as its only a bind-mount the fs struct from the task is shared with the host which allows to open files by file handles (open_by_handle_at()). As we thankfully have dac_override and dac_read_search we can do this. The handle is usually a 64bit string with 32bit inodenumber inside (tested with ext4). Inode of / is always 2, so we have a starting point to walk the FS path and brute force the remaining 32bit until we find the desired file (It's probably easier, depending on the fhandle export function used for the FS in question: it could be a parent inode# or the inode generation which can be obtained via an ioctl). [In practise the remaining 32bit are all 0 :] tested with docker 0.11 busybox demo image on a 3.11 kernel: docker run -i busybox sh seems to run any program inside VMM with UID 0 (some caps stripped);

Wait

No!

Docker has changed its security status to It's complicated

Who am I? Why am I here? Jérôme Petazzoni (@jpetazzo) - Grumpy French Linux DevOps Operated dotcloud PAAS for 3+ years - hosts arbitrary code for arbitrary users - all services, all apps, run in containers - no major security issue yet (fingers crossed) Containerize all the things! - VPN-in-Docker, KVM-in-Docker, Xorg-in-Docker, Docker-in-Docker...

What are those containers? (1/3) Technically: ~chroot on steroids - a container is a set of processes (running on top of common kernel) - isolated* from the rest of the machine (cannot see/affect/harm host or other containers) - using namespaces to have private view of the system (network interfaces, PID tree, mountpoints...) - and cgroups to have metered/limited/reserved resources (to mitigate bad neighbor effect) *Limitations may apply.

What are those containers? (2/3) From a distance: looks like a VM - I can SSH into my container - I can have root access in it - I can install packages in it - I have my own eth0 interface - I can tweak routing table, iptables rules - I can mount filesystems - etc.

What are those containers? (3/3) Lightweight, fast, disposable... virtual environments - boot in milliseconds - just a few MB of intrinsic disk/memory usage - bare metal performance is possible The new way to build, ship, deploy, run your apps!

Why is this a hot topic? Containers: have been around for decades LXC (Linux Containers): have been around for years So, what?

Blame Docker

Why is this a hot topic? Containers: have been around for decades LXC (Linux Containers): have been around for years Tools like Docker have commoditized LXC (i.e. made it very easy to use) Everybody wants to deploy containers now But, oops, LXC wasn't made for security We want containers, and we want them now; how can we do that safely?

Some inspirational quotes

LXC is not yet secure. If I want real security I will use KVM. Dan Berrangé (famous LXC hacker) This was in 2011. The Linux Kernel has changed a tiny little bit since then.

From security point of view lxc is terrible and may not be consider as security solution. someone on Reddit (original spelling and grammar) Common opinion among security experts and paranoid people. To be fair, they have to play safe & can't take risks.

Basically containers are not functional as security containers at present, in that if you have root on a container you have root on the whole box. Gentoo Wiki That's just plain false, or misleading, and we'll see why.

Containers do not contain. Dan Walsh (Mr SELinux) This was earlier this year, and this guy knows what he's talking about. Are we in trouble?

For the fashion of Minas Tirith was such that it was built on seven levels, each delved into a hill, and about each was set a wall, and in each wall was a gate. J.R.R. Tolkien (also quoted in VAX/VMS Internals and Data Structures, ca. 1980)

Keyword: levels

Let's review one of those quotes: If you have root on a container you have root on the whole box. First things first: just don't give root in the container If you really have to give root, give looks-like-root If that's not enough, give root but build another wall

Root in the host Root in the container Uruks (intruders)

There are multiple threat models Regular applications - web servers, databases, caches, message queues,... System services (high level) - logging, remote access, periodic command execution,... System services (low level) - manage physical devices, networking, filesystems,... Kernel - security policies, drivers,... The special case of immutable infrastructure

Regular applications

Regular applications Apache, MySQL, PostgreSQL, MongoDB, Redis, Cassandra, Hadoop, RabbitMQ... Virtually all your programs in any language (services/web services, workers, everything!) They never ever need root privileges (except to install packages) Don't run them as root! Ever!

Regular applications Risk: they run arbitrary code - vector: by definition, they are arbitrary code - vector: security breach causes execution of malicious code Fix: nothing - by definition, we are willing to execute arbitrary code here Consequence: assume those apps can try anything to break out

Regular applications Risk: escalate from non-root to root - vector: vulnerabilities in SUID binaries Fix: defang SUID binaries - remove them - remove suid bit - mount filesystem with nosuid Docker: - you can remove SUID binaries easily - doesn't support nosuid mount (but trivial to add)

Regular applications Risk: execute arbitrary kernel code - vector: bogus syscall (e.g. vmsplice* in 2008) Fix: limit available syscalls - seccomp-bpf = whitelist/blacklist syscalls - Docker: seccomp available in LXC driver; not in libcontainer Fix: run stronger kernels - GRSEC is a good idea (stable patches for 3.14 since July 4th) - update often (i.e. have efficient way to roll out new kernels) - Docker: more experiments needed *More details about that: http://lwn.net/articles/268783/

Regular applications Risk: leak to another container - vector: bug in namespace code; filesystem leak (like the one showed in the beginning of this talk!) Fix: user namespaces - map UID in container to a different UID outside - two containers run a process with UID 1000, but it's 14298 and 15398 outside - Docker: PR currently being reviewed Fix: security modules (e.g. SELinux) - assign different security contexts to containers - those mechanisms were designed to isolate! - Docker: SELinux integration; AppArmor in the works

System services (high level)

System services (high level) SSH, cron, syslog... You use/need them all the time Bad news: they typically run as root Good news: they don't really need root Bad news: it's hard to run them as non-root Good news: they are not arbitrary code

System services (high level) Risk: running arbitrary code as root - vector: malformed data or similar (note: risk is pretty low for syslog/cron; much higher for SSH) Fix: isolate sensitive services - run SSH on bastion host, or in a VM - note: this is not container-specific (if someone hacks into your SSH server, you'll have a bad time anyway)

System services (high level) Risk: messing with /dev - vector: malicious code Fix: devices control group - whitelist/blacklist devices - fine-grained: can allow only read, write, none, or both - fine-grained: can specify major+minor number of device Docker: - sensible defaults - support for fine-grained access to devices in the works

System services (high level) Risk: use of root calls (mount, chmod, iptables...) - vector: malicious code Fix: capabilities - break down root into many permissions - e.g. CAP_NET_ADMIN (network configuration) - e.g. CAP_NET_RAW (generate and sniff traffic) - e.g. CAP_SYS_ADMIN (big can of worms ) - see capabilities(7) Docker: - sensible default capabilities - but: CAP_SYS_ADMIN! (see next slide)

Interlude: CAP_SYS_ADMIN Operations controlled by CAP_SYS_ADMIN... quotactl, mount, umount, swapon, swapoff sethostname, setdomainname IPC_SET, IPC_RMID on arbitrary System V IPC perform operations on trusted and security Extended Attributes set realtime priority (ioprio_set + IOPRIO_CLASS_RT) create new namespaces (clone and unshare + CLONE_NEWNS)

System services (high level) Risk: messing with /proc, /sys - vector: malicious code Fix: prevent unauthorized access control - Mandatory Access Control (AppArmor, SELinux) - remount read-only, then drop CAP_SYS_ADMIN to prevent remount Fix: wider implementation of namespaces - some parts of procfs/sysfs are namespace-aware - some aren't, but can be fixed (by writing kernel code) Docker: - locks down /proc and /sys

System services (high level) Risk: leaking with UID 0 - vector: malicious code Fix: user namespaces - already mentioned earlier - UID 0 in the container is mapped to some random UID outside - you break out: you're not root - you manage to issue weird syscalls: they're done as unprivileged UID Docker: work in progress Caveat: user namespaces are still new. We have to see how they behave with that!

System services (low level)

System services (low level) Device management (keyboard, mouse, screen), network and firewall config, filesystem mounts... You use/need some of them all the time But you don't need any of them in containers - physical device management is done by the host - network configuration and filesystems are setup by the host Exceptions: - custom mounts (FUSE) - network appliances

System services (low level) Risk: running arbitrary code as root - vector: malformed data or similar Fix: isolate sensitive functions - one-shot commands can be fenced in privileged context (think sudo but without even requiring sudo ) - everything else (especially processes that are long-running, or handle arbitrary input) runs in non-privileged context - works well for FUSE, some VPN services Docker: provides fine-grained sharing - e.g. docker run --net container: for network namespace - nsenter for other out-of-band operations

System services (low level) Risk: run arbitrary code with full privileges - vector: needs a process running with full privileges (rare!) - vector: malformed data, unchecked input classic exploit Fix: treat it as kernel - we'll see that immediately in the next section

Kernel

Kernel Drivers - can talk to the hardware, so can do pretty much anything - except: virtualize the bus and use e.g. driver domains (Xen) Network stacks - this probably has to live into the kernel for good performance - except: DPDK, OpenOnload... (networking stacks in userspace) Security policies - by definition, they control everything else - except: there might be nested security contexts some day

Kernel Risk: run arbitrary code with absolute privileges Fix:?

Reality check: if you run something which by definition needs full control over hardware or kernel, containers are not going to make it secure. Please stop trying to shoot yourself in the foot safely.

Reality check: if you run something which by definition needs full control over hardware or kernel, containers are not going to make it secure. Please stop trying to shoot yourself in the foot safely.

Kernel Risk: run arbitrary code with absolute privileges Fix: give it its own kernel and (virtual) hardware - i.e. run it in a virtual machine - that VM can run in a container - that VM can hold a container - run a privileged container, in Docker, in a VM, while the VM runs in a container, in a Docker https://github.com/jpetazzo/docker2docker - inb4 xzibit meme

Immutable immutable infrastructure

Immutable immutable infrastructure New rule: the whole container is read-only Compromise: if we must write, write to a noexec area Scalability has never been easier (if totally read-only) It's even harder for malicious users to do evil things

Recap (in no specific order!) don't run things as root drop capabilities enable user namespaces get rid of shady SUID binaries enable SELinux (or AppArmor) use seccomp-bpf get a GRSEC kernel update kernels often mount everything read-only ultimately, fence things in VMs

Recap (with Docker status) don't run things as root drop capabilities (you do it!) (but CAP_SYS_ADMIN!) enable user namespaces get rid of shady SUID binaries (work in progress) (but not enforced yet) enable SELinux (or AppArmor) use seccomp-bpf (SELinux) (on LXC driver) get a GRSEC kernel update kernels often (to be confirmed) (not Docker's job) mount everything read-only ultimately, fence things in VMs (not yet) (easy to do)

Recap (improvements needed) don't run things as root drop capabilities (you do it!) (but CAP_SYS_ADMIN!) enable user namespaces get rid of shady SUID binaries (work in progress) (but not enforced yet) enable SELinux (or AppArmor) use seccomp-bpf (SELinux) (on LXC driver) get a GRSEC kernel update kernels often (to be confirmed) (not Docker's job) mount everything read-only ultimately, fence things in VMs (not yet) (easy to do)

Thank you! Questions?