Secure Containers with EPT Isolation

Similar documents
Multi-tenancy Virtualization Challenges & Solutions. Daniel J Walsh Mr SELinux, Red Hat Date

What is KVM? KVM patch. Modern hypervisors must do many things that are already done by OSs Scheduler, Memory management, I/O stacks

Virtual Machines. Part 2: starting 19 years ago. Operating Systems In Depth IX 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

The Challenges of X86 Hardware Virtualization. GCC- Virtualization: Rajeev Wankar 36

CS-580K/480K Advanced Topics in Cloud Computing. VM Virtualization II

Hypervisor security. Evgeny Yakovlev, DEFCON NN, 2017

MultiNyx: A Multi-Level Abstraction Framework for Systematic Analysis of Hypervisors. Pedro Fonseca, Xi Wang, Arvind Krishnamurthy

Spring 2017 :: CSE 506. Introduction to. Virtual Machines. Nima Honarmand

64-bit ARM Unikernels on ukvm

IBM Research Report. Docker and Container Security White Paper

CSC 5930/9010 Cloud S & P: Virtualization

LINUX KVM FRANCISCO JAVIER VARGAS GARCIA-DONAS CLOUD COMPUTING 2017

W11 Hyper-V security. Jesper Krogh.

Introduction to SGX (Software Guard Extensions) and SGX Virtualization. Kai Huang, Jun Nakajima (Speaker) July 12, 2017

Container Adoption for NFV Challenges & Opportunities. Sriram Natarajan, T-Labs Silicon Valley Innovation Center

Privilege Escalation

Virtualization Device Emulator Testing Technology. Speaker: Qinghao Tang Title 360 Marvel Team Leader

ViryaOS RFC: Secure Containers for Embedded and IoT. A proposal for a new Xen Project sub-project

Module 1: Virtualization. Types of Interfaces

1 Virtualization Recap

Virtualization. Starting Point: A Physical Machine. What is a Virtual Machine? Virtualization Properties. Types of Virtualization

Scotch: Combining Software Guard Extensions and System Management Mode to Monitor Cloud Resource Usage

DOUG GOLDSTEIN STAR LAB XEN SUMMIT AUG 2016 ATTACK SURFACE REDUCTION

Virtualization. ! Physical Hardware Processors, memory, chipset, I/O devices, etc. Resources often grossly underutilized

CIT 480: Securing Computer Systems. Operating System Concepts

Optimizing and Enhancing VM for the Cloud Computing Era. 20 November 2009 Jun Nakajima, Sheng Yang, and Eddie Dong

Virtualization. Pradipta De

Advanced Exploitation: Xen Hypervisor VM Escape

Virtually Impossible

Micro VMMs and Nested Virtualization


Qiang Li && Zhibin Hu/Qihoo 360 Gear Team Ruxcon 2016

Virtualization (II) SPD Course 17/03/2010 Massimo Coppola

Meltdown and Spectre - understanding and mitigating the threats (Part Deux)

Arsenal. Shadow-Box: Lightweight Hypervisor-Based Kernel Protector. Seunghun Han, Jungwhan Kang (hanseunghun

Cauldron: A Framework to Defend Against Cache-based Side-channel Attacks in Clouds

Chapter 5 C. Virtual machines

Introduction to Cloud Computing and Virtualization. Mayank Mishra Sujesha Sudevalayam PhD Students CSE, IIT Bombay

LINUX Virtualization. Running other code under LINUX

Intel s Virtualization Extensions (VT-x) So you want to build a hypervisor?

Runtime VM Protection By Intel Multi-Key Total Memory Encryption (MKTME)

Meltdown and Spectre - understanding and mitigating the threats

COMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy

CS 550 Operating Systems Spring Introduction to Virtual Machines

Virtualization. Adam Belay

Monitoring Hypervisor Integrity at Runtime. Student: Cuong Pham PIs: Prof. Zbigniew Kalbarczyk, Prof. Ravi K. Iyer ACC Meeting, Oct 2015

CSE543 - Computer and Network Security Module: Virtualization

EE 660: Computer Architecture Cloud Architecture: Virtualization


Making Nested Virtualization Real by Using Hardware Virtualization Features

Intel Clear Containers. Amy Leeland Program Manager Clear Linux, Clear Containers And Ciao

The speed of containers, the security of VMs

HW isolation for automotive environment BoF

How Container Runtimes matter in Kubernetes?

Virtual Machines. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CS 470 Spring Virtualization and Cloud Computing. Mike Lam, Professor. Content taken from the following:

KVM as The NFV Hypervisor

Docker and Security. September 28, 2017 VASCAN Michael Irwin

Reducing World Switches in Virtualized Environment with Flexible Cross-world Calls

Intel Virtualization Technology Roadmap and VT-d Support in Xen

Linux and Xen. Andrea Sarro. andrea.sarro(at)quadrics.it. Linux Kernel Hacking Free Course IV Edition

Amir Zipory Senior Solutions Architect, Redhat Israel, Greece & Cyprus

Virtualization with XEN. Trusted Computing CS599 Spring 2007 Arun Viswanathan University of Southern California

Lecture 3: O/S Organization. plan: O/S organization processes isolation

Back To The Future: A Radical Insecure Design of KVM on ARM

Course Review. Hui Lu

Nested Virtualization Friendly KVM

EDGE COMPUTING & IOT MAKING IT SECURE AND MANAGEABLE FRANCK ROUX MARKETING MANAGER, NXP JUNE PUBLIC

Malware

Categorizing container escape methodologies in multi-tenant environments

Xen is not just paravirtualization

A Survey on Security Isolation of Virtualization, Containers, and Unikernels

Increase KVM Performance/Density

Computer Architecture Background

Testing System Virtual Machines

Achieve Low Latency NFV with Openstack*

Virtualization. Operating Systems, 2016, Meni Adler, Danny Hendler & Amnon Meisels

The Architecture of Virtual Machines Lecture for the Embedded Systems Course CSD, University of Crete (April 29, 2014)

Advanced Systems Security: Virtual Machine Systems

TEN LAYERS OF CONTAINER SECURITY

Virtual Machines. To do. q VM over time q Implementation methods q Hardware features supporting VM q Next time: Midterm?

Lecture 5. KVM for ARM. Christoffer Dall and Jason Nieh. 5 November, Operating Systems Practical. OSP Lecture 5, KVM for ARM 1/42

TURNING (PAGE) TABLES

Introduction to Qubes OS

CSE543 - Computer and Network Security Module: Virtualization

Spectre and Meltdown. Clifford Wolf q/talk

ARMv8 port of the Jailhouse hypervisor

Operating System Security

Multi-Hypervisor Virtual Machines: Enabling An Ecosystem of Hypervisor-level Services

KVM for IA64. Anthony Xu

Problem System administration tasks on a VM from the outside, e.g., issue administrative commands such as hostname and rmmod. One step ahead tradition

Advanced Systems Security: Virtual Machine Systems

Linux Containers Roadmap Red Hat Enterprise Linux 7 RC. Bhavna Sarathy Senior Technology Product Manager, Red Hat

TEN LAYERS OF CONTAINER SECURITY. Kirsten Newcomer Security Strategist

Live Demo: A New Hardware- Based Approach to Secure the Internet of Things

Xen and the Art of Virtualization. CSE-291 (Cloud Computing) Fall 2016

Dune: Safe User- level Access to Privileged CPU Features

Porting bhyve on ARM. Mihai Carabas, Peter Grehan BSDCan 2016 University of Ottawa Ottawa, Canada June 10 11, 2016

Virtual Machine Security

Secure In-VM Monitoring Using Hardware Virtualization

Transcription:

Secure Containers with EPT Isolation Chunyan Liu liuchunyan9@huawei.com Jixing Gu jixing.gu@intel.com

Presenters Jixing Gu: Software Architect, from Intel CIG SW Team, working on secure container solution architecture design and KVM support. Chunyan Liu: Software Engineer, from Huawei Central Software Institution, working on secure container solution design, guest OS support and Docker tooling integration. 2

Agenda Background Secure Container Solution Architecture Design Work Flow Security Cases & Demo Benchmark 3

Container Stack & Security Containers Stack App1 App2 Bin/Libs Bin/Libs Docker Engine Host OS Infrasturcture Security? Attack surface is large Namespace isolation is weak Bugs in Linux kernel can allow escape to the host and harm other containers Ease of Development/Deployment High performance, low overhead Shared kernel Namespace isolation

Security Features Supported by Docker Capability: restrict capabilities of process in container Seccomp: filter access to syscall SElinux: customize privileges for processes, users and files. User namespace: map root in container to non-root user on host Fuse: isolate /proc, useful for container monitoring system. CAN: restrict container capabilities and privileges, reduce chances container attacking kernel However, it s not enough CANNOT: container privilege escalation by exploiting kernel vulnerabilities

Security Problems We Address Problem-1: Privilege Escalation Problem-2: Memory Data Peek Container Attacker App Container Victim App $$ Attacker App Kernel Kernel By exploiting kernel vulnerability, like CVE-2017-6074, attackers are able to escalate to root privilege with ret2user By exploiting kernel modules, like /dev/mem device, attackers w/ root privilege are able to peek containers memory data

Our Solution: Namespace-alike Memory Isolation w/ EPT Protection-1: Defeat Privilege Escalation Container Protection-2: Defend Memory Data Peek Container Attacker App Victim App $$ Attacker App Kernel KVM Kernel KVM KVM creates a EPT memory region to isolate container user-space memory from kernel, and captures illegitimate cross-ept code execution or data access behaviors with EPT violation

Typical User Model of Containers VM1 VM2 container container container container App App App App Bin/Libs Bin/Libs Bin/Libs Bin/Libs memory Docker Engine Docker Engine Guest OS Guest OS EPT Table Host OS (Hypervisor) Infrastructure EPT Table 8

Our Ideas To Secure Containers VM1 VM2 container container container container App App App App Bin/Libs Bin/Libs Bin/Libs Bin/Libs memory Docker Engine Docker Engine Guest OS Guest OS EPT Table EPT Table Host OS (Hypervisor) EPT Table Infrastructure Extract container memory to a new EPT table, separated from VM EPT Table Strong EPT isolation between container and VM kernel. 9

Secure Container SW Stack Up EPT1 EPT2 Legend: VM Container 1 Container 2 Docker Tools Guest Kernel Patch KVM Patch QEMU Patch Secure Container Solution Guest OS Hypervisor ( KVM/QEMU & Host OS ) Intel VT Hardware Server HW System SC Components Existing Components Secure Container Solution Includes: Secure Container KVM Patch Extend KVM existing interfaces for secure container Create EPT for new container Add mem page into EPT view Delete mem page from EPT view Secure Container Guest Kernel Patch Handle secure container creation w/ extended interfaces from the underlying KVM Handle data exchange w/ extended interfaces from KVM Secure Container QEMU Patch Support VM management interfaces Tiny Changes To Docker Tools Differentiate Secure Container from Common Container

Typical Usage Scenarios Private Cloud: Isolate containers in a single shared guest OS Public Cloud: Isolate containers from same tenant & leverage VM to isolate between tenants EPT-1 EPT-N Container Container Guest OS Hypervisor(KVM & Host OS) Infrastructure VM-1 (for one tenant) VM-N (for one tenant) EPT-1 EPT-N EPT-1 EPT-N Container Container Container Container Guest OS Guest OS Hypervisor(KVM & Host OS) Infrastructure

Secure Container Arch and Data Flow

Secure Container Interfaces Calling Flow

EPT Violation Flow Switch View & Add Page Kernel to User Condition - Violation happens under view 0 - VM s CPL = 3 - RIP from user - GVA from user space - Page is present in specific user view User to Kernel Condition - Violation happens under user view - VM s CPL = 0 - RIP from kernel space - GVA from kernel space New page in current user view Condition - Requested page is not used by other user views - Requested page GVA is in user space

EPT Violation Flow Delete Page Delete Page is to delete all access permission for a specific page from an EPT view. It entails a page free from guest kernel. EPT violation handles a delayed page deletion in hypervisor. Hypervisor will erase all the data before deleting a page.

EPT Violation Flow Share Page Pages in file cache may be shared across containers. Hypervisor records share page status. Shared pages shall be granted to access by different containers. Share pages should be created per EPT view.

Data Exchange between user/kernel space struct task_struct { bool sc_flag; // if true, indicate the task is in a secure container. }; Pseudo Code int data_exchange(struct data_ex_cfg *cfg, uint64_t size) { if (current->sc_flag == false) { return 1; } //Transfer struct data_ex_cfg to hypervisor through hyper call. HyperCall( KVM_HC_SC, HC_DATA_EXCHANGE, cfg, sizeof(struct data_ex_cfg) ); }

Optimization: VMFUNC Switch EPT View Use VMFUNC in places that need to switch view (e.g. syscall)

Security Case: CVE-2017-6074 ret2user Attack Note: 1. Kernel trigger user space execution code; 2. CPL (current privilege level) is ring 0; 3. Privilege escalation succeeds;

Security Case: CVE-2017-6074 ret2user Attack Note: 1. EPT violation happens while kernel trying to execute user space code because kernel space and user space are isolated in different EPT views; 2. KVM checks CPL (current privilege level) is ring 0, and concludes it is not a reasonable system call return and rejects the operation. (CPL should be ring 3 for normal syscall_return); 3. Even SMAP and SMEP were both disabled, secure container can prevent the privilege escalation;

Demo Time

Benchmark Current data: SC vs CC Overhead Boot time 555ms vs 481 ms (~15%) Memory footprint CPU Performance Memory Performance Storage IO 18%~26% Network IO 28%~30% Notes Host memory has little overhead to manage more EPT tables Optimization focus (VMFUNC result not updated) Optimization focus (VMFUNC result not updated) Next Step: 1. Use VMFUNC to reduce switch view VMEXIT/VMENTRY overhead 2. Improve DataExchange performance by reducing VMCALL times

Thank you! 23