Hardware Acceleration for Tagging

Similar documents
Making Performance Understandable: Towards a Standard for Performance Counters on Manycore Architectures

Integrating the Par Lab Stack Running Damascene on SEJITS/ROS/RAMP Gold

Contour Detection on Mobile Platforms

Parallel Webpage Layout

Active Testing for Concurrent Programs

Code Generators for Stencil Auto-tuning

Code Generators for Stencil Auto-tuning

A Retrospective on Par Lab Architecture Research

Active Testing for Concurrent Programs

BERKELEY PAR LAB. RAMP Gold Wrap. Krste Asanovic. RAMP Wrap Stanford, CA August 25, 2010

Tessellation: Space-Time Partitioning in a Manycore Client OS

SEJITS: High Performance for Productivity Applications

Lecture 13: Address Translation

SPIN Operating System

Tessellation OS. Present and Future. Juan A. Colmenares, Sarah Bird, Andrew Wang, Krste Asanovid, John Kubiatowicz [Parallel Computing Laboratory]

Third party names are the property of their owners.

CS370 Operating Systems

Motivations for Virtual Memory Virtual Memory Oct. 29, Why VM Works? Motivation #1: DRAM a Cache for Disk

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2017

Virtual Memory Oct. 29, 2002

CS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches

CIS Operating Systems Memory Management Cache and Demand Paging. Professor Qiang Zeng Spring 2018

virtual memory. March 23, Levels in Memory Hierarchy. DRAM vs. SRAM as a Cache. Page 1. Motivation #1: DRAM a Cache for Disk

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 23

Computer Science 146. Computer Architecture

End of Parlab talk. May 29, 2013 Gans Srinivasa Intel Corp

Chapter 8: Memory-Management Strategies

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Virtualization and memory hierarchy

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES

Labels and Information Flow

Chapter 8: Main Memory. Operating System Concepts 9 th Edition

Wedge: Splitting Applications into Reduced-Privilege Compartments

Chapter 8: Main Memory

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2015

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

CISC 360. Virtual Memory Dec. 4, 2008

Chapter 8: Main Memory

Asbestos Operating System

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 23

COMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy

Mondrian Memory Protection

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory

Lecture 9 - Virtual Memory

Darek Mihocka, Emulators.com Stanislav Shwartsman, Intel Corp. June

Virtual Memory. Motivations for VM Address translation Accelerating translation with TLBs

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts

Xen and the Art of Virtualization. CSE-291 (Cloud Computing) Fall 2016

MICROKERNELS: MACH AND L4

virtual memory Page 1 CSE 361S Disk Disk

Introduction to Operating Systems

CIS Operating Systems Memory Management Address Translation for Paging. Professor Qiang Zeng Spring 2018

o Code, executable, and process o Main memory vs. virtual memory

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

Multithreading and the Tera MTA. Multithreading for Latency Tolerance

CS3600 SYSTEMS AND NETWORKS

Separating Access Control Policy, Enforcement, and Functionality in Extensible Systems. Robert Grimm University of Washington

Memory. From Chapter 3 of High Performance Computing. c R. Leduc

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

OS Design Approaches. Roadmap. OS Design Approaches. Tevfik Koşar. Operating System Design and Implementation

CS 152 Computer Architecture and Engineering. Lecture 8 - Address Translation

CS 152 Computer Architecture and Engineering. Lecture 9 - Address Translation

CS 152 Computer Architecture and Engineering. Lecture 9 - Virtual Memory

Deterministic Process Groups in

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

Chapter 8 Main Memory

Chapter 2. OS Overview

Virtual Memory Virtual memory first used to relive programmers from the burden of managing overlays.

Lecture 4: Memory Management & The Programming Interface

Operating Systems (2INC0) 2018/19. Introduction (01) Dr. Tanir Ozcelebi. Courtesy of Prof. Dr. Johan Lukkien. System Architecture and Networking Group

24-vm.txt Mon Nov 21 22:13: Notes on Virtual Machines , Fall 2011 Carnegie Mellon University Randal E. Bryant.

Performance analysis basics

CSC Operating Systems Fall Lecture - II OS Structures. Tevfik Ko!ar. Louisiana State University. August 27 th, 2009.

Announcements. Computer System Organization. Roadmap. Major OS Components. Processes. Tevfik Ko!ar. CSC Operating Systems Fall 2009

0x1A Great Papers in Computer Security

Portland State University ECE 587/687. Virtual Memory and Virtualization

CS 152 Computer Architecture and Engineering. Lecture 9 - Address Translation

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

JiST Java in Simulation Time An efficient, unifying approach to simulation using virtual machines

CSC 5930/9010 Cloud S & P: Virtualization

CS 152 Computer Architecture and Engineering. Lecture 8 - Address Translation

Explicit Information Flow in the HiStar OS. Nickolai Zeldovich, Silas Boyd-Wickizer, Eddie Kohler, David Mazières

Last 2 Classes: Introduction to Operating Systems & C++ tutorial. Today: OS and Computer Architecture

CHAPTER 8: MEMORY MANAGEMENT. By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

Memory Management william stallings, maurizio pizzonia - sistemi operativi

Copyright Khronos Group Page 1. Vulkan Overview. June 2015

PROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18

CS307: Operating Systems

Four Components of a Computer System

Operating Systems. 09. Memory Management Part 1. Paul Krzyzanowski. Rutgers University. Spring 2015

Motivation. Threads. Multithreaded Server Architecture. Thread of execution. Chapter 4

Hardware Enforcement of Application Security Policies Using Tagged Memory

Page 1. Goals for Today" Important Aspects of Memory Multiplexing" Virtualizing Resources" CS162 Operating Systems and Systems Programming Lecture 9

Chapter 1 Computer System Overview

CS 261 Fall Mike Lam, Professor. Virtual Memory

Chapter 7: Main Memory. Operating System Concepts Essentials 8 th Edition

Supercomputing and Mass Market Desktops

Transcription:

Parallel Hardware Parallel Applications IT industry (Silicon Valley) Parallel Software Users Hardware Acceleration for Tagging Sarah Bird, David McGrogan, John Kubiatowicz, Krste Asanovic June 5, 2008 1

Diagnosing Power/Performance Correctness Par Lab Research Overview Easy to write correct programs that run efficiently on manycore Personal Health Image Hearing, Speech Parallel Retrieval Music Browser Design Patterns/Motifs Composition & Coordination Language (C&CL) C&CL Compiler/Interpreter Static Verification Parallel Libraries Parallel Frameworks Type Systems Efficiency Sketching Languages Autotuners Legacy Communication & Schedulers Code Synch. Primitives Efficiency Language Compilers OS Libraries & Services Legacy OS Hypervisor Multicore/GPGPU RAMP Manycore Directed Testing Dynamic Checking Debugging with Replay 2

Diagnosing Power/Performance Correctness Par Lab Research Overview Easy to write correct programs that run efficiently on manycore Personal Health Image Hearing, Speech Parallel Retrieval Music Browser Design Patterns/Motifs Composition & Coordination Language (C&CL) C&CL Compiler/Interpreter Static Verification Parallel Libraries Parallel Frameworks Type Systems Efficiency Languages Legacy Code Sketching Autotuners Communication & Schedulers Synch. Primitives Efficiency Language Compilers OS Libraries & Services Legacy OS Hypervisor Multicore/GPGPU RAMP Manycore Directed Testing Dynamic Checking Debugging with Replay 3

Does Security Matter? Code bases are growing Linux 2.6.25 has 9 million lines of code* Difficult to verify all of the code More code is interacting on parallel platforms Need to provide isolation More 3 rd party software Mac Dashboard Widgets Browser Plug-ins Device drivers Growing amount of personal data on our computers and web Google has health information on the web Webservers access thousands of users personal data Need to prevent one users data from being leaked to a different user *www.heise-online.co.uk 4

Tagging 1 2 3 2 Read and Write access Contains a categories and clearance levels Labels form a partial order Used on threads, devices, data, messages, etc Information is allowed to flow from less tainted labels to more tainted labels. 5

Terminology In this talk, Categories (61 bits) Levels 0 1 2 3 4 5 * Labels (n categories & their levels) 3 Tags (compressed labels) 6

Tagging Management New Categories Threads can ask for a new category The thread now owns that category and it s added to the threads label Enforcement On read and writes, compare labels On access to devices, the devices label is compared to the data label Sharing Data The owning thread can give category permission and a max clearance level to another thread or device. * * 3 3 3 * 3 7

Does Tagging Scale? Protection information carried in labels Only requires local comparisons of the thread or device label with the data label Updates to permissions/labels need to only be updated on the device, data, or thread Allocation of new categories doesn t need to be global Allocating space needs only the calling thread s label 8

Tagging and Tessellation Secure L A Channel L B Allow if L A L B Mandatory Access Control on Channels and Objects, ala Asbestos/HiStar Create Secure channels in memory using labels Anarchistic Privilege Control Categories can be created on the fly by users Dynamic and Flexible Protection 9

Hardware vs. Software Software can be expensive 3x overhead for LIFT* 2 to 3x overhead for HiStar* in many cases Does the os scale? What about protection check in hardware? Reduces the trusted code Much lower overhead in steady state condition *http://opera.cs.uiuc.edu/paper/lift-micro06.pdf http://www.scs.stanford.edu/histar/ 10

Tag Everything 64 bit label for each 64 bit word 100% memory overhead 100% cache overhead 100% network overhead Protection checks are a simple comparison Variable Labels? Larger overheads! Complex memory and network controllers How do we even store this? In memory? In the cache? 11

Dealing with Tag Explosion: Memory Unlikely that nearby data has a different label Take a page out of fine-grained memory protection Linear Representation Takes advantage contiguous segments with the same label More compressed Insert must slide down everything Completely flexible representation Binary Search to look up Multilevel Page Table Simple look up algorithm Less flexible Easy insert Address 0x0000 0x0020 0x1000 Pointer Label/Tag Pointer 12

Percent Overhead Page Table 3-Level Page Table with 128 Byte cache line granularity 3 memory accesses per cache line on read Could be combined with translation 1 st Level 10 bits 2 nd Level 10 bits 3 rd Level 5 bits 0.14 0.12 0.1 0.08 0.06 0.04 0.02 Memory Overhead 0 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Percent of non-contiguous 4k segments 13

Dealing with Tag Explosion: Cache Can t afford to store a variable sized label for every cache line Represent Labels with Tags! Local Relabeling Scheme Map active categories to bit vectors where each bit represents an active category Makes protection checks very easy 3 113 2 012 14

What about the network? Network overhead is still 100% or more Network controllers need to deal with the variable length labels All reads and writes have to go through the translation scheme To talk to a different node we have to translate from the bit vector back to the label and then to the bit vector for the receiving node 15

Dealing with Tag Explosion: Network Must change labels into Tags before the network Tags must be universally understood Nodes can communicate without translation Relabeling Engine Changes labels in 16-bit tags 3 2 2 15 Network overhead is now only 16 bits per Cache Line = 1.5% overhead 16

How do handle protection checks? Can t easy compare tags since they don t contain the label and level information Precalculate the comparison between 2 tags Done with software handler Store in memory Protection Lookaside Buffer Thread Tag Data Tag Read or Write 17

System View 18

Evaluation Simics Simulator Synthetic Benchmarks Pentium 4 processor Vector Write 20 Mhz 256 MB Memory Linux kernel 2.6.15 HiStar kernel (single core) Image Histogram 7-Point Stencil Custom Memory Hierarchy Insert delays for the relevant misses 19

Single Core Results Single core experiments comparing the runtime of a plain system without information flow control to Lighthouse and HiStar using synthetic vector benchmarks. The y axis is cycles. 20

Multicore Results MultiCore experiment comparing the runtime a plain system without information flow control to Lighthouse using the image histogram benchmark. The y axis is cycles and the x axis is processors. MultiCore experiment comparing the runtime a plain system without information flow control to Lighthouse using the stencil benchmark. The y axis is cycles and the x axis is processors. 21

Future Work Develop benchmarks that utilize categories to help determine best, worst, and average usage case. Implement a more realistic model of the structures Run experiments to analyze area vs. performance tradeoffs 22

Arch. OS Efficiency Productivity Diagnosing Power/Performance Bottlenecks Correctness Apps Conclusions Parallel is happening Security is being increasingly important Tagging provides a scalable mechanisms for enforcing isolation Software checks can be expensive Hardware support can provide a low overhead mechanism to support tagging Tagging could have many uses for security and debugging Easy to write correct programs that run efficiently and scale up on manycore Personal Health Legacy Code Image Retrieval Hearing, Music Design Patterns/Motifs Schedulers Sketching Speech Parallel Browser Composition & Coordination Language (C&CL) Efficiency Languages C&CL Compiler/Interpreter Communication & Synch. Primitives Efficiency Language Compilers Legacy OS Parallel Libraries Autotuners Parallel Frameworks OS Libraries & Services Hypervisor Static Verification Type Systems Directed Testing Dynamic Checking Debugging with Replay Multicore/GPGPU RAMP Manycore 23

Acknowledgments Thanks to Profs John Kubiatowicz, Krste Asanovic, Eric Brewer, Joe Hellerstein UCB Grad students David McGrogan, Henry Cook, Mark Murphy, Scott Beamer, Colleen Lewis, Cynthia Sturton HiStar Designer Nickolai Zeldovich 24

Questions? slbird@eecs.berkeley.edu 25

Tagging Management New Categories Threads can ask for a new category which is generated by encrypting a 61 bit counter The thread now owns that category and it s added to the threads label Enforcement On read and writes, compare the threads label with the data address label to see if it can read (thread >= data) or write (data >= thread). On access to devices, the devices label is compared to the data label Sharing Data The owning thread can give category permission and a max clearance level to another thread or device. Allocating Space A thread may allocate space to write in. The space is given the thread s current label. * * 3 3 3 * 3 3 3 26

Tagging Management New Categories Threads can ask for a new category The thread now owns that category and it s added to the threads label Enforcement Hardware instruction sends the thread s tag to relabeling engine. Relabeling engine allocates new category and a new tag and returns new tag and new category to the thread. On read and writes, compare the threads label with the data address label to see if it can read (thread >= data) or write (data >= thread). On access to devices, the devices label is compared to the data label. Comparator looks up thread tag and data tag in the PLB. If not in PLB then the two tags are sent to the relabeling engine. The relabeling engine looks them up in the Label Comparison Store. If there is a miss in the Label Comparison Store, a software handler is invoked to look up the labels and compare them. Sharing Data The owning thread can give category permission and a max clearance level to another thread or device. Hardware instruction sends the category and thread id to the relabeling engine which updates the max clearance level for that thread in the category. The thread may then ask to raise its actual clearance level in that category which results in a new tag. Allocating Space A thread may allocate space to write in. The space is given the thread s current label. Update the protection table for that area with the thread s tag. Allocating scheme needs to make sure that old data can t be read (clear on either allocate or deallocate). 27