Utilizing the IOMMU scalably

Size: px
Start display at page:

Download "Utilizing the IOMMU scalably"

Transcription

1 Utilizing the IOMMU scalably Omer Peleg, Adam Morrison, Benjamin Serebrin, and Dan Tsafrir USENIX ATC Shin Seok Ha

2 1 Introduction What is an IOMMU? Provides the translation between IO addresses and Physical addresses. Just as MMU does.

3 2 The importance of IOMMU Memory System data IO buf 1 X X IO buf 2 Device IOMMU Memory 4GB Device1 Device2 16GB A malicious or buggy device may interfere the area that should not be accessed. The range that device can access gets larger than the device itself can.

4 3 Introduction Non scalable IOMMU IO Virtual Address Allocator IOTLB Invalidation

5 4 IO Virtual Address Allocator Centralized IOVA allocator..... rb tree List of free IOVA ranges EiovaR : Reduces time to traverse the tree Request IOVA Attempting to acquire the lock Task1 Task2 Task3

6 5 IOTLB Invalidation IOMMU Memory Invalidation queue Device IOTLB IOVA 0x2000 0x3000 HPA 0x00FF 0x00FE IO page table Invalidation should occur in every unmap operation. Invalidation queue requires global lock. Task should wait for IO to be completed.

7 6 Deferred Invalidation Batch invalidation commands up to 250 and flush Or in every 10ms. Accessing the global queue still produces a huge overhead Security Issue while the mapping exists after unmap

8 16 core parallel netperf RR workload cycle breakdown and throughput (270 instances) 7 Invalidation overhead masks allocation overhead. Long invalidation period naturally throttles map/unmap operations. Both kinds of overheads should be treated altogether.

9 8 Suggested solutions Scalable IOVA allocation - Dynamic Identity Mapping - IOVA-kmalloc - Magazines Per-core flush queue

10 9 Dynamic Identity Mapping IOMMU Device IOVA 0x2000 0x3000 HPA 0x0020 0x0030 Physical Address 1 to 1 relation between HPA and IOVA No need to access rb tree. Physically contiguous buffer Eliminating works and locks used to manage a distinct space of IOVAs.

11 Each mapping is associated with a distinct IOVA. Several mappings in a same page can be linked with different page table entries. Page 0x Page table entry 0x x3000 0x3000 Reference count must be recorded to decide when to remove this entry -> atomic operation required

12 11 Dynamic Identity Mapping Conflicting access Permissions.. 48 bits.. 46 bits Read Write IOVA PA Spare bits Read, Write, R&W, and etc. When mapping is failed with this method. Go back to the traditional method.

13 Dynamic Identity Mapping When to go back to the old way? 12 Non contiguous buffers PTE reference count overflow - 10 reference bits

14 13 IOVA-kmalloc Obtain IOVA range using kmalloc allocator, and use the address of block as an IOVA. Easy implementation, well optimized. Uniquie address per allocation -> No worries whether IOMMU pages conflict.

15 14 kmalloc R/4096 bytes Memory PA 36 bits IOVA 36 bits as PFN 10 bits as page offset

16 15 IOVA-kmalloc Obstacles Pages not packed as dense as the old way. Required number of page tables should be larger. Linux does not reclaim pages used as page tables. Possibility to cause memory blowup Collisions might occur between two different IOVAs Use GFP_DMA flag to limit the area

17 16 Magazine allocator Per-core Cache of previously deallocated IOVA ranges. The global allocator will not be invoked until percore cache uses its all capacity. Core1 Core2 IOVA IOVA. Depot IOVA IOVA IOVA IOVA IOVA IOVA M elements. A Core s miss rate is bounded by 1/M

18 17 Scalable IOTLB Invalidation A global flush queue is overkill, with its associated lock contention. - There is no dependency between the invalidation process of distinct IOVA ranges. Per-core flush queue Requirements Until an entire IOVA range mapping is invalidated in IOTLB, 1. The IOVA range will not be released back to the IOVA allocator 2. The page tables that were mapping the IOVA range will not be reclaimed.

19 18 Scalable IOTLB Invalidation The owning core almost acquires the lock on the queue. (Except global invalidation) Core1 Core2 Flush queue Cyclic invalidation queue

20 19 Evaluation Rack server Processors Hyperthreading Memory NIC Server Dell PowerEdge R430 Client Dual 2.4GHz Intel Xeon E5-2630v3 8-core Disabled 32GB 2133MHz Broadcom NetXtreme II BCM Gb/s Intel Gb/s Base OS Ubuntu Ubuntu

21 20 Evaluation 15 rings on server NIC Benchmarks are executed in a round-robin fashion. Each benchmark runs once for 10 seconds for each possible number of cores. The cycle is run 5 times.

22 21 Benchmarks Netperf : network analysis tool 270 instances of the TCP RR test on the client - Measuring the throughput As many instances as we allow server cores - Measuring the parallel latency Memcached : a key-value store service used by web applications. 16 threads, 256 concurrent requests. - Measuring the throughput

23 22 Results - Throughput netperf memcached The suggested designs show 90%-95% throughtput of No-IOMMU design. IOMMU overhead is essentially constant and does not get worse with more concurrency.

24 23 Results - Latency At 16 cores, EiovaR shows a gap due to the lock Suggested designs are on par with No-IOMMU

25 Results Memory consumption 24 kmalloc : tradeoff between memory and performance DIM : no effort to pack pages densely Magazines : Not returning freed IOVAs to global allocator

26 25 Conclusion IOVA allocation and IOTLB invalidation are two main bottlenecks for scalability. Dynamic Identity mapping is efficient but make additional cost to manage IOMMU page table. IOVA-kmalloc is simple to implement with high performance, but suffers from unbounded page table blowup Magazine also shows high performance, however deferring freed IOVA return could be an obstacle.

27 26 Dynamic Identity Mapping Conflicting Access Permissions Page table entry 0x Read Write Must handle with the case that buffers with different permissions stay in a same page. Promoting the permission is prohibited.

Utilizing the IOMMU Scalably

Utilizing the IOMMU Scalably Utilizing the IOMMU Scalably USENIX Annual Technical Conference 2015 Omer Peleg, Adam Morrison, Benjamin Serebrin * and Dan Tsafrir * Google In This Talk IOMMU overview Main challenges to OSes Current

More information

Utilizing the IOMMU Scalably

Utilizing the IOMMU Scalably Utilizing the IOMMU Scalably Omer Peleg and Adam Morrison, Technion Israel Institute of Technology; Benjamin Serebrin, Google; Dan Tsafrir, Technion Israel Institute of Technology https://www.usenix.org/conference/atc15/technical-session/presentation/peleg

More information

Efficient Intra-Operating System Protection Against Harmful DMAs

Efficient Intra-Operating System Protection Against Harmful DMAs Efficient Intra-Operating System Protection Against Harmful DMAs Moshe Malka, Nadav Amit, and Dan Tsafrir, Technion Israel Institute of Technology https://www.usenix.org/conference/fast15/technical-sessions/presentation/malka

More information

Arrakis: The Operating System is the Control Plane

Arrakis: The Operating System is the Control Plane Arrakis: The Operating System is the Control Plane Simon Peter, Jialin Li, Irene Zhang, Dan Ports, Doug Woos, Arvind Krishnamurthy, Tom Anderson University of Washington Timothy Roscoe ETH Zurich Building

More information

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

IX: A Protected Dataplane Operating System for High Throughput and Low Latency IX: A Protected Dataplane Operating System for High Throughput and Low Latency Belay, A. et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Reviewed by Chun-Yu and Xinghao Li Summary In this

More information

Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740

Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740 Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740 A performance study with NVDIMM-N Dell EMC Engineering September 2017 A Dell EMC document category Revisions Date

More information

Operating System Performance and Large Servers 1

Operating System Performance and Large Servers 1 Operating System Performance and Large Servers 1 Hyuck Yoo and Keng-Tai Ko Sun Microsystems, Inc. Mountain View, CA 94043 Abstract Servers are an essential part of today's computing environments. High

More information

viommu/arm: full emulation and virtio-iommu approaches Eric Auger KVM Forum 2017

viommu/arm: full emulation and virtio-iommu approaches Eric Auger KVM Forum 2017 viommu/arm: full emulation and virtio-iommu approaches Eric Auger KVM Forum 2017 Overview Goals & Terminology ARM IOMMU Emulation QEMU Device VHOST Integration VFIO Integration Challenges VIRTIO-IOMMU

More information

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Pilar González-Férez and Angelos Bilas 31 th International Conference on Massive Storage Systems

More information

Virtualization, Xen and Denali

Virtualization, Xen and Denali Virtualization, Xen and Denali Susmit Shannigrahi November 9, 2011 Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 1 / 70 Introduction Virtualization is the technology to allow two

More information

Improve VNF safety with Vhost-User/DPDK IOMMU support

Improve VNF safety with Vhost-User/DPDK IOMMU support Improve VNF safety with Vhost-User/DPDK IOMMU support No UIO anymore! Maxime Coquelin Software Engineer KVM Forum 2017 AGENDA Background Vhost-user device IOTLB implementation Benchmarks Future improvements

More information

RIGHTNOW A C E

RIGHTNOW A C E RIGHTNOW A C E 2 0 1 4 2014 Aras 1 A C E 2 0 1 4 Scalability Test Projects Understanding the results 2014 Aras Overview Original Use Case Scalability vs Performance Scale to? Scaling the Database Server

More information

CPHash: A Cache-Partitioned Hash Table Zviad Metreveli, Nickolai Zeldovich, and M. Frans Kaashoek

CPHash: A Cache-Partitioned Hash Table Zviad Metreveli, Nickolai Zeldovich, and M. Frans Kaashoek Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-211-51 November 26, 211 : A Cache-Partitioned Hash Table Zviad Metreveli, Nickolai Zeldovich, and M. Frans Kaashoek

More information

VMware VMmark V1.1 Results

VMware VMmark V1.1 Results Page 1 of 9 VMware VMmark V1.1 Results Vendor and Hardware Platform: Dell PowerEdge R710 Virtualization Platform: ESX build 150817 VMmark V1.1 Score = 24.00 @ 17 Tiles Tested By: Dell Inc. Test Date: 03/26/2009

More information

FAQ. Release rc2

FAQ. Release rc2 FAQ Release 19.02.0-rc2 January 15, 2019 CONTENTS 1 What does EAL: map_all_hugepages(): open failed: Permission denied Cannot init memory mean? 2 2 If I want to change the number of hugepages allocated,

More information

viommu/arm: full emulation and virtio-iommu approaches Eric Auger KVM Forum 2017

viommu/arm: full emulation and virtio-iommu approaches Eric Auger KVM Forum 2017 viommu/arm: full emulation and virtio-iommu approaches Eric Auger KVM Forum 2017 Overview Goals & Terminology ARM IOMMU Emulation QEMU Device VHOST Integration VFIO Integration Challenges VIRTIO-IOMMU

More information

WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES BIG AND SMALL SERVER PLATFORMS

WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES BIG AND SMALL SERVER PLATFORMS WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES ON BIG AND SMALL SERVER PLATFORMS Shuang Chen*, Shay Galon**, Christina Delimitrou*, Srilatha Manne**, and José Martínez* *Cornell University **Cavium

More information

The Price of Safety: Evaluating IOMMU Performance

The Price of Safety: Evaluating IOMMU Performance The Price of Safety: Evaluating IOMMU Performance Muli Ben-Yehuda 1 Jimi Xenidis 2 Michal Ostrowski 2 Karl Rister 3 Alexis Bruemmer 3 Leendert Van Doorn 4 1 muli@il.ibm.com 2 {jimix,mostrows}@watson.ibm.com

More information

High-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han

High-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han High-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han Seoul National University, Korea Dongduk Women s University, Korea Contents Motivation and Background

More information

Impact of Dell FlexMem Bridge on Microsoft SQL Server Database Performance

Impact of Dell FlexMem Bridge on Microsoft SQL Server Database Performance Impact of Dell FlexMem Bridge on Microsoft SQL Server Database Performance A Dell Technical White Paper Dell Database Solutions Engineering Jisha J Leena Basanthi October 2010 THIS WHITE PAPER IS FOR INFORMATIONAL

More information

Speeding up Linux TCP/IP with a Fast Packet I/O Framework

Speeding up Linux TCP/IP with a Fast Packet I/O Framework Speeding up Linux TCP/IP with a Fast Packet I/O Framework Michio Honda Advanced Technology Group, NetApp michio@netapp.com With acknowledge to Kenichi Yasukata, Douglas Santry and Lars Eggert 1 Motivation

More information

Bridging the Gap between Software and Hardware Techniques for I/O Virtualization

Bridging the Gap between Software and Hardware Techniques for I/O Virtualization Bridging the Gap between Software and Hardware Techniques for I/O Virtualization Jose Renato Santos, Yoshio Turner, G.(John) Janakiraman, Ian Pratt HP Laboratories HPL-28-39 Keyword(s): Virtualization,

More information

Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems

Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems 1 Presented by Hadeel Alabandi Introduction and Motivation 2 A serious issue to the effective utilization

More information

Falcon: Scaling IO Performance in Multi-SSD Volumes. The George Washington University

Falcon: Scaling IO Performance in Multi-SSD Volumes. The George Washington University Falcon: Scaling IO Performance in Multi-SSD Volumes Pradeep Kumar H Howie Huang The George Washington University SSDs in Big Data Applications Recent trends advocate using many SSDs for higher throughput

More information

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

IX: A Protected Dataplane Operating System for High Throughput and Low Latency IX: A Protected Dataplane Operating System for High Throughput and Low Latency Adam Belay et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Presented by Han Zhang & Zaina Hamid Challenges

More information

Accelerate Applications Using EqualLogic Arrays with directcache

Accelerate Applications Using EqualLogic Arrays with directcache Accelerate Applications Using EqualLogic Arrays with directcache Abstract This paper demonstrates how combining Fusion iomemory products with directcache software in host servers significantly improves

More information

Azor: Using Two-level Block Selection to Improve SSD-based I/O caches

Azor: Using Two-level Block Selection to Improve SSD-based I/O caches Azor: Using Two-level Block Selection to Improve SSD-based I/O caches Yannis Klonatos, Thanos Makatos, Manolis Marazakis, Michail D. Flouris, Angelos Bilas {klonatos, makatos, maraz, flouris, bilas}@ics.forth.gr

More information

Operating Systems Design Exam 2 Review: Spring 2012

Operating Systems Design Exam 2 Review: Spring 2012 Operating Systems Design Exam 2 Review: Spring 2012 Paul Krzyzanowski pxk@cs.rutgers.edu 1 Question 1 Under what conditions will you reach a point of diminishing returns where adding more memory may improve

More information

Advanced Operating Systems (CS 202) Virtualization

Advanced Operating Systems (CS 202) Virtualization Advanced Operating Systems (CS 202) Virtualization Virtualization One of the natural consequences of the extensibility research we discussed What is virtualization and what are the benefits? 2 Virtualization

More information

Nested Virtualization Update From Intel. Xiantao Zhang, Eddie Dong Intel Corporation

Nested Virtualization Update From Intel. Xiantao Zhang, Eddie Dong Intel Corporation Nested Virtualization Update From Intel Xiantao Zhang, Eddie Dong Intel Corporation Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,

More information

PostgreSQL as a benchmarking tool

PostgreSQL as a benchmarking tool PostgreSQL as a benchmarking tool How it was used to check and improve the scalability of the DragonFly operating system François Tigeot ftigeot@wolfpond.org 1/21 About Me Independent consultant, sysadmin

More information

Operating Systems. 11. Memory Management Part 3 Kernel Memory Allocation. Paul Krzyzanowski Rutgers University Spring 2015

Operating Systems. 11. Memory Management Part 3 Kernel Memory Allocation. Paul Krzyzanowski Rutgers University Spring 2015 Operating Systems 11. Memory Management Part 3 Kernel Memory Allocation Paul Krzyzanowski Rutgers University Spring 2015 1 Kernel memory The kernel also needs memory User code calls malloc kernel functions

More information

references Virtualization services Topics Virtualization

references Virtualization services Topics Virtualization references Virtualization services Virtual machines Intel Virtualization technology IEEE xplorer, May 2005 Comparison of software and hardware techniques for x86 virtualization ASPLOS 2006 Memory resource

More information

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays Dell EqualLogic Best Practices Series Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays A Dell Technical Whitepaper Jerry Daugherty Storage Infrastructure

More information

(b) External fragmentation can happen in a virtual memory paging system.

(b) External fragmentation can happen in a virtual memory paging system. Alexandria University Faculty of Engineering Electrical Engineering - Communications Spring 2015 Final Exam CS333: Operating Systems Wednesday, June 17, 2015 Allowed Time: 3 Hours Maximum: 75 points Note:

More information

Netchannel 2: Optimizing Network Performance

Netchannel 2: Optimizing Network Performance Netchannel 2: Optimizing Network Performance J. Renato Santos +, G. (John) Janakiraman + Yoshio Turner +, Ian Pratt * + HP Labs - * XenSource/Citrix Xen Summit Nov 14-16, 2007 2003 Hewlett-Packard Development

More information

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1 Memory Management Disclaimer: some slides are adopted from book authors slides with permission 1 Recap Paged MMU: Two main Issues Translation speed can be slow TLB Table size is big Multi-level page table

More information

COL862 Programming Assignment-1

COL862 Programming Assignment-1 Submitted By: Rajesh Kedia (214CSZ8383) COL862 Programming Assignment-1 Objective: Understand the power and energy behavior of various benchmarks on different types of x86 based systems. We explore a laptop,

More information

Caches. Cache Memory. memory hierarchy. CPU memory request presented to first-level cache first

Caches. Cache Memory. memory hierarchy. CPU memory request presented to first-level cache first Cache Memory memory hierarchy CPU memory request presented to first-level cache first if data NOT in cache, request sent to next level in hierarchy and so on CS3021/3421 2017 jones@tcd.ie School of Computer

More information

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1 Memory Management Disclaimer: some slides are adopted from book authors slides with permission 1 CPU management Roadmap Process, thread, synchronization, scheduling Memory management Virtual memory Disk

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

Benchmark Performance Results for Pervasive PSQL v11. A Pervasive PSQL White Paper September 2010

Benchmark Performance Results for Pervasive PSQL v11. A Pervasive PSQL White Paper September 2010 Benchmark Performance Results for Pervasive PSQL v11 A Pervasive PSQL White Paper September 2010 Table of Contents Executive Summary... 3 Impact Of New Hardware Architecture On Applications... 3 The Design

More information

Xen Network I/O Performance Analysis and Opportunities for Improvement

Xen Network I/O Performance Analysis and Opportunities for Improvement Xen Network I/O Performance Analysis and Opportunities for Improvement J. Renato Santos G. (John) Janakiraman Yoshio Turner HP Labs Xen Summit April 17-18, 27 23 Hewlett-Packard Development Company, L.P.

More information

SRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores

SRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores SRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores Xiaoning Ding The Ohio State University dingxn@cse.ohiostate.edu

More information

Status of the Linux Slab Allocators

Status of the Linux Slab Allocators Status of the Linux Slab Allocators David Rientjes rientjes@google.com SCALE 9X February 26, 2011 Los Angeles, California 1 of 13 Status of the Linux Slab Allocators As of 2.6.37.1, the latest stable release

More information

1-Gigabit TCP Offload Engine

1-Gigabit TCP Offload Engine White Paper 1-Gigabit TCP Offload Engine Achieving greater data center efficiencies by providing Green conscious and cost-effective reductions in power consumption. July 2009 Third party information brought

More information

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of California, Berkeley Operating Systems Principles

More information

Fairness Issues in Software Virtual Routers

Fairness Issues in Software Virtual Routers Fairness Issues in Software Virtual Routers Norbert Egi, Adam Greenhalgh, h Mark Handley, Mickael Hoerdt, Felipe Huici, Laurent Mathy Lancaster University PRESTO 2008 Presenter: Munhwan Choi Virtual Router

More information

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory Recall: Address Space Map 13: Memory Management Biggest Virtual Address Stack (Space for local variables etc. For each nested procedure call) Sometimes Reserved for OS Stack Pointer Last Modified: 6/21/2004

More information

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr

More information

Microsoft SQL Server 2012 Fast Track Reference Configuration Using PowerEdge R720 and EqualLogic PS6110XV Arrays

Microsoft SQL Server 2012 Fast Track Reference Configuration Using PowerEdge R720 and EqualLogic PS6110XV Arrays Microsoft SQL Server 2012 Fast Track Reference Configuration Using PowerEdge R720 and EqualLogic PS6110XV Arrays This whitepaper describes Dell Microsoft SQL Server Fast Track reference architecture configurations

More information

ffwd: delegation is (much) faster than you think Sepideh Roghanchi, Jakob Eriksson, Nilanjana Basu

ffwd: delegation is (much) faster than you think Sepideh Roghanchi, Jakob Eriksson, Nilanjana Basu ffwd: delegation is (much) faster than you think Sepideh Roghanchi, Jakob Eriksson, Nilanjana Basu int get_seqno() { } return ++seqno; // ~1 Billion ops/s // single-threaded int threadsafe_get_seqno()

More information

Consolidating OLTP Workloads on Dell PowerEdge R th generation Servers

Consolidating OLTP Workloads on Dell PowerEdge R th generation Servers Consolidating OLTP Workloads on Dell PowerEdge R720 12 th generation Servers B Balamurugan Phani MV Dell Database Solutions Engineering March 2012 This document is for informational purposes only and may

More information

Reduce Costs & Increase Oracle Database OLTP Workload Service Levels:

Reduce Costs & Increase Oracle Database OLTP Workload Service Levels: Reduce Costs & Increase Oracle Database OLTP Workload Service Levels: PowerEdge 2950 Consolidation to PowerEdge 11th Generation A Dell Technical White Paper Dell Database Solutions Engineering Balamurugan

More information

UNIT III MEMORY MANAGEMENT

UNIT III MEMORY MANAGEMENT UNIT III MEMORY MANAGEMENT TOPICS TO BE COVERED 3.1 Memory management 3.2 Contiguous allocation i Partitioned memory allocation ii Fixed & variable partitioning iii Swapping iv Relocation v Protection

More information

Use of the Internet SCSI (iscsi) protocol

Use of the Internet SCSI (iscsi) protocol A unified networking approach to iscsi storage with Broadcom controllers By Dhiraj Sehgal, Abhijit Aswath, and Srinivas Thodati In environments based on Internet SCSI (iscsi) and 10 Gigabit Ethernet, deploying

More information

COMP9242 Advanced OS. S2/2017 W03: Caches: What Every OS Designer Must

COMP9242 Advanced OS. S2/2017 W03: Caches: What Every OS Designer Must COMP9242 Advanced OS S2/2017 W03: Caches: What Every OS Designer Must Know @GernotHeiser Copyright Notice These slides are distributed under the Creative Commons Attribution 3.0 License You are free: to

More information

CSE 120 Principles of Operating Systems

CSE 120 Principles of Operating Systems CSE 120 Principles of Operating Systems Spring 2018 Lecture 15: Multicore Geoffrey M. Voelker Multicore Operating Systems We have generally discussed operating systems concepts independent of the number

More information

Paging. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Paging. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Paging Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Paging Allows the physical address space of a process to be noncontiguous Divide virtual

More information

Keeping up with the hardware

Keeping up with the hardware Keeping up with the hardware Challenges in scaling I/O performance Jonathan Davies XenServer System Performance Lead XenServer Engineering, Citrix Cambridge, UK 18 Aug 2015 Jonathan Davies (Citrix) Keeping

More information

CHAPTER 2: PROCESS MANAGEMENT

CHAPTER 2: PROCESS MANAGEMENT 1 CHAPTER 2: PROCESS MANAGEMENT Slides by: Ms. Shree Jaswal TOPICS TO BE COVERED Process description: Process, Process States, Process Control Block (PCB), Threads, Thread management. Process Scheduling:

More information

Using Transparent Compression to Improve SSD-based I/O Caches

Using Transparent Compression to Improve SSD-based I/O Caches Using Transparent Compression to Improve SSD-based I/O Caches Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr

More information

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1 Memory Management Disclaimer: some slides are adopted from book authors slides with permission 1 Demand paging Concepts to Learn 2 Abstraction Virtual Memory (VM) 4GB linear address space for each process

More information

Virtual Switch Acceleration with OVS-TC

Virtual Switch Acceleration with OVS-TC WHITE PAPER Virtual Switch Acceleration with OVS-TC HARDWARE ACCELERATED OVS-TC PROVIDES BETTER CPU EFFICIENCY, LOWER COMPLEXITY, ENHANCED SCALABILITY AND INCREASED NETWORK PERFORMANCE COMPARED TO KERNEL-

More information

The Google File System

The Google File System The Google File System By Ghemawat, Gobioff and Leung Outline Overview Assumption Design of GFS System Interactions Master Operations Fault Tolerance Measurements Overview GFS: Scalable distributed file

More information

I, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3.

I, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3. 5 Solutions Chapter 5 Solutions S-3 5.1 5.1.1 4 5.1.2 I, J 5.1.3 A[I][J] 5.1.4 3596 8 800/4 2 8 8/4 8000/4 5.1.5 I, J 5.1.6 A(J, I) 5.2 5.2.1 Word Address Binary Address Tag Index Hit/Miss 5.2.2 3 0000

More information

Epilogue. Thursday, December 09, 2004

Epilogue. Thursday, December 09, 2004 Epilogue Thursday, December 09, 2004 2:16 PM We have taken a rather long journey From the physical hardware, To the code that manages it, To the optimal structure of that code, To models that describe

More information

Virtual Virtual Memory

Virtual Virtual Memory Virtual Virtual Memory Jason Power 3/20/2015 With contributions from Jayneel Gandhi and Lena Olson 4/17/2015 UNIVERSITY OF WISCONSIN 1 Virtual Machine History 1970 s: VMMs 1997: Disco 1999: VMWare (binary

More information

Tolerating Malicious Drivers in Linux. Silas Boyd-Wickizer and Nickolai Zeldovich

Tolerating Malicious Drivers in Linux. Silas Boyd-Wickizer and Nickolai Zeldovich XXX Tolerating Malicious Drivers in Linux Silas Boyd-Wickizer and Nickolai Zeldovich How could a device driver be malicious? Today's device drivers are highly privileged Write kernel memory, allocate memory,...

More information

VMware VMmark V1.1 Results

VMware VMmark V1.1 Results VMware VMmark V1.1 Results Vendor and Hardware Platform: HP ProLiant DL370 G6 Virtualization Platform: VMware ESX 4.0 build 148783 VMmark V1.1 Score = 23.96 @ 16 Tiles Performance Section Performance Tested

More information

What s An OS? Cyclic Executive. Interrupts. Advantages Simple implementation Low overhead Very predictable

What s An OS? Cyclic Executive. Interrupts. Advantages Simple implementation Low overhead Very predictable What s An OS? Provides environment for executing programs Process abstraction for multitasking/concurrency scheduling Hardware abstraction layer (device drivers) File systems Communication Do we need an

More information

Four-Socket Server Consolidation Using SQL Server 2008

Four-Socket Server Consolidation Using SQL Server 2008 Four-Socket Server Consolidation Using SQL Server 28 A Dell Technical White Paper Authors Raghunatha M Leena Basanthi K Executive Summary Businesses of all sizes often face challenges with legacy hardware

More information

Main Memory. Electrical and Computer Engineering Stephen Kim ECE/IUPUI RTOS & APPS 1

Main Memory. Electrical and Computer Engineering Stephen Kim ECE/IUPUI RTOS & APPS 1 Main Memory Electrical and Computer Engineering Stephen Kim (dskim@iupui.edu) ECE/IUPUI RTOS & APPS 1 Main Memory Background Swapping Contiguous allocation Paging Segmentation Segmentation with paging

More information

Virtual Memory. Virtual Memory

Virtual Memory. Virtual Memory Virtual Memory Virtual Memory Main memory is cache for secondary storage Secondary storage (disk) holds the complete virtual address space Only a portion of the virtual address space lives in the physical

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 23

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 23 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 208 Lecture 23 LAST TIME: VIRTUAL MEMORY Began to focus on how to virtualize memory Instead of directly addressing physical memory, introduce a level of indirection

More information

Multiprocessor Systems. Chapter 8, 8.1

Multiprocessor Systems. Chapter 8, 8.1 Multiprocessor Systems Chapter 8, 8.1 1 Learning Outcomes An understanding of the structure and limits of multiprocessor hardware. An appreciation of approaches to operating system support for multiprocessor

More information

RiSE: Relaxed Systems Engineering? Christoph Kirsch University of Salzburg

RiSE: Relaxed Systems Engineering? Christoph Kirsch University of Salzburg RiSE: Relaxed Systems Engineering? Christoph Kirsch University of Salzburg Application: >10k #threads, producer/consumer, blocking Hardware: CPUs, cores, MMUs,, caches Application: >10k #threads, producer/consumer,

More information

Operating System Supports for SCM as Main Memory Systems (Focusing on ibuddy)

Operating System Supports for SCM as Main Memory Systems (Focusing on ibuddy) 2011 NVRAMOS Operating System Supports for SCM as Main Memory Systems (Focusing on ibuddy) 2011. 4. 19 Jongmoo Choi http://embedded.dankook.ac.kr/~choijm Contents Overview Motivation Observations Proposal:

More information

Non-Volatile Memory Through Customized Key-Value Stores

Non-Volatile Memory Through Customized Key-Value Stores Non-Volatile Memory Through Customized Key-Value Stores Leonardo Mármol 1 Jorge Guerra 2 Marcos K. Aguilera 2 1 Florida International University 2 VMware L. Mármol, J. Guerra, M. K. Aguilera (FIU and VMware)

More information

Oracle Database 12c: JMS Sharded Queues

Oracle Database 12c: JMS Sharded Queues Oracle Database 12c: JMS Sharded Queues For high performance, scalable Advanced Queuing ORACLE WHITE PAPER MARCH 2015 Table of Contents Introduction 2 Architecture 3 PERFORMANCE OF AQ-JMS QUEUES 4 PERFORMANCE

More information

Dongjun Shin Samsung Electronics

Dongjun Shin Samsung Electronics 2014.10.31. Dongjun Shin Samsung Electronics Contents 2 Background Understanding CPU behavior Experiments Improvement idea Revisiting Linux I/O stack Conclusion Background Definition 3 CPU bound A computer

More information

Motivations for Virtual Memory Virtual Memory Oct. 29, Why VM Works? Motivation #1: DRAM a Cache for Disk

Motivations for Virtual Memory Virtual Memory Oct. 29, Why VM Works? Motivation #1: DRAM a Cache for Disk class8.ppt 5-23 The course that gives CMU its Zip! Virtual Oct. 29, 22 Topics Motivations for VM Address translation Accelerating translation with TLBs Motivations for Virtual Use Physical DRAM as a Cache

More information

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8. Multiprocessor System Multiprocessor Systems Chapter 8, 8.1 We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than

More information

Designing a True Direct-Access File System with DevFS

Designing a True Direct-Access File System with DevFS Designing a True Direct-Access File System with DevFS Sudarsun Kannan, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau University of Wisconsin-Madison Yuangang Wang, Jun Xu, Gopinath Palani Huawei Technologies

More information

Task Scheduling of Real- Time Media Processing with Hardware-Assisted Virtualization Heikki Holopainen

Task Scheduling of Real- Time Media Processing with Hardware-Assisted Virtualization Heikki Holopainen Task Scheduling of Real- Time Media Processing with Hardware-Assisted Virtualization Heikki Holopainen Aalto University School of Electrical Engineering Degree Programme in Communications Engineering Supervisor:

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Computer Systems Engineering: Spring Quiz I

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Computer Systems Engineering: Spring Quiz I Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.033 Computer Systems Engineering: Spring 2016 Quiz I There are 15 questions and 13 pages in this quiz booklet.

More information

Implementation and Evaluation of Moderate Parallelism in the BIND9 DNS Server

Implementation and Evaluation of Moderate Parallelism in the BIND9 DNS Server Implementation and Evaluation of Moderate Parallelism in the BIND9 DNS Server JINMEI, Tatuya / Toshiba Paul Vixie / Internet Systems Consortium [Supported by SCOPE of the Ministry of Internal Affairs and

More information

Chapter 12: File System Implementation

Chapter 12: File System Implementation Chapter 12: File System Implementation Silberschatz, Galvin and Gagne 2013 Chapter 12: File System Implementation File-System Structure File-System Implementation Allocation Methods Free-Space Management

More information

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES Anish Athalye and Patrick Long Mentors: Austin Clements and Stephen Tu 3 rd annual MIT PRIMES Conference Sequential

More information

Multiprocessor Systems. COMP s1

Multiprocessor Systems. COMP s1 Multiprocessor Systems 1 Multiprocessor System We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than one CPU to improve

More information

Designing Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services. Presented by: Jitong Chen

Designing Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services. Presented by: Jitong Chen Designing Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services Presented by: Jitong Chen Outline Architecture of Web-based Data Center Three-Stage framework to benefit

More information

Software Routers: NetMap

Software Routers: NetMap Software Routers: NetMap Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking October 8, 2014 Slides from the NetMap: A Novel Framework for

More information

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1 Memory Management Disclaimer: some slides are adopted from book authors slides with permission 1 CPU management Roadmap Process, thread, synchronization, scheduling Memory management Virtual memory Disk

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Memory Management

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Memory Management ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part I: Operating system overview: Memory Management 1 Hardware background The role of primary memory Program

More information

CSE 120 Principles of Operating Systems

CSE 120 Principles of Operating Systems CSE 120 Principles of Operating Systems Spring 2018 Lecture 16: Virtual Machine Monitors Geoffrey M. Voelker Virtual Machine Monitors 2 Virtual Machine Monitors Virtual Machine Monitors (VMMs) are a hot

More information

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed

More information

Virtual Memory Oct. 29, 2002

Virtual Memory Oct. 29, 2002 5-23 The course that gives CMU its Zip! Virtual Memory Oct. 29, 22 Topics Motivations for VM Address translation Accelerating translation with TLBs class9.ppt Motivations for Virtual Memory Use Physical

More information

Linux Network Tuning Guide for AMD EPYC Processor Based Servers

Linux Network Tuning Guide for AMD EPYC Processor Based Servers Linux Network Tuning Guide for AMD EPYC Processor Application Note Publication # 56224 Revision: 1.00 Issue Date: November 2017 Advanced Micro Devices 2017 Advanced Micro Devices, Inc. All rights reserved.

More information

CS5460: Operating Systems Lecture 14: Memory Management (Chapter 8)

CS5460: Operating Systems Lecture 14: Memory Management (Chapter 8) CS5460: Operating Systems Lecture 14: Memory Management (Chapter 8) Important from last time We re trying to build efficient virtual address spaces Why?? Virtual / physical translation is done by HW and

More information

CIS Operating Systems Memory Management Cache and Demand Paging. Professor Qiang Zeng Spring 2018

CIS Operating Systems Memory Management Cache and Demand Paging. Professor Qiang Zeng Spring 2018 CIS 3207 - Operating Systems Memory Management Cache and Demand Paging Professor Qiang Zeng Spring 2018 Process switch Upon process switch what is updated in order to assist address translation? Contiguous

More information