Keeping up with the hardware

Size: px
Start display at page:

Download "Keeping up with the hardware"

Transcription

1 Keeping up with the hardware Challenges in scaling I/O performance Jonathan Davies XenServer System Performance Lead XenServer Engineering, Citrix Cambridge, UK 18 Aug 2015 Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

2 Outline 1 The virtualisation performance challenge 2 Networking performance 3 Storage performance Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

3 Outline The virtualisation performance challenge 1 The virtualisation performance challenge 2 Networking performance 3 Storage performance Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

4 The virtualisation performance challenge Recent hardware trends 100 Gb/s 40 Gb/s 10 Gb/s NICs speed (log scale) 1 Gb/s CPUs disks NVMe HDD SSD Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

5 The virtualisation performance challenge Virtualisation overhead is increasing As I/O devices get faster but CPU speeds remain constant, this means the relative virtualisation overhead increases: Old I/O devices time spent on physical device overhead Modern I/O devices overhead Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

6 Outline Networking performance 1 The virtualisation performance challenge 2 Networking performance 3 Storage performance Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

7 Networking performance Areas of weak networking performance Metric Xen s performance Intrahost VM-to-VM throughput weak Intrahost aggregate throughput weak Interhost from-vm transmit throughput strong Interhost into-vm receive throughput weak Interhost aggregate throughput strong Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

8 Outline Networking performance Improving intrahost single-stream throughput 1 The virtualisation performance challenge 2 Networking performance Improving intrahost single-stream throughput Improving intrahost aggregate throughput Summary 3 Storage performance Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

9 Where do we stand? Networking performance Improving intrahost single-stream throughput Intrahost VM-to-VM single-stream throughput measurements (using CentOS 7): XenServer Gb/s Target more is better 30 Gb/s Dell R720 (2 Xeon E v2) Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

10 Networking performance Improving intrahost single-stream throughput It s even worse with an upstream guest kernel! Intrahost VM-to-VM single-stream throughput measurements (using CentOS 7): XenServer Gb/s (guests with 4.0 kernel) 9 Gb/s Target more is better 30 Gb/s Dell R720 (2 Xeon E v2) Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

11 Networking performance Improving intrahost single-stream throughput Datapath analysis with 4.0 kernel in guests tsc/ rx netfront passing skb to kernel rx netfront put on rxq rx netfront lling frags rx netfront dequeuing skb from tmpq rx netfront enqueuing skb on tmpq rx netfront reading from rx slot tx netfront freeing skb tx netfront received tx response dealloc thread sent tx response dealloc thread releasing dealloc thread got from dealloc ring rx netback put in dealloc ring rx netback freeing skb rx netback dequeued from rxq rx netback gntcpy nished rx netback enqueued in rxq rx netback kicking receive thread rx netback device received skb bridge delivered skb bridge received skb tx netback passing skb to kernel tx netback lling frags tx netback dequeued skb from tx_queue tx netback nished gntmap tx netback nished gntcpy tx netback nished build_gops tx netback enqueued on tx_queue tx netback allocated skb tx netback reading from rst tx slot tx netfront written to last tx slot tx netfront written to rst tx slot tx kernel passes skb to netfront tx kernel passes skb to ip layer tx kernel clones skb tx kernel in tcp_transmit_skb tx kernel calling tcp_transmit_skb Two CentOS 7.0 VMs (4.0.9 kernel) on Dell R720 (2 Xeon E v2) Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

12 Networking performance Improving intrahost single-stream throughput Datapath analysis with 4.0 kernel in guests tsc/ rx netfront passing skb to kernel rx netfront put on rxq rx netfront filling frags rx netfront dequeuing skb from tmpq rx netfront enqueuing skb on tmpq rx netfront reading from rx slot tx netfront freeing skb tx netfront received tx response dealloc thread sent tx response dealloc thread releasing dealloc thread got from dealloc ring rx netback put in dealloc ring rx netback freeing skb rx netback dequeued from rxq rx netback gntcpy finished rx netback enqueued in rxq rx netback kicking receive thread rx netback device received skb bridge delivered skb bridge received skb tx netback passing skb to kernel tx netback filling frags tx netback dequeued skb from tx_queue tx netback finished gntmap tx netback finished gntcpy tx netback finished build_gops tx netback enqueued on tx_queue tx netback allocated skb tx netback reading from first tx slot tx netfront written to last tx slot tx netfront written to first tx slot tx kernel passes skb to netfront tx kernel passes skb to ip layer tx kernel clones skb tx kernel in tcp_transmit_skb tx kernel calling tcp_transmit_skb Two CentOS 7.0 VMs (4.0.9 kernel) on Dell R720 (2 Xeon E v2) Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

13 Networking performance Improving intrahost single-stream throughput Transmitter often stalls; only ever two packets in flight tsc/ rx netfront passing skb to kernel rx netfront put on rxq rx netfront filling frags rx netfront dequeuing skb from tmpq rx netfront enqueuing skb on tmpq rx netfront reading from rx slot tx netfront freeing skb tx netfront received tx response dealloc thread sent tx response dealloc thread releasing dealloc thread got from dealloc ring rx netback put in dealloc ring rx netback freeing skb rx netback dequeued from rxq rx netback gntcpy finished rx netback enqueued in rxq rx netback kicking receive thread rx netback device received skb bridge delivered skb bridge received skb tx netback passing skb to kernel tx netback filling frags tx netback dequeued skb from tx_queue tx netback finished gntmap tx netback finished gntcpy tx netback finished build_gops tx netback enqueued on tx_queue tx netback allocated skb tx netback reading from first tx slot tx netfront written to last tx slot tx netfront written to first tx slot tx kernel passes skb to netfront tx kernel passes skb to ip layer tx kernel clones skb tx kernel in tcp_transmit_skb tx kernel calling tcp_transmit_skb Red boxes: periods when netfront is not running Two CentOS 7.0 VMs (4.0.9 kernel) on Dell R720 (2 Xeon E v2) Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

14 Networking performance Improving intrahost single-stream throughput Principal bottleneck: high TX completion latency High TX completion latency is a serious problem with guests using 4.x kernels, which aggressively limit the amount of uncompleted data. Definition of TX completion latency TX completion latency time skb generated by guest request put in TX ring request consumed by dom0 response received in TX ring Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

15 Networking performance Improving intrahost single-stream throughput The transmitter waits for TX completion tsc/ rx netfront passing skb to kernel rx netfront put on rxq rx netfront filling frags rx netfront dequeuing skb from tmpq rx netfront enqueuing skb on tmpq rx netfront reading from rx slot tx netfront freeing skb tx netfront received tx response dealloc thread sent tx response dealloc thread releasing dealloc thread got from dealloc ring rx netback put in dealloc ring rx netback freeing skb rx netback dequeued from rxq rx netback gntcpy finished rx netback enqueued in rxq rx netback kicking receive thread rx netback device received skb bridge delivered skb bridge received skb tx netback passing skb to kernel tx netback filling frags tx netback dequeued skb from tx_queue tx netback finished gntmap tx netback finished gntcpy tx netback finished build_gops tx netback enqueued on tx_queue tx netback allocated skb tx netback reading from first tx slot tx netfront written to last tx slot tx netfront written to first tx slot tx kernel passes skb to netfront tx kernel passes skb to ip layer tx kernel clones skb tx kernel in tcp_transmit_skb tx kernel calling tcp_transmit_skb Yellow slice: point of TX completion Two CentOS 7.0 VMs (4.0.9 kernel) on Dell R720 (2 Xeon E v2) Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

16 Networking performance Improving intrahost single-stream throughput Principal bottleneck: high TX completion latency Idea to reduce TX completion latency 1 Pretend TX completion happens after netback consumes the request. This can be done using skb_orphan, which decouples freeing from skb accounting Rationale: On physical NIC drivers, TX completion occurs when the packet has hit the wire, not when it has gone into the receiver s queue. effective TX completion latency time skb generated by guest request put in TX ring request consumed by dom0 orphan the skb? response received in TX ring Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

17 Networking performance Improving intrahost single-stream throughput Datapath analysis with 3.18 kernel in guests tsc/ rx netfront passing skb to kernel rx netfront put on rxq rx netfront filling frags rx netfront dequeuing skb from tmpq rx netfront enqueuing skb on tmpq rx netfront reading from rx slot tx netfront freeing skb tx netfront received tx response dealloc thread sent tx response dealloc thread releasing dealloc thread got from dealloc ring rx netback put in dealloc ring rx netback freeing skb rx netback dequeued from rxq rx netback gntcpy finished rx netback enqueued in rxq rx netback kicking receive thread rx netback device received skb bridge delivered skb bridge received skb tx netback passing skb to kernel tx netback filling frags tx netback dequeued skb from tx_queue tx netback finished gntmap tx netback finished gntcpy tx netback finished build_gops tx netback enqueued on tx_queue tx netback allocated skb tx netback reading from first tx slot tx netfront written to last tx slot tx netfront written to first tx slot tx kernel passes skb to netfront tx kernel passes skb to ip layer tx kernel clones skb tx kernel in tcp_transmit_skb tx kernel calling tcp_transmit_skb Two CentOS 7.0 VMs ( kernel) on Dell R720 (2 Xeon E v2) Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

18 Networking performance Improving intrahost single-stream throughput The main problem is still TX completion latency tsc/ rx netfront passing skb to kernel rx netfront put on rxq rx netfront filling frags rx netfront dequeuing skb from tmpq rx netfront enqueuing skb on tmpq rx netfront reading from rx slot tx netfront freeing skb tx netfront received tx response dealloc thread sent tx response dealloc thread releasing dealloc thread got from dealloc ring rx netback put in dealloc ring rx netback freeing skb rx netback dequeued from rxq rx netback gntcpy finished rx netback enqueued in rxq rx netback kicking receive thread rx netback device received skb bridge delivered skb bridge received skb tx netback passing skb to kernel tx netback filling frags tx netback dequeued skb from tx_queue tx netback finished gntmap tx netback finished gntcpy tx netback finished build_gops tx netback enqueued on tx_queue tx netback allocated skb tx netback reading from first tx slot tx netfront written to last tx slot tx netfront written to first tx slot tx kernel passes skb to netfront tx kernel passes skb to ip layer tx kernel clones skb tx kernel in tcp_transmit_skb tx kernel calling tcp_transmit_skb Red boxes: periods when netfront is not running Two CentOS 7.0 VMs ( kernel) on Dell R720 (2 Xeon E v2) Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

19 Networking performance Improving intrahost single-stream throughput Next bottleneck: NAPI CPU utilisation tsc/ rx netfront passing skb to kernel rx netfront put on rxq rx netfront filling frags rx netfront dequeuing skb from tmpq rx netfront enqueuing skb on tmpq rx netfront reading from rx slot tx netfront freeing skb tx netfront received tx response dealloc thread sent tx response dealloc thread releasing dealloc thread got from dealloc ring rx netback put in dealloc ring rx netback freeing skb rx netback dequeued from rxq rx netback gntcpy finished rx netback enqueued in rxq rx netback kicking receive thread rx netback device received skb bridge delivered skb bridge received skb tx netback passing skb to kernel tx netback filling frags tx netback dequeued skb from tx_queue tx netback finished gntmap tx netback finished gntcpy tx netback finished build_gops tx netback enqueued on tx_queue tx netback allocated skb tx netback reading from first tx slot tx netfront written to last tx slot tx netfront written to first tx slot tx kernel passes skb to netfront tx kernel passes skb to ip layer tx kernel clones skb tx kernel in tcp_transmit_skb tx kernel calling tcp_transmit_skb Red boxes: periods when NAPI is not running Two CentOS 7.0 VMs ( kernel) on Dell R720 (2 Xeon E v2) Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

20 Networking performance Improving intrahost single-stream throughput Next bottleneck: NAPI CPU utilisation After TX completion latency, the next bottleneck is that netback s NAPI thread (softirq context) fully utilises a CPU. Ideas to reduce NAPI CPU utilisation 1 Avoid spilling over into a frag-list by copying more Rationale: It s much more costly to handle an skb with a frag-list, so try to fit the data into a single skb. For intrahost VM-to-VM traffic, around 30% of skbs have a frag-list. 2 Unbatch grant-map Rationale: Historically, batching was best due to the overheads in the hypercall. But recent improvements in grant-map locking means it s no longer so expensive. Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

21 Networking performance Improving intrahost single-stream throughput Avoiding frag-lists and unbatching grant-map tsc/ rx netfront passing skb to kernel rx netfront put on rxq rx netfront filling frags rx netfront dequeuing skb from tmpq rx netfront enqueuing skb on tmpq rx netfront reading from rx slot tx netfront freeing skb tx netfront received tx response dealloc thread sent tx response dealloc thread releasing dealloc thread got from dealloc ring rx netback put in dealloc ring rx netback freeing skb rx netback dequeued from rxq rx netback gntcpy finished rx netback enqueued in rxq rx netback kicking receive thread rx netback device received skb bridge delivered skb bridge received skb tx netback passing skb to kernel tx netback filling frags tx netback dequeued skb from tx_queue tx netback finished gntmap tx netback finished gntcpy tx netback finished build_gops tx netback enqueued on tx_queue tx netback allocated skb tx netback reading from first tx slot tx netfront written to last tx slot tx netfront written to first tx slot tx kernel passes skb to netfront tx kernel passes skb to ip layer tx kernel clones skb tx kernel in tcp_transmit_skb tx kernel calling tcp_transmit_skb Red boxes: periods when NAPI is not running Two CentOS 7.0 VMs ( kernel) on Dell R720 (2 Xeon E v2) Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

22 Networking performance NAPI CPU utilisation bottleneck Improving intrahost single-stream throughput These ideas make the datapath look a lot cleaner, but don t reduce the CPU utilisation noticeably. Conclusion Further work required to increase the efficiency of the NAPI thread. Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

23 Outline Networking performance Improving intrahost aggregate throughput 1 The virtualisation performance challenge 2 Networking performance Improving intrahost single-stream throughput Improving intrahost aggregate throughput Summary 3 Storage performance Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

24 Networking performance Improving intrahost aggregate throughput Intrahost aggregate throughput measurements XenServer Gb/s Target more is better Dell R730 (2 Xeon E v3) s Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

25 Networking performance Improving intrahost aggregate throughput Intrahost aggregate throughput analysis Intrahost aggregate throughput is typically limited by dom0 CPU utilisation. Ideas to improve aggregate throughput 1 Improve grant-map scalability: per-vcpu maptrack free lists already in Xen 4.6 per-active entry locking already in Xen 4.6 avoid TLB flush on unmap patches proposed by Malcolm Crossley 2 Provide dom0 with more CPU power Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

26 Networking performance Improving intrahost aggregate throughput Grant-map locking improvements have really helped before improvements after improvements Aggregate intrahost throughput, 40 VMs aggregate throughput (Gb/s) dom0 vcpus Dell R730 (2 Xeon E v3) Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

27 Outline Networking performance Summary 1 The virtualisation performance challenge 2 Networking performance Improving intrahost single-stream throughput Improving intrahost aggregate throughput Summary 3 Storage performance Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

28 Summary Networking performance Summary Bottlenecks with intrahost VM-to-VM throughput (listed in order): TX completion latency potential mitigation using skb_orphan NAPI CPU utilisation prototype showed minimal improvement Bottlenecks with aggregate intrahost throughput: dom0 CPU utilisation already improved in Xen 4.6 Future work Work to minimise TX completion latency required to avoid regression with recent kernels Further optimisations need implementing Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

29 Outline Storage performance 1 The virtualisation performance challenge 2 Networking performance 3 Storage performance Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

30 Storage performance Xen is weakest in single-vbd performance Metric Xen s performance Single-VBD throughput weak Multiple-VBD aggregate throughput strong For example, consider 4 KB serial IOPS: XenServer 6.5 Target more is better Debian 6.0 VM on Dell R815 (Opteron 6272), Intel S3700 SSD Deficiencies with single-vbd performance 1 Latency is too high 2 Not enough data in-flight 3 Backend CPU utilisation too high Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

31 Outline Storage performance Reduce latency 1 The virtualisation performance challenge 2 Networking performance 3 Storage performance Reduce latency Allow more data in-flight Summary Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

32 Reduce latency Storage performance Reduce latency The problem Latency is too high. This especially impacts serial I/O with small block sizes. XenServer uses tapdisk3, a user-space backend using grant-copy via the gntdev. Ideas to reduce latency 1 Polling in the backend Rationale: Event-channel and backend-scheduling latency is too high. 2 Use grant-map in the backend Rationale: In principle, grant-copy should be slower than grant-map. Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

33 Storage performance Idea 1: Polling in the backend Reduce latency Single-threaded sequential reads, queue-depth 1, varying block size with polling (1 ms) without polling IOPS block size (KB) Debian 6.0 VM on Dell R720 (2 Xeon E v2), Micron P320h SSD Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

34 Storage performance Idea 1: Polling in the backend Reduce latency Polling for just 1 millisecond can yield a significant improvement 1. The faster the disk, the bigger the improvement 2. Conclusion XenServer will likely adopt polling in tapdisk3. But we need to be careful about eating too much CPU, which can hurt multi-vbd aggregate throughput. 1 On blkback the improvement may be even larger. 2 Until the tapdisk3 process fully utilises a CPU even when not polling the next bottleneck. Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

35 Storage performance Reduce latency Idea 2: Grant-map in the backend 8000 Single-threaded sequential reads, queue-depth IOPS block size (KB) Debian 6.0 VM on Dell R720, Intel S3700 SSD Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

36 Storage performance Reduce latency Idea 2: Grant-map in the backend So grant-copy is still faster in practice, despite recent improvements to grant-map locking. This suggests inefficiency issues with the gntdev...? Conclusion XenServer will likely retain grant-copy for now. Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

37 Outline Storage performance Allow more data in-flight 1 The virtualisation performance challenge 2 Networking performance 3 Storage performance Reduce latency Allow more data in-flight Summary Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

38 Storage performance Allow more data in-flight Allow more data in-flight The problem Each blkif ring supports 32 slots, each of which can address up to 44 KB, i.e. a total of MB. Meanwhile, modern disks and arrays can give better throughput when issued with more than this. Ideas to get more data in-flight 1 Multi-queue patches proposed by Bob Liu Rationale: more than one blkif ring per device 2 Multi-page ring patches proposed by Bob Liu Rationale: larger blkif ring 3 Indirect descriptors available since kernel 3.11 Rationale: ability to address more data per ring slot Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

39 Storage performance Allow more data in-flight Idea 1: Multi-queue measurements Sequential reads, 8 threads, queue-depth 32, varying block size IOPS block size (KB) Ubuntu VM using blkback on Dell R720 (2 Xeon E v2), Micron P320h SSD Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

40 Storage performance Allow more data in-flight Idea 1: Multi-queue measurements in context Sequential reads, 8 threads, queue-depth 32, varying block size IOPS block size (KB) Ubuntu VM using blkback on Dell R720 (2 Xeon E v2), Micron P320h SSD Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

41 Idea 1: Multi-queue Storage performance Allow more data in-flight Adding multi-queue support hurts performance for small block sizes. Explanation Explanation pending! The guest does no request merging. We rely on merging to get good performance on modern disks for sequential I/O. Conclusion Unless the sequential I/O performance when requests are merged can be retained, XenServer will likely not adopt multi-queue. Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

42 Storage performance Allow more data in-flight Idea 2: Multi-page ring: good for random I/O Random 4 KB reads, queue-depth 4, varying number of threads IOPS Ubuntu VM (16 vcpus) using blkback on Dell R720 (2 Xeon E v2), Micron P320h SSD Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

43 Storage performance Allow more data in-flight Idea 2: Multi-page ring: poor for sequential I/O Sequential reads, 8 threads, queue-depth 32, varying block size IOPS block size (KB) Ubuntu VM (4 vcpus) using blkback on Dell R720 (2 Xeon E v2), Micron P320h SSD Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

44 Storage performance Idea 2: Multi-page ring Allow more data in-flight Improves random I/O throughput by over 50% when the ring would otherwise be full. But reduces sequential I/O throughput for small block sizes and high queue depth. Explanation The guest kernel does not merge requests when there is a multi-page ring. Conclusion Further work needed to mitigate effect on request merging. XenServer will likely retain use of a single-page ring for now. Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

45 Storage performance Idea 3: Indirect descriptors Allow more data in-flight Background Indirect descriptors has been available in blkfront/blkback since This allows up to 1 MB to be addressed per ring slot, meaning the total in-flight data can be 32 MB rather than MB. But is this actually a good thing? Most modern disks respond better to smaller requests... Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

46 Storage performance Allow more data in-flight Idea 3: Indirect descriptors is it worthwhile? Reading direct from physical disk, splitting requests into chunks issued in parallel chunk size (KB) Dell R720 (2 Xeon E v2), Micron P320h SSD Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

47 Storage performance Idea 3: Indirect descriptors Allow more data in-flight Conclusion On modern disks, throughput generally improves by splitting large requests into 44 KB chunks! Allowing bigger requests through can hurt performance. Ideally we need the Linux block layer to know the disk s optimal block size, and to split or merge requests accordingly. Then indirect I/O would present an improvement by allowing more data in flight. Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

48 Outline Storage performance Summary 1 The virtualisation performance challenge 2 Networking performance 3 Storage performance Reduce latency Allow more data in-flight Summary Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

49 Summary Storage performance Summary Reduce latency: Polling promising results Grant-map needs more work for userspace backend Allow more data in-flight: Multi-queue prevents request merging Multi-page ring prevents request merging Indirect descriptors prevents use of optimal block size Future work Improve performance of gntdev Better strategy for getting more data in-flight whilst ensuring that requests are of optimal size Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

50 Questions Questions? Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 50

51 Extra slides There s little benefit from batching nowadays Dell R220 (Xeon E v3) Jonathan Davies (Citrix) Keeping up with the hardware 18 Aug / 1

Netchannel 2: Optimizing Network Performance

Netchannel 2: Optimizing Network Performance Netchannel 2: Optimizing Network Performance J. Renato Santos +, G. (John) Janakiraman + Yoshio Turner +, Ian Pratt * + HP Labs - * XenSource/Citrix Xen Summit Nov 14-16, 2007 2003 Hewlett-Packard Development

More information

Xen Network I/O Performance Analysis and Opportunities for Improvement

Xen Network I/O Performance Analysis and Opportunities for Improvement Xen Network I/O Performance Analysis and Opportunities for Improvement J. Renato Santos G. (John) Janakiraman Yoshio Turner HP Labs Xen Summit April 17-18, 27 23 Hewlett-Packard Development Company, L.P.

More information

To Grant or Not to Grant

To Grant or Not to Grant To Grant or Not to Grant (for the case of Xen network drivers) João Martins Principal Software Engineer Virtualization Team July 11, 2017 Safe Harbor Statement The following is intended to outline our

More information

Support for Smart NICs. Ian Pratt

Support for Smart NICs. Ian Pratt Support for Smart NICs Ian Pratt Outline Xen I/O Overview Why network I/O is harder than block Smart NIC taxonomy How Xen can exploit them Enhancing Network device channel NetChannel2 proposal I/O Architecture

More information

Xenrelay: An Efficient Data Transmitting Approach for Tracing Guest Domain

Xenrelay: An Efficient Data Transmitting Approach for Tracing Guest Domain Xenrelay: An Efficient Data Transmitting Approach for Tracing Guest Domain Hai Jin, Wenzhi Cao, Pingpeng Yuan, Xia Xie Cluster and Grid Computing Lab Services Computing Technique and System Lab Huazhong

More information

Enabling Fast, Dynamic Network Processing with ClickOS

Enabling Fast, Dynamic Network Processing with ClickOS Enabling Fast, Dynamic Network Processing with ClickOS Joao Martins*, Mohamed Ahmed*, Costin Raiciu, Roberto Bifulco*, Vladimir Olteanu, Michio Honda*, Felipe Huici* * NEC Labs Europe, Heidelberg, Germany

More information

The Convergence of Storage and Server Virtualization Solarflare Communications, Inc.

The Convergence of Storage and Server Virtualization Solarflare Communications, Inc. The Convergence of Storage and Server Virtualization 2007 Solarflare Communications, Inc. About Solarflare Communications Privately-held, fabless semiconductor company. Founded 2001 Top tier investors:

More information

Falcon: Scaling IO Performance in Multi-SSD Volumes. The George Washington University

Falcon: Scaling IO Performance in Multi-SSD Volumes. The George Washington University Falcon: Scaling IO Performance in Multi-SSD Volumes Pradeep Kumar H Howie Huang The George Washington University SSDs in Big Data Applications Recent trends advocate using many SSDs for higher throughput

More information

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Pilar González-Férez and Angelos Bilas 31 th International Conference on Massive Storage Systems

More information

Applying Polling Techniques to QEMU

Applying Polling Techniques to QEMU Applying Polling Techniques to QEMU Reducing virtio-blk I/O Latency Stefan Hajnoczi KVM Forum 2017 Agenda Problem: Virtualization overhead is significant for high IOPS devices QEMU

More information

Speeding up Linux TCP/IP with a Fast Packet I/O Framework

Speeding up Linux TCP/IP with a Fast Packet I/O Framework Speeding up Linux TCP/IP with a Fast Packet I/O Framework Michio Honda Advanced Technology Group, NetApp michio@netapp.com With acknowledge to Kenichi Yasukata, Douglas Santry and Lars Eggert 1 Motivation

More information

Implementation and Analysis of Large Receive Offload in a Virtualized System

Implementation and Analysis of Large Receive Offload in a Virtualized System Implementation and Analysis of Large Receive Offload in a Virtualized System Takayuki Hatori and Hitoshi Oi The University of Aizu, Aizu Wakamatsu, JAPAN {s1110173,hitoshi}@u-aizu.ac.jp Abstract System

More information

Evolution of the netmap architecture

Evolution of the netmap architecture L < > T H local Evolution of the netmap architecture Evolution of the netmap architecture -- Page 1/21 Evolution of the netmap architecture Luigi Rizzo, Università di Pisa http://info.iet.unipi.it/~luigi/vale/

More information

Accelerating VM networking through XDP. Jason Wang Red Hat

Accelerating VM networking through XDP. Jason Wang Red Hat Accelerating VM networking through XDP Jason Wang Red Hat Agenda Kernel VS userspace Introduction to XDP XDP for VM Use cases Benchmark and TODO Q&A Kernel Networking datapath TAP A driver to transmit

More information

Optimizing TCP Receive Performance

Optimizing TCP Receive Performance Optimizing TCP Receive Performance Aravind Menon and Willy Zwaenepoel School of Computer and Communication Sciences EPFL Abstract The performance of receive side TCP processing has traditionally been dominated

More information

Jim Harris Principal Software Engineer Intel Data Center Group

Jim Harris Principal Software Engineer Intel Data Center Group Jim Harris Principal Software Engineer Intel Data Center Group Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR

More information

A Case for High Performance Computing with Virtual Machines

A Case for High Performance Computing with Virtual Machines A Case for High Performance Computing with Virtual Machines Wei Huang*, Jiuxing Liu +, Bulent Abali +, and Dhabaleswar K. Panda* *The Ohio State University +IBM T. J. Waston Research Center Presentation

More information

Xen. past, present and future. Stefano Stabellini

Xen. past, present and future. Stefano Stabellini Xen past, present and future Stefano Stabellini Xen architecture: PV domains Xen arch: driver domains Xen: advantages - small surface of attack - isolation - resilience - specialized algorithms (scheduler)

More information

D E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager

D E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager 1 T HE D E N A L I N E X T - G E N E R A T I O N H I G H - D E N S I T Y S T O R A G E I N T E R F A C E Laura Caulfield Senior Software Engineer Arie van der Hoeven Principal Program Manager Outline Technology

More information

TCP/misc works. Eric Google

TCP/misc works. Eric Google TCP/misc works Eric Dumazet @ Google 1) TCP zero copy receive 2) SO_SNDBUF model in linux TCP (aka better TCP_NOTSENT_LOWAT) 3) ACK compression 4) PSH flag set on every TSO packet Design for TCP RX ZeroCopy

More information

OpenFlow Software Switch & Intel DPDK. performance analysis

OpenFlow Software Switch & Intel DPDK. performance analysis OpenFlow Software Switch & Intel DPDK performance analysis Agenda Background Intel DPDK OpenFlow 1.3 implementation sketch Prototype design and setup Results Future work, optimization ideas OF 1.3 prototype

More information

The Price of Safety: Evaluating IOMMU Performance

The Price of Safety: Evaluating IOMMU Performance The Price of Safety: Evaluating IOMMU Performance Muli Ben-Yehuda 1 Jimi Xenidis 2 Michal Ostrowski 2 Karl Rister 3 Alexis Bruemmer 3 Leendert Van Doorn 4 1 muli@il.ibm.com 2 {jimix,mostrows}@watson.ibm.com

More information

Network optimizations for PV guests

Network optimizations for PV guests Network optimizations for PV guests J. Renato Santos G. (John) Janakiraman Yoshio Turner HP Labs Summit September 7-8, 26 23 Hewlett-Packard Development Company, L.P. The information contained herein is

More information

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research Fast packet processing in the cloud Dániel Géhberger Ericsson Research Outline Motivation Service chains Hardware related topics, acceleration Virtualization basics Software performance and acceleration

More information

Xen and the Art of Virtualization. Nikola Gvozdiev Georgian Mihaila

Xen and the Art of Virtualization. Nikola Gvozdiev Georgian Mihaila Xen and the Art of Virtualization Nikola Gvozdiev Georgian Mihaila Outline Xen and the Art of Virtualization Ian Pratt et al. I. The Art of Virtualization II. Xen, goals and design III. Xen evaluation

More information

Part 1: Introduction to device drivers Part 2: Overview of research on device driver reliability Part 3: Device drivers research at ERTOS

Part 1: Introduction to device drivers Part 2: Overview of research on device driver reliability Part 3: Device drivers research at ERTOS Some statistics 70% of OS code is in device s 3,448,000 out of 4,997,000 loc in Linux 2.6.27 A typical Linux laptop runs ~240,000 lines of kernel code, including ~72,000 loc in 36 different device s s

More information

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed ASPERA HIGH-SPEED TRANSFER Moving the world s data at maximum speed ASPERA HIGH-SPEED FILE TRANSFER 80 GBIT/S OVER IP USING DPDK Performance, Code, and Architecture Charles Shiflett Developer of next-generation

More information

Virtualization, Xen and Denali

Virtualization, Xen and Denali Virtualization, Xen and Denali Susmit Shannigrahi November 9, 2011 Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 1 / 70 Introduction Virtualization is the technology to allow two

More information

Virtualization with XEN. Trusted Computing CS599 Spring 2007 Arun Viswanathan University of Southern California

Virtualization with XEN. Trusted Computing CS599 Spring 2007 Arun Viswanathan University of Southern California Virtualization with XEN Trusted Computing CS599 Spring 2007 Arun Viswanathan University of Southern California A g e n d a Introduction Virtualization approaches Basic XEN Architecture Setting up XEN Bootstrapping

More information

Introduction to Oracle VM (Xen) Networking

Introduction to Oracle VM (Xen) Networking Introduction to Oracle VM (Xen) Networking Dongli Zhang Oracle Asia Research and Development Centers (Beijing) dongli.zhang@oracle.com May 30, 2017 Dongli Zhang (Oracle) Introduction to Oracle VM (Xen)

More information

KVM PV DEVICES.

KVM PV DEVICES. K DEVICES dor.laor@qumranet.com Agenda Kernel Virtual Machine overview Paravirtualized s intro & brief history VirtIO Enhanced VirtIO with K support 2 Kernel Virtual Machine overview is a regular Linux

More information

Enabling innovation in the Internet: Main Achievements of the CHANGE Project. Felipe Huici, NEC Europe

Enabling innovation in the Internet: Main Achievements of the CHANGE Project. Felipe Huici, NEC Europe Enabling innovation in the Internet: Main Achievements of the CHANGE Project Felipe Huici, NEC Europe EU FP7 CHANGE Project Info: Start in Oct. 2010, will end in December 2013 Partner Eurescom (Prime Contractor)

More information

Bridging the Gap between Software and Hardware Techniques for I/O Virtualization

Bridging the Gap between Software and Hardware Techniques for I/O Virtualization Bridging the Gap between Software and Hardware Techniques for I/O Virtualization Jose Renato Santos, Yoshio Turner, G.(John) Janakiraman, Ian Pratt HP Laboratories HPL-28-39 Keyword(s): Virtualization,

More information

The Power of Batching in the Click Modular Router

The Power of Batching in the Click Modular Router The Power of Batching in the Click Modular Router Joongi Kim, Seonggu Huh, Keon Jang, * KyoungSoo Park, Sue Moon Computer Science Dept., KAIST Microsoft Research Cambridge, UK * Electrical Engineering

More information

Xenwatch Multithreading

Xenwatch Multithreading Xenwatch Multithreading Dongli Zhang Principal Member of Technical Staf Oracle Linux http://donglizhang.org domu creation failure: problem # xl create hvm.cfg Parsing config from hvm.cfg libxl: error:

More information

Tackling the Management Challenges of Server Consolidation on Multi-core System

Tackling the Management Challenges of Server Consolidation on Multi-core System Tackling the Management Challenges of Server Consolidation on Multi-core System Hui Lv (hui.lv@intel.com) Intel June. 2011 1 Agenda SPECvirt_sc2010* Introduction SPECvirt_sc2010* Workload Scalability Analysis

More information

Arrakis: The Operating System is the Control Plane

Arrakis: The Operating System is the Control Plane Arrakis: The Operating System is the Control Plane Simon Peter, Jialin Li, Irene Zhang, Dan Ports, Doug Woos, Arvind Krishnamurthy, Tom Anderson University of Washington Timothy Roscoe ETH Zurich Building

More information

Accelerating NVMe-oF* for VMs with the Storage Performance Development Kit

Accelerating NVMe-oF* for VMs with the Storage Performance Development Kit Accelerating NVMe-oF* for VMs with the Storage Performance Development Kit Jim Harris Principal Software Engineer Intel Data Center Group Santa Clara, CA August 2017 1 Notices and Disclaimers Intel technologies

More information

High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK

High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK [r.tasker@dl.ac.uk] DataTAG is a project sponsored by the European Commission - EU Grant IST-2001-32459

More information

Entuity Network Monitoring and Analytics 10.5 Server Sizing Guide

Entuity Network Monitoring and Analytics 10.5 Server Sizing Guide Entuity Network Monitoring and Analytics 10.5 Server Sizing Guide Table of Contents 1 Introduction 3 2 Server Performance 3 2.1 Choosing a Server... 3 2.2 Supported Server Operating Systems for ENMA 10.5...

More information

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition Chapter 8: Memory- Management Strategies Operating System Concepts 9 th Edition Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation

More information

Networking in a Vertically Scaled World

Networking in a Vertically Scaled World Networking in a Vertically Scaled World David S. Miller Red Hat Inc. LinuxTAG, Berlin, 2008 OUTLINE NETWORK PRINCIPLES MICROPROCESSOR HISTORY IMPLICATIONS FOR NETWORKING LINUX KERNEL HORIZONTAL NETWORK

More information

Virtualization of the MS Exchange Server Environment

Virtualization of the MS Exchange Server Environment MS Exchange Server Acceleration Maximizing Users in a Virtualized Environment with Flash-Powered Consolidation Allon Cohen, PhD OCZ Technology Group Introduction Microsoft (MS) Exchange Server is one of

More information

GPU Consolidation for Cloud Games: Are We There Yet?

GPU Consolidation for Cloud Games: Are We There Yet? GPU Consolidation for Cloud Games: Are We There Yet? Hua-Jun Hong 1, Tao-Ya Fan-Chiang 1, Che-Run Lee 1, Kuan-Ta Chen 2, Chun-Ying Huang 3, Cheng-Hsin Hsu 1 1 Department of Computer Science, National Tsing

More information

CSC501 Operating Systems Principles. OS Structure

CSC501 Operating Systems Principles. OS Structure CSC501 Operating Systems Principles OS Structure 1 Announcements q TA s office hour has changed Q Thursday 1:30pm 3:00pm, MRC-409C Q Or email: awang@ncsu.edu q From department: No audit allowed 2 Last

More information

Chapter 8: Memory-Management Strategies

Chapter 8: Memory-Management Strategies Chapter 8: Memory-Management Strategies Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and

More information

NUMA replicated pagecache for Linux

NUMA replicated pagecache for Linux NUMA replicated pagecache for Linux Nick Piggin SuSE Labs January 27, 2008 0-0 Talk outline I will cover the following areas: Give some NUMA background information Introduce some of Linux s NUMA optimisations

More information

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson A Cross Media File System Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson 1 Let s build a fast server NoSQL store, Database, File server, Mail server Requirements

More information

Virtio/vhost status update

Virtio/vhost status update Virtio/vhost status update Yuanhan Liu Aug 2016 outline Performance Multiple Queue Vhost TSO Functionality/Stability Live migration Reconnect Vhost PMD Todo Vhost-pci Vhost Tx

More information

Distributed caching for cloud computing

Distributed caching for cloud computing Distributed caching for cloud computing Maxime Lorrillere, Julien Sopena, Sébastien Monnet et Pierre Sens February 11, 2013 Maxime Lorrillere (LIP6/UPMC/CNRS) February 11, 2013 1 / 16 Introduction Context

More information

Interrupt Coalescing in Xen

Interrupt Coalescing in Xen Interrupt Coalescing in Xen with Scheduler Awareness Michael Peirce & Kevin Boos Outline Background Hypothesis vic-style Interrupt Coalescing Adding Scheduler Awareness Evaluation 2 Background Xen split

More information

ntop Users Group Meeting

ntop Users Group Meeting ntop Users Group Meeting PF_RING Tutorial Alfredo Cardigliano Overview Introduction Installation Configuration Tuning Use cases PF_RING Open source packet processing framework for

More information

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper Architecture Overview Copyright 2016 Paperspace, Co. All Rights Reserved June - 1-2017 Technical Whitepaper Paperspace Whitepaper: Architecture Overview Content 1. Overview 3 2. Virtualization 3 Xen Hypervisor

More information

A Virtual Storage Environment for SSDs and HDDs in Xen Hypervisor

A Virtual Storage Environment for SSDs and HDDs in Xen Hypervisor A Virtual Storage Environment for SSDs and HDDs in Xen Hypervisor Yu-Jhang Cai*,Chih-kai Kang+, and Chin-Hsien Wu* *National Taiwan University of Science and Technology +Research Center for Information

More information

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES OBJECTIVES Detailed description of various ways of organizing memory hardware Various memory-management techniques, including paging and segmentation To provide

More information

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts Memory management Last modified: 26.04.2016 1 Contents Background Logical and physical address spaces; address binding Overlaying, swapping Contiguous Memory Allocation Segmentation Paging Structure of

More information

What is KVM? KVM patch. Modern hypervisors must do many things that are already done by OSs Scheduler, Memory management, I/O stacks

What is KVM? KVM patch. Modern hypervisors must do many things that are already done by OSs Scheduler, Memory management, I/O stacks LINUX-KVM The need for KVM x86 originally virtualization unfriendly No hardware provisions Instructions behave differently depending on privilege context(popf) Performance suffered on trap-and-emulate

More information

Chapter 8: Main Memory. Operating System Concepts 9 th Edition

Chapter 8: Main Memory. Operating System Concepts 9 th Edition Chapter 8: Main Memory Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel

More information

Got Loss? Get zovn! Daniel Crisan, Robert Birke, Gilles Cressier, Cyriel Minkenberg, and Mitch Gusat. ACM SIGCOMM 2013, August, Hong Kong, China

Got Loss? Get zovn! Daniel Crisan, Robert Birke, Gilles Cressier, Cyriel Minkenberg, and Mitch Gusat. ACM SIGCOMM 2013, August, Hong Kong, China Got Loss? Get zovn! Daniel Crisan, Robert Birke, Gilles Cressier, Cyriel Minkenberg, and Mitch Gusat ACM SIGCOMM 2013, 12-16 August, Hong Kong, China Virtualized Server 1 Application Performance in Virtualized

More information

Identifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage

Identifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage Identifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage TechTarget Dennis Martin 1 Agenda About Demartek Enterprise Data Center Environments Storage Performance Metrics

More information

Chapter 7: Main Memory. Operating System Concepts Essentials 8 th Edition

Chapter 7: Main Memory. Operating System Concepts Essentials 8 th Edition Chapter 7: Main Memory Operating System Concepts Essentials 8 th Edition Silberschatz, Galvin and Gagne 2011 Chapter 7: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure

More information

Status Update About COLO (COLO: COarse-grain LOck-stepping Virtual Machines for Non-stop Service)

Status Update About COLO (COLO: COarse-grain LOck-stepping Virtual Machines for Non-stop Service) Status Update About COLO (COLO: COarse-grain LOck-stepping Virtual Machines for Non-stop Service) eddie.dong@intel.com arei.gonglei@huawei.com yanghy@cn.fujitsu.com Agenda Background Introduction Of COLO

More information

Presented by: Nafiseh Mahmoudi Spring 2017

Presented by: Nafiseh Mahmoudi Spring 2017 Presented by: Nafiseh Mahmoudi Spring 2017 Authors: Publication: Type: ACM Transactions on Storage (TOS), 2016 Research Paper 2 High speed data processing demands high storage I/O performance. Flash memory

More information

Chapter 8: Main Memory

Chapter 8: Main Memory Chapter 8: Main Memory Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel

More information

Demystifying Network Cards

Demystifying Network Cards Demystifying Network Cards Paul Emmerich December 27, 2017 Chair of Network Architectures and Services About me PhD student at Researching performance of software packet processing systems Mostly working

More information

Advanced Computer Networks. End Host Optimization

Advanced Computer Networks. End Host Optimization Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct

More information

Modernization of Kemari using HVM with PV Drivers

Modernization of Kemari using HVM with PV Drivers Modernization of Kemari using HVM with PV Drivers Yoshi Tamura NTT Cyber Space Labs. 2008/11/20 What is Kemari? Don t drop the ball! Don t drop the VMs! Hardware failure Keep running transparently Kemari:

More information

Xen and the Art of Virtualization. CSE-291 (Cloud Computing) Fall 2016

Xen and the Art of Virtualization. CSE-291 (Cloud Computing) Fall 2016 Xen and the Art of Virtualization CSE-291 (Cloud Computing) Fall 2016 Why Virtualization? Share resources among many uses Allow heterogeneity in environments Allow differences in host and guest Provide

More information

Changpeng Liu. Senior Storage Software Engineer. Intel Data Center Group

Changpeng Liu. Senior Storage Software Engineer. Intel Data Center Group Changpeng Liu Senior Storage Software Engineer Intel Data Center Group Legal Notices and Disclaimers Intel technologies features and benefits depend on system configuration and may require enabled hardware,

More information

Implementing Ultra Low Latency Data Center Services with Programmable Logic

Implementing Ultra Low Latency Data Center Services with Programmable Logic Implementing Ultra Low Latency Data Center Services with Programmable Logic John W. Lockwood, CEO: Algo-Logic Systems, Inc. http://algo-logic.com Solutions@Algo-Logic.com (408) 707-3740 2255-D Martin Ave.,

More information

May 1, Foundation for Research and Technology - Hellas (FORTH) Institute of Computer Science (ICS) A Sleep-based Communication Mechanism to

May 1, Foundation for Research and Technology - Hellas (FORTH) Institute of Computer Science (ICS) A Sleep-based Communication Mechanism to A Sleep-based Our Akram Foundation for Research and Technology - Hellas (FORTH) Institute of Computer Science (ICS) May 1, 2011 Our 1 2 Our 3 4 5 6 Our Efficiency in Back-end Processing Efficiency in back-end

More information

Accelerating NVMe I/Os in Virtual Machine via SPDK vhost* Solution Ziye Yang, Changpeng Liu Senior software Engineer Intel

Accelerating NVMe I/Os in Virtual Machine via SPDK vhost* Solution Ziye Yang, Changpeng Liu Senior software Engineer Intel Accelerating NVMe I/Os in Virtual Machine via SPDK vhost* Solution Ziye Yang, Changpeng Liu Senior software Engineer Intel @optimistyzy Notices & Disclaimers Intel technologies features and benefits depend

More information

double split driver model

double split driver model software defining system devices with the BANANA double split driver model Dan WILLIAMS, Hani JAMJOOM IBM Watson Research Center Hakim WEATHERSPOON Cornell University Decoupling gives Flexibility Cloud

More information

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1 Memory Management Disclaimer: some slides are adopted from book authors slides with permission 1 Recap Paged MMU: Two main Issues Translation speed can be slow TLB Table size is big Multi-level page table

More information

A Userspace Packet Switch for Virtual Machines

A Userspace Packet Switch for Virtual Machines SHRINKING THE HYPERVISOR ONE SUBSYSTEM AT A TIME A Userspace Packet Switch for Virtual Machines Julian Stecklina OS Group, TU Dresden jsteckli@os.inf.tu-dresden.de VEE 2014, Salt Lake City 1 Motivation

More information

Utilizing the IOMMU scalably

Utilizing the IOMMU scalably Utilizing the IOMMU scalably Omer Peleg, Adam Morrison, Benjamin Serebrin, and Dan Tsafrir USENIX ATC 15 2017711456 Shin Seok Ha 1 Introduction What is an IOMMU? Provides the translation between IO addresses

More information

AutoStream: Automatic Stream Management for Multi-stream SSDs in Big Data Era

AutoStream: Automatic Stream Management for Multi-stream SSDs in Big Data Era AutoStream: Automatic Stream Management for Multi-stream SSDs in Big Data Era Changho Choi, PhD Principal Engineer Memory Solutions Lab (San Jose, CA) Samsung Semiconductor, Inc. 1 Disclaimer This presentation

More information

CHAPTER 16 - VIRTUAL MACHINES

CHAPTER 16 - VIRTUAL MACHINES CHAPTER 16 - VIRTUAL MACHINES 1 OBJECTIVES Explore history and benefits of virtual machines. Discuss the various virtual machine technologies. Describe the methods used to implement virtualization. Show

More information

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization Evaluating Cloud Storage Strategies James Bottomley; CTO, Server Virtualization Introduction to Storage Attachments: - Local (Direct cheap) SAS, SATA - Remote (SAN, NAS expensive) FC net Types - Block

More information

THE STORAGE PERFORMANCE DEVELOPMENT KIT AND NVME-OF

THE STORAGE PERFORMANCE DEVELOPMENT KIT AND NVME-OF 14th ANNUAL WORKSHOP 2018 THE STORAGE PERFORMANCE DEVELOPMENT KIT AND NVME-OF Paul Luse Intel Corporation Apr 2018 AGENDA Storage Performance Development Kit What is SPDK? The SPDK Community Why are so

More information

KVM PERFORMANCE OPTIMIZATIONS INTERNALS. Rik van Riel Sr Software Engineer, Red Hat Inc. Thu May

KVM PERFORMANCE OPTIMIZATIONS INTERNALS. Rik van Riel Sr Software Engineer, Red Hat Inc. Thu May KVM PERFORMANCE OPTIMIZATIONS INTERNALS Rik van Riel Sr Software Engineer, Red Hat Inc. Thu May 5 2011 KVM performance optimizations What is virtualization performance? Optimizations in RHEL 6.0 Selected

More information

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1 Memory Management Disclaimer: some slides are adopted from book authors slides with permission 1 CPU management Roadmap Process, thread, synchronization, scheduling Memory management Virtual memory Disk

More information

My VM is Lighter (and Safer) than your Container

My VM is Lighter (and Safer) than your Container My VM is Lighter (and Safer) than your Container Filipe Manco, Florian Schmidt, Simon Kuenzer, Kenichi Yasukata, Sumit Sati, Costin Lupu*, Costin Raiciu*, Felipe Huici NEC Europe Ltd, *University Politehnica

More information

The Path to DPDK Speeds for AF XDP

The Path to DPDK Speeds for AF XDP The Path to DPDK Speeds for AF XDP Magnus Karlsson, magnus.karlsson@intel.com Björn Töpel, bjorn.topel@intel.com Linux Plumbers Conference, Vancouver, 2018 Legal Disclaimer Intel technologies may require

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

RDMA-like VirtIO Network Device for Palacios Virtual Machines

RDMA-like VirtIO Network Device for Palacios Virtual Machines RDMA-like VirtIO Network Device for Palacios Virtual Machines Kevin Pedretti UNM ID: 101511969 CS-591 Special Topics in Virtualization May 10, 2012 Abstract This project developed an RDMA-like VirtIO network

More information

Using (Suricata over) PF_RING for NIC-Independent Acceleration

Using (Suricata over) PF_RING for NIC-Independent Acceleration Using (Suricata over) PF_RING for NIC-Independent Acceleration Luca Deri Alfredo Cardigliano Outlook About ntop. Introduction to PF_RING. Integrating PF_RING with

More information

Scheduling in Xen: Present and Near Future

Scheduling in Xen: Present and Near Future Scheduling in Xen: Present and Near Future Dario Faggioli dario.faggioli@citrix.com Cambridge 27th of May, 2015 Introduction Cambridge 27th of May, 2015 Scheduling in Xen: Present and Near Future 2 / 33

More information

What s Wrong with the Operating System Interface? Collin Lee and John Ousterhout

What s Wrong with the Operating System Interface? Collin Lee and John Ousterhout What s Wrong with the Operating System Interface? Collin Lee and John Ousterhout Goals for the OS Interface More convenient abstractions than hardware interface Manage shared resources Provide near-hardware

More information

KVM PV DEVICES.

KVM PV DEVICES. K DEVICES dor.laor@qumranet.com 1 Agenda Introduction & brief history VirtIO Enhanced VirtIO with K support Further implementation 2 General & history Fully virtualized devices performs bad 55 Mbps for

More information

FC-NVMe. NVMe over Fabrics. Fibre Channel the most trusted fabric can transport NVMe natively. White Paper

FC-NVMe. NVMe over Fabrics. Fibre Channel the most trusted fabric can transport NVMe natively. White Paper FC-NVMe NVMe over Fabrics Fibre Channel the most trusted fabric can transport NVMe natively BACKGROUND AND SUMMARY Ever since IBM shipped the world s first hard disk drive (HDD), the RAMAC 305 in 1956,

More information

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition Chapter 8: Memory- Management Strategies Operating System Concepts 9 th Edition Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation

More information

Chapter 8: Memory- Management Strategies

Chapter 8: Memory- Management Strategies Chapter 8: Memory Management Strategies Chapter 8: Memory- Management Strategies Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and

More information

Xen scheduler status. George Dunlap Citrix Systems R&D Ltd, UK

Xen scheduler status. George Dunlap Citrix Systems R&D Ltd, UK Xen scheduler status George Dunlap Citrix Systems R&D Ltd, UK george.dunlap@eu.citrix.com Goals for talk Understand the problem: Why a new scheduler? Understand reset events in credit1 and credit2 algorithms

More information

Locality and The Fast File System. Dongkun Shin, SKKU

Locality and The Fast File System. Dongkun Shin, SKKU Locality and The Fast File System 1 First File System old UNIX file system by Ken Thompson simple supported files and the directory hierarchy Kirk McKusick The problem: performance was terrible. Performance

More information

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

IX: A Protected Dataplane Operating System for High Throughput and Low Latency IX: A Protected Dataplane Operating System for High Throughput and Low Latency Adam Belay et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Presented by Han Zhang & Zaina Hamid Challenges

More information

Towards Massive Server Consolidation

Towards Massive Server Consolidation Towards Massive Server Consolidation Filipe Manco, João Martins, Felipe Huici {filipe.manco,joao.martins,felipe.huici}@neclab.eu NEC Europe Ltd. Xen Developer Summit 2014 Agenda 1. Use Cases and Goals

More information

Operating Systems. File Systems. Thomas Ropars.

Operating Systems. File Systems. Thomas Ropars. 1 Operating Systems File Systems Thomas Ropars thomas.ropars@univ-grenoble-alpes.fr 2017 2 References The content of these lectures is inspired by: The lecture notes of Prof. David Mazières. Operating

More information

Building a High IOPS Flash Array: A Software-Defined Approach

Building a High IOPS Flash Array: A Software-Defined Approach Building a High IOPS Flash Array: A Software-Defined Approach Weafon Tsao Ph.D. VP of R&D Division, AccelStor, Inc. Santa Clara, CA Clarification Myth 1: S High-IOPS SSDs = High-IOPS All-Flash Array SSDs

More information

MultiLanes: Providing Virtualized Storage for OS-level Virtualization on Many Cores

MultiLanes: Providing Virtualized Storage for OS-level Virtualization on Many Cores MultiLanes: Providing Virtualized Storage for OS-level Virtualization on Many Cores Junbin Kang, Benlong Zhang, Tianyu Wo, Chunming Hu, and Jinpeng Huai Beihang University 夏飞 20140904 1 Outline Background

More information