Who fears the spectres?

Similar documents
TX bulking and qdisc layer

Qdisc layer. Fast enough for 10G wirespeed? Jesper Dangaard Brouer Hannes Frederic Sowa Daniel Borkmann Florian Westphal

Network Performance BoF

I m not good at grand unifying themes. RX batching ( skbundle ) Merging OVS megafows Loose GRO ARFS fapping issue Inevitable ebpf-verifer argument

Operating Systems. 17. Sockets. Paul Krzyzanowski. Rutgers University. Spring /6/ Paul Krzyzanowski

Network stack challenges at increasing speeds

Network stack challenges at increasing speeds

XDP now with REDIRECT

XDP: 1.5 years in production. Evolution and lessons learned. Nikita V. Shirokov

rx hardening & udp gso willem de bruijn

A practical introduction to XDP

Network stack challenges at increasing speeds

Accelerating VM networking through XDP. Jason Wang Red Hat

DPDK+eBPF KONSTANTIN ANANYEV INTEL

ebpf Offload to Hardware cls_bpf and XDP

What is an L3 Master Device?

Linux IP Networking. Antonio Salueña

What s happened to the world of networking hardware offloads? Jesse Brandeburg Anjali Singhai Jain

The Challenges of XDP Hardware Offload

This talk is not about XDP: From Resource Limits to SKB Lists. David S. Miller, Vancouver, 2018

Software Datapath Acceleration for Stateless Packet Processing

netfilters connection tracking subsystem

Accelerating Load Balancing programs using HW- Based Hints in XDP

XDP: a new fast and programmable network layer

OpenContrail, Real Speed: Offloading vrouter

rtnl mutex, the network stack big kernel lock

UDP Encapsulation in Linux netdev0.1 Conference February 16, Tom Herbert

Implementing the Wireless Token Ring Protocol As a Linux Kernel Module

XDP For the Rest of Us

High Speed Packet Filtering on Linux

Performance of Wireless IEEE e-Based Devices with Multiple Hardware Queues

TCP/misc works. Eric Google

TD(07)037. The 10th COST 290 MC Meeting. Technische Universität Wien Wien, Austria October 1 2, 2007

Bringing the Power of ebpf to Open vswitch. Linux Plumber 2018 William Tu, Joe Stringer, Yifeng Sun, Yi-Hung Wei VMware Inc. and Cilium.

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, t he Energy Efficient Solutions logo, mobilegt, PowerQUICC,

Combining ktls and BPF for Introspection and Policy Enforcement

Open Source Traffic Analyzer

A Brief Guide to Virtual Switching Franck Baudin (Red Hat) Billy O Mahony (Intel)

Open-NFP Summer Webinar Series: Session 4: P4, EBPF And Linux TC Offload

The Path to DPDK Speeds for AF XDP

Keeping up with the hardware

Virtio/vhost status update

Leveraging IPSec for Network Access Control for SELinux

UNIVERSITY OF OSLO Department of Informatics. Implementing Rate Control in NetEm. Untying the NetEm/tc tangle. Master thesis. Anders G.

Implemen'ng IPv6 Segment Rou'ng in the Linux Kernel

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, t he Energy Efficient Solutions logo, mobilegt, PowerQUICC,

Host Dataplane Acceleration: SmartNIC Deployment Models

QUIC. Internet-Scale Deployment on Linux. Ian Swett Google. TSVArea, IETF 102, Montreal

Operating System Modifications for User-Oriented Addressing Model

Xen Network I/O Performance Analysis and Opportunities for Improvement

Netfilter updates since last NetDev. NetDev 2.2, Seoul, Korea (Nov 2017) Pablo Neira Ayuso

ERSPAN in Linux. A short history and review. Presenters: William Tu and Greg Rose

The State of the Art in Bufferbloat Testing and Reduction on Linux

Network device drivers in Linux

TUTORIAL FOR UW NETWORK CODING USING OPNET AND MATLAB

Message Passing Architecture in Intra-Cluster Communication

Got Loss? Get zovn! Daniel Crisan, Robert Birke, Gilles Cressier, Cyriel Minkenberg, and Mitch Gusat. ACM SIGCOMM 2013, August, Hong Kong, China

Networking Subsystem in Linux. Manoj Naik IBM Almaden Research Center

vswitch Acceleration with Hardware Offloading CHEN ZHIHUI JUNE 2018

CSCI-GA Operating Systems. Networking. Hubertus Franke

Netfilter updates since last NetDev. NetDev 2.2, Seoul, Korea (Nov 2017) Pablo Neira Ayuso

Stacked Vlan: Performance Improvement and Challenges

Next Gen Virtual Switch. CloudNetEngine Founder & CTO Jun Xiao

IsoStack Highly Efficient Network Processing on Dedicated Cores

Virtio 1 - why do it? And - are we there yet? Michael S. Tsirkin Red Hat

Illegitimate Source IP Addresses At Internet Exchange Points

Much Faster Networking

A 10 years journey in Linux firewalling Pass the Salt, summer 2018 Lille, France Pablo Neira Ayuso

XDP in Practice DDoS Gilberto Bertin

Network Simulator 2: Introduction

ebpf Tooling and Debugging Infrastructure

Combining ktls and BPF for Introspection and Policy Enforcement

Agilio OVS Software Architecture

Hardware Checksumming

XDP: The Future of Networks. David S. Miller, Red Hat Inc., Seoul 2017

Comprehensive BPF offload

Improving the DragonFlyBSD Network Stack

Programming Hardware Tables John Fastabend (Intel) Netdev0.1

Exploring CVE , a Skeleton key in DNS. Jaime Cochran, Marek Vavrusa

#ebpf You Cannot Stop This

Chapter 10: I/O Subsystems (2)

Programmable NICs. Lecture 14, Computer Networks (198:552)

Locating Cache Performance Bottlenecks Using Data Profiling

CS 498: Network Systems Laboratory Laboratory Assignment 4: Implementation of AQM as Kernel Modules in Linux

Linux Network Programming with P4. Linux Plumbers 2018 Fabian Ruffy, William Tu, Mihai Budiu VMware Inc. and University of British Columbia

ENG. WORKSHOP: PRES. Linux Networking Greatness (part II). Roopa Prabhu/Director Engineering Linux Software/Cumulus Networks.

OpenDataplane project

Preview Test: HW3. Test Information Description Due:Nov. 3

Stacked Vlan - Performance Improvement and Challenges

jelly-near jelly-far

An Introduction to Open vswitch

Queues. Lesson 4. CS 32: Data Structures Dept. of Computer Science

VRF Update David Ahern

The.pdf version of this slide deck will have missing info, due to use of animations. The original.pptx deck is available here:

Fast packet processing in linux with af_xdp

Bringing the Power of ebpf to Open vswitch

Securing Network Traffic Tunneled Over Kernel managed TCP/UDP sockets

Lecture 16: Network Layer Overview, Internet Protocol

CASP Cross- Application Signaling Protocol

IPtables and Netfilter

Transcription:

Who fears the spectres? Performance impacts of spectre/meltdown counter-measures and possible improvements Paolo Abeni - Red Hat NetConf, Boston 2018

Outline Performance impact of PTI and retpoline on current net-next Possible improvements, PTI and retpoline-related Possible improvements, misc 2

UDP RX performances 64 bytes ipv4 packets, single sink, different RX queues 3

Digging with perf Topmost perf offenders for UDP RX test - receiver process, compared NO mitigations 11.41% copy_user_generic_unrolled 9.12% udp_recvmsg 5.25% slab_free 5.20% page_frag_free 4.55% sys_recvfrom 4.38% entry_syscall_64_after_hwframe 4.08% do_syscall_64 4.04% avc_has_perm 3.75% _copy_to_iter Retpoline only 10.95% udp_recvmsg (delta 2.73%) 7.72% copy_user_generic_unrolled 6.40% avc_has_perm (delta 2.36%) 5.51% page_frag_free 5.12% sys_recvfrom 4.35% slab_free 4.07% skb_recv_udp (delta ~1.54%) 3.93% entry_syscall_64_after_hwframe 3.40% do_syscall_64 PTI only 13.49% syscall_return_via_sysret 10.49% 0xfffffe000016601b 7.11% copy_user_generic_unrolled 6.40% udp_recvmsg 4.15% page_frag_free 3.72% sys_recvfrom 3.61% do_syscall_64 3.46% slab_free 3.18% entry_syscall_64_after_hwframe

QDisc performances pktgen tput with queue_xmit mode, 64 bytes packets 5

Perf, again Topmost perf offenders for Qdisc test, compared NO mitigations 11.86% pktgen_xmit 9.27% ixgbe_xmit_frame_ring 8.71% skb_unref.part.39 6.76% pfifo_fast_dequeue 5.81% ip_send_check 3.82% dev_queue_xmit 3.58% mod_cur_headers 3.29% qdisc_run 3.15% skb_put Retpoline only 11.41% ixgbe_xmit_frame_ring (delta 2.14%) 10.42%pktgen_xmit 10.34% pfifo_fast_dequeue (delta 3.62%) 4.98% ip_send_check 4.74% skb_unref.part.39 3.50% qdisc_run 3.33% dev_queue_xmit 2.97% mod_cur_headers 2.60% build_skb

PVP performances OVS kernel datapath, default flow configuration 7

One last perf comparison Topmost perf offenders for PVP test - vhost process - compared NO mitigations 5.78% vhost_get_vq_desc 5.47% tun_get_user 5.37% masked_flow_lookup 5.05% copy_user_generic_unrolled 4.62% translate_desc 4.01% iov_iter_advance 3.54% ixgbe_xmit_frame_ring 2.73% pfifo_fast_dequeue Retpoline only 5.29% tun_get_user 5.06% vhost_get_vq_desc 4.77% masked_flow_lookup 4.65% ixgbe_xmit_frame_ring (delta 1.11%) 4.20% pfifo_fast_dequeue (delta 1.47%) 4.20% copy_user_generic_unrolled 4.04% translate_desc 3.76% iov_iter_advance PTI only 5.64% vhost_get_vq_desc 5.18% tun_get_user 5.08% masked_flow_lookup 4.63% copy_user_generic_unrolled 4.57% translate_desc 3.92% iov_iter_advance 3.41% ixgbe_xmit_frame_ring 3.24% pfifo_fast_dequeue

Fighting spectres Bulking: Potentially reduces the impact of both retpolines and PTI But really affecting retpolines is usually less straight-forward - e.g. bulk_dequeue Already there in several places (GSO, GRO, qdisc dequeue) but routing and forwarding have no support UDP is in a mixed state: GSO (and eventually GRO) for connected sockets, recvmmsg/sendmmsg for unconnected (?!?) other options? 9

Still fighting spectres might as well [not indirect] jump - indirect calls we can avoids skb->destruct() Proposed by Hannes Sowa, originally to reduce skbuff size Use integer to demux the destruction action between the known ones Some driver - chelsio - may still need indirect call The expected gain is currently unknown 10

Indirect calls we want to avoid [II] sch->enqueue and sch->dequeue We can check for build-in qdiscs and call the related ops directly, We can avoid 2 indirect calls per packet Still need them in some (most?!?) cases With jump labels we can avoid all the indirect calls with the default configuration And fall back to the above after any changes 11

More indirect calls we want to avoid GRO and offloads A Lot of indirect calls per packet there At least for GRO removing all of them looks possible But some code uglification looks unavoidable Other targets? 12

Side-track: too many [virtual-]switches 2 in kernel datapaths for OVS (net/openvswitch and TC/flower) neither is near to the requested performances (for SDN) But even bypass solutions do not meet pkt rate requirements TC/flower is needed for H/W offload But it still misses some features Do we need both of them? Can we move towards TC/flower only? Crazy idea: can we attach TC ingress to the XDP hook? 13

And now for something completely different Leverage UMH to implement COMPAT_ code for xfrm, and remove compat kernel support from xtables (idea from Florian Westphal) Remote skb free (idea from Eric) Any more details here? edmux for unconnected sockets (is that a dead cow?) 14

the hardships of an orphan[ed skb] And now for something completely different - part II SKBs are orphaned the xmit path, when potentially crossing net-ns In presence of XPS this hurts TCP performance badly (due to OoO and lack of feedback towards the sender socket) Naive partial solution: disable XPS for orphaned sockets Hurts UDP performances, don t solve lack of feedback Alternative solution: access skb->sk via an helper in netfilter, do not really clear skb->sk while scrubbing the skb, just mark is as not accessible (via the helper) 15

THANK YOU