Operating Systems. 17. Sockets. Paul Krzyzanowski. Rutgers University. Spring /6/ Paul Krzyzanowski

Similar documents
Networking Subsystem in Linux. Manoj Naik IBM Almaden Research Center

Linux IP Networking. Antonio Salueña

Distributed Systems. 02. Networking. Paul Krzyzanowski. Rutgers University. Fall 2017

Tutorial 2. Linux networking, sk_buff and stateless packet filtering. Roei Ben-Harush Check Point Software Technologies Ltd.

Operating Systems. 16. Networking. Paul Krzyzanowski. Rutgers University. Spring /6/ Paul Krzyzanowski

CSCI-GA Operating Systems. Networking. Hubertus Franke

A Client-Server Exchange

Chapter 10: I/O Subsystems (2)

jelly-near jelly-far

TCP/IP Stack Introduction: Looking Under the Hood!

Interprocess Communication Mechanisms

shared storage These mechanisms have already been covered. examples: shared virtual memory message based signals

Chapter 10: I/O Subsystems (2)

Memory-Mapped Files. generic interface: vaddr mmap(file descriptor,fileoffset,length) munmap(vaddr,length)

Linux Kernel Application Interface

NETWORK PROGRAMMING. Instructor: Junaid Tariq, Lecturer, Department of Computer Science

ECE 650 Systems Programming & Engineering. Spring 2018

Our Small Quiz. Chapter 9: I/O Subsystems (2) Generic I/O functionality. The I/O subsystem. The I/O Subsystem.

Network Implementation

CSE/EE 461 Lecture 14. Connections. Last Time. This Time. We began on the Transport layer. Focus How do we send information reliably?

Light & NOS. Dan Li Tsinghua University

Operating Systems Design Exam 3 Review: Spring Paul Krzyzanowski

ELEC / COMP 177 Fall Some slides from Kurose and Ross, Computer Networking, 5 th Edition

Our Small Quiz. Chapter 10: I/O Subsystems (2) Generic I/O functionality. The I/O subsystem. The I/O Subsystem. The I/O Subsystem

Application Programming Interfaces

Networks and Operating Systems ( ) Chapter 10: I/O Subsystems (2)

Sockets Sockets Communication domains

CS 416: Operating Systems Design April 22, 2015

Elementary TCP Sockets

Operating Systems. Week 13 Recitation: Exam 3 Preview Review of Exam 3, Spring Paul Krzyzanowski. Rutgers University.

Packet Sniffing and Spoofing

Lecture 8: Other IPC Mechanisms. CSC 469H1F Fall 2006 Angela Demke Brown

Topics. Lecture 8: Other IPC Mechanisms. Socket IPC. Unix Communication

Chapter 8: I/O functions & socket options

(Refer Slide Time: 1:09)

Implementing the Wireless Token Ring Protocol As a Linux Kernel Module

Operating Systems Design Exam 3 Review: Spring 2011

Outline. 1) Introduction to Linux Kernel 2) How system calls work 3) Kernel-space programming 4) Networking in kernel 2/34

19: Networking. Networking Hardware. Mark Handley

ADRIAN PERRIG & TORSTEN HOEFLER ( ) 10: I/O

UNIVERSITY OF OSLO Department of Informatics. Implementing Rate Control in NetEm. Untying the NetEm/tc tangle. Master thesis. Anders G.

TCP/IP stack is the family of protocols that rule the current internet. While other protocols are also used in computer networks, TCP/IP is by far

Lecture 3. The Network Layer (cont d) Network Layer 1-1

Programming with TCP/IP. Ram Dantu

The Fundamentals. Port Assignments. Common Protocols. Data Encapsulation. Protocol Communication. Tevfik Ko!ar

Transport Layer. The transport layer is responsible for the delivery of a message from one process to another. RSManiaol

Lecture 8. Network Layer (cont d) Network Layer 1-1

QUIZ: Longest Matching Prefix

Advanced Computer Networks. End Host Optimization

User Datagram Protocol

Message Passing Architecture in Intra-Cluster Communication

Using Time Division Multiplexing to support Real-time Networking on Ethernet

Introduction to Computer Systems. Networks 2. c Theodore Norvell. The Sockets API

CSE506: Operating Systems CSE 506: Operating Systems

COPYRIGHTED MATERIAL INTRODUCTION

CS 378 (Spring 2003)

CSE506: Operating Systems CSE 506: Operating Systems

Network device drivers in Linux

Socket Programming. Sungkyunkwan University. Hyunseung Choo Copyright Networking Laboratory

Transport Layer (TCP/UDP)

Introduction to TCP/IP networking

CSEP 561 Connections. David Wetherall

Oral. Total. Dated Sign (2) (5) (3) (2)

Networking and Internetworking 1

Mike Anderson. TCP/IP in Embedded Systems. CTO/Chief Scientist The PTR Group, Inc.

Internet Technology 3/23/2016

Internet Layers. Physical Layer. Application. Application. Transport. Transport. Network. Network. Network. Network. Link. Link. Link.

1-1. Switching Networks (Fall 2010) EE 586 Communication and. October 25, Lecture 24

TRANSMISSION CONTROL PROTOCOL. ETI 2506 TELECOMMUNICATION SYSTEMS Monday, 7 November 2016

Application Note 2244 Implementing a Network Interface in TINI 1.1x

Unix Network Programming

ETSF05/ETSF10 Internet Protocols Network Layer Protocols

Linux Kernel Application Interface

UNIX Sockets. Developed for the Azera Group By: Joseph D. Fournier B.Sc.E.E., M.Sc.E.E.

CSCI Computer Networks

CSE 461 Connections. David Wetherall

Stream Control Transmission Protocol (SCTP)

CSCE 463/612 Networks and Distributed Processing Spring 2018

CS 326: Operating Systems. Networking. Lecture 17

CSC 546: Client/Server Fundamentals. Fall Major client/server protocols

Operating System Modifications for User-Oriented Addressing Model

The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication

Hybrid of client-server and P2P. Pure P2P Architecture. App-layer Protocols. Communicating Processes. Transport Service Requirements

Lecture 11: IP routing, IP protocols

CSEP 561 Connections. David Wetherall

Session NM056. Programming TCP/IP with Sockets. Geoff Bryant Process software

TD(07)037. The 10th COST 290 MC Meeting. Technische Universität Wien Wien, Austria October 1 2, 2007

Objectives. Chapter 10. Upon completion you will be able to:

Connections. Topics. Focus. Presentation Session. Application. Data Link. Transport. Physical. Network

Data and Computer Communications. Chapter 2 Protocol Architecture, TCP/IP, and Internet-Based Applications

precise rules that govern communication between two parties TCP/IP: the basic Internet protocols IP: Internet protocol (bottom level)

The Network Stack. Chapter Network stack functions 216 CHAPTER 21. THE NETWORK STACK

CSE 461 The Transport Layer

Outline. Inter-Process Communication. IPC across machines: Problems. CSCI 4061 Introduction to Operating Systems

Chapter 4 Network Layer: The Data Plane

UNIT IV- SOCKETS Part A

Group-A Assignment No. 6

Network Programming. Introduction to Sockets. Dr. Thaier Hayajneh. Process Layer. Network Layer. Berkeley API

Socket Programming. Dr. -Ing. Abdalkarim Awad. Informatik 7 Rechnernetze und Kommunikationssysteme

... Application Note AN-531. PCI Express System Interconnect Software Architecture. Notes Introduction. System Architecture.

Transcription:

Operating Systems 17. Sockets Paul Krzyzanowski Rutgers University Spring 2015 1

Sockets Dominant API for transport layer connectivity Created at UC Berkeley for 4.2BSD Unix (1983) Design goals Communication between processes should not depend on whether they are on the same machine Communication should be efficient Interface should be compatible with files Support different protocols and naming conventions Sockets is not just for the Internet Protocol family 2

Socket Socket = Abstract object from which messages are sent and received Looks like a file descriptor Application can select particular style of communication Virtual circuit, datagram, message-based, in-order delivery Unrelated processes should be able to locate communication endpoints Sockets can have a name Name should be meaningful in the communications domain 3

Connection-Oriented (TCP) socket operations Client Server Create a socket socket Create a socket socket Name the socket (assign local address, port) bind Name the socket (assign local address, port) Connect to the other side bind connect Set the socket for listening Wait for and accept a connection; get a socket for the connection listen accept read / write byte streams read/write read / write byte streams read/write close the socket close close the socket close close the listening socket close April 6, 2015 2014 Paul Krzyzanowski 4

Connectionless (UDP) socket operations Client Server Create a socket socket Create a socket socket Name the socket (assign local address, port) bind Name the socket (assign local address, port) bind Send a message sendto Receive a message recvfrom Receive a message recvfrom Send a message sendto close the socket close close the socket close April 6, 2015 2014 Paul Krzyzanowski 5

Socket Internals 6

Logical View Socket layer data same socket buffer socket Transport layer TCP data same socket buffer Network layer IP TCP data same socket buffer Protocol input queue Network interface (driver) layer Ethernet ethernet IP TCP data Device interrupt socket buffer 7

Data Path Application writes data to socket From app Packet received by device From device Move packet to transport layer [TCP creates packet buffer & header] copy data from user process to kernel socket buffer Packet for host? copy data from device to kernel socket buffer drop Move packet to network layer [IP fills in header] Forward packet? Move packet to network layer [IP fills in header] Move packet to transport layer internal drop external Look up route Move packet to socket Move packet to net device driver [packet goes on send queue] Data goes to application buffer To app Transmit the packet To device 8

OS Network Stack System call Interface File calls Socket calls Socket-related system calls Generic network interface Sockets implementation layer Network Protocols Transport Layer Network Layer TCP/IPv4, UDP/IPv4, TCP/IPv6 IPv4, IPv6 Abstract Device Interface Device Driver netdevice, queuing discipline Link Layer Ethernet, Wi-Fi, SLIP 9

System call interface Two ways to communicate with the network: Socket-specific call (e.g., socket, bind, shutdown) Directed to sys_socketcall (socket.c) Goes to the target function File call (e.g., read, write, close) File descriptor socket Sockets reside in the process s file table Direct parallel of the VFS structure A socket s f_ops field points to a set of functions for socket operations A socket structure acts as a queuing point for data being transmitted & received A socket has send and receive queues associated with it High & low watermarks 10

Sockets layer All network communication takes place via a socket Two socket structures one within another 1. Generic sockets (aka BSD sockets) struct socket 2. Protocol-specific sockets (e.g., INET socket) struct sock socket structure Keeps all the state of a socket including the protocol and operations that can be performed on it Some key members of the structure: struct proto_ops *ops: protocol-specific functions that implement socket operations Common functions to support a variety of protocols: TCP, UDP, IP, raw ethernet, other networks Pointers to protocol functions: bind, connect, accept, listen, sendmsg, shutdown, struct inode *inode: points to in-memory inode associated with the socket struct sock *sk: protocol-specific (e.g., INET) socket E.g., this contains TCP/IP and UDP/IP specific data for an INET (Internet Address Domain) socket 11

Socket Buffer: struct sk_buff Component for managing the data movement for sockets through the networking layers Contains packet & state data for multiple layers of the protocol stack Don t waste time copying parameters & packet data from layer to layer of the network stack Data sits in a socket buffer (struct sk_buff) As we move through layers, data is only copied twice: 1. From user to kernel space 2. From kernel space to the device (via DMA if available) associated device source device sk_buff: next prev head data tail end dev dev_rx sk sk_buff sk_buff socket sk_buff sk_buff IP TCP data Packet data 12

Socket Buffer: struct sk_buff Each sent or received packet is associated with an sk_buff: Packet data in data->, tail-> Total packet buffer in head->, end-> Header pointers (MAC, IP, TCP header, etc.) Identifies device structure (net_device) rx_dev: points to the network device that received the packet dev: identifies net device on which the buffer operates If a routing decision has been made, this is the outbound interface Add or remove headers without reallocating memory Each socket (connection stream) is associated with a linked list of sk_buffs 13

Keeping track of packet data Example: Prepare an outgoing packet head data tail end ethernet tail room Allocate new socket buffer data skb = alloc_skb(len, GFP_KERNEL); No packet data: head = data = tail 14

Keeping track of packet data head data tail end head room tail room Make room for protocol headers. skb_reserve(skb, header_len) For IPv4, use sk->sk_prot->max_header Data size is still 0 15

Keeping track of packet data head data tail end head room User data tail room Add user data 16

Keeping track of packet data head data tail end head room TCP User data tail room Add TCP header 17

Keeping track of packet data head data tail end head room IP TCP User data tail room Add IP header 18

Keeping track of packet data head data tail end ethernet ethernet IP TCP User data Add ethernet header The outbound packet is complete! 19

Network protocols Define the specific protocols available (e.g., TCP, UDP) Each networking protocol has a structure called proto Associated with an address family (e.g., AF_INET) Address family is specified by the programmer when creating the socket Defines socket operations that can be performed from the sockets layer to the transport layer Close, connect, disconnect, accept, shutdown, sendmsg, recvmsg, etc. Modular: one module may define one or more protocols Initialized & registered at startup Initialization function: registers a family of protocols The register function adds the protocol to the active protocol list 20

Abstract device interface Layer that interfaces with network device drivers Common set of functions for low-level network device drivers to operate with the higher-level protocol stack 21

Abstract device interface Send a packet to a device Send sk_buff from the protocol layer to a device dev_queue_xmit function enqueues an sk_buff for transmission to the underlying driver Device is defined in sk_buff Device structure contains a method hard_start_xmit: driver function for actually transmitting the data in the sk_buff Receive a packet from a device & send to protocol stack Receive an sk_buff from a device Driver receives a packet and places it into an allocated sk_buff sk_buff passed to the network layer with a call to netif_rx Function enqueues the sk_buff to an upper-layer protocol's queue for processing through netif_rx_schedule 22

Device drivers Drivers to access the network device Examples: ethernet, 802.11n, SLIP Modular, like other devices Described by struct net_device Initialization Driver allocates a net_device structure Initializes it with its functions dev->hard_start_xmit: defines how to transmit a packet Typically the packet is moved to a hardware queue Register interrupt service routine Calls register_netdevice to make the device available to the network stack 23

Sending a message Write data to socket Socket calls appropriate send function (typically INET) Send function verifies status of socket & protocol type Sends data to transport layer routine (typically TCP or UDP) Transport layer Creates a socket buffer (struct sk_buff) Copies data from application layer; fills in header (port #, options, checksum) Passes buffer to the network layer (typically IP) Network layer Fills in buffer with its own headers (IP address, options, checksum) Look up destination route IP layer may fragment data into multiple packets Passes buffer to link layer: to destination route s device output function Link layer: move packet to the device s xmit queue Network driver Wait for scheduler to run the device driver s transmit code Sends the link header Transmit packet via DMA 24

Routing IP Network layer Two structures: 1. Forwarding Information Base (FIB) Keeps track of details for every known route 2. Cache for destinations in use (hash table) If not found here then check FIB. 25

Receiving a message part 1 Interrupt from network card: packet received Network driver top half Allocate new sk_buff Move data from the hardware buffer into the sk_buff (DMA) Call netif_rx, the generic network reception handler This moves the sk_buff to protocol processing (it s a work queue) When netif_rx returns, the service routine is finished Repeat until no more packets in the device buffers If the packet queue is full, the packet is discarded netif_rx is called in the interrupt service routine Must be quick. Main goal: queue the packet. 26

Receiving a packet part 2 Bottom half Bottom half = softirq = work queues Tuples containing < operation, data > Kernel schedules work to go through pending packet queue Call net_rx_action() Dequeue first sk_buff (packet) Go through list of protocol handlers Each protocol handler registers itself Identifies which protocol type they handle Go through each generic handler first Then go through the receive function registered for the packet s protocol 27

Receiving an IP packet part 3 Network layer Ethernet Protocol: IP IP is a registered as a protocol handler for ETH_P_IP packets Packet header identifies next level protocol E.g., Ethernet header states encapsulated protocol is IPv4 IPv4 header states encapsulated protocol is TCP IP handler will either route the packet, deliver locally, or discard Send either to an outgoing queue (if routing) or to the transport layer Look at protocol field inside the IP packet Calls transport-level handlers (tcp_v4_rcv, udp_rcv, icmp_rcv, ) IP handler includes Netfilter hooks Additional checks for packet filtering, port translation, and extensions 28

Receiving an IP packet part 4 Transport layer Next stage (usually): tcp_v4_rcv() or udp_rcv() Check for transport layer errors Look for a socket that should receive this packet (match local & remote addresses and ports) Call tcp_v4_do_rcv: passing it the sk_buff and socket (sock structure) Adds sk_buff to the end of that socket s receive queue The socket may have specific processing options defined If so, apply them Wake up the process (ready state) if it was blocked on the socket 29

Lots of Interrupts! Assume: Non-jumbo maximum payload size: 1500 bytes TCP acknowledgement (no data): 40 bytes Median packet size: 413 bytes Assume a steady flow of network traffic at: 1 Gbps: ~300,000 packets/second 100 Mbps: ~30,000 packets/second Even 9000-byte jumbo frames give us: 1 Gbps: 14,000 packets per second 14,000 interrupts/second One interrupt per received packet Network traffic can generate a LOT of interrupts!! 30

Interrupt Mitigation: Linux NAPI Linux NAPI: New API (c. 2009) Avoid getting thousands of interrupts per second Disable network device interrupts during high traffic Re-enable interrupts when there are no more packets Polling is better at high loads; interrupts are better at low loads Throttle packets If we get more packets than we can process, leave them in the network card s buffer and let them get overwritten (same as dropping a packet) Better to drop packets early than waste time processing them 31

The End 32