Servers: Concurrency and Performance. Jeff Chase Duke University

Similar documents
CSCI-1680 Network Programming II. Rodrigo Fonseca

CISC2200 Threads Spring 2015

PROCESSES AND THREADS THREADING MODELS. CS124 Operating Systems Winter , Lecture 8

I/O Models. Kartik Gopalan

Assignment 2 Group 5 Simon Gerber Systems Group Dept. Computer Science ETH Zurich - Switzerland

Exception-Less System Calls for Event-Driven Servers

Asynchronous Events on Linux

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

Chapter 8: I/O functions & socket options

CS118 Discussion Week 2. Taqi

Flash: an efficient and portable web server

SMD149 - Operating Systems

Operating System. Chapter 4. Threads. Lynn Choi School of Electrical Engineering

COMP/ELEC 429/556 Introduction to Computer Networks

CSE 120 Principles of Operating Systems

Chapter 13: I/O Systems

Operating System: Chap13 I/O Systems. National Tsing-Hua University 2016, Fall Semester

Lecture 15: I/O Devices & Drivers

Agenda. Threads. Single and Multi-threaded Processes. What is Thread. CSCI 444/544 Operating Systems Fall 2008

Processes and Non-Preemptive Scheduling. Otto J. Anshus

Module 12: I/O Systems

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

JAVA CONCURRENCY FRAMEWORK. Kaushik Kanetkar

CSE 120 Principles of Operating Systems

Device-Functionality Progression

Chapter 12: I/O Systems. I/O Hardware

Threads and Too Much Milk! CS439: Principles of Computer Systems January 31, 2018

CSci 4061 Introduction to Operating Systems. (Thread-Basics)

CSE 153 Design of Operating Systems Fall 2018

Operating Systems 2 nd semester 2016/2017. Chapter 4: Threads

by I.-C. Lin, Dept. CS, NCTU. Textbook: Operating System Concepts 8ed CHAPTER 13: I/O SYSTEMS

Process Description and Control

Introduction to Asynchronous Programming Fall 2014

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

Background: I/O Concurrency

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

Chapter 13: I/O Systems

Process- Concept &Process Scheduling OPERATING SYSTEMS

A Scalable Event Dispatching Library for Linux Network Servers

Process Concepts. CSC400 - Operating Systems. 3. Process Concepts. J. Sumey

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Silberschatz and Galvin Chapter 12

Processes and Threads

IsoStack Highly Efficient Network Processing on Dedicated Cores

Yi Shi Fall 2017 Xi an Jiaotong University

Module 12: I/O Systems

Answer to exercises chap 13.2

Notes based on prof. Morris's lecture on scheduling (6.824, fall'02).

Chapter 13: I/O Systems

Threads and Too Much Milk! CS439: Principles of Computer Systems February 6, 2019

Comparing the Performance of Web Server Architectures

Lecture 4: Threads; weaving control flow

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues

Prepared by Prof. Hui Jiang Process. Prof. Hui Jiang Dept of Electrical Engineering and Computer Science, York University

! How is a thread different from a process? ! Why are threads useful? ! How can POSIX threads be useful?

CS370 Operating Systems

CS5460: Operating Systems

Process. Prepared by Prof. Hui Jiang Dept. of EECS, York Univ. 1. Process in Memory (I) PROCESS. Process. How OS manages CPU usage? No.

!! How is a thread different from a process? !! Why are threads useful? !! How can POSIX threads be useful?

CS333 Intro to Operating Systems. Jonathan Walpole

Last class: Today: Thread Background. Thread Systems

CHAPTER 2: PROCESS MANAGEMENT

Process! Process Creation / Termination! Process Transitions in" the Two-State Process Model! A Two-State Process Model!

COMP SCI 3SH3: Operating System Concepts (Term 2 Winter 2006) Test 2 February 27, 2006; Time: 50 Minutes ;. Questions Instructor: Dr.

Lecture 2 Process Management

CSE 120 Principles of Operating Systems

CSE 333 Lecture fork, pthread_create, select

OPERATING SYSTEM. Chapter 4: Threads

CONCURRENCY MODEL. UNIX Programming 2014 Fall by Euiseong Seo

CS350: Final Exam Review

CS631 - Advanced Programming in the UNIX Environment Interprocess Communication II

Capriccio : Scalable Threads for Internet Services

Problem Set: Processes

Processes. CS 475, Spring 2018 Concurrent & Distributed Systems

CS 5523 Operating Systems: Midterm II - reivew Instructor: Dr. Tongping Liu Department Computer Science The University of Texas at San Antonio

CS420: Operating Systems

Processes Prof. James L. Frankel Harvard University. Version of 6:16 PM 10-Feb-2017 Copyright 2017, 2015 James L. Frankel. All rights reserved.

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

Part I: Communication and Networking

Main Points of the Computer Organization and System Software Module

Chapter 4: Multi-Threaded Programming

Web Client And Server

Techno India Batanagar Department of Computer Science & Engineering. Model Questions. Multiple Choice Questions:

Operating Systems. Review ENCE 360

Chapter 13: I/O Systems

Chapter 13: I/O Systems. Chapter 13: I/O Systems. Objectives. I/O Hardware. A Typical PC Bus Structure. Device I/O Port Locations on PCs (partial)

CS 153 Design of Operating Systems Winter 2016

Chapter 4: Threads. Chapter 4: Threads

CSI3131 Final Exam Review

CS370 Operating Systems Midterm Review

Today s class. Scheduling. Informationsteknologi. Tuesday, October 9, 2007 Computer Systems/Operating Systems - Class 14 1

CS 318 Principles of Operating Systems

Threads. Computer Systems. 5/12/2009 cse threads Perkins, DW Johnson and University of Washington 1

3.1 Introduction. Computers perform operations concurrently

Applications, services. Middleware. OS2 Processes, threads, Processes, threads, communication,... communication,... Platform

CS 318 Principles of Operating Systems

Dr. Rafiq Zakaria Campus. Maulana Azad College of Arts, Science & Commerce, Aurangabad. Department of Computer Science. Academic Year

ADVANCED OPERATING SYSTEMS

CPS 214: Computer Networks. Slides by Adolfo Rodriguez

Chapter 4: Threads. Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads

Transcription:

Servers: Concurrency and Performance Jeff Chase Duke University

HTTP Server HTTP Server Creates a socket (socket) Binds to an address Listens to setup accept backlog Can call accept to block waiting for connections (Can call select to check for data on multiple socks) Handle request GET /index.html HTTP/1.0\n <optional body, multiple lines>\n \n

Inside your server packet queues Server application (Apache, Tomcat/Java, etc) accept queue Measures offered load response time throughput utilization listen queue

Example: Video On Demand Client() { fd = connect( server ); write (fd, video.mpg ); while (!eof(fd)) { read (fd, buf); display (buf); Server() { while (1) { cfd = accept(); read (cfd, name); fd = open (name); while (!eof(fd)) { read(fd, block); write (cfd, block); close (cfd); close (fd); How many clients can the server support? Suppose, say, 200 kb/s video on a 100 Mb/s network link? [MIT/Morris]

Performance analysis Server capacity: Network (100 Mbit/s) Disk(20 Mbyte/s) Obtained performance: one client stream Server is limited by software structure If a video is 200 Kbit/s, server should be able to support more than one client. 500? [MIT/Morris]

WebServer Flow Create ServerSocket TCP socket space 128.36.232.5 128.36.230.2 connsocket = accept() read request from connsocket read local file write file to connsocket state: listening address: {*.6789, *.* completed connection queue: sendbuf: recvbuf: state: established address: {128.36.232.5:6789, 198.69.10.10.1500 sendbuf: recvbuf: state: listening address: {*.25, *.* completed connection queue: sendbuf: recvbuf: close connsocket Discussion: what does step do and how long does it take?

Web Server Processing Steps Accept Client Connection may block waiting on network Read HTTP Request Header Find File may block waiting on disk I/O Send HTTP Response Header Read File Send Data Want to be able to process requests concurrently.

Process States and Transitions running (user) interrupt, exception trap/return Sleep running (kernel) Run Yield blocked Wakeup ready

Server Blocking accept() when no connect requests are waiting on the listen queue What if server has multiple ports to listen from? E.g., 80 for HTTP, 443 for HTTPS open/read/write on server files read() on a socket, if the client is sending too slowly write() on socket, if the client is receiving too slowly Yup, TCP has flow control like pipes What if the server blocks while serving one client, and another client has work to do?

Under the Hood start (arrival rate λ) CPU I/O completion I/O request I/O device exit (throughput λ until some center saturates)

Concurrency and Pipelining CPU DISK Before NET CPU DISK NET After

Better single-server performance Goal: run at server s hardware speed Disk or network should be bottleneck Method: Pipeline blocks of each request Multiplex requests from multiple clients Two implementation approaches: Multithreaded server Asynchronous I/O [MIT/Morris]

Concurrent threads or processes Using multiple threads/processes so that only the flow processing a particular request is blocked Java: extends Thread or implements Runnable interface Example: a Multi-threaded WebServer, which creates a thread for each request

Multiple Process Architecture Process 1 Accept Conn Read Request Find File Send Header Read File Send Data separate address spaces Process N Accept Conn Read Request Find File Send Header Read File Send Data Advantages Simple programming while addressing blocking issue Disadvantages Many processes; large context switch overheads Consumes much memory Optimizations involving sharing information among processes (e.g., caching) harder

Using Threads Thread 1 Accept Conn Read Request Find File Send Header Read File Send Data Thread N Accept Conn Read Request Find File Send Header Read File Send Data Advantages Lower context switch overheads Shared address space simplifies optimizations (e.g., caches) Disadvantages Need kernel level threads (why?) Some extra memory needed to support multiple stacks Need thread-safe programs, synchronization

Threads A thread is a schedulable stream of control. defined by CPU register values (PC, SP) suspend: save register values in memory resume: restore registers from memory Multiple threads can execute independently: They can run in parallel on multiple CPUs... - physical concurrency or arbitrarily interleaved on a single CPU. - logical concurrency Each thread must have its own stack.

Multithreaded server server() { while (1) { cfd = accept(); read (cfd, name); fd = open (name); while (!eof(fd)) { read(fd, block); write (cfd, block); close (cfd); close (fd); for (i = 0; i < 10; i++) threadfork (server); When waiting for I/O, thread scheduler runs another thread What about references to shared data? Synchronization [MIT/Morris]

Event-Driven Programming One execution stream: no CPU concurrency. Register interest in events (callbacks). Event loop waits for events, invokes handlers. No preemption of event handlers. Handlers generally shortlived. Event Loop Event Handlers [Ousterhout 1995]

Single Process Event Driven (SPED) Accept Conn Read Request Find File Send Header Read File Send Data Event Dispatcher Single threaded Asynchronous (non-blocking) I/O Advantages Single address space No synchronization Disadvantages In practice, disk reads still block

Asynchronous Multi-Process Event Driven (AMPED) Accept Conn Read Request Find File Send Header Read File Send Data Event Dispatcher Helper 1 Helper 1 Helper 1 Like SPED, but use helper processes/thread for disk I/O Use IPC to communicate with helper process Advantages Shared address space for most web server functions Concurrency for disk I/O Disadvantages IPC between main thread and helper threads This hybrid model is used by the Flash web server.

Event-Based Concurrent Servers Using I/O Multiplexing Maintain a pool of connected descriptors. Repeat the following forever: Use the Unix select function to block until: (a) New connection request arrives on the listening descriptor. (b) New data arrives on an existing connected descriptor. If (a), add the new connection to the pool of connections. If (b), read any available data from the connection Close connection on EOF and remove it from the pool. [CMU 15-213]

Select If a server has many open sockets, how does it know when one of them is ready for I/O? int select(int n, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout); Issues with scalability: alternative event interfaces have been offered.

Asynchronous I/O struct callback { bool (*is_ready)(); void (*cb)(arg); void *arg; main() { while (1) { for (c = each callback) { if (c->is_ready()) c->handler(c->arg); Code is structured as a collection of handlers Handlers are nonblocking Create new handlers for blocking operations When operation completes, call handler [MIT/Morris]

Asychronous server init() { on_accept(accept_cb); accept_cb() { on_readable(cfd,name_cb); on_readable(fd, fn) { c = new callback(test_readable, fn, fd); add c to callback list; name_cb(cfd) { read(cfd,name); fd = open(name); on_readable(fd, read_cb); read_cb(cfd, fd) { read(fd, block); on_writeeable(fd, write_cb); write_cb(cfd, fd) { write(cfd, block); on_readable(fd, read_cb); [MIT/Morris]

Multithreaded vs. Async Hard to program Locking code Need to know what blocks Coordination explicit State stored on thread s stack Memory allocation implicit Context switch may be expensive Multiprocessors Hard to program Callback code Need to know what blocks Coordination implicit State passed around explicitly Memory allocation explicit Lightweight context switch Uniprocessors [MIT/Morris]

Coordination example Threaded server: Thread for network interface Interrupt wakes up network thread Protected (locks and conditional variables) shared buffer shared between server threads and network thread Asynchronous I/O Poll for packets How often to poll? Or, interrupt generates an event Be careful: disable interrupts when manipulating callback queue. [MIT/Morris]

One View Threads!

Should You Abandon Threads? No: important for high-end servers (e.g. databases). But, avoid threads wherever possible: Use events, not threads, for GUIs, distributed systems, low-end servers. Only use threads where true CPU concurrency is needed. Where threads needed, isolate usage in threaded application kernel: keep most of code single-threaded. Event-Driven Handlers Threaded Kernel [Ousterhout 1995]

Another view Events obscure control flow For programmers and tools Web Server Threads thread_main(int sock) { struct session s; accept_conn(sock, &s); read_request(&s); pin_cache(&s); write_response(&s); unpin(&s); pin_cache(struct session *s) { pin(&s); if(!in_cache(&s) ) read_file(&s); Events AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); RequestHandler(struct session *s) { ; CacheHandler.enqueue(s); CacheHandler(struct session *s) { pin(s); if(!in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s);... ExitHandlerr(struct session *s) { ; unpin(&s); free_session(s); Accept Conn. Read Request Pin Cache Write Response Exit Read File [von Behren]

Control Flow Events obscure control flow For programmers and tools Web Server Threads Events Accept Conn. thread_main(int sock) { struct session s; accept_conn(sock, &s); read_request(&s); pin_cache(&s); write_response(&s); unpin(&s); pin_cache(struct session *s) { pin(&s); if(!in_cache(&s) ) read_file(&s); CacheHandler(struct session *s) { pin(s); if(!in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s); RequestHandler(struct session *s) { ; CacheHandler.enqueue(s);... ExitHandlerr(struct session *s) { ; unpin(&s); free_session(s); AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); Read Request Pin Cache Write Response Exit [von Behren] Read File

Exceptions Exceptions complicate control flow Harder to understand program flow Cause bugs in cleanup code Web Server Threads Events Accept Conn. thread_main(int sock) { struct session s; accept_conn(sock, &s); if(!read_request(&s) ) return; pin_cache(&s); write_response(&s); unpin(&s); pin_cache(struct session *s) { pin(&s); if(!in_cache(&s) ) read_file(&s); CacheHandler(struct session *s) { pin(s); if(!in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s); RequestHandler(struct session *s) { ; if( error ) return; CacheHandler.enqueue(s);... ExitHandlerr(struct session *s) { ; unpin(&s); free_session(s); AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); Read Request Pin Cache Write Response Exit [von Behren] Read File

State Management Events require manual state management Hard to know when to free Use GC or risk bugs Web Server Threads Events Accept Conn. thread_main(int sock) { struct session s; accept_conn(sock, &s); if(!read_request(&s) ) return; pin_cache(&s); write_response(&s); unpin(&s); pin_cache(struct session *s) { pin(&s); if(!in_cache(&s) ) read_file(&s); CacheHandler(struct session *s) { pin(s); if(!in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s); RequestHandler(struct session *s) { ; if( error ) return; CacheHandler.enqueue(s);... ExitHandlerr(struct session *s) { ; unpin(&s); free_session(s); AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); Read Request Pin Cache Write Response Exit [von Behren] Read File

Thread 1 Accept Conn Read Request Find File Send Header Read File Send Data Thread N Accept Conn Read Request Find File Send Header Read File Send Data

Internet Growth and Scale The Internet How to handle all those client requests raining on your server?

Servers Under Stress Ideal Performance Peak: some resource at max Overload: some resource thrashing Load (concurrent requests, or arrival rate) [Von Behren]

Response Time Components Wire time + Queuing time + Service demand + Wire time (response) Depends on Cost/length of request Load conditions at server latency offered load

Queuing Theory for Busy People offered load request stream @ arrival rate λ wait here M/M/1 Service Center Process for mean service demand D Big Assumptions Queue is First-Come-First-Served (FIFO, FCFS). Request arrivals are independent (poisson arrivals). Requests have independent service demands. i.e., arrival interval and service demand are exponentially distributed (noted as M ).

Utilization What is the probability that the center is busy? Answer: some number between 0 and 1. What percentage of the time is the center busy? Answer: some number between 0 and 100 These are interchangeable: called utilization U If the center is not saturated, i.e., it completes all its requests in some bounded time, then: U = λd = (arrivals/t * service demand) Utilization Law The probability that the service center is idle is 1-U.

Little s Law For an unsaturated queue in steady state, mean response time R and mean queue length N are governed by: Little s Law: N = λr Suppose a task T is in the system for R time units. During that time: λr new tasks arrive. N tasks depart (all tasks ahead of T). But in steady state, the flow in balances flow out. Note: this means that throughput X = λ.

Inverse Idle Time Law R Service center saturates as 1/ λ approaches D: small increases in λ cause large increases in the expected response time R. U 1(100%) Little s Law gives response time R = D/(1 - U). Intuitively, each task T s response time R = D + DN. Substituting λr for N: R = D + D λr Substituting U for λd: R = D + UR R - UR = D --> R(1 - U) = D --> R = D/(1 - U)

What does this tell us about server behavior at saturation?

Under the Hood start (arrival rate λ) CPU I/O completion I/O request I/O device exit (throughput λ until some center saturates)

Common Bottlenecks No more File Descriptors Sockets stuck in TIME_WAIT High Memory Use (swapping) CPU Overload Interrupt (IRQ) Overload [Aaron Bannert]

Scaling Server Sites: Clustering Clients L4: TCP L7: HTTP SSL etc. virtual IP addresses (VIPs) smart switch server array Goals server load balancing failure detection access control filtering priorities/qos request locality transparent caching What to switch/filter on? L3 source IP and/or VIP L4 (TCP) ports etc. L7 URLs and/or cookies L7 SSL session IDs

Scaling Services: Replication Site A Site B Distribute service load across multiple sites. How to select a server site for each client or request? Is it scalable?? Internet Client

Extra Slides (Any new information on the following slides will not be tested.)

Event-Based Concurrent Servers Using I/O Multiplexing Maintain a pool of connected descriptors. Repeat the following forever: Use the Unix select function to block until: (a) New connection request arrives on the listening descriptor. (b) New data arrives on an existing connected descriptor. If (a), add the new connection to the pool of connections. If (b), read any available data from the connection Close connection on EOF and remove it from the pool. [CMU 15-213]

Problems of Multi-Thread Server High resource usage, context switch overhead, contended locks Too many threads throughput meltdown, response time explosion Solution: bound total number of threads

Event-Driven Programming Event-driven programming, also called asynchronous i/o Using Finite State Machines (FSM) to monitor the progress of requests Yields efficient and scalable concurrency Many examples: Click router, Flash web server, TP Monitors, etc. Java: asynchronous i/o for an example see: http://www.cafeaulait.org/books/jnp3/examples/12/

Traditional Processes Expensive and heavyweight One system call per process Fork overhead Coordination

Events Need async I/O Need select Wasn t originally available Not standardized Immature But efficient Code is distributed all through the program Harder to debug and understand

Threads Separate interface and implementation Pthreads interface Implementation is user-level or kernel (native) If user-level, needs async I/O But hide the abstraction behind the thread interface

Reference The State of the Art in Locally Distributed Webserver Systems Valeria Cardellini, Emiliano Casalicchio, Michele Colajanni and Philip S. Yu