BENCHMARKING LIBEVENT AGAINST LIBEV

Similar documents
Introduction to Asynchronous Programming Fall 2014

Exception-Less System Calls for Event-Driven Servers

RxNetty vs Tomcat Performance Results

A Scalable Event Dispatching Library for Linux Network Servers

The Performance of µ-kernel-based Systems

CS 326: Operating Systems. Process Execution. Lecture 5

Assignment 2 Group 5 Simon Gerber Systems Group Dept. Computer Science ETH Zurich - Switzerland

Processors, Performance, and Profiling

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 5)

PROCESS SCHEDULING II. CS124 Operating Systems Fall , Lecture 13

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

AUTOBEST: A United AUTOSAR-OS And ARINC 653 Kernel. Alexander Züpke, Marc Bommert, Daniel Lohmann

Operating- System Structures

Sistemi in Tempo Reale

CSE398: Network Systems Design

Utilizing Linux Kernel Components in K42 K42 Team modified October 2001

An AIO Implementation and its Behaviour

High Performance Web Server in Haskell

Performance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs

Real-Time Programming

File System Forensics : Measuring Parameters of the ext4 File System

I, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3.

The benefits and costs of writing a POSIX kernel in a high-level language

ò mm_struct represents an address space in kernel ò task represents a thread in the kernel ò A task points to 0 or 1 mm_structs

Proposal To Include Java. As A Language For The IOI.

Scheduling. Don Porter CSE 306

Comprehensive Kernel Instrumentation via Dynamic Binary Translation

December 8, epoll: asynchronous I/O on Linux. Pierre-Marie de Rodat. Synchronous/asynch. epoll vs select and poll

AMT use case: Upipe + Chrome. Christophe Massiot (EBU multicast 2014)

1999 Stackless Python 2004 Greenlet 2006 Eventlet 2009 Gevent 0.x (libevent) 2011 Gevent 1.0dev (libev, c-ares)

CPSC 427a: Object-Oriented Programming

5 Solutions. Solution a. no solution provided. b. no solution provided

L41: Lab 2 - Kernel implications of IPC

Virtual Memory COMPSCI 386

Kernel Korner AEM: A Scalable and Native Event Mechanism for Linux

PROCESSES AND THREADS THREADING MODELS. CS124 Operating Systems Winter , Lecture 8

Linux - Not real-time!

The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication

Computer Systems A Programmer s Perspective 1 (Beta Draft)

Processes and Non-Preemptive Scheduling. Otto J. Anshus

SAP NetWeaver Performance and Availability

Design Overview of the FreeBSD Kernel CIS 657

Real-Time Systems Hermann Härtig Real-Time Operating Systems Brief Overview

Introduction. CS3026 Operating Systems Lecture 01

Design Overview of the FreeBSD Kernel. Organization of the Kernel. What Code is Machine Independent?

Thread. Disclaimer: some slides are adopted from the book authors slides with permission 1

Real-Time Systems. Real-Time Operating Systems

Active Server Pages Architecture

Software Development & Education Center

VEOS high level design. Revision 2.1 NEC

Serial-TCP/IP bridge

MEMORY MANAGEMENT/1 CS 409, FALL 2013

Processes and Threads

Visual Profiler. User Guide

CS533 Concepts of Operating Systems. Jonathan Walpole

Memory management. Knut Omang Ifi/Oracle 10 Oct, 2012

Interrupt response times on Arduino and Raspberry Pi. Tomaž Šolc

Protection and System Calls. Otto J. Anshus

ProxySQL's Internals

Unit 2 : Computer and Operating System Structure

My malloc: mylloc and mhysa. Johan Montelius HT2016

Accelerated Library Framework for Hybrid-x86

Hardware OS & OS- Application interface

For use by students enrolled in #71251 CSE430 Fall 2012 at Arizona State University. Do not use if not enrolled.

This book is dedicated to Sara, Inara, and the newest little one, who make it all worthwhile.

Operating Systems. II. Processes

Announcements. Reading. Project #1 due in 1 week at 5:00 pm Scheduling Chapter 6 (6 th ed) or Chapter 5 (8 th ed) CMSC 412 S14 (lect 5)

SPACE: SystemC Partitioning of Architectures for Co-design of real-time Embedded systems

Virtual Machine Design

Linux Driver and Embedded Developer

2008 Chapter-8 L1: "Embedded Systems - Architecture, Programming and Design", Raj Kamal, Publs.: McGraw-Hill, Inc.

Processes, Context Switching, and Scheduling. Kevin Webb Swarthmore College January 30, 2018

24-vm.txt Mon Nov 21 22:13: Notes on Virtual Machines , Fall 2011 Carnegie Mellon University Randal E. Bryant.

VALE: a switched ethernet for virtual machines

Multiprocessor Systems. Chapter 8, 8.1

Chapter 10: Case Studies. So what happens in a real operating system?

Overview. This Lecture. Interrupts and exceptions Source: ULK ch 4, ELDD ch1, ch2 & ch4. COSC440 Lecture 3: Interrupts 1

Module 20: The Linux System

The Kernel Abstraction

NVMe Over Fabrics: Scaling Up With The Storage Performance Development Kit

Comparing and Evaluating epoll, select, and poll Event Mechanisms

DIN EN ISO 9001:2000 certified ADDI-DATA GmbH Dieselstraße 3 D OTTERSWEIER Technical support: +49 (0)7223 / Introduction Linux drivers

Classifying Information Stored in Memory! Memory Management in a Uniprogrammed System! Segments of a Process! Processing a User Program!

Chapter 15: The Linux System. Operating System Concepts Essentials 8 th Edition

Chapter 15: The Linux System

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Computer Systems Engineering: Spring Quiz I Solutions

Module 22: The Linux System

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

Chapter 21: The Linux System. Chapter 21: The Linux System. Objectives. History. The Linux Kernel. Linux 2.0

Hardware Emulation and Virtual Machines

Architecture without explicit locks for logic simulation on SIMD machines

LINUX INTERNALS & NETWORKING Weekend Workshop

Module 12: I/O Systems

Performance Evaluation of Xenomai 3

LINUX OPERATING SYSTEM Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science

Process Description and Control

CISC2200 Threads Spring 2015

Virtualization, Xen and Denali

Chapter 21: The Linux System

Parallel Performance Studies for a Clustering Algorithm

Transcription:

BENCHMARKING LIBEVENT AGAINST LIBEV Top 2011-01-11, Version 6 This document briefly describes the results of running the libevent benchmark program against both libevent and libev. Libevent Overview Libevent (first released in 2000-11-14) is a high-performance event loop that supports a simple API, two event types (which is can either I/O+timeout or signal+timeout) and a number of "backends" (select, poll, epoll, kqueue and /dev/poll at the time of this writing). Algorithmically, it uses red-black trees to organise timers and doubly linked lists for other types. It can have at most one read and one write watcher per file descriptor or signal. It also offers a simple DNS resolver and server and HTTP server and client code, as well as a socket buffering abstraction. Libev Overview Libev (first released in 2007-11-12) is also a high-performance event loop, supporting eight event types (I/O, real time timers, wall clock timers, signals, child status changes, idle, check and prepare handlers). It uses a priority queue to manage timers and uses arrays as fundamental data structure. It has no artificial limitations on the number of watchers waiting for the same event. It offers an emulation layer for libevent and optionally the same DNS, HTTP and buffer management (by reusing the corresponding libevent code through its emulation layer). Benchmark Setup The benchmark is very simple: first a number of socket pairs is created, then event watchers for those pairs are installed and then a (smaller) number of "active clients" send and receive data on a subset of those sockets. The benchmark program used was bench.c, taken from the libevent distribution, modified to collect total time per test iteration, optionally enable timeouts on the event watchers as well as optionally using the native libev API and to output the times differently. For libevent, version 1.4.3 was used, while for libev, version 3.31 was used. Both libraries and the test program were compiled by gcc version 4.2.3 with the optimisation options -O3 -fno-guess-branch-probability -g and run on a Core2 Quad at 3.6GHz with Debian GNU/Linux (Linux version 2.6.24-1). Both libraries were configured to use the epoll interface (the most performant interface available in either library on the test machine). The same benchmark program was used to both run the libevent vs. libevent emulation benchmarks (the same code paths/source lines were executed in this case) and run the the native libev API benchmark (with different codepaths, but functionally equivalent). The difference between the libevent and libev+libevent emulation versions is strictly limited to the use of different header files (event.h from libevent, or the event.h emulation from libev).

Each run of the benchmark program consists of two iterations outputting the total time per iteration as well as the time used in handling the requests only. The program was run for various total numbers of file descriptors. Each run is composed of six individual runs, and the result used is the minimum time from these runs, for the second iteration of the test. The test program was run on its own cpu, with realtime priority, to achieve stable timings. First Benchmark: no timeouts, 100 and 1000 active clients Without further ado, here are the results (click for larger version): The left two graphs show the overall time spent for setting up watchers, preparing the sockets and polling for events, while the right two graphs only include the actual poll processing. The top row represents 100 active clients (clients doing I/O), the bottom row uses 1000. All graphs have a logarithmic fd-axis to sensibly display the large range of file descriptor numbers of 100 to 100000 (in reality, its actually socket pairs, thus there are actually twice the number of file descriptors in the process). Discussion The total time per iteration increases much faster for libevent than for libev, taking almost twice as much time than libev regardless of the number of clients. Both exhibit similar growth characteristics, though. The polling time is also very similar, with libevent being consistently being slower in the 1000-fd case, and virtually identical timings in the 100-fd case. The absolute difference, however, is small (less than 5%). The native API timings are consistently better than the emulation API, but the absolute difference again is small. Interpretation

The cost for setting up or changing event watchers is clearly much higher for libevent than for libev, and API differences cannot account for this (the difference between native API and emulation API in libev is very small). This is important in practise, as the libevent API has no good interface to change event watchers or timeouts on the fly. At higher numbers of file descriptors, libev is consistently faster than libevent. Also, the libevent API emulation itself only results in a small overhead in this case. Second Benchmark: idle timeouts, 100 and 1000 active clients Again, the results first (click for larger version): The graph layout is identical to the first benchmark. The difference is that this time, there is a random timeout attached to each socket. This was done to mirror real-world conditions where network servers usually need to maintain idle timeouts per connection. Those idle timeouts to be reset on activity. This is implemented by setting a random 10 to 11 second timeout during set-up, and deleting/re-adding the event watcher each time a client receives data. Discussion The graphs have both changed dramatically. The total time per iteration has increased dramatically for libevent, but has increased only slightly for the libev curves. The difference between native and emulated API has become more apparent. The event processing graphs look very different now, with libev being consistently faster (with a factor of two to almost three) over the whole range of file descriptor numbers. The growth behaviour exhibited is roughly similar, but much lower for libev than for libevent.

As for libev alone, the native API is consistently faster than the emulated API, and the difference is noticable compared to the first benchmark, but still relatively small when compared to the difference between libevent and libev, regarding poll times. The overall time has been almost halved, however. Interpretation Both libev and libevent use a binary heap for timer management (earlier versions of libevent used a red-black tree), which explains the similar growth characteristics. Apparently, libev makes better use of the binary heap than libevent, even with identical API calls (note that the upcoming 3.33 release of libev, not used in this benchmark, uses a cache-aligned 4-heap and benchmarks consistently faster than 3.31). Another reason for the higher performance, especially in the set-up phase, might be that libevent calls the epoll_ctl syscall on each change (twice per fd for del/add), while libev only sends changes to the kernel before the next poll (EPOLL_MOD). The native API is noticably faster in this case (almost twice as fast overall). The most likely reason is again timer management, as libevent uses two O(log n) operations, while libev needs a single and simpler O(log n) operation. Summary The benchmark clearly shows that libev has much lower costs and is consequently faster than libevent. API design issues also play a role in the results as the native API can do much better than the emulated API when timers are being used. Even though this puts libev at a disadvantage (the emulation layer has to emulate some aspects of libevent that its native API does not. It also has to map each libevent watcher to three of its own watchers, and has to run thunking code to map from those three watchers to the libevent user code, due to the different structure of their callbacks). It is still much faster than libevent even when using the libevent emulation API. Appendix: Benchmark graphs for libev 4.03, libevent 1.4.13, libevent 2.0.10 Here are the benchmark graphs, redone with more current versions. Nothing drastic has changed, libevent2 seems to be a tiny bit slower (probably due to the extra thread locking), libev a tiny bit faster.

Author/Contact Marc Alexander Lehmann <libev@schmorp.de>