Using Software Transactional Memory In Interrupt-Driven Systems

Similar documents
USING SOFTWARE TRANSACTIONAL MEMORY IN INTERRUPT-DRIVEN SYSTEMS

Atomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence?

Cost of Concurrency in Hybrid Transactional Memory. Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University)

Timers 1 / 46. Jiffies. Potent and Evil Magic

Concurrent & Distributed Systems Supervision Exercises

Scheduling Transactions in Replicated Distributed Transactional Memory

6.852: Distributed Algorithms Fall, Class 20

Lock vs. Lock-free Memory Project proposal

Lecture 12 Transactional Memory

Department of Computer Science, Institute for System Architecture, Operating Systems Group. Real-Time Systems '08 / '09. Hardware.

Today s Topics. u Thread implementation. l Non-preemptive versus preemptive threads. l Kernel vs. user threads

Transactional Memory. Concurrency unlocked Programming. Bingsheng Wang TM Operating Systems

Agenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based?

Chí Cao Minh 28 May 2008

Lecture 10: Avoiding Locks

Lecture 25: Board Notes: Threads and GPUs

OPERATING SYSTEM TRANSACTIONS

Synchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17

G52CON: Concepts of Concurrency

Transactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28

Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation

Locking: A necessary evil? Lecture 10: Avoiding Locks. Priority Inversion. Deadlock

Processes and Non-Preemptive Scheduling. Otto J. Anshus

Improving the Practicality of Transactional Memory

1 Publishable Summary

Enhancing Concurrency in Distributed Transactional Memory through Commutativity

Agenda. Threads. Single and Multi-threaded Processes. What is Thread. CSCI 444/544 Operating Systems Fall 2008

Real-Time Programming

CS533 Concepts of Operating Systems. Jonathan Walpole

OVERVIEW. Last Week: But if frequency of high priority task increases temporarily, system may encounter overload: Today: Slide 1. Slide 3.

Today s Topics. u Thread implementation. l Non-preemptive versus preemptive threads. l Kernel vs. user threads

Transactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93

Synchronization. Clock Synchronization

Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II. Prof. Onur Mutlu Carnegie Mellon University 10/12/2012

Multi-core Architecture and Programming

CSE398: Network Systems Design

Reminder from last time

Threads Implementation. Jo, Heeseung

An Evaluation of the Dynamic and Static Multiprocessor Priority Ceiling Protocol and the Multiprocessor Stack Resource Policy in an SMP System

Monitors; Software Transactional Memory

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

Whatever can go wrong will go wrong. attributed to Edward A. Murphy. Murphy was an optimist. authors of lock-free programs LOCK FREE KERNEL

Operating Systems Design Fall 2010 Exam 1 Review. Paul Krzyzanowski

Multiprocessor and Real-Time Scheduling. Chapter 10

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

Hardware Transactional Memory Architecture and Emulation

CSE 4/521 Introduction to Operating Systems. Lecture 29 Windows 7 (History, Design Principles, System Components, Programmer Interface) Summer 2018

DBT Tool. DBT Framework

Transactifying Apache s Cache Module

Concurrent Preliminaries

Hardware Transactional Memory on Haswell

Motivation. Threads. Multithreaded Server Architecture. Thread of execution. Chapter 4

Semaphore. Originally called P() and V() wait (S) { while S <= 0 ; // no-op S--; } signal (S) { S++; }

Xinu on the Transputer

Lecture 2: September 9

A Predictable RTOS. Mantis Cheng Department of Computer Science University of Victoria

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

Synchronization. Chapter 5

ECE 3055: Final Exam

Concurrent Programming. Implementation Alternatives. Content. Real-Time Systems, Lecture 2. Historical Implementation Alternatives.

COMP3151/9151 Foundations of Concurrency Lecture 8

Concurrent Programming

Transactional Memory

Four Components of a Computer System

Last 2 Classes: Introduction to Operating Systems & C++ tutorial. Today: OS and Computer Architecture

TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions

The Real-time Specification for Java

Conflict Detection and Validation Strategies for Software Transactional Memory

Operating Systems 2010/2011

Grand Central Dispatch

Operating Systems (2INC0) 2018/19. Introduction (01) Dr. Tanir Ozcelebi. Courtesy of Prof. Dr. Johan Lukkien. System Architecture and Networking Group

Threads. CS3026 Operating Systems Lecture 06

On Improving Transactional Memory: Optimistic Transactional Boosting, Remote Execution, and Hybrid Transactions

INTRODUCTION. Hybrid Transactional Memory. Transactional Memory. Problems with Transactional Memory Problems

Interrupts and Time. Real-Time Systems, Lecture 5. Martina Maggio 28 January Lund University, Department of Automatic Control

Process Description and Control

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

The Multikernel: A new OS architecture for scalable multicore systems Baumann et al. Presentation: Mark Smith

Introduction. CS3026 Operating Systems Lecture 01

Interrupts and Time. Interrupts. Content. Real-Time Systems, Lecture 5. External Communication. Interrupts. Interrupts

Outline. Introduction. Survey of Device Driver Management in Real-Time Operating Systems

CS533 Concepts of Operating Systems. Jonathan Walpole

CS 475. Process = Address space + one thread of control Concurrent program = multiple threads of control

Go Deep: Fixing Architectural Overheads of the Go Scheduler

CSC Operating Systems Fall Lecture - II OS Structures. Tevfik Ko!ar. Louisiana State University. August 27 th, 2009.

Announcements. Computer System Organization. Roadmap. Major OS Components. Processes. Tevfik Ko!ar. CSC Operating Systems Fall 2009

! Readings! ! Room-level, on-chip! vs.!

Computer Science Window-Constrained Process Scheduling for Linux Systems

4. Concurrency via Threads

Flexible Architecture Research Machine (FARM)

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Module 6: Process Synchronization

CS 571 Operating Systems. Midterm Review. Angelos Stavrou, George Mason University

Monitors; Software Transactional Memory

RT extensions/applications of general-purpose OSs

Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation. What s in a process? Organizing a Process

Review: Creating a Parallel Program. Programming for Performance

Part III Transactions

Optimistic Concurrency Control. April 18, 2018

Transcription:

Using Software Transactional Memory In Interrupt-Driven Systems Department of Mathematics, Statistics, and Computer Science Marquette University Thesis Defense

Introduction Thesis Statement Software transactional memory can be used in interrupt-driven device drivers as a method to automate and provide fine-grained synchronization between the upper and lower halves.

Operating Systems operating system low-level software on a computer. application programming interface set of functions a programmer can use. device driver deals with high-level API and low-level hardware. interrupt external signal to processor. jitter variations in delay of interrupt handling.

Jitter Avoidance Don t disable interrupts! global integer our_var = 1 function increment(void) our_var++.data.globl our_var our_var: 0x0001.text increment: lw s0, 0x4c(zero) addiu s0, s0, 1 sw s0, 0x4c(zero) j ra

Jitter Avoidance Don t disable interrupts! global integer our_var = 1 function increment(void) our_var++.data.globl our_var our_var: 0x0001.text increment: lw s0, 0x4c(zero) addiu s0, s0, 1 sw s0, 0x4c(zero) j ra

Jitter Avoidance Don t disable interrupts! global integer our_var = 1 function increment(void) our_var++.data.globl our_var our_var: 0x0001.text increment: lw s0, 0x4c(zero) addiu s0, s0, 1 sw s0, 0x4c(zero) j ra

Related Work Transactional Memory Transactions began in database community. Brought to systems as concurrency control for multiprocessors. Hardware TM Suffer from physical memory boundaries. Software TM Blocking vs. non-blocking implementations. Blocking STM allows progress guarantees. Hybrid TM HTM is expensive and restrictive. STM needs more space and time to compute. Hybrid TM takes advantages from both (speed and reliability). Also takes disadvantages.

Related Work Operating Systems Lock-free kernels: Synthesis and Cache. Use CAS and DCAS opcodes for shared data structures. TxLinux uses HTM and cxspinlocks.

Related Work For Further Reading... Massalin and Pu. A lock-free multiprocessor OS kernel. Tech. Rep. CUCS-005-91, Columbia University. Gray and Reuter. Transaction Processing: Concepts and Techniques, Morgan Kaufmann, First Edition. Herlihy and Moss. Transactional memory: Architectural support for lock-free data structures. ISCA 93. Rossbach et al. TxLinux: using and managing hardware transactional memory in an operating system. SOSP 07. Ni et al. of Transactional Constructs for C/C++. OOPSLA 08.

Major Contributions Modernized port of the Xinu kernel for IA-32. Method for integrating STM library into kernel. Embedded Xinu augmented with STM: Transactional Xinu. Evaluation of interrupt-driven device drivers in Transactional Xinu.

Outline 1 Critical Sections Transactional System 2 Measurements Testing Methodology 3 Summary Future Work

Outline 1 Critical Sections Transactional System 2 Measurements Testing Methodology 3 Summary Future Work

Outline 1 Critical Sections Transactional System 2 Measurements Testing Methodology 3 Summary Future Work

Device Driver Structure Critical Sections Transactional System

Critical Sections Critical Sections Transactional System A critical section is a piece of code that accesses shared data or resources in the system and must not be accessed by two or more processes simultaneously.

Critical Sections Properties Critical Sections Transactional System atomically perform indivisible operations instantaneously. consistent no illegal system state will exist. isolation no thread will see intermediate state. durable no reversion after completion. These are known as the ACID properties. We are only interested in the A, C, and I properties.

Critical Sections Problems Critical Sections Transactional System Deadlock 1 2 A B context switch 1 2 A B context switch 1 2 A B context switch 1 2 A B t1 t2 t3 t4 Priority Inversion

Critical Sections Jitter Critical Sections Transactional System Variations in the amount of time the system takes to respond to incoming interrupts.

Critical Sections Transactional Memory Critical Sections Transactional System STM library assures that the ACI properties are followed. Compiler and library automatically handle rollbacks and commits.

Critical Sections Interrupt Handling Critical Sections Transactional System Interrupts are implicitly given highest priority in system. STM can be set up to give interrupt-driven transaction highest priority. User-level thread will not prevent interrupt from entering. I/O operations are difficult because they cannot be taken back.

Embedded Xinu The Beginning Critical Sections Transactional System Xinu was developed 25 years ago by Doug Comer. Provide a easy-to-understand O/S for teaching and research. Under 20,000 lines of code, but still a rich experimentation platform. lightweight thread model, shared memory space, preemptive multitasking priority scheduler, synchronization primitives, IPC, and device drivers.

Transactional Xinu Critical Sections Transactional System Built on top of Embedded Xinu. Provides needed components for transactions. POSIX-thread library. Thread-local storage. Intel s STM library.

POSIX-thread library Critical Sections Transactional System Keep the Xinu model: lightweight. pthread_key_create pthread_setspecific pthread_create pthread_join

Interrupt-local Storage Critical Sections Transactional System Thread-local storage gives private memory to every thread. Interrupts push state onto stack, does not update GS register. Interrupt-local storage switches GS context for interrupts.

Transactional Library Critical Sections Transactional System Several modes of operation: optimistic, pessimistic, serial, and obstinate. Optimistic and pessimistic are normal modes. Serial mode works with legacy and irrevocable operations. Obstinate allows for stubborn transactions.

Transactional Library Single Global Lock Atomicity Critical Sections Transactional System Creates an equivalence between global lock code and atomic code. STM library actually uses SGLA for serialized transactions. tm_atomic { wait(global_lock); Statements; Statements; } signal(global_lock);

Transactional Library Obstinate Mode Critical Sections Transactional System Library lets an obstinate transaction beat all conflicting transactions. Provides an interface allowing the programmer to declare an obstinate transaction. Transactional Xinu uses this in interrupt handlers.

Transactional Library Versioning and Logging Critical Sections Transactional System Read versioning tracks the version number of a variable. Writing will obtain a lock and increment the version number. Undo logging saves original values to private memory.

Transactional Library Versioning Example Critical Sections Transactional System Reader Code global integer our_var local integer my_var atomic if ( our_var = 1 ) my_var = our_var + 1 else my_var = 1 Writer Code global integer our_var atomic our_var = our_var + 1 {} {our_var : 2} {} {our_var : 2} {our_var : 3}

Transactional Library Two Phase Locking Critical Sections Transactional System 2PL for contention manager. Maps every contended memory location to a unique lock. Mapping performed at runtime, takes 4 MB of memory. Transactional Xinu minimizes size of atomic sections.

Transactional Device Drivers Critical Sections Transactional System Wrap upper half critical sections in tm_atomic. Wrap lower half shared data in tm_atomic. Every function call and incidental function call must be re-instrumented.

Code Size Differences Critical Sections Transactional System Kernel image Non-STM STM Increase % raw 365,628 595,251 229,623 62.80% stripped 351,048 540,504 189,456 53.97% excluding STM library (170,869 bytes) raw 365,628 424,382 58,754 16.07% stripped 351,048 369,635 18,587 5.29%

Transactional System Measurements Testing Methodology Transactional Xinu was built to run on real hardware, available today. Built on mid-model Pentium 4, 3.0 GHz processor ( Northwood ). No BIOS calls or Linux code used behind-the-scenes.

Loading the Kernel Measurements Testing Methodology

Measurements Measurements Testing Methodology On-chip: read timestamp counter (readtsc). In-system: timer counters (thread time, monotonic time). External: min, avg, max, mdev of round-trip time.

Ping Testing Measurements Testing Methodology Packet enters machine and raises interrupt (RX_TSC). Upper half call transfers incoming data (READ_TSC). Upper half call transfers outgoing data (WRITE_TSC). Final call in lower half of driver (TX_TSC).

Ping Testing Other notes Measurements Testing Methodology Data gathering occurs at runtime, stores in memory until needed. Timer interrupt fires every 1/10 of a millisecond, lightweight.

Ping Testing Ping 1000 Millisecond Interval Measurements Testing Methodology milliseconds 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Average (Non STM) Average (STM) min avg max mdev measure

Ping Testing Ping 500 Millisecond Interval Measurements Testing Methodology milliseconds 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Average (Non STM) Average (STM) min avg max mdev measure

Ping Testing Ping Flood (0 Millisecond Interval) Measurements Testing Methodology milliseconds 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Average (Non STM) Average (STM) min avg max mdev measure

Ping Testing Timestamp Counter Measurements Testing Methodology Machine specific 100,000 micro-operations takes about 33 microseconds.

Ping Testing TSC 1000 Millisecond Interval Measurements Testing Methodology micro op cycles 0 20000 40000 60000 80000 100000 Average (Non STM) Average (STM) RX_TSC READ_TSC WRITE_TSC TX_TSC critical region

Ping Testing TSC 500 Millisecond Interval Measurements Testing Methodology micro op cycles 0 20000 40000 60000 80000 100000 Average (Non STM) Average (STM) RX_TSC READ_TSC WRITE_TSC TX_TSC critical region

Ping Testing TSC Flood (0 Millisecond Interval) Measurements Testing Methodology micro op cycles 0 20000 40000 60000 80000 100000 Average (Non STM) Average (STM) RX_TSC READ_TSC WRITE_TSC TX_TSC critical region

Timing Testing Methodology Measurements Testing Methodology Measuring jitter is difficult. Lightweight timestamp counter at beginning of handler. Packet generator, sending precisely timed packets.

Timing Testing Results Measurements Testing Methodology Average Minimum Maximum Std. Dev. Non-STM 30552911.53 30525638 30582946 13847.91 STM 30553291.98 30534954 30576856 13223.17 Difference 380.45 316-6090 -624.74

Summary Summary Future Work HTM systems are hard to build, use STM instead. Initial results suggest that TM might be able to reduce jitter. Transactional Xinu is built to use Intel s STM library and compiler. Adds POSIX library and interrupt-local storage. Specially instrumented interrupt handlers (tm_atomic and tm_callable) Experimentation shows software overhead is minimal in some scenarios. Many other components still disable interrupts.

Future Work Summary Future Work More tightly integrate STM library to Embedded kernel (scheduler, interrupts, etc.) Test scaling to multi-core systems (what TM was designed for). Test scaling with multiple network interfaces (ZigBee, WiFi, Bluetooth, GigE). Analyze real-time properties, if interrupts can always be received what happens ( interrupt overload ). mschul@mscs.mu.edu http://www.mscs.mu.edu/~mschul/