Software transactional memory

Similar documents
Transactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93

Scheduling Transactions in Replicated Distributed Transactional Memory

Agenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based?

Summary: Issues / Open Questions:

6.852: Distributed Algorithms Fall, Class 20

Lock vs. Lock-free Memory Project proposal

Chí Cao Minh 28 May 2008

Enhancing Concurrency in Distributed Transactional Memory through Commutativity

Transactional Locking II

Performance Evaluation of Adaptivity in STM. Mathias Payer and Thomas R. Gross Department of Computer Science, ETH Zürich

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 28 November 2014

Transactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28

Cost of Concurrency in Hybrid Transactional Memory. Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University)

Linked Lists: The Role of Locking. Erez Petrank Technion

Transactional Memory. R. Guerraoui, EPFL

Atomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence?

Speculative Locks. Dept. of Computer Science

Transactional Memory

Transactional Memory. Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

Portland State University ECE 588/688. Transactional Memory

Maintaining Consistent Transactional States without a Global Clock

Conflict Detection and Validation Strategies for Software Transactional Memory

! Part I: Introduction. ! Part II: Obstruction-free STMs. ! DSTM: an obstruction-free STM design. ! FSTM: a lock-free STM design

SOFTWARE TRANSACTIONAL MEMORY FOR MULTICORE EMBEDDED SYSTEMS

Transactional Memory. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Software Transactional Memory

Lowering the Overhead of Nonblocking Software Transactional Memory

EigenBench: A Simple Exploration Tool for Orthogonal TM Characteristics

Relaxing Concurrency Control in Transactional Memory. Utku Aydonat

Time-based Transactional Memory with Scalable Time Bases

Integrating Transactionally Boosted Data Structures with STM Frameworks: A Case Study on Set

TRANSACTION MEMORY. Presented by Hussain Sattuwala Ramya Somuri

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES

Commit Phase in Timestamp-based STM

Scalable and transparent parallelization of multiplayer games. Bogdan Simion

Concurrent Preliminaries

A Qualitative Survey of Modern Software Transactional Memory Systems

6 Transactional Memory. Robert Mullins

CS377P Programming for Performance Multicore Performance Synchronization

A Dynamic Instrumentation Approach to Software Transactional Memory. Marek Olszewski

Transactional Memory. Concurrency unlocked Programming. Bingsheng Wang TM Operating Systems

bool Account::withdraw(int val) { atomic { if(balance > val) { balance = balance val; return true; } else return false; } }

HydraVM: Mohamed M. Saad Mohamed Mohamedin, and Binoy Ravindran. Hot Topics in Parallelism (HotPar '12), Berkeley, CA

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Serializability of Transactions in Software Transactional Memory

Chapter 5. Multiprocessors and Thread-Level Parallelism

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations

Design Tradeoffs in Modern Software Transactional Memory Systems

CS 571 Operating Systems. Midterm Review. Angelos Stavrou, George Mason University

The Common Case Transactional Behavior of Multithreaded Programs

CS140 Operating Systems Midterm Review. Feb. 5 th, 2009 Derrick Isaacson

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Mohamed M. Saad & Binoy Ravindran

Integrating Caching and Prefetching Mechanisms in a Distributed Transactional Memory

THE concept of a transaction forms the foundation of

Implementing and Evaluating Nested Parallel Transactions in STM. Woongki Baek, Nathan Bronson, Christos Kozyrakis, Kunle Olukotun Stanford University

Database Management and Tuning

Multi-Core Computing with Transactional Memory. Johannes Schneider and Prof. Roger Wattenhofer

Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II. Prof. Onur Mutlu Carnegie Mellon University 10/12/2012

Computer Science 146. Computer Architecture

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Chapter 5. Multiprocessors and Thread-Level Parallelism

Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation

STMBench7: A Benchmark for Software Transactional Memory

Monitors; Software Transactional Memory

Concurrent Computing

Speculative Synchronization

Improving the Practicality of Transactional Memory

Building Efficient Concurrent Graph Object through Composition of List-based Set

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

The Multicore Transformation

Scheduler Activations. CS 5204 Operating Systems 1

Module 6: Process Synchronization

Software Speculative Multithreading for Java

Chapter 20: Database System Architectures

Chapter 6: Process Synchronization. Module 6: Process Synchronization

CS4021/4521 INTRODUCTION

Chapter 6: Process Synchronization. Operating System Concepts 8 th Edition,

Transactional Memory

Using Software Transactional Memory In Interrupt-Driven Systems

Module 6: Process Synchronization. Operating System Concepts with Java 8 th Edition

INTRODUCTION. Hybrid Transactional Memory. Transactional Memory. Problems with Transactional Memory Problems

TLRW: Return of the Read-Write Lock

Early Results Using Hardware Transactional Memory for High-Performance Computing Applications

Evaluating Contention Management Using Discrete Event Simulation

Lecture: Consistency Models, TM

What is the Race Condition? And what is its solution? What is a critical section? And what is the critical section problem?

ABORTING CONFLICTING TRANSACTIONS IN AN STM

Design and Implementation of Transactions in a Column-Oriented In-Memory Database System

A Practical Scalable Distributed B-Tree

Monitors; Software Transactional Memory

Goldibear and the 3 Locks. Programming With Locks Is Tricky. More Lock Madness. And To Make It Worse. Transactional Memory: The Big Idea

Message Passing Improvements to Shared Address Space Thread Synchronization Techniques DAN STAFFORD, ROBERT RELYEA

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo

Stretching Transactional Memory

A Wait-free Multi-word Atomic (1,N) Register for Large-scale Data Sharing on Multi-core Machines

Outline. Database Tuning. Ideal Transaction. Concurrency Tuning Goals. Concurrency Tuning. Nikolaus Augsten. Lock Tuning. Unit 8 WS 2013/2014

Transcription:

Transactional locking II (Dice et. al, DISC'06) Time-based STM (Felber et. al, TPDS'08) Mentor: Johannes Schneider March 16 th, 2011

Motivation Multiprocessor systems Speed up time-sharing applications How to parallelize easily and eciently? Concurrent access to shared memory

Locking Coarse-grained or ne-grained Dicult to program when # of locks is large Problems: deadlocks, priority inversions, convoying Race conditions Other solutions?

Transactional memory (Herlihy et. al, ISCA'93) Architectural support for lock-free data structures Based on LL/SC Changes cache coherence protocols Instructions LT, LTX, ST COMMIT, ABORT VALIDATE NOT HERE YET

(Shavit et. al, PODC'95) Software method to support exible transactional programming of synchronization operations Transactional model Transaction = atomic sequence of steps Protect access to shared objects Only for static transactions Easier to program Higher performance than locks?

Transactional locking II Dice et. al, DISC'06 "If we implemented ne-grained locking with the same simplicity of coarse-grained, we would never think of building a transactional memory" Fits any memory lifecycle (GC, malloc/free) Safe consistent memory states Performance 10x faster than single locks on RBTs Better than all lock-based & non-blocking STMs

Transactional locking II (cont.) 1. Use commit-time locking instead of encounter-time locking Encounter-time locking (Ennals, Saha): quick reads of freshly written values in memory by the read-only transaction Commit-time locking: memory locked only during the commit phase Under high loads, better performance Works with malloc/free

Transactional locking II (cont.) 2. Use global version clock for validation Why a clock? Lock STMs inconsistent states Need validation periodically Shared global version clock Incremented by write transactions Read by all transactions State always consistent

Transactional locking II (cont.) 3. Locks to shared data? PO (per object): lock per shared object Insertion of lock elds PS (per stripe): array of locks per memory stripe Each transactable location is mapped to a stripe No changes to data structure

Transactional locking II (cont.) Read-only transactions RV Clock Speculative execution Write lock is free If lock value RV commit If lock value > RV abort

Transactional locking II (cont.) Write transactions RV Clock Speculative execution Write lock is free Lock value RV Maintain read/write set Lock write set

Transactional locking II (cont.) Write transactions RV Clock Speculative execution Write lock is free Lock value RV Maintain read/write set Lock write set WV increment(clock) Validate each lock value RV

Transactional locking II (cont.) Write transactions RV Clock Speculative execution Write lock is free Lock value RV Maintain read/write set Lock write set WV increment(clock) Validate each lock value RV Release locks with value WV

Transactional locking II (cont.) Small RBT: 30% put, 30% delete, 40% get/16-proc SunV890

Transactional locking II (cont.) Large RBT: 5% put, 5% delete, 90% get/16-proc SunV890

Transactional locking II (cont.) Speedup Large RBT 5% puts, 5% deletes, 90% gets

Transactional locking II (cont.) Conclusions STM scalability is comparable with hand-crafted, but overheads are much higher Read set and validation cost aect performance No meltdown under contention Seamless operation with malloc/free

LSA-STM Felber et. al, TPDS'08 Current trade-o between consistency and performance

LSA-STM (cont.) Lazy snapshot algorithm speeds up transactions for large data sets, while reducing the overhead of incremental validation How? Time-based algorithms allow to keep multiple versions of objects for RO transactions

LSA-STM (cont.) Global clock CT counts # of commits STM objects (A,B,C) have multiple versions Each version has a validity range R v relative to CT Most recent version has upper bound

LSA-STM (cont.) Every transaction maintains a snapshot with a validity range RT Snapshot = of the accessed versions' validity range Initialized to [S T, ] If snapshot == nonempty commit

LSA-STM (cont.) When a transaction T reads an object The version's validity range must T's snapshot Snapshot bounds are adjusted to the Validity range ends at time of the read

LSA-STM (cont.) If T's snapshot with the latest version's validity range No need to update the snapshot

LSA-STM (cont.) If T's snapshot does not with the latest version's validity range Extend snapshot (may fail) Read-only transactions can use old versions

LSA-STM (cont.) Extension tries to increase the upper bound of the snapshot Check if all read versions are valid If yes, snapshot's upper bound = CT (now)

LSA-STM (cont.) Extension may increase the lower bound of the snapshot = largest lower bound among the validity ranges of accessed versions

LSA-STM (cont.) Read-only transactions can commit if snapshot is not No need to extend range to CT

LSA-STM (cont.) Update transactions create new versions of modied objects when commiting at CT Validity range of new objects starts at C T

LSA-STM (cont.) Upon commit, an update transaction tries to acquire a new, unique commit timestamp at CT Transaction can commit i the snapshot can be extended to C T - 1

LSA-STM (cont.) How to program?

LSA-STM (cont.) How to program?

LSA-STM (cont.) Performance evaluation Java implementation Sun Fire T2000 8 core UltraSparc T1 processor (8-core Niagara) SXM: visible reads ASTM: invisible reads, incremental validation LSA: time-based invisible reads

LSA-STM (cont.) Linked list: 256 elements

LSA-STM (cont.) RBT: 65536 elements

LSA-STM (cont.) Conclusions High performance and consistency Obstruction-free implementation in Java Weakest guarantee for a system At least one thread makes progress

Multiplayer gaming

Multiplayer gaming (cont.) Parallelization of SynQuake (Lupei et. al, Eurosys'10) Why? Scale the game server SynQuake Extracts data structures and features of Q3 Driven with synthetic workload (game actions, hot-spot scenarios) libtm (Lupei et al., Interact'09) C/C++ support

Multiplayer gaming (cont.) Areanode tree Binary space partitioning tree (each node = specic map region) Ecient searching for all entities that a player interacts with By recursively dividing the map into submaps (median on X and Y)

Multiplayer gaming (cont.) 8 cores 2 Xeon Quad-Core @ 3GHz 600 to 2000 players 1000 server frames on a 1024 x 1024 map areanode tree depth = 8

Multiplayer gaming (cont.) Default setting: 4 quests with low/medium/high contention overload

Multiplayer gaming (cont.) Scalability vs. processing time

Conclusions STMs are a viable alternative to locks Dierent avors: TL2, LSA SynQuake Easier to program than locks Better performance for higher degree of concurrency Higher overheads Integration with existing languages Support in hardware...?