Parallelizing SPECjbb2000 with Transactional Memory

Similar documents
Parallelizing SPECjbb2000 with Transactional Memory

Transactional Memory

The Common Case Transactional Behavior of Multithreaded Programs

Tradeoffs in Transactional Memory Virtualization

Towards Pervasive Parallelism

Early Release: Friend or Foe?

Work Report: Lessons learned on RTM

Chí Cao Minh 28 May 2008

DBT Tool. DBT Framework

Improving the Practicality of Transactional Memory

Programming with Transactional Memory

DMP Deterministic Shared Memory Multiprocessing

<Insert Picture Here>

Potential violations of Serializability: Example 1

Architectural Semantics for Practical Transactional Memory

ATLAS: A Chip-Multiprocessor. with Transactional Memory Support

Transactional Programming In A Multi-core Environment

Incrementally Parallelizing. Twofold Speedup on a Quad-Core. Thread-Level Speculation. A Case Study with BerkeleyDB. What Am I Working on Now?

Transactional Memory Coherence and Consistency

Implementing and Evaluating Nested Parallel Transactions in STM. Woongki Baek, Nathan Bronson, Christos Kozyrakis, Kunle Olukotun Stanford University

Summary: Open Questions:

Thread-level Parallelism for the Masses. Kunle Olukotun Computer Systems Lab Stanford University 2007

Scheduling Transactions in Replicated Distributed Transactional Memory

An Effective Hybrid Transactional Memory System with Strong Isolation Guarantees

Executing Java Programs with Transactional Memory

Outline. Exploiting Program Parallelism. The Hydra Approach. Data Speculation Support for a Chip Multiprocessor (Hydra CMP) HYDRA

Hardware Transactional Memory. Daniel Schwartz-Narbonne

Heuristics for Profile-driven Method- level Speculative Parallelization

EigenBench: A Simple Exploration Tool for Orthogonal TM Characteristics

Data Speculation Support for a Chip Multiprocessor Lance Hammond, Mark Willey, and Kunle Olukotun

Chapter 5. Multiprocessors and Thread-Level Parallelism

AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM

System Challenges and Opportunities for Transactional Memory

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation

Transactional Memory. Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

DESIGNING AN EFFECTIVE HYBRID TRANSACTIONAL MEMORY SYSTEM

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime

The Java Memory Model

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

Software Speculative Multithreading for Java

6 Transactional Memory. Robert Mullins

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

HydraVM: Mohamed M. Saad Mohamed Mohamedin, and Binoy Ravindran. Hot Topics in Parallelism (HotPar '12), Berkeley, CA

Lecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM

Lecture 20: Transactional Memory. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations

Lecture 8: Transactional Memory TCC. Topics: lazy implementation (TCC)

Java Memory Model. Jian Cao. Department of Electrical and Computer Engineering Rice University. Sep 22, 2016

Exploiting Hardware Transactional Memory for Efficient In-Memory Transaction Processing. Hao Qian, Zhaoguo Wang, Haibing Guan, Binyu Zang, Haibo Chen

Lecture: Transactional Memory. Topics: TM implementations

The Common Case Transactional Behavior of Multithreaded Programs

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Scalable Software Transactional Memory for Chapel High-Productivity Language

Transactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28

Beyond Sequential Consistency: Relaxed Memory Models

Flexible Architecture Research Machine (FARM)

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Hardware Transactional Memory on Haswell

Relaxing Concurrency Control in Transactional Memory. Utku Aydonat

Thread-Level Speculation on Off-the-Shelf Hardware Transactional Memory

Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability

CS5412: TRANSACTIONS (I)

Characterization of TCC on Chip-Multiprocessors

A Scalable, Non-blocking Approach to Transactional Memory

Log-Based Transactional Memory

ECE/CS 757: Homework 1

Understanding Hardware Transactional Memory

TEST REPORT. AUGUST 2006 SPECjbb2005 performance and power consumption on Intel Xeon 51xx processor-based servers

Deterministic Shared Memory Multiprocessing

The Java Memory Model

Lecture 17: Transactional Memories I

Transactifying Apache s Cache Module

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Indistinguishability: Friend and Foe of Concurrent Data Structures. Hagit Attiya CS, Technion

Dependence-Aware Transactional Memory for Increased Concurrency. Hany E. Ramadan, Christopher J. Rossbach, Emmett Witchel University of Texas, Austin

Atomicity via Source-to-Source Translation

Revisiting the Past 25 Years: Lessons for the Future. Guri Sohi University of Wisconsin-Madison

L20: Putting it together: Tree Search (Ch. 6)!

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Atomicity for Reliable Concurrent Software. Race Conditions. Motivations for Atomicity. Race Conditions. Race Conditions. 1. Beyond Race Conditions

Informatica Data Explorer Performance Tuning

Conflict Detection and Validation Strategies for Software Transactional Memory

Lecture 12: TM, Consistency Models. Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations

Workload Optimization on Hybrid Architectures

LIMITS OF ILP. B649 Parallel Architectures and Programming

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

(1) Measuring performance on multiprocessors using linear speedup instead of execution time is a good idea.

Goldibear and the 3 Locks. Programming With Locks Is Tricky. More Lock Madness. And To Make It Worse. Transactional Memory: The Big Idea

Rococo: Extract more concurrency from distributed transactions

An Automated Framework for Decomposing Memory Transactions to Exploit Partial Rollback

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Foundations of the C++ Concurrency Memory Model

Database Workload. from additional misses in this already memory-intensive databases? interference could be a problem) Key question:

Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017!

Parallelism and runtimes

HyPer-sonic Combined Transaction AND Query Processing

Transcription:

Parallelizing SPECjbb2000 with Transactional Memory JaeWoong Chung, Chi Cao Minh,, Brian D. Carlstrom, Christos Kozyrakis Picture comes here Computer Systems Lab Stanford University http://tcc.stanford.edu 1

The question we all share TM provides Speculative parallelism for sequential applications Coarse-grain synchronization for parallel applications How can TM help parallelize complex applications? Beyond basic data-structures Can we get 90% of performance at 10% of the effort? We parallelized SPECjbb2000 with transactions Irregular code from the enterprise domain 2

Contents SPECjbb2000 overview Methodology Transactional programming with Flat transaction Closed nesting Open nesting Other interesting ideas Conclusion 3

SPECjbb2000 overview (1) 3 tier enterprise system Client Tier Transaction Server Tier Database Tier Driver Threads Driver Threads Shared data Transaction Manager District newid (Integer) Warehouse Warehouse B-tree structure in database tier Shared variables in transaction server tier Shared warehouse ordertable (B-Tree) itemtable (B-Tree) stocktable (B-Tree) itemtable (B-Tree) stocktable (B-Tree) 4

SPECjbb2000 overview (2) TransactionManager::go() 5 types of e-commerce e transactions We worked on this loop. while (worktodo( worktodo) ) { switch( e-commerce e tx type ) { case new_order: case payment: case order_status: case delivery : case stock_level: } } 5

Methodology Execution-driven simulator Transactional Coherence and Consistency 8 PowerPC core 32K L1 and 256K L2 cache 16 bytes bus Java environment JikesRVM (JVM) GNU classpath (Java runtime library) synchronized blocks are removed. For SPECjbb2000, too 6

Flat transaction Speculative parallelism No analysis on potential races 1 transaction for 1 e-commerce e transaction Equivalent to having 1 global lock case new_order: atomic { // generate new order }; break; case payment: atomic { // make payment }; break; case order_status: atomic { // check order status }; break; case delivery : atomic { // make delivery }; break; case stock_level: atomic { // check stock }; break; 3.09x speedup over coarse-grain locking 62.7 % cycles lost due to violation 7

Analysis of violations Profiler provides us a violation report Violation sources JikesRVM,, GNU classpath Minor impact SPECjbb2000 New_order type takes almost 50% of all transactions. Case new_order: Shared Variable B-Tree // 1. initialize a new order e-commerce e TX // 2. assign a new order ID (newid( newid++) // 3. retrieve items/stocks from warehouse (itemtable( itemtable, stocktable) // 4. calculate the cost and update warehouse // 5. record the order for delivery (ordertable( ordertable) B-Tree // 6. display the processing result 8

Closed nesting (1) Child TX is merged to parent TX at commit. Reduction of violation penalty Parent RW-set <= Parent RW-set U Child RW-set Closed nesting doesn t t break the atomicity of original TX. Core 0 // A is initially 0; atomic {... atomic { A++; // 1. } } Core 1 A = ; = A; // 0 = A; // 1 9

Closed nesting (2) 2 closed nested transactions Case new_order: // 1. initialize a new order TX // 2. assign a new order ID (newid( newid++) // 3. retrieve items/stocks from warehouse (itemtable( itemtable, stocktable) // 4. calculate the cost and update warehouse // 5. record the order for delivery (ordertable( ordertable) // 6. display the result 47.9 % reduction in violation cycles 5.36x speedup 10

Open nesting (1) Child TX communicates to all the other TXes Child W-set W is broadcasted through system. Communication in the middle of a transaction Child R-set R is cleaned out. Elimination of violations Core 0 // A is initially 0; atomic {... open_atomic { A++; // 1. } } No conflict! Core 1 A = ; = A; // 1 A = ; 11

Open nesting (2) 1 open nested transaction Case new_order: // 1. initialize a new order // 2. assign a new order ID (newid( newid++) // 3. retrieve items/stocks from warehouse (itemtable( itemtable, stocktable) // 4. calculate the cost and update warehouse // 5. record the order for delivery (ordertable( ordertable) // 6. display the result newid++ 12 % reduction in the number of violation 4.96x speedup Compensation code for rollback Here rollback results in only a gap in newid. 12

Other interesting ideas Mixture of open/close nesting Advantages from both nested transactions Smaller flat transactions newid is incremeted in a separate flat transaction. In general, programmers should guarantee the correctness. Composability is a challenge. Early release For B-tree B structure See talk on Early Release: Friend or Foe? 13

Conclusion We parallelized SPECjbb2000 with transactions. Flat transaction for speculative parallelism A reasonable speedup is obtained. Closed nesting The violation penalty is reduced. Open nesting Violations are eliminated. Good speedup with small changes in source code A couple of nested transactions We are heading for a transactional benchmark suite. Realistic transactional applications 14

Questions? Whew~! Jae Woong Chung jwchung@stanford.edu Computer Systems Lab. Stanford University http://tcc.stanford.edu 15