System Challenges and Opportunities for Transactional Memory
|
|
- Eileen Stafford
- 5 years ago
- Views:
Transcription
1 System Challenges and Opportunities for Transactional Memory JaeWoong Chung Computer System Lab Stanford University
2 My thesis is about Computer system design that help leveraging hardware parallelism Transactional memory (TM) for easy parallel programming Contribution Challenges to building an efficient and practical TM system Opportunities to use TM beyond parallel programming 2
3 Multi Core Processors No more frequency race The era of multi cores Parallel programming is not easy Performance Split a sequential task into multiple sub tasks Pentium (1993) Pentium 4 (2000) Core Duo (2006) Year 3
4 Locking is hard to use Synchronize access to shared data Coarse-grain locking Easy to program The other task is blocked Fine-grain locking High concurrency Hard to use Dead lock, priority inversion, High locking overhead : Object : Reference Task1 Task Object reference graph (e.g. Java and C++) 4
5 Transactional Memory
6 Transactional Memory Atomic and isolated execution of instructions Atomicity : All or nothing Isolation : No intermediate results Programmer A transaction encloses instructions logically sequential execution of transactions TM system TX_Begin TX_Begin // Instructions // Instructions // for Task1 // for Task2 TX_End TX_End Transactions are executed in parallel without conflict If conflict, one of them is aborted and restarts 6
7 TM Example R R Tx1 Tx W 5 6 W R Tx 1 : R 13 Tx 2 : R 2 W W 5 6 Data versioning At TX_Begin,, save register values At write, save old memory values Conflict detection Read-set and write-set per transaction Conflict detection with set comparison 7
8 TM Benefits Logically sequential execution of transactions Optimistic concurrency control for parallel transaction execution No dead lock, priority inversion, and convoying TM system handles pathological cases Composability Error Recovery 8
9 TM System Design Many proposals in hardware and software Hardware acceleration for TM is crucial for performance HTM is 2 ~3 times faster than STM Correctness : strong isolation Hardware TM In the beginning register checkpoint At memory access Set read/write bits per cache line Buffer new values in cache or log old values Conflict detection With cache coherence protocol With transaction validation protocol 9
10 Hardware TM Example TM hardware Tx1 Core 1 ADDR : DATA : R : W 1 XXX XXX 10 3 XXX XXX 0 5 XXX XXX 01 L1 cache BUS Regs 0 Conflict 0 Load Load 1 5 Memory Tx2 Core 2 ADDR : DATA : R : W 0 L1 cache Regs TM programs R R Tx W 5 R Tx2 6 W 10 R
11 Challenges and Opportunities How to build efficient TM system tuned for common case? How to build practical TM system to deal with uncommon case? Can we use TM to support system software? Can we use TM to improve other important system metrics? 11
12 Contributions Challenges to building TM systems Common case behavior of parallel programs Extract architectural parameters for efficient TM system design TM virtualization Overcome the limitation of TM hardware Opportunity for system beyond parallel programming Multithreading for dynamic binary translation Guarantee correctness of DBT Support for reliability, security, and fast memory snapshot Improve important system metrics other than performance 12
13 Outline Software parallelization : a major issue for performance Transactional memory Challenges to building TM systems Common case behavior of parallel programs TM virtualization Opportunities for systems beyond parallel programming Multithreading for dynamic binary translation Support for reliability, security, and fast memory snapshot Conclusion 13
14 Challenges to Building TM Systems
15 Goal Challenge 1 : Common Case Behavior of Parallel Programs Understand the common case behavior of TM programs Few TM programs available More TM programs now but for research purpose Few efficient TM systems as development tool chicken & egg problem 15
16 Inferring Transactions in Multithread Programs Analyze existing parallel programs Assumption : the inherent parallelism remains regardless of programming tools Mapping programming primitives to transactions Programming primitive Lock/Unlock Parallel_For Transaction primitive Begin/End Begin/End 16
17 Parallel Applications Different domains and different language Languages Java Pthread ANL OpenMP Applications MolDyn, MonteCarlo, RayTracer,, Crypt, LUFact, Series, SOR, SparseMatmult,, SPECjbb2000, PMD, HSQLDB Apache, Kingate,, Bp-vision, Localize, Ultra Tic Tac Toe, MPEG2, AOL Server Barnes, Mp3d, Ocean, Radix, FMM, Cholesky, Radiosity,, FFT, Volrend,, Water-N2, Water-Spatial APPLU, Equake,, Art, Swim, CG, BT, IS 17
18 Key Metrics of TM Programs Measure the key metrics of TM programs Use the metrics to make suggestions for TM designs Key metrics Transaction length Read-/write-set size Write-set to length ratio Frequency of nesting & I/O in transactions Architectural parameters TM primitive overhead Buffer size Transaction commit/abort overhead Support for nesting and system calls 18
19 Transaction Length Number of instructions executed in transaction Length in Instructions Application Avg 50th % 95th % Max Java average Pthreads average ANL average Observation : Up to 95% of transactions < 5000 instructions Suggestion : Light-weight transactional primitives Observation : Rare but long transactions Suggesion : Transaction over context-switching 19
20 Read-/Write /Write-Set Size Bytes of data read/written by transaction Read Set Size in Kbytes th ANL Java Pthreads 80th Write Set Size in Kbytes th ANL Java Pthreads 80th Percentile of Transactions Percentile of Transactions Observation : 98% transactions <16KB read-set and <6KB write set Suggestion : 32K L1 cache is sufficient Observation : Few very large transaction > 32K Suggestion : Need for buffer space virtualization 20
21 Outline Software parallelization : a major issue for performance Transactional memory Challenges to building TM systems Common case behavior of parallel programs TM virtualization Opportunities for systems beyond parallel programming Multithreading for dynamic binary translation Support for reliability, security, and fast memory snapshot Conclusion 21
22 Problem Challenge 2 : TM Virtualization Limited hardware resources tuned for common cases E.g. buffer size for 99% transactions How do we cover uncommon cases as well? Cache as buffer for transactional data What if cache capacity is exhausted? => space virtualization What if a transaction is interrupted? Time virtualization What if transactions are nested deeply? Depth virtualization 22
23 Goals XTM: extended TM Virtualize TM space, time, and depth at low HW cost Completely transparent to user SW Minimize interference with coexisting HW transactions Assumption Overflows, interrupts, and deep nesting are rare Approach Transactional data and metadata in virtual memory Using virtual memory support in OS Data versioning & conflict detection at page granularity Similar to page-based software DSM systems 23
24 XTM Overview Basic operation On HTM overflow, rollback and restart in SW mode At the first access, create a copy of original (master) page Change the address mapping to the copy (private page) Transactional data in private page, committed data in master page At commit, make the private page the new master page All orchestrated by the operating system (no HW) Conflict detection Use TLB shoot-downs to gain exclusive page access Hardware requirement Overflow exception 24
25 XTM Example Timeline Page table pointer Master page table Master page HTM Overflow Xtn Write Xtn Read Commit Per-Tx Page table For Level 0 XTM Metadata Private Copy R W 25
26 Hardware Acceleration XTM-g Gradual page-by by-page switching Reduce the switch overhead from hardware mode to software mode A portion of transactional data in private pages, the rest in the cache Hardware requirement : OV(overflow) ) bit in page table XTM-e Additional buffer for overflowed read/write bits Reduce false conflict at page granularity Hardware requirement : Eviction buffer 26
27 Time Virtualization Interrupt and context-switch procedure Interrupt Naïve Approach Can wait? No Young transction? No Switch to software mode Yes Yes Wait for short transaction to finish Abort young transaction Rare case 27
28 Performance Analysis Normalized Execution Time Versioning Validation Commit Violations Idle Useful 0 XTM XTM-g XTM-e VTM XTM XTM-g XTM-e VTM XTM XTM-g XTM-e VTM XTM XTM-g XTM-e VTM XTM XTM-g XTM-e VTM XTM XTM-g XTM-e VTM tomcatv [37.7%] volrend [0.01%] radix [0.26%] mic ro- P10 [39.2%] mic ro- P20 [60.3%] mic ro- P30 [60.8%] XTM causes no cost for applications without overflow XTM-g g presents a good cost/performance tradeoff point 20% faster to 50% slower than a fully-hardware solution 28
29 Outline Software parallelization : a major issue for performance Transactional memory Challenges to building TM systems Common case behavior of parallel programs TM virtualization Opportunities for systems beyond parallel programming Multithreading for dynamic binary translation Support for reliability, security, and fast memory snapshot Conclusion 29
30 Opportunities for Systems beyond Parallel Programming
31 DBT Opportunity 1 : Dynamic Binary Translation Binary code is translated in run-time PIN, Valgrind, DynamoRIO, StarDBT,, etc Original Binary DBT Tool DBT Framework Translated Binary DBT use cases Translation on new target architecture JIT optimizations in virtual machines Binary instrumentation Profiling, security, debugging, 31
32 Example: Dynamic Information Flow Tracking (DIFT) t = XX ; // untrusted data from network taint(t) = 1;. Variables swap t, u1; swap taint(t), taint(u1); Taint bits u2 = u1; taint(u2) = taint(u1); t u1 u2 Track untrusted data A taint bit per memory byte Security policy uses the taint bit. E.g. no syscall with untrusted data 32
33 Problem : DBT with Parallel Program Thread 1 swap t, u1; swap taint(t), taint(u1); Thread2 u2 = u1; taint(u2) = taint(u1); t u1 u2 Variables Taint bits XX 1 XX Security Breach!! Atomicity between original and instrumented instructions for correctness 33
34 How to Guarantee Atomicity? Easy but unsatisfactory solutions No multithreaded programs (StarDBT( StarDBT) Serialization (Valgrind( Valgrind) Hard solution : Locking Idea : Enclose original and instrumented instruction with lock Fine-grained locks locking overhead, convoying, limited scope of DBT optimizations Coarse-grained locks performance degradation Lock nesting between app & DBT locks potential deadlock Tool developers should be feature + multithreading experts 34
35 Idea Transactional Memory for Correctness of DBT Original and instrumented instructions in a transaction Thread 1 TX_Begin swap t, u1; swap taint(t), taint(u1); TX_End Thread2 TX_Begin u2 = u1; taint(u2) = taint(u1); TX_End Advantages Atomic execution High performance through optimistic concurrency Support for nested transactions 35
36 Granularity of Transaction Per instruction : short Instrumentation High overhead of executing TX_Begin and TX_End Limited scope for DBT optimizations Per basic block : long Amortizing the TX_Begin and TX_End overhead Easy to match TX_Begin and TX_End Per trace : longer Further amortization of the overhead Potentially high transaction conflict Profile-based sizing : dynamic Optimize transaction size based on transaction abort ratio 36
37 Baseline Performance Results Normalized Overhead (% ) 60% 50% 40% 30% 20% 10% 0% Barnes Equake Fmm Radiosity Radix Swim Tomcatv Water Waterspatial 8 CPUs Software TM and DIFT on PIN 41 % overhead on the average Transaction at the DBT trace granularity 37
38 Hardware Acceleration Normailized Overhead (%) 60% 50% 40% 30% 20% 10% 0% B arnes E quake F MMR adios ityr adix S wim Tomc atvwater Waterspatial SW Only Register Checkpoint Reg. + HW Signature Full HW Overhead reduction 28% with register checkpoint 12% with register checkpoint + hardware signature 6% with full hardware TM 38
39 Outline Software parallelization : a major issue for performance Transactional memory Challenges to building TM systems Common case behavior of parallel programs TM virtualization Opportunities for systems beyond parallel programming Multithreading for dynamic binary translation Support for reliability, security, and fast memory snapshot Conclusion 39
40 Opportunity 2 : Improving Other System Metrics TM hardware consists of Fine-grain data versioning HW Fine-grain access tracking HW Fast exception handlers Can use such HW for other purposes Reliability, Security, The benefits for SW Finer granularity (compared to VM-based approach) User-level event handling (compared to VM-based approach) No instrumentation overhead (compared to DBT-based approach) Simplified code (compared to DBT-based approach) 40
41 Outline for TM Hardware Application Reliability Global & local checkpoints (data versioning) Security Fine-grain read/write barriers (address tracking) Isolated execution (data versioning) Memory snapshot (data versioning) Concurrent garbage collector Dynamic memory profiler 41
42 Memory Snapshot Snapshot Read-only image Multiple regions Shared by multiple threads mutator Read-only Snapshot Memory collector Applications Service threads that analyze memory in parallel with app threads Garbage collection, memory profiling (heap & stack), mutator collector < Garbage Collection > 42
43 TM Hardware Snapshot Feature correspondence TM metadata track data written since snapshot TM versioning storage for progressive snapshot Including virtualization mechanism TM conflict detection catch errors Writes to read-only snapshot Differences & additions Data versioning for single thread Vs. multiple thread Table to record snapshot regions Resulting snapshot system Fast : O(# CPUs) Scan (create) and O(1) write/read Small memory footprint : O(# memory locations written) 43
44 GC Overhead Parallel GC: stop app threads & run GC threads 20% to 30% overhead for memory intensive apps Snapshot GC GC is essentially free Fast : Stop app, take snapshot, then run GC & app concurrently Simple : +100 lines over parallel GC by Boehm Fundamentally simpler than any other concurrent GC 44
45 Conclusion Challenges to building TM systems Common case behavior of parallel programs Extract architectural parameters for efficient TM system design TM virtualization Overcome the limitation of TM hardware Opportunity for system beyond parallel programming Multithreading for dynamic binary translation Fix correctness issue for DBT Support for reliability, security, and fast memory snapshot Improve important system metrics other than performance 45
46 Acknowledgement KyungHae,, wife Parents, brother, in-laws Prof. Kozyrakis,, advisor Prof. Olukotun,, associate advisor Prof. Garcia-Molina Prof. Saraswat TCC group mates and research colleagues Korean mafia 46
The Common Case Transactional Behavior of Multithreaded Programs
The Common Case Transactional Behavior of Multithreaded Programs JaeWoong Chung Hassan Chafi,, Chi Cao Minh, Austen McDonald, Brian D. Carlstrom, Christos Kozyrakis, Kunle Olukotun Computer Systems Lab
More informationDBT Tool. DBT Framework
Thread-Safe Dynamic Binary Translation using Transactional Memory JaeWoong Chung,, Michael Dalton, Hari Kannan, Christos Kozyrakis Computer Systems Laboratory Stanford University http://csl.stanford.edu
More informationTradeoffs in Transactional Memory Virtualization
Tradeoffs in Transactional Memory Virtualization JaeWoong Chung Chi Cao Minh, Austen McDonald, Travis Skare, Hassan Chafi,, Brian D. Carlstrom, Christos Kozyrakis, Kunle Olukotun Computer Systems Lab Stanford
More informationThread-Safe Dynamic Binary Translation using Transactional Memory
Thread-Safe Dynamic Binary Translation using Transactional Memory JaeWoong Chung, Michael Dalton, Hari Kannan, Christos Kozyrakis Computer Systems Laboratory Stanford University {jwchung, mwdalton, hkannan,
More informationThe Common Case Transactional Behavior of Multithreaded Programs
The Common Case Transactional Behavior of Multithreaded Programs JaeWoong Chung, Hassan Chafi, Chi Cao Minh, Austen McDonald, Brian D. Carlstrom Christos Kozyrakis, Kunle Olukotun Computer Systems Laboratory
More informationSYSTEM CHALLENGES AND OPPORTUNITIES FOR TRANSACTIONAL MEMORY
SYSTEM CHALLENGES AND OPPORTUNITIES FOR TRANSACTIONAL MEMORY A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL
More informationMaking the Fast Case Common and the Uncommon Case Simple in Unbounded Transactional Memory
Making the Fast Case Common and the Uncommon Case Simple in Unbounded Transactional Memory Colin Blundell (University of Pennsylvania) Joe Devietti (University of Pennsylvania) E Christopher Lewis (VMware,
More informationLecture 20: Transactional Memory. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 20: Transactional Memory Parallel Computer Architecture and Programming Slide credit Many of the slides in today s talk are borrowed from Professor Christos Kozyrakis (Stanford University) Raising
More informationChí Cao Minh 28 May 2008
Chí Cao Minh 28 May 2008 Uniprocessor systems hitting limits Design complexity overwhelming Power consumption increasing dramatically Instruction-level parallelism exhausted Solution is multiprocessor
More informationLogTM: Log-Based Transactional Memory
LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood 12th International Symposium on High Performance Computer Architecture () 26 Mulitfacet
More informationATLAS: A Chip-Multiprocessor. with Transactional Memory Support
ATLAS: A Chip-Multiprocessor with Transactional Memory Support Njuguna Njoroge, Jared Casper, Sewook Wee, Yuriy Teslyar, Daxia Ge, Christos Kozyrakis, and Kunle Olukotun Transactional Coherence and Consistency
More informationTransactional Memory. Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech
Transactional Memory Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech (Adapted from Stanford TCC group and MIT SuperTech Group) Motivation Uniprocessor Systems Frequency
More informationTransactional Memory
Transactional Memory Architectural Support for Practical Parallel Programming The TCC Research Group Computer Systems Lab Stanford University http://tcc.stanford.edu TCC Overview - January 2007 The Era
More informationImproving the Practicality of Transactional Memory
Improving the Practicality of Transactional Memory Woongki Baek Electrical Engineering Stanford University Programming Multiprocessors Multiprocessor systems are now everywhere From embedded to datacenter
More informationLog-Based Transactional Memory
Log-Based Transactional Memory Kevin E. Moore University of Wisconsin-Madison Motivation Chip-multiprocessors/Multi-core/Many-core are here Intel has 1 projects in the works that contain four or more computing
More informationTransactional Memory. Lecture 19: Parallel Computer Architecture and Programming CMU /15-618, Spring 2015
Lecture 19: Transactional Memory Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of the slides in today s talk are borrowed from Professor Christos Kozyrakis
More informationTransactional Memory Coherence and Consistency
Transactional emory Coherence and Consistency all transactions, all the time Lance Hammond, Vicky Wong, ike Chen, rian D. Carlstrom, ohn D. Davis, en Hertzberg, anohar K. Prabhu, Honggo Wijaya, Christos
More information6 Transactional Memory. Robert Mullins
6 Transactional Memory ( MPhil Chip Multiprocessors (ACS Robert Mullins Overview Limitations of lock-based programming Transactional memory Programming with TM ( STM ) Software TM ( HTM ) Hardware TM 2
More informationAn Effective Hybrid Transactional Memory System with Strong Isolation Guarantees
An Effective Hybrid Transactional Memory System with Strong Isolation Guarantees Chi Cao Minh, Martin Trautmann, JaeWoong Chung, Austen McDonald, Nathan Bronson, Jared Casper, Christos Kozyrakis, Kunle
More informationTransactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28
Transactional Memory or How to do multiple things at once Benjamin Engel Transactional Memory 1 / 28 Transactional Memory: Architectural Support for Lock-Free Data Structures M. Herlihy, J. Eliot, and
More informationReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors
ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors Milos Prvulovic, Zheng Zhang*, Josep Torrellas University of Illinois at Urbana-Champaign *Hewlett-Packard
More informationOutline. Database Tuning. Ideal Transaction. Concurrency Tuning Goals. Concurrency Tuning. Nikolaus Augsten. Lock Tuning. Unit 8 WS 2013/2014
Outline Database Tuning Nikolaus Augsten University of Salzburg Department of Computer Science Database Group 1 Unit 8 WS 2013/2014 Adapted from Database Tuning by Dennis Shasha and Philippe Bonnet. Nikolaus
More informationParallelizing SPECjbb2000 with Transactional Memory
Parallelizing SPECjbb2000 with Transactional Memory JaeWoong Chung, Chi Cao Minh,, Brian D. Carlstrom, Christos Kozyrakis Picture comes here Computer Systems Lab Stanford University http://tcc.stanford.edu
More informationExecuting Java Programs with Transactional Memory
Executing Java Programs with Transactional Memory Brian D. Carlstrom, JaeWoong Chung, Hassan Chafi, Austen McDonald, Chi Cao Minh, Lance Hammond, Christos Kozyrakis, Kunle Olukotun Computer Systems Laboratory,
More informationConcurrent Preliminaries
Concurrent Preliminaries Sagi Katorza Tel Aviv University 09/12/2014 1 Outline Hardware infrastructure Hardware primitives Mutual exclusion Work sharing and termination detection Concurrent data structures
More informationFlexible Architecture Research Machine (FARM)
Flexible Architecture Research Machine (FARM) RAMP Retreat June 25, 2009 Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan Bronson Christos Kozyrakis, Kunle Olukotun Motivation Why CPUs + FPGAs make sense
More informationTransactional Memory. Lecture 18: Parallel Computer Architecture and Programming CMU /15-618, Spring 2017
Lecture 18: Transactional Memory Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2017 Credit: many slides in today s talk are borrowed from Professor Christos Kozyrakis (Stanford
More informationLecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM
Lecture 6: Lazy Transactional Memory Topics: TM semantics and implementation details of lazy TM 1 Transactions Access to shared variables is encapsulated within transactions the system gives the illusion
More informationSpeculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution
Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution Ravi Rajwar and Jim Goodman University of Wisconsin-Madison International Symposium on Microarchitecture, Dec. 2001 Funding
More informationSpeculative Synchronization: Applying Thread Level Speculation to Parallel Applications. University of Illinois
Speculative Synchronization: Applying Thread Level Speculation to Parallel Applications José éf. Martínez * and Josep Torrellas University of Illinois ASPLOS 2002 * Now at Cornell University Overview Allow
More informationArchitectural Support for Software Transactional Memory
Architectural Support for Software Transactional Memory Bratin Saha, Ali-Reza Adl-Tabatabai, Quinn Jacobson Microprocessor Technology Lab, Intel Corporation bratin.saha, ali-reza.adl-tabatabai, quinn.a.jacobson@intel.com
More informationMcRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime
McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime B. Saha, A-R. Adl- Tabatabai, R. Hudson, C.C. Minh, B. Hertzberg PPoPP 2006 Introductory TM Sales Pitch Two legs
More informationSpeculative Synchronization
Speculative Synchronization José F. Martínez Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/martinez Problem 1: Conservative Parallelization No parallelization
More informationAtomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence?
CPSC-4/6: Operang Systems Atomic Transactions The Transaction Model / Primitives Serializability Implementation Serialization Graphs 2-Phase Locking Optimistic Concurrency Control Transactional Memory
More informationByteSTM: Java Software Transactional Memory at the Virtual Machine Level
ByteSTM: Java Software Transactional Memory at the Virtual Machine Level Mohamed Mohamedin Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment
More informationEliminating Global Interpreter Locks in Ruby through Hardware Transactional Memory
Eliminating Global Interpreter Locks in Ruby through Hardware Transactional Memory Rei Odaira, Jose G. Castanos and Hisanobu Tomari IBM Research and University of Tokyo April 8, 2014 Rei Odaira, Jose G.
More informationEarly Release: Friend or Foe?
Early Release: Friend or Foe? Travis Skare Christos Kozyrakis WTW 2006 Computer Systems Lab Stanford University http://tcc.stanford.edu tcc.stanford.edu Motivation for Early Release Consider multiple transactions
More informationReminder from last time
Concurrent systems Lecture 7: Crash recovery, lock-free programming, and transactional memory DrRobert N. M. Watson 1 Reminder from last time History graphs; good (and bad) schedules Isolation vs. strict
More informationDatabase Management and Tuning
Database Management and Tuning Concurrency Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 8 May 10, 2012 Acknowledgements: The slides are provided by Nikolaus
More informationSoftware-Controlled Multithreading Using Informing Memory Operations
Software-Controlled Multithreading Using Informing Memory Operations Todd C. Mowry Computer Science Department University Sherwyn R. Ramkissoon Department of Electrical & Computer Engineering University
More informationHeuristics for Profile-driven Method- level Speculative Parallelization
Heuristics for Profile-driven Method- level John Whaley and Christos Kozyrakis Stanford University Speculative Multithreading Speculatively parallelize an application Uses speculation to overcome ambiguous
More informationDeterministic Replay and Data Race Detection for Multithreaded Programs
Deterministic Replay and Data Race Detection for Multithreaded Programs Dongyoon Lee Computer Science Department - 1 - The Shift to Multicore Systems 100+ cores Desktop/Server 8+ cores Smartphones 2+ cores
More informationPotential violations of Serializability: Example 1
CSCE 6610:Advanced Computer Architecture Review New Amdahl s law A possible idea for a term project Explore my idea about changing frequency based on serial fraction to maintain fixed energy or keep same
More informationLecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation
Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, lazy implementation 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end
More informationA Scalable, Non-blocking Approach to Transactional Memory
A Scalable, Non-blocking Approach to Transactional Memory Hassan Chafi Jared Casper Brian D. Carlstrom Austen McDonald Chi Cao Minh Woongki Baek Christos Kozyrakis Kunle Olukotun Computer Systems Laboratory
More informationWork Report: Lessons learned on RTM
Work Report: Lessons learned on RTM Sylvain Genevès IPADS September 5, 2013 Sylvain Genevès Transactionnal Memory in commodity hardware 1 / 25 Topic Context Intel launches Restricted Transactional Memory
More informationHydraVM: Mohamed M. Saad Mohamed Mohamedin, and Binoy Ravindran. Hot Topics in Parallelism (HotPar '12), Berkeley, CA
HydraVM: Mohamed M. Saad Mohamed Mohamedin, and Binoy Ravindran Hot Topics in Parallelism (HotPar '12), Berkeley, CA Motivation & Objectives Background Architecture Program Reconstruction Implementation
More informationVulcan: Hardware Support for Detecting Sequential Consistency Violations Dynamically
Vulcan: Hardware Support for Detecting Sequential Consistency Violations Dynamically Abdullah Muzahid, Shanxiang Qi, and University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu MICRO December
More informationEigenBench: A Simple Exploration Tool for Orthogonal TM Characteristics
EigenBench: A Simple Exploration Tool for Orthogonal TM Characteristics Pervasive Parallelism Laboratory, Stanford University Sungpack Hong Tayo Oguntebi Jared Casper Nathan Bronson Christos Kozyrakis
More informationLegato: Bounded Region Serializability Using Commodity Hardware Transactional Memory. Aritra Sengupta Man Cao Michael D. Bond and Milind Kulkarni
Legato: Bounded Region Serializability Using Commodity Hardware al Memory Aritra Sengupta Man Cao Michael D. Bond and Milind Kulkarni 1 Programming Language Semantics? Data Races C++ no guarantee of semantics
More informationScheduling Transactions in Replicated Distributed Transactional Memory
Scheduling Transactions in Replicated Distributed Transactional Memory Junwhan Kim and Binoy Ravindran Virginia Tech USA {junwhan,binoy}@vt.edu CCGrid 2013 Concurrency control on chip multiprocessors significantly
More informationRelaxing Concurrency Control in Transactional Memory. Utku Aydonat
Relaxing Concurrency Control in Transactional Memory by Utku Aydonat A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of The Edward S. Rogers
More informationAgenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based?
Agenda Designing Transactional Memory Systems Part III: Lock-based STMs Pascal Felber University of Neuchatel Pascal.Felber@unine.ch Part I: Introduction Part II: Obstruction-free STMs Part III: Lock-based
More informationLecture: Transactional Memory. Topics: TM implementations
Lecture: Transactional Memory Topics: TM implementations 1 Summary of TM Benefits As easy to program as coarse-grain locks Performance similar to fine-grain locks Avoids deadlock 2 Design Space Data Versioning
More informationNikos Anastopoulos, Konstantinos Nikas, Georgios Goumas and Nectarios Koziris
Early Experiences on Accelerating Dijkstra s Algorithm Using Transactional Memory Nikos Anastopoulos, Konstantinos Nikas, Georgios Goumas and Nectarios Koziris Computing Systems Laboratory School of Electrical
More informationDecoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor
1 Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor Hari Kannan, Michael Dalton, Christos Kozyrakis Presenter: Yue Zheng Yulin Shi Outline Motivation & Background Hardware DIFT
More informationHardware Transactional Memory. Daniel Schwartz-Narbonne
Hardware Transactional Memory Daniel Schwartz-Narbonne Hardware Transactional Memories Hybrid Transactional Memories Case Study: Sun Rock Clever ways to use TM Recap: Parallel Programming 1. Find independent
More informationMultithreaded Processors. Department of Electrical Engineering Stanford University
Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread
More informationEvolution of Virtual Machine Technologies for Portability and Application Capture. Bob Vandette Java Hotspot VM Engineering Sept 2004
Evolution of Virtual Machine Technologies for Portability and Application Capture Bob Vandette Java Hotspot VM Engineering Sept 2004 Topics Virtual Machine Evolution Timeline & Products Trends forcing
More informationSoftware Speculative Multithreading for Java
Software Speculative Multithreading for Java Christopher J.F. Pickett and Clark Verbrugge School of Computer Science, McGill University {cpicke,clump}@sable.mcgill.ca Allan Kielstra IBM Toronto Lab kielstra@ca.ibm.com
More informationA new Mono GC. Paolo Molaro October 25, 2006
A new Mono GC Paolo Molaro lupus@novell.com October 25, 2006 Current GC: why Boehm Ported to the major architectures and systems Featurefull Very easy to integrate Handles managed pointers in unmanaged
More informationImplementing and Evaluating Nested Parallel Transactions in STM. Woongki Baek, Nathan Bronson, Christos Kozyrakis, Kunle Olukotun Stanford University
Implementing and Evaluating Nested Parallel Transactions in STM Woongki Baek, Nathan Bronson, Christos Kozyrakis, Kunle Olukotun Stanford University Introduction // Parallelize the outer loop for(i=0;i
More informationChapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!
Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and
More informationPOSIX Threads: a first step toward parallel programming. George Bosilca
POSIX Threads: a first step toward parallel programming George Bosilca bosilca@icl.utk.edu Process vs. Thread A process is a collection of virtual memory space, code, data, and system resources. A thread
More informationLecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations
Lecture 21: Transactional Memory Topics: Hardware TM basics, different implementations 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end locks are blocking,
More informationLecture 8: Transactional Memory TCC. Topics: lazy implementation (TCC)
Lecture 8: Transactional Memory TCC Topics: lazy implementation (TCC) 1 Other Issues Nesting: when one transaction calls another flat nesting: collapse all nested transactions into one large transaction
More informationLast 2 Classes: Introduction to Operating Systems & C++ tutorial. Today: OS and Computer Architecture
Last 2 Classes: Introduction to Operating Systems & C++ tutorial User apps OS Virtual machine interface hardware physical machine interface An operating system is the interface between the user and the
More informationGoldibear and the 3 Locks. Programming With Locks Is Tricky. More Lock Madness. And To Make It Worse. Transactional Memory: The Big Idea
Programming With Locks s Tricky Multicore processors are the way of the foreseeable future thread-level parallelism anointed as parallelism model of choice Just one problem Writing lock-based multi-threaded
More informationIntroduction to Parallel Computing
Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen
More informationUnderstanding Hardware Transactional Memory
Understanding Hardware Transactional Memory Gil Tene, CTO & co-founder, Azul Systems @giltene 2015 Azul Systems, Inc. Agenda Brief introduction What is Hardware Transactional Memory (HTM)? Cache coherence
More informationParallelizing SPECjbb2000 with Transactional Memory
Parallelizing SPECjbb2000 with Transactional Memory JaeWoong Chung, Chi Cao Minh, Brian D. Carlstrom, Christos Kozyrakis Computer Systems Laboratory Stanford University {jwchung, caominh, bdc, kozyraki}@stanford.edu
More informationOS-caused Long JVM Pauses - Deep Dive and Solutions
OS-caused Long JVM Pauses - Deep Dive and Solutions Zhenyun Zhuang LinkedIn Corp., Mountain View, California, USA https://www.linkedin.com/in/zhenyun Zhenyun@gmail.com 2016-4-21 Outline q Introduction
More informationThread-level Parallelism for the Masses. Kunle Olukotun Computer Systems Lab Stanford University 2007
Thread-level Parallelism for the Masses Kunle Olukotun Computer Systems Lab Stanford University 2007 The World has Changed Process Technology Stops Improving! Moore s law but! Transistors don t get faster
More informationLecture 12 Transactional Memory
CSCI-UA.0480-010 Special Topics: Multicore Programming Lecture 12 Transactional Memory Christopher Mitchell, Ph.D. cmitchell@cs.nyu.edu http://z80.me Database Background Databases have successfully exploited
More informationTransactional Memory. Yaohua Li and Siming Chen. Yaohua Li and Siming Chen Transactional Memory 1 / 41
Transactional Memory Yaohua Li and Siming Chen Yaohua Li and Siming Chen Transactional Memory 1 / 41 Background Processor hits physical limit on transistor density Cannot simply put more transistors to
More informationMulti-core Architecture and Programming
Multi-core Architecture and Programming Yang Quansheng( 杨全胜 ) http://www.njyangqs.com School of Computer Science & Engineering 1 http://www.njyangqs.com Process, threads and Parallel Programming Content
More informationHybrid Static-Dynamic Analysis for Statically Bounded Region Serializability
Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Programming
More informationTowards Parallel, Scalable VM Services
Towards Parallel, Scalable VM Services Kathryn S McKinley The University of Texas at Austin Kathryn McKinley Towards Parallel, Scalable VM Services 1 20 th Century Simplistic Hardware View Faster Processors
More informationCMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 12: Multi-Core Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 4 " Due: 11:49pm, Saturday " Two late days with penalty! Exam I " Grades out on
More informationZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS
ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200
More informationTransactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93
Transactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93 What are lock-free data structures A shared data structure is lock-free if its operations
More informationZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS
ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200
More informationShared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network
Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache
More informationComparing Memory Systems for Chip Multiprocessors
Comparing Memory Systems for Chip Multiprocessors Jacob Leverich Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, Christos Kozyrakis Computer Systems Laboratory Stanford University
More informationTo Everyone... iii To Educators... v To Students... vi Acknowledgments... vii Final Words... ix References... x. 1 ADialogueontheBook 1
Contents To Everyone.............................. iii To Educators.............................. v To Students............................... vi Acknowledgments........................... vii Final Words..............................
More informationFlexBulk: Intelligently Forming Atomic Blocks in Blocked-Execution Multiprocessors to Minimize Squashes
FlexBulk: Intelligently Forming Atomic Blocks in Blocked-Execution Multiprocessors to Minimize Squashes Rishi Agarwal and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu
More informationLock vs. Lock-free Memory Project proposal
Lock vs. Lock-free Memory Project proposal Fahad Alduraibi Aws Ahmad Eman Elrifaei Electrical and Computer Engineering Southern Illinois University 1. Introduction The CPU performance development history
More informationAtomicity via Source-to-Source Translation
Atomicity via Source-to-Source Translation Benjamin Hindman Dan Grossman University of Washington 22 October 2006 Atomic An easier-to-use and harder-to-implement primitive void deposit(int x){ synchronized(this){
More informationChapter 20: Database System Architectures
Chapter 20: Database System Architectures Chapter 20: Database System Architectures Centralized and Client-Server Systems Server System Architectures Parallel Systems Distributed Systems Network Types
More informationDESIGNING AN EFFECTIVE HYBRID TRANSACTIONAL MEMORY SYSTEM
DESIGNING AN EFFECTIVE HYBRID TRANSACTIONAL MEMORY SYSTEM A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT
More informationDHTM: Durable Hardware Transactional Memory
DHTM: Durable Hardware Transactional Memory Arpit Joshi, Vijay Nagarajan, Marcelo Cintra, Stratis Viglas ISCA 2018 is here!2 is here!2 Systems LLC!3 Systems - Non-volatility over the memory bus - Load/Store
More informationLight64: Ligh support for data ra. Darko Marinov, Josep Torrellas. a.cs.uiuc.edu
: Ligh htweight hardware support for data ra ce detection ec during systematic testing Adrian Nistor, Darko Marinov, Josep Torrellas University of Illinois, Urbana Champaign http://iacoma a.cs.uiuc.edu
More informationSummary: Open Questions:
Summary: The paper proposes an new parallelization technique, which provides dynamic runtime parallelization of loops from binary single-thread programs with minimal architectural change. The realization
More informationXen and the Art of Virtualization. CSE-291 (Cloud Computing) Fall 2016
Xen and the Art of Virtualization CSE-291 (Cloud Computing) Fall 2016 Why Virtualization? Share resources among many uses Allow heterogeneity in environments Allow differences in host and guest Provide
More informationFall 2012 Parallel Computer Architecture Lecture 16: Speculation II. Prof. Onur Mutlu Carnegie Mellon University 10/12/2012
18-742 Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II Prof. Onur Mutlu Carnegie Mellon University 10/12/2012 Past Due: Review Assignments Was Due: Tuesday, October 9, 11:59pm. Sohi
More informationHardware Transactional Memory Architecture and Emulation
Hardware Transactional Memory Architecture and Emulation Dr. Peng Liu 刘鹏 liupeng@zju.edu.cn Media Processor Lab Dept. of Information Science and Electronic Engineering Zhejiang University Hangzhou, 310027,P.R.China
More informationMartin Kruliš, v
Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal
More informationArchitectural Semantics for Practical Transactional Memory
Architectural Semantics for Practical Transactional Memory Austen McDonald, JaeWoong Chung, Brian D. Carlstrom, Chi Cao Minh, Hassan Chafi, Christos Kozyrakis and Kunle Olukotun Computer Systems Laboratory
More informationUsing Software Transactional Memory In Interrupt-Driven Systems
Using Software Transactional Memory In Interrupt-Driven Systems Department of Mathematics, Statistics, and Computer Science Marquette University Thesis Defense Introduction Thesis Statement Software transactional
More informationRuntime Monitoring on Multicores via OASES
Runtime Monitoring on Multicores via OASES Vijay Nagarajan and Rajiv Gupta University of California, CSE Department, Riverside, CA 92521 {vijay,gupta}@cs.ucr.edu Abstract Runtime monitoring support serves
More information