Hardware Transactional Memory Architecture and Emulation
|
|
- Nathaniel Davidson
- 5 years ago
- Views:
Transcription
1 Hardware Transactional Memory Architecture and Emulation Dr. Peng Liu 刘鹏 Media Processor Lab Dept. of Information Science and Electronic Engineering Zhejiang University Hangzhou, ,P.R.China
2 Outline Motivation Introduce basic TM concepts & interfaces TM implementation tradeoffs Discuss opportunities beyond parallelism Related work
3 Motivation: The Parallel Programming Crisis Multi-core chips, inflection point for SW development Scalable performance now requires parallel programming Parallel programming up until now Limited to people with access to large parallel systems Using low-level concurrency features in languages Thin veneer over underlying hardware Too cumbersome for mainstream software developers Difficult to write, debug, maintain and even get some speedup We need better concurrency abstractions Goal = easy to use + good performance 90% of the speedup with 10% of the effort
4 Parallel Programming is Hard Thread level parallelism is great until we want to share data Defining & implementing synchronization Races, deadlock avoidance, memory model issues Fundamentally, it s hard to work on shared data at the same time so we don t mutual exclusion via locks Locks have problems performance/correctness, fine/coarse tradeoff deadlocks and failure recovery
5 Transactional Memory (TM) Memory transaction [Knight 86, Herlihy & Moss 93] Inspired by database transactions Execute large, programmer-defined regions atomically and in isolation atomic { x = x + y; } Atomicity (all or nothing) At commit, all memory writes take effec at once On abort, none of the writes appear to take effect Isolation No other code can observe writes before commit Serializability Transactions seem to commit in a single serial order The exact order is not guaranteed though
6 Advantages of TM Easy to use synchronization construct As easy to use as coarse-grain locks Programmer declares, system implements Performs as well as fine-grain locks Automatic read-read & fine-grain concurrency No tradeoff between performance & correctness Failure atomicity & recovery No lost locks when a thread fails Failure recovery = transaction abort + restart Composability Safe & scalable composition of software modules
7 Programming with TM Basic atomic blocks: atomic {} User-triggered abort: abort Conditional synchronization: retry Composing code sequences: orelse Integration with parallel models:?
8 TM Caveats and Open Issues TM Vs. Locks I/O and unrecoverable actions Interaction with non-transactional code
9 Atomic() Lock() + Unlock() The difference Atomic: high-level declaration of atomicity Does not specify implementation/blocking behavior Does not provide a consistency model Lock: low-level blocking primitive Does not provide atomicity or isolation on its own Keep in mind Locks can be used to implement atomic() Locks can be used for purposes beyond atomicity Cannot replace all lock regions with atomic regions Atomic eliminates many data races Atomic blocks can suffer from atomicity violations Atomic action in algorithm split into two atomic blocks
10 I/O and Other Irrevocable Actions Challenge: difficult to undo output & redo input I/O devices, I/O registers Alternative solutions (open problem) Buffer output & log input Finalize output & clear log at commit Does not work if atomic does input after output Guarantee that transaction will not abort Abort interfering transactions or sequentialize the system Does not work with abort(), input-after-output Transaction-based systems Multiple transactional devices Manager coordinates transactions across devices
11 Interactions with Non-Transactional Code Two basic alternatives Weak atomicity Transactions are serializable only against other transactions No guarantees about interactions with non-transactional code Strong atomicity Transactions are serializable against all memory accesses Non-transactional loads/stores are 1-instrcution transactions The tradeoff Strong atomicity seems intuitive Predictable interactions for a wide range of coding patterns But, strong atomicity has high overheads for software TM
12 Why TM? TM= declarative synchronization User specifies requirement (atomicity & isolation) System implements in best possible way Motivation for TM Difficult for user to get explicit sync right Correctness Vs performance Vs complexity Explicit sync is difficult to scale Locking scheme for 4 CPUs is not the best for 64 Difficult to do explicit sync with composable SW Need a global locking strategy Other advantages: fault atomicity,.. TM applicability Apps with irregular or unstructured parallelism Difficult to prove independence in advance Difficult to partition data in advance TM does not generate new parallelism It just helps you tap into what is there TM target: 90% of 10% of work
13 Implementation Requirements for TM To build TM, you need Data Versioning atomic { x = x + y; } Conflict Detection T0 atomic { x = x + y; } T1 atomic { x = x / 8; } Conflict Resolution T0 x = x + y; x = x / 8; x = x / 8; T1 Where do you put the new x until commit? How do you detect that reads/writes to x need to be serialized? How do you enforce serialization when required? Design space tradeoffs
14 TM Implementation Basics TM systems must provide atomicity and isolation Without sacrificing concurrency Basic implementation requirements Data versioning Conflict detection & resolution Implementation options Hardware transactional memory (HTM) Software transactional memory (STM) Hybrid transactional memory Hardware accelerated STMs and dual-mode systems
15 Data Versioning Manage uncommitted (new) and committed (old) versions of data for concurrent transactions Eager versioning (undo-log based) Update memory location directly Maintain undo info in a log + Faster commit, direct reads (SW) - Slower aborts, fault tolerance issues Lazy versioning (write-buffer based) - Buffer data until commit in a write-buffer - Update actual memory location on commit + Faster abort, no fault tolerance issues - Slower commits, indirect reads (SW)
16 Conflict Detection Detect and handle conflicts between transaction Read-Write and (often) Write-Write conflicts Must track the transaction s read-set and write-set Read-set: addresses read within the transaction Write-set: addresses written within the transaction Pessimistic detection Check for conflicts during loads or stores SW: SW barriers using locks and/or version numbers HW: check through coherence actions Use contention manager to decide to stall or abort Various priority policies to handle common case fast Optimistic detection Detect conflicts when a transaction attempts to commit SW: validate write/read-set using locks or version numbers HW: validate write-set using coherence actions Get exclusive access for cache lines in write-set On a conflict, give priority to committing transaction Other transactions may abort later on On conflicts between committing transactions, use contention manager to decide priority
17 Conflict Detection Tradeoffs Pessimistic conflict detection (aka encounter or eager) + Detect conflicts early Undo less work, turn some aborts to stalls - No forward progress guarantees, more aborts in some cases - Locking issues (SW), fine-grain communication (HW) Optimistic conflict detection (aka commit or lazy) + Forward progress guarantees + Potentially less conflicts, shorter locking (SW), bulk communication (HW) - Detects conflicts late, still has fairness problems
18 Conflict Detection Granularity Object granularity (SW/hybrid) + Reduced overhead (time/space) + Close to programmers reasoning - False sharing on large objects (e.g. arrays) Word granularity + Minimize false sharing - Increased overhead (time/space) Cache line granularity + compromise between object & word + works for both HW/SW Mix & match ->best of both words word-level for arrays, object-level for other data,..
19 TM Implementation Space Hardware TM systems Lazy + optimistic: Stanford TCC Lazy + pessimistic: MIT LTM, Intel VTM Eager + pessimistic: Wisconsin LogTM Software TM Systems Lazy + optimistic (rd/wr): Sun TL2 Lazy + optimistic (rd)/pessimistic (wr): MS OSTM Eager + optimistic (rd)/pessimistic (wr): Intel STM Eager + pessimistic (rd/wr): Intel STM Optimal design is still an open questions May be different for HW, SW, and hybrid
20 Hardware or Software TM? Can be implemented in HW or SW SW is slow Bookkeeping is expensive: 2-8x slowdown SW has correctness pitfalls Even for correctly synchronized code! Lack of strong atomicity Let s use hardware for TM
21 Types of Hardware Support Hardware-accelerated STM systems (HASTM, SigTM, USTM, FlexTM ) Start with STM system & identify key bottlenecks Provide (simple) HW primitives for acceleration Hardware-based TM systems (TCC, LTM, VTM, LogTM, ) Versioning & conflict detection directly in HW Hybrid TM systems (Sun Rock, ) Combine an HTM with an STM by switching modes when needed Based on xaction characteristics available resources,
22 Hardware TM Data versioning in caches Cache the write-buffer or the undo-log Cache metadata to track read-set and write-set Can do with private, shared, and multi-level caches Conflict detection through cache coherence protocol Coherence lookups detect conflicts between transactions Works with snooping & directory coherence Notes Register checkpoint must be taken at transaction begin Virtualization of hardware resources HTM support similar for TLS and speculative lock-elision Some hardware can support all three models actually
23 HTM Advantages Transparent No need for SW barriers, function cloning,.. Fast common case behavior Zero-overhead tracking of read-set & write-set Zero-overhead versioning Fast commit & abort without data movement Continuous validation of read-set Strong isolation Conflicts detected on non-xaction loads/stores as well Can simplify multi-core hardware Replace existing coherence with transactional coherence
24 HTM Challenges and Opportunities 1.What s the best implementation in hardware? Many available options 2.What s the right HW/SW interface? HTM support flexible SW environment 3.What s happens when HW resources are exhausted? Virtualization of hardware resources Time virtualization Interrupts, paging, and context switch with xaction What happens to the state in caches Space virtualization Where is the write-buffer or log stored How are R&W bits stored and checked Most transactions are currently small Small read-sets & write-sets Short in terms of instructions
25 Project Aims The self-tuning transactional memory system Dynamically adapt its policies to best suit the application behavior. Configurable parameterized application programming interface (API) to improve the scalability and flexibility. Develop loop-closed debugger for HTM based on our FPGA prototype platform. Validate the self-tuning memory hierarchy in the platform that can support both software-managed memories and a cache-coherent or transactional memory system.
26 Processing Elements Concepts Memory Wall Processor frequency vs. DRAM memory latency Latency introduced by multiple levels of memory Attack on the Memory Wall 3-level Memory Model: Main storage, tightly coupled memory (TCM) and HTM cache, and Register file Streaming DMA architecture RISC processor RTOS support real-time worlds
27 Hardware Transactional Memory Architecture
28 TM Version, Conflict, Contention Implement an atomic and isolated transactional region: Versioning: eager and lazy Conflict detection: optimistic and pessimistic Contention management To make a transactional code region appear atomic, all its modifications must be stored and kept isolated from other transaction until commit time. To ensure serializability between transactions, conflicts must be detected and resolved.
29 Designer-defined Interface Define the TM instructions in three models Basic model XSTART, a transaction begin mark XEND, a transaction end mark Extension model XSTART_OPEN, independent atomicity and isolation for nested transactions XSTART_CLOSED, independent rollback & restart for nested transactions XABR, abort a running transaction XVLD, validate a running transaction User mode UCLEAR, clear the read-set and write-set data USTORE, store the data to memory, the speculative cache state is not changed ULOAD, load the data from the memory, the speculative cache state is not changed Problems: How to coincide with the instruction set architecture and processor pipeline How to write the application program using these primitives
30 Emulation Platform Framework Architecture research relies on software simulators which are too slow to facilitate interesting experiments. An alternative to simulation is to develop FPGA-based platforms for parallel computing platform. For the HTM project, we have developed the Transactional system Emulation Accelerator (TEA) platform to validate the HTM design and to support programming models and application development. We also can use the FPGA-based technology for prototyping modern CMP systems.
31 TEA Architecture FPGA E FPGA S RISC/DSP RISC/DSP RISC/DSP RISC/DSP I$ HTM DMA TCM I$ HTM DMA TCM DDR2 DRAM CTRL Token ARB I$ HTM DMA TCM I$ HTM DMA TCM switch switch Router switch switch DMA TCM I$ HTM DMA TCM I$ HTM Linux RISC FPGA M I/O DMA TCM I$ HTM DMA TCM I$ HTM RISC/DSP RISC/DSP RISC/DSP RISC/DSP FPGA N FPGA W Each User FPGA (East, South, West, North) contains two RISC/DSP cores enhanced with a HTM and DMA mechanism. The FPGA M connects all the processors to the shared memory and I/O devices. The router interfaces with the token arbiter, the DDR controller and RISC32E core that runs the Linux OS/RTOS.
32 Breakdown of TEA s Bandwidth FPGA-FPGA ⅰ)LVCMOS Link Control FPGA to User FPGA link: 100MHz x80bit = 8.0Gb/s User FPGA to User FPGA link:100mhz x100bit = 10.0Gb/s ⅱ)GTP Link Control FPGA to User FPGA link: 2 GTPs User FPGA to User FPGA link: 6 GTPs Memory Capacity 10GB DDRⅡ/FPGA Bandwidth 64bit x 150MHz =9.6Gb/s I/O Control FPGA 8 SFP User FPGA 2 SFP Supports both 10-Gigabit Ethernet and 10-Gigabit Infiniband standards Bandwidth 2.5Gb/s In addition, one Gbit Ethernet port/fpga for supplementary,
33 TEA Platform Photo Cache coherence and TM Emulation On-chip Interconnection Network and Protocol Verification of MPSoC
34 Contributions Evaluated hardware TM systems The best system from efficiency/complexity and application standpoint Replaced coherence and consistency with only transactions Using only transactions for communication is advantageous and efficient Devised a hardware/software interface for TM Simple primitives provide TM with flexible and needed semantics
35 Problems Software simulator user-level or full system? Hardware emulator? Is TM an panacea? How to attack memory wall?
36 Related Work Cell processor and Roadrunner RAMP( Research Accelerator for Multiple Processors) project, an FPGA-based hardware emulator in computer architecture. Smart Memory (Stanford University) A.Firoozshahian, et al., A memory system design framework: creating smart memories, ISCA Sun s Rock is a highly-speculative multicore processor with a isolating hardware checkpointing feature. M. Tremblay and S.Chaudhry, A third-generation 65nm 16-core 32-thread plus 32-scout-thread CMT SPARC processor, ISSCC TCC project LogTM K.E.Moore, et al., LogTM: log-based transactional memory, HPCA EazyHTM S.Tomić, et al., EazyHTM:eager-lazy hardware transactional memory, MICRO MetaTM Rossbach et al., "TxLinux and MetaTM: transactional memory and the operating system," Communications of the ACM, FlexTM S.Arrvindh et al. Flexible decoupled transactional memory support, ISCA TM research community TM bibliography:
37 Selected References TM Overview Larus & Rajwar. Transactional Memory, Morgan & Claypool Publishers,2007, 2011 Larus & Kozyrakis. Transactional Memory. Communications of the ACM, Harris et al. Transactional Memory: An Overview, IEEE Micro, Basics Herligh & Moss. Transactional Memory: Architectural Support for Lock-Free Data Structures, ISCA, Hammond, et al. Transactional Memory Coherence and Consistency, ISCA, Rajwar et al. Virtualizing Transactional Memory. ISCA, Moore et al. logtm: Log-Based Transactional Memory, HPCA, Ceze et al. BulkSC: Bulk Enforcement of Sequential Consistency, ISCA, McDonald. Architectures for Transactional Memory, Dissertation, Stanford University, McDonald. Architectural Semantics for Practical Transactional Memory, ISCA, Moravan. Supporting Nested Transactional Memory in LogTM, ASPLOS, Wee et al. A practical FPGA-based Framework for Novel CMP Research, FPGA, Njoroge et al. ATLAS: A Chip-Multiprocessor with Transactional Memory Support, DATE, Lupon et al. A Dynamically Adaptable Hardware Transactional Memory, Microarchitecture, Christos. Transactional Memory, Concepts, Implementations, & Opportunities,
Lecture 20: Transactional Memory. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 20: Transactional Memory Parallel Computer Architecture and Programming Slide credit Many of the slides in today s talk are borrowed from Professor Christos Kozyrakis (Stanford University) Raising
More informationTransactional Memory. Lecture 19: Parallel Computer Architecture and Programming CMU /15-618, Spring 2015
Lecture 19: Transactional Memory Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of the slides in today s talk are borrowed from Professor Christos Kozyrakis
More information6 Transactional Memory. Robert Mullins
6 Transactional Memory ( MPhil Chip Multiprocessors (ACS Robert Mullins Overview Limitations of lock-based programming Transactional memory Programming with TM ( STM ) Software TM ( HTM ) Hardware TM 2
More informationChí Cao Minh 28 May 2008
Chí Cao Minh 28 May 2008 Uniprocessor systems hitting limits Design complexity overwhelming Power consumption increasing dramatically Instruction-level parallelism exhausted Solution is multiprocessor
More informationTransactional Memory. Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech
Transactional Memory Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech (Adapted from Stanford TCC group and MIT SuperTech Group) Motivation Uniprocessor Systems Frequency
More informationFall 2012 Parallel Computer Architecture Lecture 16: Speculation II. Prof. Onur Mutlu Carnegie Mellon University 10/12/2012
18-742 Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II Prof. Onur Mutlu Carnegie Mellon University 10/12/2012 Past Due: Review Assignments Was Due: Tuesday, October 9, 11:59pm. Sohi
More informationTransactional Memory
Transactional Memory Architectural Support for Practical Parallel Programming The TCC Research Group Computer Systems Lab Stanford University http://tcc.stanford.edu TCC Overview - January 2007 The Era
More informationTradeoffs in Transactional Memory Virtualization
Tradeoffs in Transactional Memory Virtualization JaeWoong Chung Chi Cao Minh, Austen McDonald, Travis Skare, Hassan Chafi,, Brian D. Carlstrom, Christos Kozyrakis, Kunle Olukotun Computer Systems Lab Stanford
More informationTransactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28
Transactional Memory or How to do multiple things at once Benjamin Engel Transactional Memory 1 / 28 Transactional Memory: Architectural Support for Lock-Free Data Structures M. Herlihy, J. Eliot, and
More informationTransactional Memory. Lecture 18: Parallel Computer Architecture and Programming CMU /15-618, Spring 2017
Lecture 18: Transactional Memory Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2017 Credit: many slides in today s talk are borrowed from Professor Christos Kozyrakis (Stanford
More informationEazyHTM: Eager-Lazy Hardware Transactional Memory
EazyHTM: Eager-Lazy Hardware Transactional Memory Saša Tomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián Cristal, Osman Unsal, Tim Harris, Mateo Valero Barcelona Supercomputing Center,
More informationLecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM
Lecture 6: Lazy Transactional Memory Topics: TM semantics and implementation details of lazy TM 1 Transactions Access to shared variables is encapsulated within transactions the system gives the illusion
More informationTransactional Memory. Concurrency unlocked Programming. Bingsheng Wang TM Operating Systems
Concurrency unlocked Programming Bingsheng Wang TM Operating Systems 1 Outline Background Motivation Database Transaction Transactional Memory History Transactional Memory Example Mechanisms Software Transactional
More informationImproving the Practicality of Transactional Memory
Improving the Practicality of Transactional Memory Woongki Baek Electrical Engineering Stanford University Programming Multiprocessors Multiprocessor systems are now everywhere From embedded to datacenter
More informationThe Common Case Transactional Behavior of Multithreaded Programs
The Common Case Transactional Behavior of Multithreaded Programs JaeWoong Chung Hassan Chafi,, Chi Cao Minh, Austen McDonald, Brian D. Carlstrom, Christos Kozyrakis, Kunle Olukotun Computer Systems Lab
More informationLogTM: Log-Based Transactional Memory
LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood 12th International Symposium on High Performance Computer Architecture () 26 Mulitfacet
More informationATLAS: A Chip-Multiprocessor. with Transactional Memory Support
ATLAS: A Chip-Multiprocessor with Transactional Memory Support Njuguna Njoroge, Jared Casper, Sewook Wee, Yuriy Teslyar, Daxia Ge, Christos Kozyrakis, and Kunle Olukotun Transactional Coherence and Consistency
More informationLecture: Transactional Memory. Topics: TM implementations
Lecture: Transactional Memory Topics: TM implementations 1 Summary of TM Benefits As easy to program as coarse-grain locks Performance similar to fine-grain locks Avoids deadlock 2 Design Space Data Versioning
More informationLog-Based Transactional Memory
Log-Based Transactional Memory Kevin E. Moore University of Wisconsin-Madison Motivation Chip-multiprocessors/Multi-core/Many-core are here Intel has 1 projects in the works that contain four or more computing
More informationLecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations
Lecture 21: Transactional Memory Topics: Hardware TM basics, different implementations 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end locks are blocking,
More informationFlexible Architecture Research Machine (FARM)
Flexible Architecture Research Machine (FARM) RAMP Retreat June 25, 2009 Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan Bronson Christos Kozyrakis, Kunle Olukotun Motivation Why CPUs + FPGAs make sense
More informationRelaxing Concurrency Control in Transactional Memory. Utku Aydonat
Relaxing Concurrency Control in Transactional Memory by Utku Aydonat A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of The Edward S. Rogers
More informationLecture 12 Transactional Memory
CSCI-UA.0480-010 Special Topics: Multicore Programming Lecture 12 Transactional Memory Christopher Mitchell, Ph.D. cmitchell@cs.nyu.edu http://z80.me Database Background Databases have successfully exploited
More informationTransactional Memory Implementation Lecture 1. COS597C, Fall 2010 Princeton University Arun Raman
Transactional Memory Implementation Lecture 1 COS597C, Fall 2010 Princeton University Arun Raman 1 Module Outline ecture 1 (THIS LECTURE) ransactional Memory System Taxonomy oftware Transactional Memory
More informationLecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM
Lecture 4: Directory Protocols and TM Topics: corner cases in directory protocols, lazy TM 1 Handling Reads When the home receives a read request, it looks up memory (speculative read) and directory in
More informationScheduling Transactions in Replicated Distributed Transactional Memory
Scheduling Transactions in Replicated Distributed Transactional Memory Junwhan Kim and Binoy Ravindran Virginia Tech USA {junwhan,binoy}@vt.edu CCGrid 2013 Concurrency control on chip multiprocessors significantly
More informationPotential violations of Serializability: Example 1
CSCE 6610:Advanced Computer Architecture Review New Amdahl s law A possible idea for a term project Explore my idea about changing frequency based on serial fraction to maintain fixed energy or keep same
More informationLecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks
Lecture: Transactional Memory, Networks Topics: TM implementations, on-chip networks 1 Summary of TM Benefits As easy to program as coarse-grain locks Performance similar to fine-grain locks Avoids deadlock
More informationDBT Tool. DBT Framework
Thread-Safe Dynamic Binary Translation using Transactional Memory JaeWoong Chung,, Michael Dalton, Hari Kannan, Christos Kozyrakis Computer Systems Laboratory Stanford University http://csl.stanford.edu
More information6.852: Distributed Algorithms Fall, Class 20
6.852: Distributed Algorithms Fall, 2009 Class 20 Today s plan z z z Transactional Memory Reading: Herlihy-Shavit, Chapter 18 Guerraoui, Kapalka, Chapters 1-4 Next: z z z Asynchronous networks vs asynchronous
More informationLecture 8: Transactional Memory TCC. Topics: lazy implementation (TCC)
Lecture 8: Transactional Memory TCC Topics: lazy implementation (TCC) 1 Other Issues Nesting: when one transaction calls another flat nesting: collapse all nested transactions into one large transaction
More informationLecture 16: Checkpointed Processors. Department of Electrical Engineering Stanford University
Lecture 16: Checkpointed Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 18-1 Announcements Reading for today: class notes Your main focus:
More informationSpeculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution
Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution Ravi Rajwar and Jim Goodman University of Wisconsin-Madison International Symposium on Microarchitecture, Dec. 2001 Funding
More informationTowards Pervasive Parallelism
Towards Pervasive Parallelism Kunle Olukotun Pervasive Parallelism Laboratory Stanford University UT Austin, October 2008 End of Uniprocessor Performance 10000 Performance (vs. VAX-11/780) 1000 100 10
More informationCost of Concurrency in Hybrid Transactional Memory. Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University)
Cost of Concurrency in Hybrid Transactional Memory Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University) 1 Transactional Memory: a history Hardware TM Software TM Hybrid TM 1993 1995-today
More informationDependence-Aware Transactional Memory for Increased Concurrency. Hany E. Ramadan, Christopher J. Rossbach, Emmett Witchel University of Texas, Austin
Dependence-Aware Transactional Memory for Increased Concurrency Hany E. Ramadan, Christopher J. Rossbach, Emmett Witchel University of Texas, Austin Concurrency Conundrum Challenge: CMP ubiquity Parallel
More informationABORTING CONFLICTING TRANSACTIONS IN AN STM
Committing ABORTING CONFLICTING TRANSACTIONS IN AN STM PPOPP 09 2/17/2009 Hany Ramadan, Indrajit Roy, Emmett Witchel University of Texas at Austin Maurice Herlihy Brown University TM AND ITS DISCONTENTS
More informationSpeculative Synchronization
Speculative Synchronization José F. Martínez Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/martinez Problem 1: Conservative Parallelization No parallelization
More informationThread-level Parallelism for the Masses. Kunle Olukotun Computer Systems Lab Stanford University 2007
Thread-level Parallelism for the Masses Kunle Olukotun Computer Systems Lab Stanford University 2007 The World has Changed Process Technology Stops Improving! Moore s law but! Transistors don t get faster
More informationLecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation
Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, lazy implementation 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end
More informationBuilding and Using the ATLAS Transactional Memory System
Building and Using the ATLAS Transactional Memory System Njuguna Njoroge, Sewook Wee, Jared Casper, Justin Burdick, Yuriy Teslyar, Christos Kozyrakis, Kunle Olukotun Computer Systems Laboratory Stanford
More informationDESIGNING AN EFFECTIVE HYBRID TRANSACTIONAL MEMORY SYSTEM
DESIGNING AN EFFECTIVE HYBRID TRANSACTIONAL MEMORY SYSTEM A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT
More informationFARM: A Prototyping Environment for Tightly-Coupled, Heterogeneous Architectures
FARM: A Prototyping Environment for Tightly-Coupled, Heterogeneous Architectures Tayo Oguntebi, Sungpack Hong, Jared Casper, Nathan Bronson Christos Kozyrakis, Kunle Olukotun Outline Motivation The Stanford
More informationMutex Locking versus Hardware Transactional Memory: An Experimental Evaluation
Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation Thesis Defense Master of Science Sean Moore Advisor: Binoy Ravindran Systems Software Research Group Virginia Tech Multiprocessing
More informationData/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)
Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Hydra is a 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software
More informationUsing Software Transactional Memory In Interrupt-Driven Systems
Using Software Transactional Memory In Interrupt-Driven Systems Department of Mathematics, Statistics, and Computer Science Marquette University Thesis Defense Introduction Thesis Statement Software transactional
More informationLecture 12: TM, Consistency Models. Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations
Lecture 12: TM, Consistency Models Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations 1 Paper on TM Pathologies (ISCA 08) LL: lazy versioning, lazy conflict detection, committing
More informationReminder from last time
Concurrent systems Lecture 7: Crash recovery, lock-free programming, and transactional memory DrRobert N. M. Watson 1 Reminder from last time History graphs; good (and bad) schedules Isolation vs. strict
More information740: Computer Architecture Memory Consistency. Prof. Onur Mutlu Carnegie Mellon University
740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University Readings: Memory Consistency Required Lamport, How to Make a Multiprocessor Computer That Correctly Executes Multiprocess
More informationConventional processor designs run out of steam Complexity (verification) Power (thermal) Physics (CMOS scaling)
A Gentler, Kinder Guide to the Multi-core Galaxy Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech Guest lecture for ECE4100/6100 for Prof. Yalamanchili Reality Check Conventional
More informationData/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)
Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Hydra ia a 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software
More informationTransactional Memory. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Transactional Memory Companion slides for The by Maurice Herlihy & Nir Shavit Our Vision for the Future In this course, we covered. Best practices New and clever ideas And common-sense observations. 2
More informationLecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)
Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) 1 Coherence Vs. Consistency Recall that coherence guarantees (i) that a write will eventually be seen by other processors,
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationLecture: Consistency Models, TM
Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) No class on Monday (please watch TM videos) Wednesday: TM wrap-up, interconnection networks 1 Coherence Vs. Consistency
More informationMULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationData/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)
Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) A 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software
More informationPractical Near-Data Processing for In-Memory Analytics Frameworks
Practical Near-Data Processing for In-Memory Analytics Frameworks Mingyu Gao, Grant Ayers, Christos Kozyrakis Stanford University http://mast.stanford.edu PACT Oct 19, 2015 Motivating Trends End of Dennard
More informationMultiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types
Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon
More informationMULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationLecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory
Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory 1 Example Programs Initially, A = B = 0 P1 P2 A = 1 B = 1 if (B == 0) if (A == 0) critical section
More informationFlexTM. Flexible Decoupled Transactional Memory Support. Arrvindh Shriraman Sandhya Dwarkadas Michael L. Scott Department of Computer Science
FlexTM Flexible Decoupled Transactional Memory Support Arrvindh Shriraman Sandhya Dwarkadas Michael L. Scott Department of Computer Science 1 Transactions: Our Goal Lazy Txs (i.e., optimistic conflict
More informationConflict Detection and Validation Strategies for Software Transactional Memory
Conflict Detection and Validation Strategies for Software Transactional Memory Michael F. Spear, Virendra J. Marathe, William N. Scherer III, and Michael L. Scott University of Rochester www.cs.rochester.edu/research/synchronization/
More informationAgenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based?
Agenda Designing Transactional Memory Systems Part III: Lock-based STMs Pascal Felber University of Neuchatel Pascal.Felber@unine.ch Part I: Introduction Part II: Obstruction-free STMs Part III: Lock-based
More informationSOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS
SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CSAIL IAP MEETING MAY 21, 2013 Research Agenda Lack of technology progress Moore s Law still alive Power
More informationCMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 12: Multi-Core Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 4 " Due: 11:49pm, Saturday " Two late days with penalty! Exam I " Grades out on
More informationTransactional Memory Coherence and Consistency
Transactional emory Coherence and Consistency all transactions, all the time Lance Hammond, Vicky Wong, ike Chen, rian D. Carlstrom, ohn D. Davis, en Hertzberg, anohar K. Prabhu, Honggo Wijaya, Christos
More informationNON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 28 November 2014
NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 28 November 2014 Lecture 8 Problems with locks Atomic blocks and composition Hardware transactional memory Software transactional memory
More informationAtomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence?
CPSC-4/6: Operang Systems Atomic Transactions The Transaction Model / Primitives Serializability Implementation Serialization Graphs 2-Phase Locking Optimistic Concurrency Control Transactional Memory
More informationComparing Memory Systems for Chip Multiprocessors
Comparing Memory Systems for Chip Multiprocessors Jacob Leverich Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, Christos Kozyrakis Computer Systems Laboratory Stanford University
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationConcurrent Preliminaries
Concurrent Preliminaries Sagi Katorza Tel Aviv University 09/12/2014 1 Outline Hardware infrastructure Hardware primitives Mutual exclusion Work sharing and termination detection Concurrent data structures
More informationHardware Transactional Memory. Daniel Schwartz-Narbonne
Hardware Transactional Memory Daniel Schwartz-Narbonne Hardware Transactional Memories Hybrid Transactional Memories Case Study: Sun Rock Clever ways to use TM Recap: Parallel Programming 1. Find independent
More informationTransactional Memory. review articles. Is TM the answer for improving parallel programming?
Is TM the answer for improving parallel programming? by james larus and christos Kozyrakis doi: 10.1145/1364782.1364800 Transactional Memory As computers evolve, programming changes as well. The past few
More informationSOFTWARE TRANSACTIONAL MEMORY FOR MULTICORE EMBEDDED SYSTEMS
SOFTWARE TRANSACTIONAL MEMORY FOR MULTICORE EMBEDDED SYSTEMS A Thesis Presented by Jennifer Mankin to The Department of Electrical and Computer Engineering in partial fulfillment of the requirements for
More informationOverview of Transaction Management
Overview of Transaction Management Chapter 16 Comp 521 Files and Databases Fall 2010 1 Database Transactions A transaction is the DBMS s abstract view of a user program: a sequence of database commands;
More informationARCHITECTURES FOR TRANSACTIONAL MEMORY
ARCHITECTURES FOR TRANSACTIONAL MEMORY A DISSERTATION SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
More informationHigh Performance Computing on GPUs using NVIDIA CUDA
High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and
More informationChris Rossbach, Owen Hofmann, Don Porter, Hany Ramadan, Aditya Bhandari, Emmett Witchel University of Texas at Austin
Chris Rossbach, Owen Hofmann, Don Porter, Hany Ramadan, Aditya Bhandari, Emmett Witchel University of Texas at Austin Hardware Transactional Memory is a reality Sun Rock supports HTM Solaris 10 takes advantage
More informationCommit Algorithms for Scalable Hardware Transactional Memory. Abstract
Commit Algorithms for Scalable Hardware Transactional Memory Seth H. Pugsley, Rajeev Balasubramonian UUCS-07-016 School of Computing University of Utah Salt Lake City, UT 84112 USA August 9, 2007 Abstract
More informationHandout 3 Multiprocessor and thread level parallelism
Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed
More informationSYSTEM CHALLENGES AND OPPORTUNITIES FOR TRANSACTIONAL MEMORY
SYSTEM CHALLENGES AND OPPORTUNITIES FOR TRANSACTIONAL MEMORY A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL
More informationMultiprocessors and Locking
Types of Multiprocessors (MPs) Uniform memory-access (UMA) MP Access to all memory occurs at the same speed for all processors. Multiprocessors and Locking COMP9242 2008/S2 Week 12 Part 1 Non-uniform memory-access
More informationIntroduction to Parallel Computing
Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen
More informationSystem Challenges and Opportunities for Transactional Memory
System Challenges and Opportunities for Transactional Memory JaeWoong Chung Computer System Lab Stanford University My thesis is about Computer system design that help leveraging hardware parallelism Transactional
More informationMultiprocessor Synchronization
Multiprocessor Systems Memory Consistency In addition, read Doeppner, 5.1 and 5.2 (Much material in this section has been freely borrowed from Gernot Heiser at UNSW and from Kevin Elphinstone) MP Memory
More informationMcRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime
McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime B. Saha, A-R. Adl- Tabatabai, R. Hudson, C.C. Minh, B. Hertzberg PPoPP 2006 Introductory TM Sales Pitch Two legs
More informationZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS
ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200
More informationLock vs. Lock-free Memory Project proposal
Lock vs. Lock-free Memory Project proposal Fahad Alduraibi Aws Ahmad Eman Elrifaei Electrical and Computer Engineering Southern Illinois University 1. Introduction The CPU performance development history
More informationZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS
ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200
More informationTDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading
Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5
More informationAn Effective Hybrid Transactional Memory System with Strong Isolation Guarantees
An Effective Hybrid Transactional Memory System with Strong Isolation Guarantees Chi Cao Minh, Martin Trautmann, JaeWoong Chung, Austen McDonald, Nathan Bronson, Jared Casper, Christos Kozyrakis, Kunle
More informationHardware Support For Serializable Transactions: A Study of Feasibility and Performance
Hardware Support For Serializable Transactions: A Study of Feasibility and Performance Utku Aydonat Tarek S. Abdelrahman Edward S. Rogers Sr. Department of Electrical and Computer Engineering University
More informationAn Introduction to Parallel Programming
An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe
More informationIntro to Transactions
Reading Material CompSci 516 Database Systems Lecture 14 Intro to Transactions [RG] Chapter 16.1-16.3, 16.4.1 17.1-17.4 17.5.1, 17.5.3 Instructor: Sudeepa Roy Acknowledgement: The following slides have
More informationLock Elision and Transactional Memory Predictor in Hardware. William Galliher, Liang Zhang, Kai Zhao. University of Wisconsin Madison
Lock Elision and Transactional Memory Predictor in Hardware William Galliher, Liang Zhang, Kai Zhao University of Wisconsin Madison Email: {galliher, lzhang432, kzhao32}@wisc.edu ABSTRACT Shared data structure
More informationGoldibear and the 3 Locks. Programming With Locks Is Tricky. More Lock Madness. And To Make It Worse. Transactional Memory: The Big Idea
Programming With Locks s Tricky Multicore processors are the way of the foreseeable future thread-level parallelism anointed as parallelism model of choice Just one problem Writing lock-based multi-threaded
More informationExploiting Distributed Software Transactional Memory
Exploiting Distributed Software Transactional Memory Christos Kotselidis Research Fellow Advanced Processor Technologies Group The University of Manchester Outline Transactional Memory Distributed Transactional
More informationLecture 17: Transactional Memories I
Lecture 17: Transactional Memories I Papers: A Scalable Non-Blocking Approach to Transactional Memory, HPCA 07, Stanford The Common Case Transactional Behavior of Multi-threaded Programs, HPCA 06, Stanford
More informationTransactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93
Transactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93 What are lock-free data structures A shared data structure is lock-free if its operations
More information