AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM
|
|
- Derick Cummings
- 6 years ago
- Views:
Transcription
1 FRACTAL AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM SUVINAY SUBRAMANIAN, MARK C. JEFFREY, MALEEN ABEYDEERA, HYUN RYONG LEE, VICTOR A. YING, JOEL EMER, DANIEL SANCHEZ ISCA 017
2 Current speculative systems scale poorly Speculative parallelization, e.g. TM, simplifies parallel programming Performs poorly on real world applications because applications comprise large atomic tasks FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM
3 Large atomic tasks limit performance Database Transaction Millions of cycles Prone to aborts Challenging to track Serial (misses parallelism) FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 3
4 Large atomic tasks have abundant nested parallelism! qry K qry Y qry Y upd J qry S upd L qry M qry U upd V How to - extract parallelism? - maintain atomicity? - achieve high performance? FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 4
5 Prior TMs fail to exploit nested parallelism 1. Merging of nested speculative state with parent A X Large speculative state, prone to aborts B Time Core 1 Core Core 3 Core 4 X A. Cyclic dependence between parent and nested children Deadlock and livelock issues FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 5 B Y Y See the paper for more details!
6 Ordering tasks to guarantee atomicity X Y 1 X A B Y Core 1 Core Core 3 Core 4 X Y A B Time X A B Y FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 6
7 Fractal decouples atomicity from parallelism 1. Decouples unit of atomicity from unit of parallelism Domain: All tasks belonging to a domain appear to execute atomically. Implementation guarantees atomicity by ordering tasks No merging speculative state Benefits of Fractal Tiny tasks Easy to track Composable speculative parallelism FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 7
8 Fractal Execution Model DECOUPLING ATOMICITY FROM PARALLELISM FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 8
9 Domains to group tasks into atomic units Fractal programs consist of atomic tasks Tasks may access arbitrary data Tasks may create child tasks Tasks belong to a hierarchy of nested domains FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 9
10 Semantics across domains Each task: can create a single subdomain can enqueue child tasks to subdomain or current domain A X Root domain B L Y M (All tasks in domain + creator of domain) à C D N O Appear to execute as single atomic unit E P FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 10
11 Semantics within a domain Unordered Arbitrary order while respecting parent-child dependences Timestamp-ordered Tasks appear to execute in increasing timestamp order Children appear to execute after parent A C E X Root domain B L D N P Y M O FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 11
12 Fractal software API Creating and enqueuing tasks Creating sub-domains fractal::enqueue(function_pointer, timestamp, arguments...); fractal::create_subdomain(<domain_type>); High-level programming interface, e.g. forall(), callcc(), parallel_reduce() FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 1
13 Example: Database transactions in Fractal TXN 1 Root domain TXN T 1 qry Z 4 5 qry U upd V 1 qry A qry B upd C FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 13
14 Fractal Implementation ATOMICITY THROUGH ORDERING FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 14
15 Fractal Virtual Time (VT) Fractal assigns a fractal virtual time (VT) to each task Captures the ordering of tasks across domains, within a domain Fractal VT= Domain VT FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 15
16 Example: Database transactions in Fractal TXN 1 1 Root domain TXN qry Z 4 5 qry U upd V qry A qry B upd C T 5 FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 16
17 Example: Database transactions in Fractal TXN 1 1 Root domain TXN qry Z 4 5 qry U upd V qry A qry B upd C 5 3 Fractal VT captures all ordering information FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 16 3 T 5
18 Swarm[MICRO 15] : An efficient substrate for ordered speculation Swarm executes tasks speculatively and out of order Large hardware task queues Scalable ordered commits Scalable ordered speculation 64-tile, 56-core chip Mem / IO Mem / IO Mem / IO Tile Mem / IO Tile organization Router L3 slice L L1I/D L1I/D L1I/D L1I/D Core Core Core Core Task unit Efficiently supports tiny speculative tasks FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 17
19 Fractal features Fractal VT construction requires no centralized structures Fractal VT assigns order dynamically Hardware supports a few number of concurrent depths Zooming operations allow for unbounded nesting Spill tasks from shallower domains to memory Parallelism compounds quickly with depth See the paper for more details! FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 18
20 1 Core 1 Core Core 3 Core 4 TXN 1 TXN Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19
21 1 Core 1 Core Core 3 Core 4 qry U 1 1 TXN 1 TXN Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19
22 1 Core 1 Core Core 3 Core qry U 1 TXN qry Z TXN upd V Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19
23 1 Core 1 Core Core 3 Core T qry U qry Z TXN 1 TXN upd V Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19
24 Core 1 Core Core 3 Core T qry U qry Z qry A TXN 1 TXN upd V Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19
25 Core 1 Core Core 3 Core T qry U qry Z qry A TXN 1 TXN upd V 5 qry B Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19
26 Core 1 Core Core 3 Core T qry U qry Z qry A TXN 1 TXN upd V qry B Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19
27 Core 1 Core Core 3 Core T qry U qry Z qry A TXN 1 TXN upd V qry B Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19
28 Core 1 Core Core 3 Core T qry U Abort task qry Z qry A TXN 1 TXN upd V qry B Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19
29 Core 1 Core Core 3 Core T qry U Abort task qry Z qry A TXN 1 TXN upd V qry B Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19
30 Core 1 Core Core 3 Core T qry U Abort task qry Z qry A TXN 1 TXN upd V qry B Tracking, conflict detection at level of fine-grain tasks Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19
31 Core 1 Core Core 3 Core T qry U Abort task qry Z qry A TXN 1 TXN upd V qry B Tracking, conflict detection at level of fine-grain tasks Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19
32 1 1 1 T qry U Abort task qry Z qry A TXN 1 TXN Time Core 1 Core Core 3 Core 4 upd V qry B Tracking, conflict detection at level of fine-grain tasks Selective aborts waste less work FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19
33 Core 1 Core Core 3 Core T qry U Abort task qry Z qry A TXN 1 TXN upd V qry B 3 upd C Task-level tracking Task-level CD Selective aborts Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19
34 Core 1 Core Core 3 Core T qry U Abort task qry Z qry A TXN 1 TXN upd V qry B 3 upd C Task-level tracking Task-level CD Selective aborts Time FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19
35 1 1 1 T qry U Abort task qry Z qry A TXN 1 TXN Time Core 1 Core Core 3 Core 4 upd V qry B 3 upd C Task-level tracking Task-level CD Selective aborts FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 19
36 Core 1 Core Core 3 Core T qry U Abort task qry Z qry A TXN 1 TXN Time upd V qry B 3 upd C Task-level tracking Task-level CD Selective aborts FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 0
37 Core 1 Core Core 3 Core T qry U Abort task qry Z qry A TXN 1 TXN Time upd V qry B 3 upd C Task-level tracking Task-level CD Selective aborts FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 0
38 Core 1 Core Core 3 Core 4 T qry U Abort task qry Z qry A TXN 1 TXN Time upd V qry B 3 upd C Task-level tracking Task-level CD Selective aborts FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 0
39 Commit parent before child completes Core 1 Core Core 3 Core 4 qry U qry Z qry A TXN 1 TXN Time upd V 1 3 Abort task qry B 3 upd C Task-level tracking Task-level CD Selective aborts FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 0 T
40 Commit parent before child completes TXN 1 = X Core 1 Core Core 3 Core 4 T qry U Abort task qry Z qry A Fractal unlocks upd the V benefits of fine-grain parallelism Time qry B 3 upd C Task-level tracking Task-level CD Selective aborts TXN = Y FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 1
41 Evaluation FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM
42 Methodology Event-driven, Pin-based simulator Target system: 56-core, 64-tile chip Scalability experiments from 1 56 cores Scaled-down systems have fewer tiles Applications Unordered (STAMP): labyrinth, bayes Ordered: color, msf, silo, maxflow, mis Mem / IO Mem / IO Mem / IO Tile Mem / IO Router L3 slice L L1I/D L1I/D L1I/D L1I/D Core Core Core Core Task unit 64 MB shared L3 (1MB/tile) 16K task queue entries (64/core) 4K commit queue entries (16/core) 56 KB per-tile Ls 16 KB per-core L1s In-order, single-issue, scoreboarded FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 3
43 Fractal uncovers abundant nested parallelism Flat Fractal Large atomic tasks Nested parallelism exposed through fine-grained tasks FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 4
44 Fractal uncovers abundant nested parallelism 56 maxflow 3x 56 Flat bayes Fractal 18 labyrinth Flat Speedup x 4.9x Fractal 1 1c 18c 56c 1 1c 18c 56c 1 1c 18c 56c 88x 3x Flat M 16 M Fractal Average task length (cycles) FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 5
45 Fractal avoids over-serialization Speedup c mis 145x Flat 18c 56c Swarm color 1c 18c 56c Fractal c msf 18c 56c Flat 6x 98x Swarm 1x 119x Fractal 40x 145x Flat Fractal Average task length (cycles) FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 6
46 Conclusion Speculative systems must extract nested parallelism in order to scale large, complex, real-world applications Fractal: An execution model for fine-grain nested speculative parallelism Decouple atomicity from parallelism Guarantee atomicity by ordering tasks Fractal unlocks the benefits of fine-grain speculative parallelism Parallelizes many challenging workloads Enables composition of speculative parallel algorithms FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 7
47 Thank You! Questions? Speculative systems must extract nested parallelism in order to scale large, complex, real-world applications Fractal: An execution model for fine-grain nested speculative parallelism Decouple atomicity from parallelism Guarantee atomicity by ordering tasks Fractal unlocks the benefits of fine-grain speculative parallelism Parallelizes many challenging workloads Enables composition of speculative parallel algorithms FRACTAL: AN EXECUTION MODEL FOR FINE-GRAIN NESTED SPECULATIVE PARALLELISM 8
Data-Centric Execution of Speculative Parallel Programs MARK JEFFREY, SUVINAY SUBRAMANIAN, MALEEN ABEYDEERA, JOEL EMER, DANIEL SANCHEZ MICRO 2016
Data-entric Execution of Speculative Parallel Programs MARK JEFFREY, SUVINAY SUBRAMANIAN, MALEEN ABEYDEERA, JOEL EMER, DANIEL SANHEZ MIRO 26 Executive summary Many-cores must exploit cache locality to
More informationFractal: An Execution Model for Fine-Grain Nested Speculative Parallelism
ABSTRACT Fractal: An Execution Model for Fine-Grain Nested Speculative Parallelism Suvinay Subramanian Mark C. Jeffrey Maleen Abeydeera Hyun Ryong Lee Victor A. Ying Joel Emer Daniel Sanchez Massachusetts
More informationA Scalable Architecture for Ordered Parallelism
Scalable rchitecture for Ordered Parallelism Mark Jeffrey, Suvinay Subramanian, Cong Yan, Joel mer, aniel Sanchez MICRO 2015 Multicores Target asy Parallelism 2 Multicores Target asy Parallelism 2 Regular:
More informationHarmonizing Speculative and Non-Speculative Execution in Architectures for Ordered Parallelism
Appears in the Proceedings of the 5st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 208 Harmonizing Speculative and Non-Speculative Execution in Architectures for Ordered Parallelism
More informationImproving the Practicality of Transactional Memory
Improving the Practicality of Transactional Memory Woongki Baek Electrical Engineering Stanford University Programming Multiprocessors Multiprocessor systems are now everywhere From embedded to datacenter
More informationData-Centric Execution of Speculative Parallel Programs
Appears in the Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 206 Data-Centric Execution of Speculative Parallel Programs Mark C. Jeffrey Suvinay Subramanian
More informationLecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation
Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, lazy implementation 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end
More informationLecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory
Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory 1 Example Programs Initially, A = B = 0 P1 P2 A = 1 B = 1 if (B == 0) if (A == 0) critical section
More informationLecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations
Lecture 21: Transactional Memory Topics: Hardware TM basics, different implementations 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end locks are blocking,
More informationDatabase Management Systems
Database Management Systems Concurrency Control Doug Shook Review Why do we need transactions? What does a transaction contain? What are the four properties of a transaction? What is a schedule? What is
More informationATLAS: A Chip-Multiprocessor. with Transactional Memory Support
ATLAS: A Chip-Multiprocessor with Transactional Memory Support Njuguna Njoroge, Jared Casper, Sewook Wee, Yuriy Teslyar, Daxia Ge, Christos Kozyrakis, and Kunle Olukotun Transactional Coherence and Consistency
More informationRethinking Serializable Multi-version Concurrency Control. Jose Faleiro and Daniel Abadi Yale University
Rethinking Serializable Multi-version Concurrency Control Jose Faleiro and Daniel Abadi Yale University Theory: Single- vs Multi-version Systems Single-version system T r Read X X 0 T w Write X Multi-version
More informationA Scalable Architecture for Ordered Parallelism
A Scalable Architecture for Ordered Parallelism Mark C. Jeffrey Suvinay Subramanian Cong Yan Joel Emer Daniel Sanchez MIT CSAIL NVIDIA / MIT CSAIL {mcj, suvinay, congy, emer, sanchez}@csail.mit.edu ABSTACT
More informationGraph-based protocols are an alternative to two-phase locking Impose a partial ordering on the set D = {d 1, d 2,..., d h } of all data items.
Graph-based protocols are an alternative to two-phase locking Impose a partial ordering on the set D = {d 1, d 2,..., d h } of all data items. If d i d j then any transaction accessing both d i and d j
More informationChí Cao Minh 28 May 2008
Chí Cao Minh 28 May 2008 Uniprocessor systems hitting limits Design complexity overwhelming Power consumption increasing dramatically Instruction-level parallelism exhausted Solution is multiprocessor
More informationImplementing and Evaluating Nested Parallel Transactions in STM. Woongki Baek, Nathan Bronson, Christos Kozyrakis, Kunle Olukotun Stanford University
Implementing and Evaluating Nested Parallel Transactions in STM Woongki Baek, Nathan Bronson, Christos Kozyrakis, Kunle Olukotun Stanford University Introduction // Parallelize the outer loop for(i=0;i
More informationLecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM
Lecture 6: Lazy Transactional Memory Topics: TM semantics and implementation details of lazy TM 1 Transactions Access to shared variables is encapsulated within transactions the system gives the illusion
More informationNikos Anastopoulos, Konstantinos Nikas, Georgios Goumas and Nectarios Koziris
Early Experiences on Accelerating Dijkstra s Algorithm Using Transactional Memory Nikos Anastopoulos, Konstantinos Nikas, Georgios Goumas and Nectarios Koziris Computing Systems Laboratory School of Electrical
More informationLow Overhead Concurrency Control for Partitioned Main Memory Databases. Evan P. C. Jones Daniel J. Abadi Samuel Madden"
Low Overhead Concurrency Control for Partitioned Main Memory Databases Evan P. C. Jones Daniel J. Abadi Samuel Madden" Banks" Payment Processing" Airline Reservations" E-Commerce" Web 2.0" Problem:" Millions
More informationTransactional Memory
Transactional Memory Architectural Support for Practical Parallel Programming The TCC Research Group Computer Systems Lab Stanford University http://tcc.stanford.edu TCC Overview - January 2007 The Era
More informationTIME TRAVELING HARDWARE AND SOFTWARE SYSTEMS. Xiangyao Yu, Srini Devadas CSAIL, MIT
TIME TRAVELING HARDWARE AND SOFTWARE SYSTEMS Xiangyao Yu, Srini Devadas CSAIL, MIT FOR FIFTY YEARS, WE HAVE RIDDEN MOORE S LAW Moore s Law and the scaling of clock frequency = printing press for the currency
More informationTransactional Memory. Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech
Transactional Memory Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech (Adapted from Stanford TCC group and MIT SuperTech Group) Motivation Uniprocessor Systems Frequency
More informationSOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS
SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CSAIL IAP MEETING MAY 21, 2013 Research Agenda Lack of technology progress Moore s Law still alive Power
More informationOpen Nesting in Software Transactional Memory. Paper by: Ni et al. Presented by: Matthew McGill
Open Nesting in Software Transactional Memory Paper by: Ni et al. Presented by: Matthew McGill Transactional Memory Transactional Memory is a concurrent programming abstraction intended to be as scalable
More informationParallelizing SPECjbb2000 with Transactional Memory
Parallelizing SPECjbb2000 with Transactional Memory JaeWoong Chung, Chi Cao Minh,, Brian D. Carlstrom, Christos Kozyrakis Picture comes here Computer Systems Lab Stanford University http://tcc.stanford.edu
More informationAdvanced Databases. Lecture 9- Concurrency Control (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Advanced Databases Lecture 9- Concurrency Control (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch www.mniazi.ir Multiple Granularity Allow data items to be of various sizes and
More informationEXPLOITING SEMANTIC COMMUTATIVITY IN HARDWARE SPECULATION
EXPLOITING SEMANTIC COMMUTATIVITY IN HARDWARE SPECULATION GUOWEI ZHANG, VIRGINIA CHIU, DANIEL SANCHEZ MICRO 2016 Executive summary 2 Exploiting commutativity benefits update-heavy apps Software techniques
More informationScheduling Transactions in Replicated Distributed Transactional Memory
Scheduling Transactions in Replicated Distributed Transactional Memory Junwhan Kim and Binoy Ravindran Virginia Tech USA {junwhan,binoy}@vt.edu CCGrid 2013 Concurrency control on chip multiprocessors significantly
More informationEazyHTM: Eager-Lazy Hardware Transactional Memory
EazyHTM: Eager-Lazy Hardware Transactional Memory Saša Tomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián Cristal, Osman Unsal, Tim Harris, Mateo Valero Barcelona Supercomputing Center,
More informationFall 2012 Parallel Computer Architecture Lecture 16: Speculation II. Prof. Onur Mutlu Carnegie Mellon University 10/12/2012
18-742 Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II Prof. Onur Mutlu Carnegie Mellon University 10/12/2012 Past Due: Review Assignments Was Due: Tuesday, October 9, 11:59pm. Sohi
More informationAn Automated Framework for Decomposing Memory Transactions to Exploit Partial Rollback
An Automated Framework for Decomposing Memory Transactions to Exploit Partial Rollback Aditya Dhoke Virginia Tech adityad@vt.edu Roberto Palmieri Virginia Tech robertop@vt.edu Binoy Ravindran Virginia
More informationThread-level Parallelism for the Masses. Kunle Olukotun Computer Systems Lab Stanford University 2007
Thread-level Parallelism for the Masses Kunle Olukotun Computer Systems Lab Stanford University 2007 The World has Changed Process Technology Stops Improving! Moore s law but! Transistors don t get faster
More informationGoal of Concurrency Control. Concurrency Control. Example. Solution 1. Solution 2. Solution 3
Goal of Concurrency Control Concurrency Control Transactions should be executed so that it is as though they executed in some serial order Also called Isolation or Serializability Weaker variants also
More informationFlexible Architecture Research Machine (FARM)
Flexible Architecture Research Machine (FARM) RAMP Retreat June 25, 2009 Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan Bronson Christos Kozyrakis, Kunle Olukotun Motivation Why CPUs + FPGAs make sense
More information6 Transactional Memory. Robert Mullins
6 Transactional Memory ( MPhil Chip Multiprocessors (ACS Robert Mullins Overview Limitations of lock-based programming Transactional memory Programming with TM ( STM ) Software TM ( HTM ) Hardware TM 2
More informationTicToc: Time Traveling Optimistic Concurrency Control
TicToc: Time Traveling Optimistic Concurrency Control Authors: Xiangyao Yu, Andrew Pavlo, Daniel Sanchez, Srinivas Devadas Presented By: Shreejit Nair 1 Background: Optimistic Concurrency Control ØRead
More informationCost of Concurrency in Hybrid Transactional Memory. Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University)
Cost of Concurrency in Hybrid Transactional Memory Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University) 1 Transactional Memory: a history Hardware TM Software TM Hybrid TM 1993 1995-today
More informationPerformance Improvement via Always-Abort HTM
1 Performance Improvement via Always-Abort HTM Joseph Izraelevitz* Lingxiang Xiang Michael L. Scott* *Department of Computer Science University of Rochester {jhi1,scott}@cs.rochester.edu Parallel Computing
More informationZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS
ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200
More informationOn Improving Transactional Memory: Optimistic Transactional Boosting, Remote Execution, and Hybrid Transactions
On Improving Transactional Memory: Optimistic Transactional Boosting, Remote Execution, and Hybrid Transactions Ahmed Hassan Preliminary Examination Proposal submitted to the Faculty of the Virginia Polytechnic
More informationLog-Based Transactional Memory
Log-Based Transactional Memory Kevin E. Moore University of Wisconsin-Madison Motivation Chip-multiprocessors/Multi-core/Many-core are here Intel has 1 projects in the works that contain four or more computing
More informationZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS
ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200
More informationBigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng
Bigtable: A Distributed Storage System for Structured Data Andrew Hon, Phyllis Lau, Justin Ng What is Bigtable? - A storage system for managing structured data - Used in 60+ Google services - Motivation:
More informationDependence-Aware Transactional Memory for Increased Concurrency. Hany E. Ramadan, Christopher J. Rossbach, Emmett Witchel University of Texas, Austin
Dependence-Aware Transactional Memory for Increased Concurrency Hany E. Ramadan, Christopher J. Rossbach, Emmett Witchel University of Texas, Austin Concurrency Conundrum Challenge: CMP ubiquity Parallel
More informationNePaLTM: Design and Implementation of Nested Parallelism for Transactional Memory Systems
NePaLTM: Design and Implementation of Nested Parallelism for Transactional Memory Systems Haris Volos 1, Adam Welc 2, Ali-Reza Adl-Tabatabai 2, Tatiana Shpeisman 2, Xinmin Tian 2, and Ravi Narayanaswamy
More informationChapter 15 : Concurrency Control
Chapter 15 : Concurrency Control What is concurrency? Multiple 'pieces of code' accessing the same data at the same time Key issue in multi-processor systems (i.e. most computers today) Key issue for parallel
More informationConcurrency Control. R &G - Chapter 19
Concurrency Control R &G - Chapter 19 Smile, it is the key that fits the lock of everybody's heart. Anthony J. D'Angelo, The College Blue Book Review DBMSs support concurrency, crash recovery with: ACID
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 22: More Transaction Implementations 1 Review: Schedules, schedules, schedules The DBMS scheduler determines the order of operations from txns are executed
More informationCS5412: TRANSACTIONS (I)
1 CS5412: TRANSACTIONS (I) Lecture XVII Ken Birman Transactions 2 A widely used reliability technology, despite the BASE methodology we use in the first tier Goal for this week: in-depth examination of
More informationLecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM
Lecture 4: Directory Protocols and TM Topics: corner cases in directory protocols, lazy TM 1 Handling Reads When the home receives a read request, it looks up memory (speculative read) and directory in
More informationCSE 530A. B+ Trees. Washington University Fall 2013
CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key
More informationDatabase Management Systems Concurrency Control
atabase Management Systems Concurrency Control B M G 1 BMS Architecture SQL INSTRUCTION OPTIMIZER MANAGEMENT OF ACCESS METHOS CONCURRENCY CONTROL BUFFER MANAGER RELIABILITY MANAGEMENT Index Files ata Files
More informationDistributed Systems (ICE 601) Transactions & Concurrency Control - Part1
Distributed Systems (ICE 601) Transactions & Concurrency Control - Part1 Dongman Lee ICU Class Overview Transactions Why Concurrency Control Concurrency Control Protocols pessimistic optimistic time-based
More informationronny@mit.edu www.cag.lcs.mit.edu/scale Introduction Architectures are all about exploiting the parallelism inherent to applications Performance Energy The Vector-Thread Architecture is a new approach
More informationWork Report: Lessons learned on RTM
Work Report: Lessons learned on RTM Sylvain Genevès IPADS September 5, 2013 Sylvain Genevès Transactionnal Memory in commodity hardware 1 / 25 Topic Context Intel launches Restricted Transactional Memory
More informationEnhancing Concurrency in Distributed Transactional Memory through Commutativity
Enhancing Concurrency in Distributed Transactional Memory through Commutativity Junwhan Kim, Roberto Palmieri, Binoy Ravindran Virginia Tech USA Lock-based concurrency control has serious drawbacks Coarse
More information5/10/11 Locking Protocols and SoDware TransacGonal Memory 1 LOCKING PROTOCOLS & SOFTWARE TRANSACTIONAL MEMORY
5/10/11 Locking Protocols and SoDware TransacGonal Memory 1 LOCKING PROTOCOLS & SOFTWARE TRANSACTIONAL MEMORY 5/10/11 Locking Protocols and SoDware TransacGonal Memory 2 Why Study (SoDware) TransacGons
More informationAdvances in Data Management Transaction Management A.Poulovassilis
1 Advances in Data Management Transaction Management A.Poulovassilis 1 The Transaction Manager Two important measures of DBMS performance are throughput the number of tasks that can be performed within
More informationThe Common Case Transactional Behavior of Multithreaded Programs
The Common Case Transactional Behavior of Multithreaded Programs JaeWoong Chung Hassan Chafi,, Chi Cao Minh, Austen McDonald, Brian D. Carlstrom, Christos Kozyrakis, Kunle Olukotun Computer Systems Lab
More informationLecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)
Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) 1 Coherence Vs. Consistency Recall that coherence guarantees (i) that a write will eventually be seen by other processors,
More informationCS 541 Database Systems. Two Phase Locking 2
CS 541 Database Systems Two Phase Locking 2 Phantoms Consider a banking application with two files: Accounts (number, location, balance); and Assets (branch, total). Two txns: T 1 checks total for some
More informationThis chapter is based on the article Distributed Computing and the Multicore Revolution by Maurice Herlihy and Victor Luchangco. Thanks!
Chapter 25 Multi-Core Computing This chapter is based on the article Distributed Computing and the Multicore Revolution by Maurice Herlihy and Victor Luchangco. Thanks! 25.1 Introduction In the near future,
More informationSoftware Speculative Multithreading for Java
Software Speculative Multithreading for Java Christopher J.F. Pickett and Clark Verbrugge School of Computer Science, McGill University {cpicke,clump}@sable.mcgill.ca Allan Kielstra IBM Toronto Lab kielstra@ca.ibm.com
More information6.033: Fault Tolerance: Isolation Lecture 17 Katrina LaCurts,
6.033: Fault Tolerance: Isolation Lecture 17 Katrina LaCurts, lacurts@mit.edu 0. Introduction - Last time: Atomicity via logging. We're good with atomicity now. - Today: Isolation - The problem: We have
More informationDeadlock Prevention (cont d) Deadlock Prevention. Example: Wait-Die. Wait-Die
Deadlock Prevention Deadlock Prevention (cont d) 82 83 When there is a high level of lock contention and an increased likelihood of deadlocks Prevent deadlocks by giving each Xact a priority Assign priorities
More informationHardware Transactional Memory on Haswell
Hardware Transactional Memory on Haswell Viktor Leis Technische Universität München 1 / 15 Introduction transactional memory is a very elegant programming model transaction { transaction { a = a 10; c
More informationLecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks
Lecture: Transactional Memory, Networks Topics: TM implementations, on-chip networks 1 Summary of TM Benefits As easy to program as coarse-grain locks Performance similar to fine-grain locks Avoids deadlock
More informationSCXML State Chart XML. Previously, in this course...
SCXML State Chart XML Previously, in this course... Previously, in this course... Running Example all actions omitted wasn t it supposed to help? Previously, in this course... Running Example all actions
More informationMulti-Core Computing with Transactional Memory. Johannes Schneider and Prof. Roger Wattenhofer
Multi-Core Computing with Transactional Memory Johannes Schneider and Prof. Roger Wattenhofer 1 Overview Introduction Difficulties with parallel (multi-core) programming A (partial) solution: Transactional
More informationSpeculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution
Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution Ravi Rajwar and Jim Goodman University of Wisconsin-Madison International Symposium on Microarchitecture, Dec. 2001 Funding
More informationMcRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime
McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime B. Saha, A-R. Adl- Tabatabai, R. Hudson, C.C. Minh, B. Hertzberg PPoPP 2006 Introductory TM Sales Pitch Two legs
More informationLecture 21 Concurrency Control Part 1
CMSC 461, Database Management Systems Spring 2018 Lecture 21 Concurrency Control Part 1 These slides are based on Database System Concepts 6 th edition book (whereas some quotes and figures are used from
More informationChapter 13 : Concurrency Control
Chapter 13 : Concurrency Control Chapter 13: Concurrency Control Lock-Based Protocols Timestamp-Based Protocols Validation-Based Protocols Multiple Granularity Multiversion Schemes Insert and Delete Operations
More informationLecture: Transactional Memory. Topics: TM implementations
Lecture: Transactional Memory Topics: TM implementations 1 Summary of TM Benefits As easy to program as coarse-grain locks Performance similar to fine-grain locks Avoids deadlock 2 Design Space Data Versioning
More informationDYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun
DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE Presented by Byungjin Jun 1 What is Dynamo for? Highly available key-value storages system Simple primary-key only interface Scalable and Reliable Tradeoff:
More informationMY WEAK CONSISTENCY IS STRONG WHEN BAD THINGS DO NOT COME IN THREES ZECHAO SHANG JEFFREY XU YU
MY WEAK CONSISTENCY IS STRONG WHEN BAD THINGS DO NOT COME IN THREES ZECHAO SHANG JEFFREY XU YU DISCLAIMER: NOT AN OLTP TALK HOW TO GET ALMOST EVERYTHING FOR NOTHING SHARED-MEMORY SYSTEM IS BACK shared
More informationSCXML State Chart XML
SCXML State Chart XML Previously, in this course... Previously, in this course... Running Example all actions omitted wasn t it supposed to help? Previously, in this course... Running Example all actions
More informationCS54200: Distributed Database Systems
CS54200: Distributed Database Systems Timestamp Ordering 28 January 2009 Prof. Chris Clifton Timestamp Ordering The key idea for serializability is to ensure that conflicting operations are not executed
More informationTransactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28
Transactional Memory or How to do multiple things at once Benjamin Engel Transactional Memory 1 / 28 Transactional Memory: Architectural Support for Lock-Free Data Structures M. Herlihy, J. Eliot, and
More information! A lock is a mechanism to control concurrent access to a data item! Data items can be locked in two modes :
Lock-Based Protocols Concurrency Control! A lock is a mechanism to control concurrent access to a data item! Data items can be locked in two modes : 1 exclusive (X) mode Data item can be both read as well
More informationTransactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93
Transactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93 What are lock-free data structures A shared data structure is lock-free if its operations
More informationBeyond ILP. Hemanth M Bharathan Balaji. Hemanth M & Bharathan Balaji
Beyond ILP Hemanth M Bharathan Balaji Multiscalar Processors Gurindar S Sohi Scott E Breach T N Vijaykumar Control Flow Graph (CFG) Each node is a basic block in graph CFG divided into a collection of
More informationLecture 14: Multithreading
CS 152 Computer Architecture and Engineering Lecture 14: Multithreading John Wawrzynek Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~johnw
More informationTRIPS: Extending the Range of Programmable Processors
TRIPS: Extending the Range of Programmable Processors Stephen W. Keckler Doug Burger and Chuck oore Computer Architecture and Technology Laboratory Department of Computer Sciences www.cs.utexas.edu/users/cart
More informationByteSTM: Java Software Transactional Memory at the Virtual Machine Level
ByteSTM: Java Software Transactional Memory at the Virtual Machine Level Mohamed Mohamedin Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment
More informationDatabase Workload. from additional misses in this already memory-intensive databases? interference could be a problem) Key question:
Database Workload + Low throughput (0.8 IPC on an 8-wide superscalar. 1/4 of SPEC) + Naturally threaded (and widely used) application - Already high cache miss rates on a single-threaded machine (destructive
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationConcurrency Control Service 7
Concurrency Control Service 7 7.1 Service Description The purpose of the Concurrency Control Service is to mediate concurrent access to an object such that the consistency of the object is not compromised
More informationConcurrency Control In Distributed Main Memory Database Systems. Justin A. DeBrabant
In Distributed Main Memory Database Systems Justin A. DeBrabant debrabant@cs.brown.edu Concurrency control Goal: maintain consistent state of data ensure query results are correct The Gold Standard: ACID
More informationDataflow Architectures. Karin Strauss
Dataflow Architectures Karin Strauss Introduction Dataflow machines: programmable computers with hardware optimized for fine grain data-driven parallel computation fine grain: at the instruction granularity
More informationPerformance Evaluation of Intel Transactional Synchronization Extensions for High-Performance Computing
Performance Evaluation of Intel Transactional Synchronization Extensions for High-Performance Computing Richard Yoo, Christopher Hughes: Intel Labs Konrad Lai, Ravi Rajwar: Intel Architecture Group Agenda
More informationLecture: Consistency Models, TM
Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) No class on Monday (please watch TM videos) Wednesday: TM wrap-up, interconnection networks 1 Coherence Vs. Consistency
More informationTransactional Interference-less Balanced Tree
Transactional Interference-less Balanced Tree Technical Report Ahmed Hassan, Roberto Palmieri, and Binoy Ravindran Virginia Tech, Blacksburg, VA, USA. Abstract. In this paper, we present TxCF-Tree, a balanced
More informationPortland State University ECE 588/688. Transactional Memory
Portland State University ECE 588/688 Transactional Memory Copyright by Alaa Alameldeen 2018 Issues with Lock Synchronization Priority Inversion A lower-priority thread is preempted while holding a lock
More informationEECS 570 Final Exam - SOLUTIONS Winter 2015
EECS 570 Final Exam - SOLUTIONS Winter 2015 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points 1 / 21 2 / 32
More informationBigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13
Bigtable A Distributed Storage System for Structured Data Presenter: Yunming Zhang Conglong Li References SOCC 2010 Key Note Slides Jeff Dean Google Introduction to Distributed Computing, Winter 2008 University
More informationVerifiable Hierarchical Protocols with Network Invariants on Parametric Systems
Verifiable Hierarchical Protocols with Network Invariants on Parametric Systems Opeoluwa Matthews, Jesse Bingham, Daniel Sorin http://people.duke.edu/~om26/ FMCAD 2016 - Mountain View, CA Problem Statement
More informationGoldibear and the 3 Locks. Programming With Locks Is Tricky. More Lock Madness. And To Make It Worse. Transactional Memory: The Big Idea
Programming With Locks s Tricky Multicore processors are the way of the foreseeable future thread-level parallelism anointed as parallelism model of choice Just one problem Writing lock-based multi-threaded
More informationIntel Thread Building Blocks, Part IV
Intel Thread Building Blocks, Part IV SPD course 2017-18 Massimo Coppola 13/04/2018 1 Mutexes TBB Classes to build mutex lock objects The lock object will Lock the associated data object (the mutex) for
More informationFine- grain Memory Deduplica4on for In- memory Database Systems. Heiner Litz, David Cheriton, Pete Stevenson Stanford University
Fine- grain Memory Deduplica4on for In- memory Database Systems Heiner Litz, David Cheriton, Pete Stevenson Stanford University 1 Memory Capacity Challenge In- memory databases Limited by memory capacity
More information