Advanced Multiprocessor Programming Project Topics and Requirements
|
|
- Aldous Greene
- 6 years ago
- Views:
Transcription
1 Advanced Multiprocessor Programming Project Topics and Requirements Jesper Larsson Trä TU Wien May 5th, 2017 J. L. Trä AMP SS17, Projects 1 / 21
2 Projects Goal: Get practical, own experience with concurrent algorithms and data structures, including their performance, and obstacles to obtaining the performance that may naively be expected. Projects done in one- or two-person groups Content and effort of the project independent of group size Each group selects (only!) one project from the list Implementation of material from the lecture But allowed to be creative, and bring in own ideas Implementation in C, C++ (with native threads or pthreads), or other But if you choose other, you are on your own Allowed (and encouraged) to study and use additional papers (as well as internet information), but state clearly your sources! J. L. Trä AMP SS17, Projects 2 / 21
3 Timeline Today: Announcement of project topics and rules We can have regular Q&A (Mondays, Thursdays) Project status presentation, all groups: Thursday Deadline for hand-in: Monday, , at midnight Exam from June 30th to July 4th (sign up in TISS) J. L. Trä AMP SS17, Projects 3 / 21
4 All projects Test for correctness rst, use assertions where possible, start sequentially, then gradually increase the number of threads Dene a good benchmark to measure: latency (time per operation), throughput (number of operations in some given time slot), fairness; as function of the number of threads Compare to well-chosen baseline Use good experimental practice to be able to make well-founded claims that some implementation is better than another (see HPC lecture). Repeat experiment a large number of times (> 30), report averages with condence intervals State properties of the algorithms and give worst-case bounds where possible J. L. Trä AMP SS17, Projects 4 / 21
5 Benchmarking Many papers use throughput as the measure of performance. Throughput is hoped/epected to scale (linearly?) with the number of threads/cores. For some benchmarks and ideas, see Vincent Gramoli: More than you ever wanted to know about synchronization: Synchrobench, measuring the impact of the synchronization on concurrent algorithms. PPOPP 2015: 1-10 J. L. Trä AMP SS17, Projects 5 / 21
6 Expectations and requirements Ecient and correct (!) implementation Good theoretical analysis (not necessarily formal proofs) (Invariants, linearizability, progress guarantees) Good benchmark analysis Short document (English or German) 6-10 pages excluding plots and source code Statement of problem and expectations Description of data structure Theoretical analysis Benchmark (results, how obtained) Code must be available, part of hand-in (explain how to compile and run) Project status presentation, short, all (on Thursday ) Final project presentation, examination (per group) J. L. Trä AMP SS17, Projects 6 / 21
7 Machines For benchmarks, use TU Wien Parallel Computing group systems saturn (48-core AMD), possibly (by need) mars (80-core Intel) and ceres (64-core, 8-way = 512 thread Fujitsu/Sparc) Develop gradually, start with own system at home if possible Good practice so that others can reproduce ndings: State properties of machine, compiler, environment (required!). E.g., gcc version (Debian ) J. L. Trä AMP SS17, Projects 7 / 21
8 What to hand in Your solution/hand-in consists of the report describing problems and solutions and benchmark analysis with plots/tables, and the source code, including, if necessary, Makefile, README and other things necessary to compile and run the code. Either report or README should give instructions for compiling and running The hand-in must be uploaded in TUWEL as a single.zip-,.tgz-, or.tar.gz-le. Name the le clearly: your names followed by _amp_project and the number of the project you choose. J. L. Trä AMP SS17, Projects 8 / 21
9 Project 1: Register Locks Implement: Filter-lock (generalized Peterson) Tournament tree of 2-thread Peterson locks (see exercise) Lamport Bakery, Herlihy-Shavit version Lamport Bakery, Lamport's original version Which is better? For baseline performance, compare to the following locks: pthreads or native C11 locks, simple test-and-set lock, simple test-and-test-and-set lock Challenge: Memory behavior. Ensure that memory (register) updates become visible in required order! Explain what happens if not (Peterson). J. L. Trä AMP SS17, Projects 9 / 21
10 Project 2: Bounded-timestamp register locks (I) Implement: Aravind's solution (Alex A. Aravind: Yet Another Simple Solution for the Concurrent Programming Control Problem. IEEE Trans. Parallel Distrib. Syst. 22(6): , 2011) Black-white Bakery (Gadi Taubenfeld: The Black-White Bakery Algorithm and Related Bounded-Space, Adaptive, Local-Spinning and FIFO Algorithms. DISC 2004: 56-70) Lamport's Bakery (lecture version or original) Which is better? For baseline performance, compare to the following locks: pthreads or native C11 locks, simple test-and-set lock, simple test-and-test-and-set lock Challenge: Memory behavior. Ensure that memory (register) updates become visible in required order! J. L. Trä AMP SS17, Projects 10 / 21
11 Project 3: Bounded-timestamp register locks (II) Implement: Szymanski's solution (Boleslaw K. Szymanski: A simple solution to Lamport's concurrent programming problem with linear wait. ICS 1988: ) Black-white Bakery (Gadi Taubenfeld: The Black-White Bakery Algorithm and Related Bounded-Space, Adaptive, Local-Spinning and FIFO Algorithms. DISC 2004: 56-70) Lamport's Bakery (lecture version or original) Which is better? For baseline performance, compare to the following locks: pthreads or native C11 locks, simple test-and-set lock, simple test-and-test-and-set lock Challenge: Memory behavior. Ensure that memory (register) updates become visible in required order! J. L. Trä AMP SS17, Projects 11 / 21
12 Project 4: Double word CAS The universal hardware instruction CAS (compare-and-swap) can be used to implement generalized CAS operations that work on several locations atomically. Implement either: A k-cas from Victor Luchangco, Mark Moir, Nir Shavit: Nonblocking k-compare-single-swap. Theory Comput. Syst. 44(1): (2009) An n-cas from Timothy L. Harris, Keir Fraser, Ian A. Pratt: A Practical Multi-word Compare-and-Swap Operation. DISC 2002: or combinations thereof. Compare performance to the baseline CAS instruction of your machine. How does the performance change with increasing number of threads? J. L. Trä AMP SS17, Projects 12 / 21
13 Project 5: Queue locks Implement: Array lock CLH Lock MCS Lock from the lectures. Which is better? For baseline performance, compare to the following locks: pthreads or native C11 locks, simple test-and-set lock, simple test-and-test-and-set lock. Challenge: Memory management! J. L. Trä AMP SS17, Projects 13 / 21
14 Project 6: Hierarchical locks Implement, on your own, either: Milind Chabbi, John M. Mellor-Crummey: Contention-conscious, locality-preserving locks. PPOPP 2016: 22:1-22:14 David Dice, Virendra J. Marathe, Nir Shavit: Lock Cohorting: A General Technique for Designing NUMA Locks. TOPC 1(2): 13 (2015) Which is better? For baseline performance, compare to the following locks: pthreads or native C11 locks, simple test-and-set lock, simple test-and-test-and-set lock. J. L. Trä AMP SS17, Projects 14 / 21
15 Project 7: Reader-writers locks Be creative: Find a good recent suggestion from the literature for a readers-writers lock, and implement it. Compare to native C11 or pthreads baseline. An example: Irina Calciu, David Dice, Yossi Lev, Victor Luchangco, Virendra J. Marathe, Nir Shavit: NUMA-aware reader-writer locks. PPOPP 2013: J. L. Trä AMP SS17, Projects 15 / 21
16 Project 8: List-based set (I) Implement (from the lecture): List-based set with ne-grained locks List-based set with optimistic synchronization List-based set with lazy synchronization Lock-free list based-set Compare to baseline implementation with global lock on all set operations. Which are better, for which operations? Discuss possible experimental designs. Challenge: Memory management! J. L. Trä AMP SS17, Projects 16 / 21
17 Project 9: List-based set (II) Implement: List-based set with ne-grained locks Binary tree with ne-grained locks (with wait-free contains?) (develop your own algorithm, be creative) Compare to baseline implementation with global lock, and ecient, sequential implementation. Compare this baseline also to the performance of the list-based set. Is there an advantage to using a good, sequential data structure? Look in the literature for ne-grained or lock-free algorithms for tree-data structures. Bonus: Implement what is useful or looks interesting J. L. Trä AMP SS17, Projects 17 / 21
18 Project 10: Queues and stacks Implement: Unbounded, lock-free queue Unbounded, lock-free stack Elimination back-o stack Creative alternative: Lock-free queue by Morrison (Adam Morrison, Yehuda Afek: Fast concurrent queues for x86 processors. PPOPP 2013: ) Compare to baseline-implementations with global lock (or more ne-grained locking, if you can come up with a good implementation; for the queue use the two-lock implementation from the lecture) Challenge: memory management! J. L. Trä AMP SS17, Projects 18 / 21
19 Project 11: Wait-free stack? Implement: Unbounded, lock-free stack The wait-free stack from Seep Goel, Pooja Aggarwal, Smruti R. Sarangi: A Wait-Free Stack. ICDCIT 2016: Compare to baseline-implementations with global lock (or more ne-grained locking, if you can come up with a good implementation; for the queue use the two-lock implementation from the lecture) Challenge: Memory management! J. L. Trä AMP SS17, Projects 19 / 21
20 Project 12: Ecient FAA queue Implement an ecient queue using fetch-and-add (and CAS, why?) from Chaoran Yang, John M. Mellor-Crummey: A wait-free queue as fast as fetch-and-add. PPOPP 2016: 16:1-16:13 Compare to baseline-implementations with global lock or more ne-grained locking. J. L. Trä AMP SS17, Projects 20 / 21
21 Project 13: Skip-lists Implement: Lock-based lazy skip-list Lock-free skip-list For baseline performance, compare to sequential skip-list (own implementation!) with global locks on all set operations Challenge: Memory management! J. L. Trä AMP SS17, Projects 21 / 21
Advanced Multiprocessor Programming: Locks
Advanced Multiprocessor Programming: Locks Martin Wimmer, Jesper Larsson Träff TU Wien 20th April, 2015 Wimmer, Träff AMP SS15 1 / 31 Locks we have seen so far Peterson Lock ilter Lock Bakery Lock These
More informationNUMA-Aware Reader-Writer Locks PPoPP 2013
NUMA-Aware Reader-Writer Locks PPoPP 2013 Irina Calciu Brown University Authors Irina Calciu @ Brown University Dave Dice Yossi Lev Victor Luchangco Virendra J. Marathe Nir Shavit @ MIT 2 Cores Chip (node)
More informationCS377P Programming for Performance Multicore Performance Synchronization
CS377P Programming for Performance Multicore Performance Synchronization Sreepathi Pai UTCS October 21, 2015 Outline 1 Synchronization Primitives 2 Blocking, Lock-free and Wait-free Algorithms 3 Transactional
More informationAdvanced Multiprocessor Programming: Locks
Advanced Multiprocessor Programming: Locks Martin Wimmer, Jesper Larsson Träff TU Wien 2nd May, 2016 Wimmer, Träff AMP SS15 1 / 35 Locks we have seen so far Peterson Lock ilter Lock Bakery Lock These locks......
More informationAST: scalable synchronization Supervisors guide 2002
AST: scalable synchronization Supervisors guide 00 tim.harris@cl.cam.ac.uk These are some notes about the topics that I intended the questions to draw on. Do let me know if you find the questions unclear
More informationhow to implement any concurrent data structure
how to implement any concurrent data structure marcos k. aguilera vmware jointly with irina calciu siddhartha sen mahesh balakrishnan Where to find more information about this work How to Implement Any
More informationHigh-Performance Distributed RMA Locks
High-Performance Distributed RMA Locks PATRICK SCHMID, MACIEJ BESTA, TORSTEN HOEFLER Presented at ARM Research, Austin, Texas, USA, Feb. 2017 NEED FOR EFFICIENT LARGE-SCALE SYNCHRONIZATION spcl.inf.ethz.ch
More informationNUMA-aware Reader-Writer Locks. Tom Herold, Marco Lamina NUMA Seminar
04.02.2015 NUMA Seminar Agenda 1. Recap: Locking 2. in NUMA Systems 3. RW 4. Implementations 5. Hands On Why Locking? Parallel tasks access shared resources : Synchronization mechanism in concurrent environments
More informationarxiv: v1 [cs.dc] 8 May 2017
Towards Reduced Instruction Sets for Synchronization arxiv:1705.02808v1 [cs.dc] 8 May 2017 Rati Gelashvili MIT gelash@mit.edu Alexander Spiegelman Technion sashas@tx.technion.ac.il Idit Keidar Technion
More informationLightweight Contention Management for Efficient Compare-and-Swap Operations
Lightweight Contention Management for Efficient Compare-and-Swap Operations Dave Dice 1, Danny Hendler 2 and Ilya Mirsky 3 1 Sun Labs at Oracle 2 Ben-Gurion University of the Negev and Telekom Innovation
More informationCS4021/4521 INTRODUCTION
CS4021/4521 Advanced Computer Architecture II Prof Jeremy Jones Rm 4.16 top floor South Leinster St (SLS) jones@scss.tcd.ie South Leinster St CS4021/4521 2018 jones@scss.tcd.ie School of Computer Science
More informationAdvance Operating Systems (CS202) Locks Discussion
Advance Operating Systems (CS202) Locks Discussion Threads Locks Spin Locks Array-based Locks MCS Locks Sequential Locks Road Map Threads Global variables and static objects are shared Stored in the static
More informationUsing Elimination and Delegation to Implement a Scalable NUMA-Friendly Stack
Using Elimination and Delegation to Implement a Scalable NUMA-Friendly Stack Irina Calciu Brown University irina@cs.brown.edu Justin E. Gottschlich Intel Labs justin.e.gottschlich@intel.com Maurice Herlihy
More informationMutual Exclusion: Classical Algorithms for Locks
Mutual Exclusion: Classical Algorithms for Locks John Mellor-Crummey Department of Computer Science Rice University johnmc@cs.rice.edu COMP 422 Lecture 18 21 March 2006 Motivation Ensure that a block of
More informationA Skiplist-based Concurrent Priority Queue with Minimal Memory Contention
A Skiplist-based Concurrent Priority Queue with Minimal Memory Contention Jonatan Lindén and Bengt Jonsson Uppsala University, Sweden December 18, 2013 Jonatan Lindén 1 Contributions Motivation: Improve
More information6.852: Distributed Algorithms Fall, Class 15
6.852: Distributed Algorithms Fall, 2009 Class 15 Today s plan z z z z z Pragmatic issues for shared-memory multiprocessors Practical mutual exclusion algorithms Test-and-set locks Ticket locks Queue locks
More informationffwd: delegation is (much) faster than you think Sepideh Roghanchi, Jakob Eriksson, Nilanjana Basu
ffwd: delegation is (much) faster than you think Sepideh Roghanchi, Jakob Eriksson, Nilanjana Basu int get_seqno() { } return ++seqno; // ~1 Billion ops/s // single-threaded int threadsafe_get_seqno()
More informationSection 4 Concurrent Objects Correctness, Progress and Efficiency
Section 4 Concurrent Objects Correctness, Progress and Efficiency CS586 - Panagiota Fatourou 1 Concurrent Objects A concurrent object is a data object shared by concurrently executing processes. Each object
More informationOptimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems
Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems Alexey Paznikov Saint Petersburg Electrotechnical University
More informationThe Black-White Bakery Algorithm and related bounded-space, adaptive, local-spinning and FIFO algorithms
The Black-White Bakery Algorithm and related bounded-space, adaptive, local-spinning and FIFO algorithms Gadi Taubenfeld The Interdisciplinary Center, P.O.Box 167 Herzliya 46150, Israel tgadi@idc.ac.il
More informationSnapshots and Software Transactional Memory
Snapshots and Software Transactional Memory Christopher Cole Northrop Grumman Corporation chris.cole@ngc.com Maurice Herlihy Brown University Computer Science Department herlihy@cs.brown.edu. ABSTRACT
More information1 The comparison of QC, SC and LIN
Com S 611 Spring Semester 2009 Algorithms for Multiprocessor Synchronization Lecture 5: Tuesday, 3rd February 2009 Instructor: Soma Chaudhuri Scribe: Jianrong Dong 1 The comparison of QC, SC and LIN Pi
More informationShared Objects. Shared Objects
Shared Objects Shared Objects Invoked operations have a non-zero duration Invocations can overlap Useful for: modeling distributed shared memory Objects can be combined together to implement higher level
More informationA simple correctness proof of the MCS contention-free lock. Theodore Johnson. Krishna Harathi. University of Florida. Abstract
A simple correctness proof of the MCS contention-free lock Theodore Johnson Krishna Harathi Computer and Information Sciences Department University of Florida Abstract Mellor-Crummey and Scott present
More informationarxiv: v1 [cs.ds] 16 Mar 2016
Benchmarking Concurrent Priority Queues: Performance of k-lsm and Related Data Structures [Brief Announcement] Jakob Gruber TU Wien, Austria gruber@par.tuwien.ac.at Jesper Larsson Träff TU Wien, Austria
More informationLock vs. Lock-free Memory Project proposal
Lock vs. Lock-free Memory Project proposal Fahad Alduraibi Aws Ahmad Eman Elrifaei Electrical and Computer Engineering Southern Illinois University 1. Introduction The CPU performance development history
More informationCache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency
Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency Anders Gidenstam Håkan Sundell Philippas Tsigas School of business and informatics University of Borås Distributed
More informationUnderstanding and Effectively Preventing the ABA Problem in Descriptor-based Lock-free Designs
Understanding and Effectively Preventing the ABA Problem in Descriptor-based Lock-free Designs Damian Dechev Sandia National Laboratories Scalable Computing R & D Department Livermore, CA 94551-0969 ddechev@sandia.gov
More informationNon-blocking Array-based Algorithms for Stacks and Queues!
Non-blocking Array-based Algorithms for Stacks and Queues! Niloufar Shafiei! Department of Computer Science and Engineering York University ICDCN 09 Outline! Introduction! Stack algorithm! Queue algorithm!
More informationA Comparison of Relativistic and Reader-Writer Locking Approaches to Shared Data Access
A Comparison of Relativistic and Reader-Writer Locking Approaches to Shared Data Access Philip W. Howard, Josh Triplett, and Jonathan Walpole Portland State University Abstract. This paper explores the
More informationPhased Transactional Memory
Phased Transactional Memory Dan Nussbaum Scalable Synchronization Research Group Joint work with Yossi Lev and Mark Moir Sun Microsystems Labs August 16, 2007 1 Transactional Memory (TM) Replace locks
More informationConcurrent Objects and Linearizability
Chapter 3 Concurrent Objects and Linearizability 3.1 Specifying Objects An object in languages such as Java and C++ is a container for data. Each object provides a set of methods that are the only way
More informationSnapshots and Software Transactional Memory
Snapshots and Software Transactional Memory Christopher Cole Northrop Grumman Corporation Maurice Herlihy Brown University Computer Science Department Abstract One way that software transactional memory
More informationDistributed Computing
HELLENIC REPUBLIC UNIVERSITY OF CRETE Distributed Computing Graduate Course Section 3: Spin Locks and Contention Panagiota Fatourou Department of Computer Science Spin Locks and Contention In contrast
More informationMain Points of the Computer Organization and System Software Module
Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a
More informationConflict Detection and Validation Strategies for Software Transactional Memory
Conflict Detection and Validation Strategies for Software Transactional Memory Michael F. Spear, Virendra J. Marathe, William N. Scherer III, and Michael L. Scott University of Rochester www.cs.rochester.edu/research/synchronization/
More informationDistributed Operating Systems
Distributed Operating Systems Synchronization in Parallel Systems Marcus Völp 2009 1 Topics Synchronization Locking Analysis / Comparison Distributed Operating Systems 2009 Marcus Völp 2 Overview Introduction
More informationDistributed Systems Synchronization. Marcus Völp 2007
Distributed Systems Synchronization Marcus Völp 2007 1 Purpose of this Lecture Synchronization Locking Analysis / Comparison Distributed Operating Systems 2007 Marcus Völp 2 Overview Introduction Hardware
More informationThe Relative Power of Synchronization Methods
Chapter 5 The Relative Power of Synchronization Methods So far, we have been addressing questions of the form: Given objects X and Y, is there a wait-free implementation of X from one or more instances
More informationLecture 7: Mutual Exclusion 2/16/12. slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit
Principles of Concurrency and Parallelism Lecture 7: Mutual Exclusion 2/16/12 slides adapted from The Art of Multiprocessor Programming, Herlihy and Shavit Time Absolute, true and mathematical time, of
More informationAdaptive Lock. Madhav Iyengar < >, Nathaniel Jeffries < >
Adaptive Lock Madhav Iyengar < miyengar@andrew.cmu.edu >, Nathaniel Jeffries < njeffrie@andrew.cmu.edu > ABSTRACT Busy wait synchronization, the spinlock, is the primitive at the core of all other synchronization
More informationDistributed Computing Group
Distributed Computing Group HS 2009 Prof. Dr. Roger Wattenhofer, Thomas Locher, Remo Meier, Benjamin Sigg Assigned: December 11, 2009 Discussion: none Distributed Systems Theory exercise 6 1 ALock2 Have
More informationWait-Free Multi-Word Compare-And-Swap using Greedy Helping and Grabbing
Wait-Free Multi-Word Compare-And-Swap using Greedy Helping and Grabbing H. Sundell 1 1 School of Business and Informatics, University of Borås, Borås, Sweden Abstract We present a new algorithm for implementing
More informationReduced Hardware Lock Elision
Reduced Hardware Lock Elision Yehuda Afek Tel-Aviv University afek@post.tau.ac.il Alexander Matveev MIT matveeva@post.tau.ac.il Nir Shavit MIT shanir@csail.mit.edu Abstract Hardware lock elision (HLE)
More informationLock cohorting: A general technique for designing NUMA locks
Lock cohorting: A general technique for designing NUMA locks The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationLecture. DM510 - Operating Systems, Weekly Notes, Week 11/12, 2018
Lecture In the lecture on March 13 we will mainly discuss Chapter 6 (Process Scheduling). Examples will be be shown for the simulation of the Dining Philosopher problem, a solution with monitors will also
More informationCS140 Operating Systems and Systems Programming Midterm Exam
CS140 Operating Systems and Systems Programming Midterm Exam October 31 st, 2003 (Total time = 50 minutes, Total Points = 50) Name: (please print) In recognition of and in the spirit of the Stanford University
More informationNON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 14 November 2014
NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 14 November 2014 Lecture 6 Introduction Amdahl s law Basic spin-locks Queue-based locks Hierarchical locks Reader-writer locks Reading
More informationParallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer?
Parallel and Distributed Systems Instructor: Sandhya Dwarkadas Department of Computer Science University of Rochester What is a parallel computer? A collection of processing elements that communicate and
More information6.852: Distributed Algorithms Fall, Class 20
6.852: Distributed Algorithms Fall, 2009 Class 20 Today s plan z z z Transactional Memory Reading: Herlihy-Shavit, Chapter 18 Guerraoui, Kapalka, Chapters 1-4 Next: z z z Asynchronous networks vs asynchronous
More informationLinked Lists: The Role of Locking. Erez Petrank Technion
Linked Lists: The Role of Locking Erez Petrank Technion Why Data Structures? Concurrent Data Structures are building blocks Used as libraries Construction principles apply broadly This Lecture Designing
More informationLock Oscillation: Boosting the Performance of Concurrent Data Structures
Lock Oscillation: Boosting the Performance of Concurrent Data Structures Panagiota Fatourou FORTH ICS & University of Crete Nikolaos D. Kallimanis FORTH ICS The Multicore Era The dominance of Multicore
More informationAmalgamated Lock-Elision
Amalgamated Lock-Elision Yehuda Afek 1, Alexander Matveev 2, Oscar R. Moll 3, and Nir Shavit 4 1 Tel-Aviv University, afek@post.tau.ac.il 2 MIT, amatveev@csail.mit.edu 3 MIT, orm@mit.edu 4 MIT and Tel-Aviv
More informationNON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 3 Nov 2017
NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 3 Nov 2017 Lecture 1/3 Introduction Basic spin-locks Queue-based locks Hierarchical locks Reader-writer locks Reading without locking Flat
More informationSnapshots and software transactional memory
Science of Computer Programming 58 (2005) 310 324 www.elsevier.com/locate/scico Snapshots and software transactional memory Christopher Cole a,,maurice Herlihy b a Northrop Grumman Mission Systems, 88
More informationThe Art of Multiprocessor Programming Copyright 2007 Elsevier Inc. All rights reserved. October 3, 2007 DRAFT COPY
The Art of Multiprocessor Programming Copyright 2007 Elsevier Inc. All rights reserved Maurice Herlihy October 3, 2007 Nir Shavit 2 Contents 1 Introduction 13 1.1 Shared Objects and Synchronization..............
More informationWhatever can go wrong will go wrong. attributed to Edward A. Murphy. Murphy was an optimist. authors of lock-free programs LOCK FREE KERNEL
Whatever can go wrong will go wrong. attributed to Edward A. Murphy Murphy was an optimist. authors of lock-free programs LOCK FREE KERNEL 251 Literature Maurice Herlihy and Nir Shavit. The Art of Multiprocessor
More informationA Simple Optimistic skip-list Algorithm
A Simple Optimistic skip-list Algorithm Maurice Herlihy Brown University & Sun Microsystems Laboratories Yossi Lev Brown University & Sun Microsystems Laboratories yosef.lev@sun.com Victor Luchangco Sun
More informationRevisiting the Combining Synchronization Technique
Revisiting the Combining Synchronization Technique Panagiota Fatourou Department of Computer Science University of Crete & FORTH ICS faturu@csd.uoc.gr Nikolaos D. Kallimanis Department of Computer Science
More informationProcesses and Threads. Processes: Review
Processes and Threads Processes and their scheduling Threads and scheduling Multiprocessor scheduling Distributed Scheduling/migration Lecture 3, page 1 Processes: Review Multiprogramming versus multiprocessing
More informationAlgorithms for Scalable Synchronization on Shared Memory Multiprocessors by John M. Mellor Crummey Michael L. Scott
Algorithms for Scalable Synchronization on Shared Memory Multiprocessors by John M. Mellor Crummey Michael L. Scott Presentation by Joe Izraelevitz Tim Kopp Synchronization Primitives Spin Locks Used for
More informationTransactional Memory
Transactional Memory Michał Kapałka EPFL, LPD STiDC 08, 1.XII 2008 Michał Kapałka (EPFL, LPD) Transactional Memory STiDC 08, 1.XII 2008 1 / 25 Introduction How to Deal with Multi-Threading? Locks? Wait-free
More informationA Consistency Framework for Iteration Operations in Concurrent Data Structures
A Consistency Framework for Iteration Operations in Concurrent Data Structures Yiannis Nikolakopoulos, Anders Gidenstam, Marina Papatriantafilou, Philippas Tsigas Chalmers University of Technology, Gothenburg,
More informationHardware Memory Models: x86-tso
Hardware Memory Models: x86-tso John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 9 20 September 2016 Agenda So far hardware organization multithreading
More informationWhatever can go wrong will go wrong. attributed to Edward A. Murphy. Murphy was an optimist. authors of lock-free programs 3.
Whatever can go wrong will go wrong. attributed to Edward A. Murphy Murphy was an optimist. authors of lock-free programs 3. LOCK FREE KERNEL 309 Literature Maurice Herlihy and Nir Shavit. The Art of Multiprocessor
More informationCPSC/ECE 3220 Summer 2018 Exam 2 No Electronics.
CPSC/ECE 3220 Summer 2018 Exam 2 No Electronics. Name: Write one of the words or terms from the following list into the blank appearing to the left of the appropriate definition. Note that there are more
More informationCPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.
CPSC/ECE 3220 Fall 2017 Exam 1 Name: 1. Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.) Referee / Illusionist / Glue. Circle only one of R, I, or G.
More informationC++ Memory Model Tutorial
C++ Memory Model Tutorial Wenzhu Man C++ Memory Model Tutorial 1 / 16 Outline 1 Motivation 2 Memory Ordering for Atomic Operations The synchronizes-with and happens-before relationship (not from lecture
More informationConcurrent Skip Lists. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Concurrent Skip Lists Companion slides for The by Maurice Herlihy & Nir Shavit Set Object Interface Collection of elements No duplicates Methods add() a new element remove() an element contains() if element
More informationIntegrating Transactionally Boosted Data Structures with STM Frameworks: A Case Study on Set
Integrating Transactionally Boosted Data Structures with STM Frameworks: A Case Study on Set Ahmed Hassan Roberto Palmieri Binoy Ravindran Virginia Tech hassan84@vt.edu robertop@vt.edu binoy@vt.edu Abstract
More informationProgress Guarantees When Composing Lock-Free Objects
Progress Guarantees When Composing Lock-Free Objects Nhan Nguyen Dang and Philippas Tsigas Department of Computer Science and Engineering Chalmers University of Technology Gothenburg, Sweden {nhann,tsigas}@chalmers.se
More informationCS4961 Parallel Programming. Lecture 12: Advanced Synchronization (Pthreads) 10/4/11. Administrative. Mary Hall October 4, 2011
CS4961 Parallel Programming Lecture 12: Advanced Synchronization (Pthreads) Mary Hall October 4, 2011 Administrative Thursday s class Meet in WEB L130 to go over programming assignment Midterm on Thursday
More informationPerformance Comparison of Various STM Concurrency Control Protocols Using Synchrobench
Performance Comparison of Various STM Concurrency Control Protocols Using Synchrobench Ajay Singh Dr. Sathya Peri Anila Kumari Monika G. February 24, 2017 STM vs Synchrobench IIT Hyderabad February 24,
More informationVALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur
1 VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 03 203 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK I SEMESTER CP102 ADVANCED DATA STRUCTURES AND ALGORITHMS Regulation 2013 Academic
More informationCost of Concurrency in Hybrid Transactional Memory. Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University)
Cost of Concurrency in Hybrid Transactional Memory Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University) 1 Transactional Memory: a history Hardware TM Software TM Hybrid TM 1993 1995-today
More informationCS 450 Exam 2 Mon. 11/7/2016
CS 450 Exam 2 Mon. 11/7/2016 Name: Rules and Hints You may use one handwritten 8.5 11 cheat sheet (front and back). This is the only additional resource you may consult during this exam. No calculators.
More informationClassical concurrency control: topic overview 1 In these lectures we consider shared writeable data in main memory
Classical concurrency control: topic overview 1 In these lectures we consider shared writeable data in main memory Controlling access by concurrent processes to shared writeable data has been studied as
More informationSoftware-based contention management for efficient compare-and-swap operations
SOFTWARE PRACTICE AND EXPERIENCE Softw. Pract. Exper. ; :1 16 Published online in Wiley InterScience (www.interscience.wiley.com). Software-based contention management for efficient compare-and-swap operations
More informationSynchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011
Synchronization CS61, Lecture 18 Prof. Stephen Chong November 3, 2011 Announcements Assignment 5 Tell us your group by Sunday Nov 6 Due Thursday Nov 17 Talks of interest in next two days Towards Predictable,
More informationDesign of Concurrent and Distributed Data Structures
METIS Spring School, Agadir, Morocco, May 2015 Design of Concurrent and Distributed Data Structures Christoph Kirsch University of Salzburg Joint work with M. Dodds, A. Haas, T.A. Henzinger, A. Holzer,
More informationMessage Passing Improvements to Shared Address Space Thread Synchronization Techniques DAN STAFFORD, ROBERT RELYEA
Message Passing Improvements to Shared Address Space Thread Synchronization Techniques DAN STAFFORD, ROBERT RELYEA Agenda Background Motivation Remote Memory Request Shared Address Synchronization Remote
More informationQueue Delegation Locking
Queue Delegation Locking David Klaftenegger Konstantinos Sagonas Kjell Winblad Department of Information Technology, Uppsala University, Sweden Abstract The scalability of parallel programs is often bounded
More informationPreemption Adaptivity in Time-Published Queue-Based Spin Locks
Preemption Adaptivity in Time-Published Queue-Based Spin Locks Bijun He, William N. Scherer III, and Michael L. Scott Department of Computer Science, University of Rochester, Rochester, NY 14627-0226,
More informationPartial Snapshot Objects
Partial Snapshot Objects Hagit Attiya Technion Rachid Guerraoui EPFL Eric Ruppert York University ABSTRACT We introduce a generalization of the atomic snapshot object, which we call the partial snapshot
More informationEarly Results Using Hardware Transactional Memory for High-Performance Computing Applications
Early Results Using Hardware Transactional Memory for High-Performance Computing Applications Sverker Holmgren sverker.holmgren@it.uu.se Karl Ljungkvist kalj0193@student.uu.se Martin Karlsson martin.karlsson@it.uu.se
More informationAgenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based?
Agenda Designing Transactional Memory Systems Part III: Lock-based STMs Pascal Felber University of Neuchatel Pascal.Felber@unine.ch Part I: Introduction Part II: Obstruction-free STMs Part III: Lock-based
More informationConcurrent Computing
Concurrent Computing Introduction SE205, P1, 2017 Administrivia Language: (fr)anglais? Lectures: Fridays (15.09-03.11), 13:30-16:45, Amphi Grenat Web page: https://se205.wp.imt.fr/ Exam: 03.11, 15:15-16:45
More informationPermissiveness in Transactional Memories
Permissiveness in Transactional Memories Rachid Guerraoui, Thomas A. Henzinger, and Vasu Singh EPFL, Switzerland Abstract. We introduce the notion of permissiveness in transactional memories (TM). Intuitively,
More informationDesign Tradeoffs in Modern Software Transactional Memory Systems
Design Tradeoffs in Modern Software al Memory Systems Virendra J. Marathe, William N. Scherer III, and Michael L. Scott Department of Computer Science University of Rochester Rochester, NY 14627-226 {vmarathe,
More informationConcurrent programming: From theory to practice. Concurrent Algorithms 2015 Vasileios Trigonakis
oncurrent programming: From theory to practice oncurrent Algorithms 2015 Vasileios Trigonakis From theory to practice Theoretical (design) Practical (design) Practical (implementation) 2 From theory to
More informationLinked Lists: Locking, Lock-Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Linked Lists: Locking, Lock-Free, and Beyond Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Objects Adding threads should not lower throughput Contention
More informationLock-Free and Practical Doubly Linked List-Based Deques using Single-Word Compare-And-Swap
Lock-Free and Practical Doubly Linked List-Based Deques using Single-Word Compare-And-Swap Håkan Sundell Philippas Tsigas OPODIS 2004: The 8th International Conference on Principles of Distributed Systems
More informationDistributed Operating Systems
Distributed Operating Systems Synchronization in Parallel Systems Marcus Völp 203 Topics Synchronization Locks Performance Distributed Operating Systems 203 Marcus Völp 2 Overview Introduction Hardware
More informationDistributed Operating Systems Memory Consistency
Faculty of Computer Science Institute for System Architecture, Operating Systems Group Distributed Operating Systems Memory Consistency Marcus Völp (slides Julian Stecklina, Marcus Völp) SS2014 Concurrent
More informationThe Universality of Consensus
Chapter 6 The Universality of Consensus 6.1 Introduction In the previous chapter, we considered a simple technique for proving statements of the form there is no wait-free implementation of X by Y. We
More informationFine-grained synchronization & lock-free data structures
Lecture 19: Fine-grained synchronization & lock-free data structures Parallel Computer Architecture and Programming Redo Exam statistics Example: a sorted linked list struct Node { int value; Node* next;
More informationA Wait-free Multi-word Atomic (1,N) Register for Large-scale Data Sharing on Multi-core Machines
A Wait-free Multi-word Atomic (1,N) Register for Large-scale Data Sharing on Multi-core Machines Mauro Ianni, Alessandro Pellegrini DIAG Sapienza Università di Roma, Italy Email: {mianni,pellegrini}@dis.uniroma1.it
More informationSynchronization I. Jo, Heeseung
Synchronization I Jo, Heeseung Today's Topics Synchronization problem Locks 2 Synchronization Threads cooperate in multithreaded programs To share resources, access shared data structures Also, to coordinate
More informationUnit 6: Indeterminate Computation
Unit 6: Indeterminate Computation Martha A. Kim October 6, 2013 Introduction Until now, we have considered parallelizations of sequential programs. The parallelizations were deemed safe if the parallel
More informationImportant Lessons. A Distributed Algorithm (2) Today's Lecture - Replication
Important Lessons Lamport & vector clocks both give a logical timestamps Total ordering vs. causal ordering Other issues in coordinating node activities Exclusive access to resources/data Choosing a single
More information