TIME TRAVELING HARDWARE AND SOFTWARE SYSTEMS. Xiangyao Yu, Srini Devadas CSAIL, MIT
|
|
- Brendan McGee
- 6 years ago
- Views:
Transcription
1 TIME TRAVELING HARDWARE AND SOFTWARE SYSTEMS Xiangyao Yu, Srini Devadas CSAIL, MIT
2 FOR FIFTY YEARS, WE HAVE RIDDEN MOORE S LAW Moore s Law and the scaling of clock frequency = printing press for the currency of performance
3 10,000,000 1,000, ,000 10,000 1, TECHNOLOGY SCALING u Transistors x 1000 Clock frequency (MHz) Power (W) Cores Each generation of Moore s law doubles the number of transistors but clock frequency has stopped increasing.
4 10,000,000 1,000, ,000 10,000 1, TECHNOLOGY SCALING u Transistors x 1000 Clock frequency (MHz) Power (W) Cores To increase performance, need to exploit parallelism.
5 DIFFERENT KINDS OF PARALLELISM - 1 Instruction Level Transaction Level a = b + c d = e + f g = d + b Read A Read B Compute C Read A Read D Compute E Read C Read E Compute F 5
6 DIFFERENT KINDS OF PARALLELISM - 2 Thread Level A x B = C Task Level Search( image ) Cloud Different thread computes each entry of product matrix C 6
7 DIFFERENT KINDS OF PARALLELISM - 3 Thread Level A x B = C User Level Search( image ) Lookup( data ) Cloud Different thread computes each entry of product matrix C Query( record ) 7
8 DEPENDENCY DESTROYS PARALLELISM for i = 1 to n a[b[i]] = (a[b[i - 1]] + b[i]) / c[i] Need to compute i th entry after i - 1 th has been computed L 8
9 Read A Read A DIFFERENT KINDS OF DEPENDENCY No dependency! Write A Write A WAW: Semantics decide order Write A Read A RAW: Read needs new value Read A Write A WAR: We have flexibility here! 9
10 DEPENDENCE IS ACROSS TIME, BUT WHAT IS TIME? Time can be physical time Time could correspond to logical timestamps assigned to instructions Time could be a combination of the above à Time is a definition of ordering 10
11 Read A WAR DEPENDENCE Initially A = 10 Thread 0 Thread 1 Logical Order Write A Physical Time Order A=13 Local copy of A = 10 Read happens later than Write in physical time but is before Write in logical time. 11
12 WHAT IS CORRECTNESS? We define correctness of a parallel program based on its outputs in relation to the program run sequentially 12
13 SEQUENTIAL CONSISTENCY A B C D Global Memory Order A B C D A C B D C D A B C B A D Can we exploit this freedom in correct execution to avoid dependency? 13
14 AVOIDING DEPENDENCY ACROSS THE STACK Circuit Multicore Processor Multicore Database Distributed Database Distributed Shared Memory Efficient atomic instructions Tardis coherence protocol TicToc concurrency control with Andy Pavlo and Daniel Sanchez Distributed TicToc Transaction processing with fault tolerance 14
15 SHARED MEMORY SYSTEMS Cache Coherence Concurrency Control Multi-core Processor OLTP Database 15
16 DIRECTORY-BASED COHERENCE P P P P P P P P P Data replicated and cached locally for access Uncached data copied to local cache, writes invalidate data copies 16
17 CACHE COHERENCE SCALABILITY 250% Storage Overhead 200% 150% 100% 50% 0% Today Core Count Read A Read A Write A Invalidation O(N) Sharer List 17
18 LEASE-BASED COHERENCE Ld(A) Core 0 Core 1 Write Timestamp (wts) Ld(A) Read Timestamp (rts) Ld(A) St(A) Program Timestamp (pts) t= A read gets a lease on a cacheline Lease renewal aler lease expires A store can only commit aler leases expire Tardis: logical leases Logical Time Timestamp 18
19 LOGICAL TIMESTAMP Invalidation Tardis (No Invalidation) Physical Time Order Logical Time Order (concept borrowed from database) Old Version New Version logical time 19
20 TIMESTAMP MANAGEMENT Core A state A B pts=5 wts Shared LLC Cache S 0 10 S 0 10 S 0 5 rts Program Timestamp (pts) Timestamp of last memory operation Write Timestamp (wts) Data created at wts Read Timestamp (rts) Data valid from wts to rts wts Lease rts Logical Time 20
21 TWO-CORE EXAMPLE Core 0 pts=0 Cache Core 1 pts=0 Cache Core 0 store A load B load B 3 4 Core 1 store B load A A S 0 0 B S 0 0 physical time 21
22 STORE CORE 0 Core 0 pts=0 pts=1 A M Cache 1 1 ST(A) Req Core 1 pts=0 Cache Core 0 store A load B load B 3 4 Core 1 store B load A A MS 0 0 B S 0 0 M owner:0 Write at pts = 1 22
23 LOAD CORE 0 Core 0 pts=1 Cache A M 1 1 B LD(B) Req Core 1 pts=0 Cache Core 0 store A load B load B 3 4 Core 1 store B load A A M owner:0 B S Reserve rts to pts + lease = 11 23
24 STORE CORE 1 Core 0 pts=1 Cache A M 1 1 B S 0 11 Core 1 pts=0 pts=12 Cache B M ST(B) Req Core 0 store A load B load B 3 4 Core 1 store B load A A M B MS 0 11 owner:0 M owner:1 Exclusive ownership returned No invalidation 24
25 Two VERSIONS COEXIST Core 0 pts=1 Cache A M 1 1 B S 0 11 Core 1 pts=12 Cache B M Core 0 store A load B load B 3 4 Core 1 store B load A A B M owner:0 M owner:1 Core 1 traveled ahead in time Versions ordered in logical time 25
26 LOAD CORE 1 Core 0 pts=1 Cache A M S B S 0 11 WB(A) Req S M owner: A B Core 1 A B M owner:1 pts=12 Cache M LD(A) Req Core 0 store A load B load B Write back request to Core 0 Downgrade from M to S Core 1 store B load A Reserve rts to pts + lease =
27 LOAD CORE 0 Core 0 pts=1 A Core 1 M owner:0 B M owner:1 pts=12 Cache Cache A S 1 22 A S 1 22 B S 0 11 B M Core 0 store A load B physical time 5 load B Core 1 store B load A logical timestamp 5 > physical 4, 5 < logical 4 time 3 4 time global memory order physical time order 27
28 Core 0 store A load B RAW load B SUMMARY OF EXAMPLE Directory RAW Core 1 WAR store B load A physical time Core 0 store A load B load B Tardis RAW WAR Core 1 store B load A physical time order physical + logical time order 28
29 PHYSIOLOGICAL TIME Core 0 store A (1) load B (1) load B (1) Tardis Core 1 store B (12) load A (12) Global Memory Order Physical Time Logical Time Physiological Time X < PL Y := X < L Y or (X = L Y and X < P Y) Thm: Tardis obeys Sequential Consistency 29
30 TARDIS PROS AND CONS Scalability No Invalidation, Multicast or Broadcast Lease Renew Speculative Read Timestamp Size Timestamp Compression Time Stands Still Livelock Avoidance 30
31 EVALUATION Storage overhead per cacheline (N cores) Directory: N bits per cacheline Tardis: Max(Const, log(n)) bits per cacheline 250% 200% 150% 100% 50% 0% Directory Tardis
32 SPEEDUP Normalized Speedup Graphite Multi-core Simulator (64 cores) DIRECTORY TARDIS TARDIS AT 256 CORES 32
33 NETWORK TRAFFIC Normalized Traffic DIRECTORY TARDIS TARDIS AT 256 CORES 33
34 Network Traffic Storage Overhead Snoopy Coherence Directory Coherence Optimized Directory TARDIS High Complexity Performance Degradation 34
35 CONCURRENCY CONTROL BEGIN COMMIT BEGIN BEGIN COMMIT BEGIN COMMIT BEGIN COMMIT Serializability COMMIT BEGIN COMMIT BEGIN COMMIT BEGIN COMMIT Results should correspond to some serial order of atomic execution 35
36 CONCURRENCY CONTROL BEGIN COMMIT BEGIN COMMIT BEGIN COMMIT BEGIN COMMIT Can t Have This BEGIN BEGIN COMMIT COMMIT BEGIN COMMIT BEGIN COMMIT Results should correspond to some serial order of atomic execution 36
37 BOTTLENECK 1: TIMESTAMP ALLOCATION Centralized Allocator Timestamp allocation is a scalability bottleneck Throughput (Million txn/s) T/O 2PL Synchronized Clock Clock skew causes unnecessary aborts Thread Count 37
38 BOTTLENECK 2: STATIC ASSIGNMENT Time Timestamps assigned before a transaction starts T1@ts=1 T2@ts=2 BEGIN BEGIN READ(A) WRITE(A) COMMIT COMMIT Suboptimal assignment leads to unnecessary aborts. T1@ts=2 T2@ts=1 BEGIN BEGIN READ(A) WRITE(A) COMMIT ABORT 38
39 KEY IDEA: DATA DRIVEN TIMESTAMP MANAGEMENT Traditional T/O 1. Acquire timestamp (TS) 2. Determine tuple visibility using TS Timestamp Allocation Static Timestamp Assignment TicToc 1. Access tuples and remember their timestamp info. 2. Compute commit timestamp (CommitTS) No Timestamp Allocation Dynamic Timestamp Assignment 39
40 TICTOC TRANSACTION EXECUTION BEGIN READ PHASE VALIDATION PHASE WRITE PHASE COMMIT Read & Write Tuples Execute Transaction Compute CommitTS Decide Commit/Abort Update Database wts : rts : last data wts last data rts data valid between wts and rts Tuple Format Data wts (Write Timestamp) rts (Read Timestamp) 40
41 TICTOC EXAMPLE Logical Time Tuple A v1 Transaction 1 Transaction 2 Tuple B v1 1 3 load A store B 2 4 load A load B Database States 5 commit? 6 commit? T1 Transaction Local States T2 41
42 LOAD A FROM T Logical Time Tuple A v1 Transaction 1 Transaction 2 Tuple B v1 1 3 load A store B 2 4 load A load B 5 commit? 6 commit? T1 Load a snapshot of tuple A - Data, wts and rts T2 42
43 LOAD A FROM T Logical Time Tuple A v1 Transaction 1 Transaction 2 Tuple B v1 1 3 load A store B 2 4 load A load B 5 commit? 6 commit? T1 Load a snapshot of tuple A - Data, wts and rts T2 43
44 STORE B FROM T Logical Time Tuple A v1 Transaction 1 Transaction 2 Tuple B v1 1 3 load A store B 2 4 load A load B 5 commit? 6 commit? T1 Store B to local write set T2 44
45 LOAD B FROM T Logical Time Tuple A v1 Transaction 1 Transaction 2 Tuple B v1 1 3 load A store B 2 4 load A load B 5 commit? 6 commit? T1 Load a snapshot of tuple B - Data, wts and rts T2 45
46 COMMIT PHASE OF T Logical Time Tuple A Tuple B v1 v1 Transaction load A store B commit? Transaction load A load B commit? T1 T2 Compute CommitTS Write Set: tuple.rts+1 CommitTs Read Set: tuple.wts CommitTs tuple.rts 46
47 COMMIT PHASE OF T Logical Time Tuple A v1 Transaction 1 Transaction 2 Tuple B v1 1 3 load A store B 2 4 load A load B T1 TS=2 5 commit? 6 commit? T1 rts extension on tuple A T2 47
48 COMMIT PHASE OF T Logical Time Tuple A v1 Transaction 1 Transaction 2 Tuple B v1 v2 1 3 load A store B 2 4 load A load B T1 TS=2 5 commit? 6 commit? T1 Copy tuple B from write set to database T2 48
49 COMMIT PHASE OF T Logical Time Tuple A Tuple B v1 v1 v2 Transaction load A store B commit? Transaction load A load B commit? T1 T2 T1 TS=2 Compute CommitTS for T2 Find consistent read time for T2 (no writes in T2) 49
50 FINAL STATE Logical Time Tuple A Tuple B v1 v1 v2 Transaction load A store B commit? Transaction load A load B commit? T1 T2 T2 TS=0 T1 TS=2 Txn 1 < Txn 2 physical time Txn 2 < logical Txn 1 time Thm: Serializability = All operations valid at CommitTS 50
51 EXPERIMENTAL SETUP DBx1000: Main Memory DBMS No logging No B-tree (hash indexing) Concurrency Control Algorithms MVCC: HEKATON (Microsoft) OCC: SILO (Harvard/MIT) 2PL: DL_DETECT, NO_WAIT 10 GB YCSB Benchmark 51
52 EVALUATION TICTOC HEKATON DL_DETECT NO_WAIT SILO YCSB -- Medium Contention YCSB -- High Contention Throughput (Million txn/s) 2 1 Throughput (Million txn/s) Thread Count Thread Count 52
53 TICTOC DISCUSSION Thm: Serializability = All ops valid at CommitTS Transactions may have same CommitTS Logical timestamp growing rate indicates inherent parallelism Commit Timestamp TS ALLOC High Medium # of CommiCed Txns 53
54 PHYSIOLOGICAL TIME ACROSS THE STACK Circuit Multicore Processor Multicore Database Distributed Database Distributed Shared Memory Efficient atomic instructions Tardis coherence protocol TicToc concurrency control Distributed TicToc Transaction processing with fault tolerance 54
55 ATOMIC INSTRUCTION (LR/SC) ABA Problem Detect ABA using timestamp (wts) Core 0 Core 1 LR(x); # x = A ST(x); # x = B ST(x); # x = A SC(x); # x = A ABA? 55
56 TARDIS CACHE COHERENCE Simple: No Invalidation Scalable: O(log N) storage No Broadcast, No Multicast No Clock Synchronization Support Relaxed Consistency Models 56
57 T1000: PROPOSED 1000-CORE SHARED MEMORY PROCESSOR 57
58 TICTOC CONCURRENCY CONTROL Data Driven Timestamp Management No Central Timestamp Allocation Dynamic Timestamp Assignment 58
59 DISTRIBUTED TICTOC Data Driven Timestamp Management Efficient Two-Phase Commit Protocol Support Local Caching of Remote Data 59
60 FAULT TOLERANT DISTRIBUTED SHARED MEMORY Transactional Programming Model Distributed Command Logging Dynamic Dependency Tracking Among Transactions (WAR dependency can be ignored) 60
61 TIME TRAVELING TO ELIMINATE WAR Xiangyao Yu, Srini Devadas CSAIL, MIT
TicToc: Time Traveling Optimistic Concurrency Control
TicToc: Time Traveling Optimistic Concurrency Control Authors: Xiangyao Yu, Andrew Pavlo, Daniel Sanchez, Srinivas Devadas Presented By: Shreejit Nair 1 Background: Optimistic Concurrency Control ØRead
More informationTowards Thousand-Core RISC-V Shared Memory Systems. Quan Nguyen Massachusetts Institute of Technology 30 November 2016
Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen Massachusetts Institute of Technology 30 November 2016 Outline The Tardis cache coherence protocol Example Scalability advantages Thousand-core
More informationLogical Leases: Scalable Hardware and Software Systems through Time Traveling. Xiangyao Yu
Logical Leases: Scalable Hardware and Software Systems through Time Traveling by Xiangyao Yu B.S. in Electronic Engineering, Tsinghua University (2012) S.M. in Electrical Engineering and Computer Science,
More informationLast Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications
Last Class Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#23: Concurrency Control Part 3 (R&G ch. 17) Lock Granularities Locking in B+Trees The
More informationTicToc: Time Traveling Optimistic Concurrency Control
TicToc: Time Traveling Optimistic Concurrency Control Xiangyao Yu Andrew Pavlo Daniel Sanchez Srinivas Devadas CSAIL MIT Carnegie Mellon University CSAIL MIT CSAIL MIT yy@csail.mit.edu pavlo@cs.cmu.edu
More informationRethinking Serializable Multi-version Concurrency Control. Jose Faleiro and Daniel Abadi Yale University
Rethinking Serializable Multi-version Concurrency Control Jose Faleiro and Daniel Abadi Yale University Theory: Single- vs Multi-version Systems Single-version system T r Read X X 0 T w Write X Multi-version
More informationSundial: Harmonizing Concurrency Control and Caching in a Distributed OLTP Database Management System
Sundial: Harmonizing Concurrency Control and Caching in a Distributed OLTP Database Management System Xiangyao Yu, Yu Xia, Andrew Pavlo Daniel Sanchez, Larry Rudolph, Srinivas Devadas Massachusetts Institute
More informationSundial: Harmonizing Concurrency Control and Caching in a Distributed OLTP Database Management System
: Harmonizing Concurrency Control and Caching in a Distributed OLTP Database Management System Xiangyao Yu, Yu Xia, Andrew Pavlo Daniel Sanchez, Larry Rudolph, Srinivas Devadas Massachusetts Institute
More informationA Scalable Lock Manager for Multicores
A Scalable Lock Manager for Multicores Hyungsoo Jung Hyuck Han Alan Fekete NICTA Samsung Electronics University of Sydney Gernot Heiser NICTA Heon Y. Yeom Seoul National University @University of Sydney
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationTaurus: A Parallel Transaction Recovery Method Based on Fine-Granularity Dependency Tracking
Taurus: A Parallel Transaction Recovery Method Based on Fine-Granularity Dependency Tracking Xiangyao Yu Siye Zhu Justin Kaashoek CSAIL MIT Phillips Academy Lexington High School yxy@csail.mit.edu szhu@andover.edu
More informationAn Evaluation of Distributed Concurrency Control. Harding, Aken, Pavlo and Stonebraker Presented by: Thamir Qadah For CS590-BDS
An Evaluation of Distributed Concurrency Control Harding, Aken, Pavlo and Stonebraker Presented by: Thamir Qadah For CS590-BDS 1 Outline Motivation System Architecture Implemented Distributed CC protocols
More informationarxiv: v2 [cs.dc] 23 Sep 2015
Tardis: Time Traveling Coherence lgorithm for Distributed hared Memory Xiangyao Yu CIL, MIT Cambridge, M, U yxy@mit.edu rinivas Devadas CIL, MIT Cambridge, M, U devadas@mit.edu arxiv:5.454v [cs.dc] 3 ep
More informationData Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 14: Data Replication Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database Replication What is database replication The advantages of
More informationChapter 4: Distributed Systems: Replication and Consistency. Fall 2013 Jussi Kangasharju
Chapter 4: Distributed Systems: Replication and Consistency Fall 2013 Jussi Kangasharju Chapter Outline n Replication n Consistency models n Distribution protocols n Consistency protocols 2 Data Replication
More informationHeckaton. SQL Server's Memory Optimized OLTP Engine
Heckaton SQL Server's Memory Optimized OLTP Engine Agenda Introduction to Hekaton Design Consideration High Level Architecture Storage and Indexing Query Processing Transaction Management Transaction Durability
More informationStephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, Samuel Madden Presenter : Akshada Kulkarni Acknowledgement : Author s slides are used with
Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, Samuel Madden Presenter : Akshada Kulkarni Acknowledgement : Author s slides are used with some additions/ modifications 1 Introduction Design Evaluation
More informationLast Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications
Last Class Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#23: Concurrency Control Part 2 (R&G ch. 17) Serializability Two-Phase Locking Deadlocks
More informationHigh Performance Transactions in Deuteronomy
High Performance Transactions in Deuteronomy Justin Levandoski, David Lomet, Sudipta Sengupta, Ryan Stutsman, and Rui Wang Microsoft Research Overview Deuteronomy: componentized DB stack Separates transaction,
More informationMultiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types
Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon
More informationLecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections
Lecture 18: Coherence and Synchronization Topics: directory-based coherence protocols, synchronization primitives (Sections 5.1-5.5) 1 Cache Coherence Protocols Directory-based: A single location (directory)
More informationSTAR: Scaling Transactions through Asymmetric Replication
STAR: Scaling Transactions through Asymmetric Replication arxiv:1811.259v2 [cs.db] 2 Feb 219 ABSTRACT Yi Lu MIT CSAIL yilu@csail.mit.edu In this paper, we present STAR, a new distributed in-memory database
More informationMULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationMULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationCSE 344 MARCH 9 TH TRANSACTIONS
CSE 344 MARCH 9 TH TRANSACTIONS ADMINISTRIVIA HW8 Due Monday Max Two Late days Exam Review Sunday: 5pm EEB 045 CASE STUDY: SQLITE SQLite is very simple More info: http://www.sqlite.org/atomiccommit.html
More informationL i (A) = transaction T i acquires lock for element A. U i (A) = transaction T i releases lock for element A
Lock-Based Scheduler Introduction to Data Management CSE 344 Lecture 20: Transactions Simple idea: Each element has a unique lock Each transaction must first acquire the lock before reading/writing that
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 21: Transaction Implementations CSE 344 - Winter 2017 1 Announcements WQ7 and HW7 are out Due next Mon and Wed Start early, there is little time! CSE 344
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationA Scalable Ordering Primitive for Multicore Machines. Sanidhya Kashyap Changwoo Min Kangnyeon Kim Taesoo Kim
A Scalable Ordering Primitive for Multicore Machines Sanidhya Kashyap Changwoo Min Kangnyeon Kim Taesoo Kim Era of multicore machines 2 Scope of multicore machines Huge hardware thread parallelism How
More informationCSE 344 MARCH 25 TH ISOLATION
CSE 344 MARCH 25 TH ISOLATION ADMINISTRIVIA HW8 Due Friday, June 1 OQ7 Due Wednesday, May 30 Course Evaluations Out tomorrow TRANSACTIONS We use database transactions everyday Bank $$$ transfers Online
More informationEITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor
EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration I/O MultiProcessor Summary 2 Virtual memory benifits Using physical memory efficiently
More informationLeveraging Lock Contention to Improve Transaction Applications. University of Washington
Leveraging Lock Contention to Improve Transaction Applications Cong Yan Alvin Cheung University of Washington Background Pessimistic locking-based CC protocols Poor performance under high data contention
More informationHandout 3 Multiprocessor and thread level parallelism
Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationEN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University
EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,
More informationConcurrency control 12/1/17
Concurrency control 12/1/17 Bag of words... Isolation Linearizability Consistency Strict serializability Durability Snapshot isolation Conflict equivalence Serializability Atomicity Optimistic concurrency
More informationLecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization
Lecture 25: Multiprocessors Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 Snooping-Based Protocols Three states for a block: invalid,
More informationLecture 25: Multiprocessors
Lecture 25: Multiprocessors Today s topics: Virtual memory wrap-up Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 TLB and Cache Is the cache indexed
More informationarxiv: v1 [cs.dc] 24 May 2015
A Proof of Correctness for the Tardis Cache Coherence Protocol Xiangyao Yu, Muralidaran Vijayaraghavan, Srinivas Devadas {yxy, vmurali, devadas}@mit.edu arxiv:1505.06459v1 [cs.dc] 24 May 2015 Massachusetts
More informationDefining properties of transactions
Transactions: ACID, Concurrency control (2P, OCC) Intro to distributed txns The transaction Definition: A unit of work: May consist of multiple data accesses or updates Must commit or abort as a single
More informationTransactional Consistency and Automatic Management in an Application Data Cache Dan R. K. Ports MIT CSAIL
Transactional Consistency and Automatic Management in an Application Data Cache Dan R. K. Ports MIT CSAIL joint work with Austin Clements Irene Zhang Samuel Madden Barbara Liskov Applications are increasingly
More informationNo compromises: distributed transactions with consistency, availability, and performance
No compromises: distributed transactions with consistency, availability, and performance Aleksandar Dragojevi c, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam,
More informationTransactional Memory. Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech
Transactional Memory Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech (Adapted from Stanford TCC group and MIT SuperTech Group) Motivation Uniprocessor Systems Frequency
More informationReplication. Consistency models. Replica placement Distribution protocols
Replication Motivation Consistency models Data/Client-centric consistency models Replica placement Distribution protocols Invalidate versus updates Push versus Pull Cooperation between replicas Client-centric
More informationUNIT I (Two Marks Questions & Answers)
UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More information1. Memory technology & Hierarchy
1. Memory technology & Hierarchy Back to caching... Advances in Computer Architecture Andy D. Pimentel Caches in a multi-processor context Dealing with concurrent updates Multiprocessor architecture In
More informationConcurrency Control II and Distributed Transactions
Concurrency Control II and Distributed Transactions CS 240: Computing Systems and Concurrency Lecture 18 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material.
More informationOverview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware
Overview: Shared Memory Hardware Shared Address Space Systems overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and
More informationOverview: Shared Memory Hardware
Overview: Shared Memory Hardware overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and update protocols false sharing
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Unit 7: Transactions Schedules Implementation Two-phase Locking (3 lectures) 1 Class Overview Unit 1: Intro Unit 2: Relational Data Models and Query Languages Unit
More informationCicada: Dependably Fast Multi-Core In-Memory Transactions
Cicada: Dependably Fast Multi-Core In-Memory Transactions Hyeontaek Lim Carnegie Mellon University hl@cs.cmu.edu Michael Kaminsky Intel Labs michael.e.kaminsky@intel.com ABSTRACT dga@cs.cmu.edu able timestamp
More informationReplication and Consistency. Fall 2010 Jussi Kangasharju
Replication and Consistency Fall 2010 Jussi Kangasharju Chapter Outline Replication Consistency models Distribution protocols Consistency protocols 2 Data Replication user B user C user A object object
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 22: More Transaction Implementations 1 Review: Schedules, schedules, schedules The DBMS scheduler determines the order of operations from txns are executed
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL
More informationSpeculative Synchronization
Speculative Synchronization José F. Martínez Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/martinez Problem 1: Conservative Parallelization No parallelization
More informationCache Coherence in Distributed and Replicated Transactional Memory Systems. Technical Report RT/4/2009
Technical Report RT/4/2009 Cache Coherence in Distributed and Replicated Transactional Memory Systems Maria Couceiro INESC-ID/IST maria.couceiro@ist.utl.pt Luis Rodrigues INESC-ID/IST ler@ist.utl.pt Jan
More informationReferences. Concurrency Control. Administração e Optimização de Bases de Dados 2012/2013. Helena Galhardas e
Administração e Optimização de Bases de Dados 2012/2013 Concurrency Control Helena Galhardas DEI@Técnico e DMIR@INESC-ID Chpt 15 Silberchatz Chpt 17 Raghu References 1 Summary Lock-Based Protocols Multiple
More informationReview: Multiprocessor. CPE 631 Session 21: Multiprocessors (Part 2) Potential HW Coherency Solutions. Bus Snooping Topology
Review: Multiprocessor CPE 631 Session 21: Multiprocessors (Part 2) Department of Electrical and Computer Engineering University of Alabama in Huntsville Basic issues and terminology Communication: share
More informationTransaction Management & Concurrency Control. CS 377: Database Systems
Transaction Management & Concurrency Control CS 377: Database Systems Review: Database Properties Scalability Concurrency Data storage, indexing & query optimization Today & next class Persistency Security
More informationParallel Methods for Verifying the Consistency of Weakly-Ordered Architectures. Adam McLaughlin, Duane Merrill, Michael Garland, and David A.
Parallel Methods for Verifying the Consistency of Weakly-Ordered Architectures Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader Challenges of Design Verification Contemporary hardware
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google SOSP 03, October 19 22, 2003, New York, USA Hyeon-Gyu Lee, and Yeong-Jae Woo Memory & Storage Architecture Lab. School
More informationNFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency
Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency Local file systems Disks are terrible abstractions: low-level blocks, etc. Directories, files, links much
More informationTopics in Reliable Distributed Systems
Topics in Reliable Distributed Systems 049017 1 T R A N S A C T I O N S Y S T E M S What is A Database? Organized collection of data typically persistent organization models: relational, object-based,
More informationChapter 15 : Concurrency Control
Chapter 15 : Concurrency Control What is concurrency? Multiple 'pieces of code' accessing the same data at the same time Key issue in multi-processor systems (i.e. most computers today) Key issue for parallel
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 11/15/12 Agenda Check-in Centralized and Client-Server Models Parallelism Distributed Databases Homework 6 Check-in
More informationExam 2 Review. Fall 2011
Exam 2 Review Fall 2011 Question 1 What is a drawback of the token ring election algorithm? Bad question! Token ring mutex vs. Ring election! Ring election: multiple concurrent elections message size grows
More informationChapter-4 Multiprocessors and Thread-Level Parallelism
Chapter-4 Multiprocessors and Thread-Level Parallelism We have seen the renewed interest in developing multiprocessors in early 2000: - The slowdown in uniprocessor performance due to the diminishing returns
More informationSquall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases
Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases AARON J. ELMORE, VAIBHAV ARORA, REBECCA TAFT, ANDY PAVLO, DIVY AGRAWAL, AMR EL ABBADI Higher OLTP Throughput Demand for High-throughput
More informationA lock is a mechanism to control concurrent access to a data item Data items can be locked in two modes:
Concurrency Control Concurrency Control Lock-Based and Tree-Based Protocols Timestamp-Based Protocols Validation-Based Protocols Multiple Granularity Multiversion Schemes Insert and Delete Operations Concurrency
More informationCSE 344 MARCH 5 TH TRANSACTIONS
CSE 344 MARCH 5 TH TRANSACTIONS ADMINISTRIVIA OQ6 Out 6 questions Due next Wednesday, 11:00pm HW7 Shortened Parts 1 and 2 -- other material candidates for short answer, go over in section Course evaluations
More informationIntroduction to Data Management CSE 414
Introduction to Data Management CSE 414 Lecture 23: Transactions CSE 414 - Winter 2014 1 Announcements Webquiz due Monday night, 11 pm Homework 7 due Wednesday night, 11 pm CSE 414 - Winter 2014 2 Where
More informationLIMITS OF ILP. B649 Parallel Architectures and Programming
LIMITS OF ILP B649 Parallel Architectures and Programming A Perfect Processor Register renaming infinite number of registers hence, avoids all WAW and WAR hazards Branch prediction perfect prediction Jump
More informationImportant Lessons. A Distributed Algorithm (2) Today's Lecture - Replication
Important Lessons Lamport & vector clocks both give a logical timestamps Total ordering vs. causal ordering Other issues in coordinating node activities Exclusive access to resources/data Choosing a single
More informationCurriculum 2013 Knowledge Units Pertaining to PDC
Curriculum 2013 Knowledge Units Pertaining to C KA KU Tier Level NumC Learning Outcome Assembly level machine Describe how an instruction is executed in a classical von Neumann machine, with organization
More informationAnalyzing the Impact of System Architecture on the Scalability of OLTP Engines for High-Contention Workloads
Analyzing the Impact of System Architecture on the Scalability of OLTP Engines for High-Contention Workloads Raja Appuswamy raja.appuswamy@epfl.ch Mustafa K. Iman mustafa.iman@epfl.ch Angelos C. Anadiotis
More informationLock-based concurrency control. Q: What if access patterns rarely, if ever, conflict? Serializability. Concurrency Control II (OCC, MVCC)
Concurrency Control II (CC, MVCC) CS 418: Distributed Systems Lecture 18 Serializability Execution of a set of transactions over multiple items is equivalent to some serial execution of s Michael Freedman
More informationDatabase Systems CSE 414
Database Systems CSE 414 Lecture 27: Transaction Implementations 1 Announcements Final exam will be on Dec. 14 (next Thursday) 14:30-16:20 in class Note the time difference, the exam will last ~2 hours
More informationLecture 5: Directory Protocols. Topics: directory-based cache coherence implementations
Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations 1 Flat Memory-Based Directories Block size = 128 B Memory in each node = 1 GB Cache in each node = 1 MB For 64 nodes
More informationData/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)
Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) A 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software
More informationComputer Architecture
Jens Teubner Computer Architecture Summer 2016 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2016 Jens Teubner Computer Architecture Summer 2016 83 Part III Multi-Core
More informationLecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations
Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations 1 Split Transaction Bus So far, we have assumed that a coherence operation (request, snoops, responses,
More informationRelaxing Concurrency Control in Transactional Memory. Utku Aydonat
Relaxing Concurrency Control in Transactional Memory by Utku Aydonat A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of The Edward S. Rogers
More informationFast HTAP over loosely coupled Nodes
Wildfire: Fast HTAP over loosely coupled Nodes R. Barber, C. Garcia-Arellano, R. Grosman, V. Raman, R. Sidle, M. Spilchen, A. Storm, Y. Tian, P. Tozun, D. Zilio, M. Huras, C. Mohan, F. Ozcan, H. Pirahesh
More informationCS654 Advanced Computer Architecture Lec 14 Directory Based Multiprocessors
CS654 Advanced Computer Architecture Lec 14 Directory Based Multiprocessors Peter Kemper Adapted from the slides of EECS 252 by Prof. David Patterson Electrical Engineering and Computer Sciences University
More informationCLOUD-SCALE FILE SYSTEMS
Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients
More informationSamsara: Efficient Deterministic Replay in Multiprocessor. Environments with Hardware Virtualization Extensions
Samsara: Efficient Deterministic Replay in Multiprocessor Environments with Hardware Virtualization Extensions Shiru Ren, Le Tan, Chunqi Li, Zhen Xiao, and Weijia Song June 24, 2016 Table of Contents 1
More informationConcurrency Control! Snapshot isolation" q How to ensure serializability and recoverability? " q Lock-Based Protocols" q Other Protocols"
Concurrency Control! q How to ensure serializability and recoverability? q Lock-Based Protocols q Lock, 2PL q Lock Conversion q Lock Implementation q Deadlock q Multiple Granularity q Other Protocols q
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 5: Concurrency Control Topics Data Database design Queries Decomposition Localization Optimization Transactions Concurrency control Reliability
More informationTransaction Processing: Concurrency Control. Announcements (April 26) Transactions. CPS 216 Advanced Database Systems
Transaction Processing: Concurrency Control CPS 216 Advanced Database Systems Announcements (April 26) 2 Homework #4 due this Thursday (April 28) Sample solution will be available on Thursday Project demo
More informationShared Memory Multiprocessors
Shared Memory Multiprocessors Jesús Labarta Index 1 Shared Memory architectures............... Memory Interconnect Cache Processor Concepts? Memory Time 2 Concepts? Memory Load/store (@) Containers Time
More informationTransactions. Kathleen Durant PhD Northeastern University CS3200 Lesson 9
Transactions Kathleen Durant PhD Northeastern University CS3200 Lesson 9 1 Outline for the day The definition of a transaction Benefits provided What they look like in SQL Scheduling Transactions Serializability
More informationCS5412: TRANSACTIONS (I)
1 CS5412: TRANSACTIONS (I) Lecture XVII Ken Birman Transactions 2 A widely used reliability technology, despite the BASE methodology we use in the first tier Goal for this week: in-depth examination of
More information5/17/17. Announcements. Review: Transactions. Outline. Review: TXNs in SQL. Review: ACID. Database Systems CSE 414.
Announcements Database Systems CSE 414 Lecture 21: More Transactions (Ch 8.1-3) HW6 due on Today WQ7 (last!) due on Sunday HW7 will be posted tomorrow due on Wed, May 24 using JDBC to execute SQL from
More informationScalable Cache Coherence. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Scalable Cache Coherence Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hierarchical Cache Coherence Hierarchies in cache organization Multiple levels
More informationModule 9: "Introduction to Shared Memory Multiprocessors" Lecture 16: "Multiprocessor Organizations and Cache Coherence" Shared Memory Multiprocessors
Shared Memory Multiprocessors Shared memory multiprocessors Shared cache Private cache/dancehall Distributed shared memory Shared vs. private in CMPs Cache coherence Cache coherence: Example What went
More informationAbstract. 1. Introduction. 2. Design and Implementation Master Chunkserver
Abstract GFS from Scratch Ge Bian, Niket Agarwal, Wenli Looi https://github.com/looi/cs244b Dec 2017 GFS from Scratch is our partial re-implementation of GFS, the Google File System. Like GFS, our system
More informationMain-Memory Databases 1 / 25
1 / 25 Motivation Hardware trends Huge main memory capacity with complex access characteristics (Caches, NUMA) Many-core CPUs SIMD support in CPUs New CPU features (HTM) Also: Graphic cards, FPGAs, low
More informationConsistency & Replication
Objectives Consistency & Replication Instructor: Dr. Tongping Liu To understand replication and related issues in distributed systems" To learn about how to keep multiple replicas consistent with each
More information