Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance
|
|
- Christian Dalton
- 5 years ago
- Views:
Transcription
1 Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander*, Esteban Meneses, Harshitha Menon*, Phil Miller*, Sriram Krishnamoorthy, Laxmikant V. Kale* *University of Illinois Urbana-Champaign (UIUC) University of Pittsburgh Pacific Northwest National Laboratory (PNNL) September 23, 2014
2 Deterministic Replay & Fault Tolerance Fault tolerance often crosses over into replay territory! Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 2 / 33 S
3 Deterministic Replay & Fault Tolerance Fault tolerance often crosses over into replay territory! Popular uses Online fault tolerance Parallel debugging Reproducing results Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 2 / 33 S
4 Deterministic Replay & Fault Tolerance Fault tolerance often crosses over into replay territory! Popular uses Online fault tolerance Parallel debugging Reproducing results Types of replay Data-driven replay Application/system data is recorded Content of messages sent/received, etc. Control-driven replay The ordering of events is recorded Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 2 / 33 S
5 Deterministic Replay & Fault Tolerance Our Focus Fault tolerance often crosses over into replay territory! Popular uses Online fault tolerance Parallel debugging Reproducing results Types of replay Data-driven replay Application/system data is recorded Content of messages sent/received, etc. Control-driven replay The ordering of events is recorded Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 3 / 33 S
6 Online Fault Tolerance Hard failures Researchers have predicted that hard faults will increase Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 4 / 33 S
7 Online Fault Tolerance Hard failures Researchers have predicted that hard faults will increase Exascale! Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 4 / 33 S
8 Online Fault Tolerance Hard failures Researchers have predicted that hard faults will increase Exascale! Machines are getting larger Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 4 / 33 S
9 Online Fault Tolerance Hard failures Researchers have predicted that hard faults will increase Exascale! Machines are getting larger Projected to house more than 200,000 sockets Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 4 / 33 S
10 Online Fault Tolerance Hard failures Researchers have predicted that hard faults will increase Exascale! Machines are getting larger Projected to house more than 200,000 sockets Hard failures may be frequent and only affect a small percentage of nodes Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 4 / 33 S
11 Online Fault Tolerance Approaches Checkpoint/restart (C/R) Well-established method Save snapshot of system state Roll back to previous snapshot in case of failure Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 5 / 33 S
12 Online Fault Tolerance Approaches Checkpoint/restart (C/R) Well-established method Save snapshot of system state Roll back to previous snapshot in case of failure Motivation beyond C/R If a single node experiences a hard fault, why must all the nodes roll back? Recovering from C/R is expensive at large machine scales Complicated because it depends on many factors (e.g checkpointing frequency) Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 5 / 33 S
13 Online Fault Tolerance Approaches Checkpoint/restart (C/R) Well-established method Save snapshot of system state Roll back to previous snapshot in case of failure Motivation beyond C/R If a single node experiences a hard fault, why must all the nodes roll back? Recovering from C/R is expensive at large machine scales Complicated because it depends on many factors (e.g checkpointing frequency) Solutions Application-specific fault tolerance Other system-level approaches Message-logging! Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 5 / 33 S
14 Hard Failure System Model P processes that communicate via message passing Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 6 / 33 S
15 Hard Failure System Model P processes that communicate via message passing Communication is across non-fifo channels Sent asynchronously Possibly out of order Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 6 / 33 S
16 Hard Failure System Model P processes that communicate via message passing Communication is across non-fifo channels Sent asynchronously Possibly out of order Guaranteed to arrive sometime in the future if the recipient process has not failed Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 6 / 33 S
17 Hard Failure System Model P processes that communicate via message passing Communication is across non-fifo channels Sent asynchronously Possibly out of order Guaranteed to arrive sometime in the future if the recipient process has not failed Fail-stop model for all failures Failed processes do not recover from failures They do not behave maliciously (non-byzantine failures) Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 6 / 33 S
18 Sender-Based Causal Message Logging (SB-ML) Combination of data-driven and control-driven replay Data-driven Messages sent are recorded Control-driven Determinants are recorded to store the order of events Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 7 / 33 S
19 Sender-Based Causal Message Logging (SB-ML) Combination of data-driven and control-driven replay Data-driven Messages sent are recorded Control-driven Determinants are recorded to store the order of events Incurs costs in the form of time and storage overhead during forward execution Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 7 / 33 S
20 Sender-Based Causal Message Logging (SB-ML) Combination of data-driven and control-driven replay Data-driven Messages sent are recorded Control-driven Determinants are recorded to store the order of events Incurs costs in the form of time and storage overhead during forward execution Periodic checkpoints reduce storage overhead Recovery effort is limited to work executed after the latest checkpoint Data stored before the checkpoint can be discarded Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 7 / 33 S
21 Sender-Based Causal Message Logging (SB-ML) Combination of data-driven and control-driven replay Data-driven Messages sent are recorded Control-driven Determinants are recorded to store the order of events Incurs costs in the form of time and storage overhead during forward execution Periodic checkpoints reduce storage overhead Recovery effort is limited to work executed after the latest checkpoint Data stored before the checkpoint can be discarded Scalable implementation in Charm++ Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 7 / 33 S
22 Example Execution with SB-ML Checkpoint Failure Time Task A Task B m 1 m 2 m 4 m 1 m 2 m 4 m 6 Task C Task D Task E m 3 m 5 m 5 m 3 m 7 Forward Path Restart Recovery Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 8 / 33 S
23 Motivation Overheads with SB-ML 100% Performance Overhead Progress Slowdown Checkpoint Failure No FT FT Recovery Time Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 9 / 33 S
24 Forward Execution Overhead with SB-ML Logging the messages Just requires a pointer to be saved and message is not deallocated! Increases memory pressure Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 10 / 33 S
25 Forward Execution Overhead with SB-ML Logging the messages Just requires a pointer to be saved and message is not deallocated! Increases memory pressure Determinants, 4-tuple of the form: <SPE,SSN,RPE,RSN> Components: Sender processor (SPE) Sender sequence number (SSN) Receiver processor (RPE) Receiver sequence number (RSN) Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 10 / 33 S
26 Forward Execution Overhead with SB-ML Logging the messages Just requires a pointer to be saved and message is not deallocated! Increases memory pressure Determinants, 4-tuple of the form: <SPE,SSN,RPE,RSN> Components: Sender processor (SPE) Sender sequence number (SSN) Receiver processor (RPE) Receiver sequence number (RSN) Must be stored stably based on the reliability requirements Propagated to n processors Unacknowledged determinants are augmented onto new messages (to avoid frequent synchronizations) Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 10 / 33 S
27 Forward Execution Overhead with SB-ML Logging the messages Just requires a pointer to be saved and message is not deallocated! Increases memory pressure Determinants, 4-tuple of the form: <SPE,SSN,RPE,RSN> Components: Sender processor (SPE) Sender sequence number (SSN) Receiver processor (RPE) Receiver sequence number (RSN) Must be stored stably based on the reliability requirements Propagated to n processors Unacknowledged determinants are augmented onto new messages (to avoid frequent synchronizations) Recovery Messages must be replayed in a total order Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 10 / 33 S
28 Forward Execution Microbenchmark (SB-ML) Component Overhead (%) Determinants 84.75% Bookkeeping 11.65% Message-envelope size increase 3.10% Message storage 0.50% Using the LeanMD (molecular dynamics) benchmark Measured on 256 cores of Ranger Largest source of overhead is determinants Creating, storing, sending, etc. Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 11 / 33 S
29 Benchmarks Runtime System Charm++ Decompose parallel computation into objects that communicate More objects than number of processors Objects communicate by sending messages Computation is oblivious to the processors Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 12 / 33 S
30 Benchmarks Runtime System Charm++ Decompose parallel computation into objects that communicate More objects than number of processors Objects communicate by sending messages Computation is oblivious to the processors Benefits Load balancing, message-driven execution, fault tolerance, etc. Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 12 / 33 S
31 Benchmarks Configuration & Experimental Setup Benchmark Configuration STENCIL3D matrix: , chunk: 64 3 LEANMD (mini-app for NAMD) 600K atoms, 2-away XY, 75 atoms/cell LULESH (shock hydrodynamics) matrix: 1024x512 2, chunk: 16x8 2 All experiments on IBM Blue Gene/P (BG/P), Intrepid node system Each node consists of one quad-core 850MHz PowerPC 450 2GB DDR2 memory Compiler: IBM XL C/C++ Advanced Edition for Blue Gene/P, V9.0 Runtime: Charm Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 13 / 33 S
32 Forward Execution Overhead with SB-ML Percent Overhead (%) k Cores 16k Cores 32k Cores 64k Cores 132k Cores 0 Stencil3D LeanMD LULESH The finer-grained benchmarks, LeanMD and LULESH, suffer from significant overhead Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 14 / 33 S
33 Reducing the Overhead of Determinants Design Criteria We must maintain full determinism Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 15 / 33 S
34 Reducing the Overhead of Determinants Design Criteria We must maintain full determinism We must devolve well for all cases (even very non-deterministic programs) Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 15 / 33 S
35 Reducing the Overhead of Determinants Design Criteria We must maintain full determinism We must devolve well for all cases (even very non-deterministic programs) Need to consider tasks or lightweight objects Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 15 / 33 S
36 Reducing the Overhead of Determinants Intrinsic determinism Many researchers have noticed that programs have internal determinism Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 16 / 33 S
37 Reducing the Overhead of Determinants Intrinsic determinism Many researchers have noticed that programs have internal determinism Causality tracking (1988: Fidge, Partial orders for parallel debugging) Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 16 / 33 S
38 Reducing the Overhead of Determinants Intrinsic determinism Many researchers have noticed that programs have internal determinism Causality tracking (1988: Fidge, Partial orders for parallel debugging) Racing messages (1992: Netzer, et al., Optimal tracing and replay for debugging message-passing parallel programs) Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 16 / 33 S
39 Reducing the Overhead of Determinants Intrinsic determinism Many researchers have noticed that programs have internal determinism Causality tracking (1988: Fidge, Partial orders for parallel debugging) Racing messages (1992: Netzer, et al., Optimal tracing and replay for debugging message-passing parallel programs) Theoretical races (1993: Damodaran-Kamal, Nondeterminancy: testing and debugging in message passing parallel programs) Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 16 / 33 S
40 Reducing the Overhead of Determinants Intrinsic determinism Many researchers have noticed that programs have internal determinism Causality tracking (1988: Fidge, Partial orders for parallel debugging) Racing messages (1992: Netzer, et al., Optimal tracing and replay for debugging message-passing parallel programs) Theoretical races (1993: Damodaran-Kamal, Nondeterminancy: testing and debugging in message passing parallel programs) Block races (1995: Clemencon, An implementation of race detection and deterministic replay with MPI Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 16 / 33 S
41 Reducing the Overhead of Determinants Intrinsic determinism Many researchers have noticed that programs have internal determinism Causality tracking (1988: Fidge, Partial orders for parallel debugging) Racing messages (1992: Netzer, et al., Optimal tracing and replay for debugging message-passing parallel programs) Theoretical races (1993: Damodaran-Kamal, Nondeterminancy: testing and debugging in message passing parallel programs) Block races (1995: Clemencon, An implementation of race detection and deterministic replay with MPI MPI and Non-determinism (2000: Kranzlmuller, Event graph analysis for debugging massively parallel programs) Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 16 / 33 S
42 Reducing the Overhead of Determinants Intrinsic determinism Many researchers have noticed that programs have internal determinism Causality tracking (1988: Fidge, Partial orders for parallel debugging) Racing messages (1992: Netzer, et al., Optimal tracing and replay for debugging message-passing parallel programs) Theoretical races (1993: Damodaran-Kamal, Nondeterminancy: testing and debugging in message passing parallel programs) Block races (1995: Clemencon, An implementation of race detection and deterministic replay with MPI MPI and Non-determinism (2000: Kranzlmuller, Event graph analysis for debugging massively parallel programs)... Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 16 / 33 S
43 Reducing the Overhead of Determinants Intrinsic determinism Many researchers have noticed that programs have internal determinism Causality tracking (1988: Fidge, Partial orders for parallel debugging) Racing messages (1992: Netzer, et al., Optimal tracing and replay for debugging message-passing parallel programs) Theoretical races (1993: Damodaran-Kamal, Nondeterminancy: testing and debugging in message passing parallel programs) Block races (1995: Clemencon, An implementation of race detection and deterministic replay with MPI MPI and Non-determinism (2000: Kranzlmuller, Event graph analysis for debugging massively parallel programs)... Send-determinism (2011: Guermouche, et al., Uncoordinated checkpointing without domino effect for send-deterministic MPI applications) Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 16 / 33 S
44 Our Approach In many cases, only a partial order must be stored for full determinism Program = internal determinism + non-determinism + commutative Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 17 / 33 S
45 Our Approach In many cases, only a partial order must be stored for full determinism Program = internal determinism + non-determinism + commutative Internal determinism requires no determinants! Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 17 / 33 S
46 Our Approach In many cases, only a partial order must be stored for full determinism Program = internal determinism + non-determinism + commutative Internal determinism requires no determinants! Commutative events require no determinants! Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 17 / 33 S
47 Our Approach In many cases, only a partial order must be stored for full determinism Program = internal determinism + non-determinism + commutative Internal determinism requires no determinants! Commutative events require no determinants! Approach: use determinants to store a partial order for the non-deterministic events that are not commutative Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 17 / 33 S
48 Ordering Algebra Ordered Sets, O O(n, d) Set of n events and d dependencies Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 18 / 33 S
49 Ordering Algebra Ordered Sets, O O(n, d) Set of n events and d dependencies Can be accurately replayed from a given starting point Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 18 / 33 S
50 Ordering Algebra Ordered Sets, O O(n, d) Set of n events and d dependencies Can be accurately replayed from a given starting point Dependencies d can be among the events in the set, or on preceding events Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 18 / 33 S
51 Ordering Algebra Ordered Sets, O O(n, d) Set of n events and d dependencies Can be accurately replayed from a given starting point Dependencies d can be among the events in the set, or on preceding events Intuitively, they are ordered sets of events Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 18 / 33 S
52 Ordering Algebra Ordered Sets, O O(n, d) Set of n events and d dependencies Can be accurately replayed from a given starting point Dependencies d can be among the events in the set, or on preceding events Intuitively, they are ordered sets of events Define sequencing operation, : O(1, d 1 ) O(1, d 2 ) = O(2, d 1 + d 2 + 1) Intuitively, if we have two atomic events, we need a single dependency to tell us which one comes first Generalization: O(n 1, d 1 ) O(n 2, d 2 ) = O(n 1 + n 2, d 1 + d 2 + 1) Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 18 / 33 S
53 Ordering Algebra Unordered Sets, U U(n, d) Unordered set of n events and d dependencies Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 19 / 33 S
54 Ordering Algebra Unordered Sets, U U(n, d) Unordered set of n events and d dependencies Example is where several messages are sent to a single endpoint Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 19 / 33 S
55 Ordering Algebra Unordered Sets, U U(n, d) Unordered set of n events and d dependencies Example is where several messages are sent to a single endpoint Depending the order of arrival, the eventual state will be different We decompose this into atomic events with an additional dependency between each successive pair: where d = d i U(n, d) = O(1, d 1 ) O(1, d 2 ) O(1, d n ) = O(n, d + n 1) Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 19 / 33 S
56 Ordering Algebra Unordered Sets, U U(n, d) Unordered set of n events and d dependencies Example is where several messages are sent to a single endpoint Depending the order of arrival, the eventual state will be different We decompose this into atomic events with an additional dependency between each successive pair: U(n, d) = O(1, d 1 ) O(1, d 2 ) O(1, d n ) = O(n, d + n 1) where d = d i Result: additional n 1 dependencies required to fully order n events Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 19 / 33 S
57 Ordering Algebra Interleaving Multiple Independent Sets, operator Lemma Any possible interleaving of two ordered sets of events A = O(m, d) and B = O(n, e), where A B =, is given by: O(m, d) O(n, e) = O(m + n, d + e + min(m, n)) Lemma Any possible ordering of n ordered set of events O(m 1, d 1 ), O(m 2, d 2 ),..., O(m n, d n ), when i O(m i, d i ) =, can be n represented as: O(m i, d i ) = O(m, d + m max i m i ) where i=1 m = n m i d = n i=1 d i i=1 Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 20 / 33 S
58 Internal Determinism D D(n) = O(n, 0) n deterministically ordered events are structurally equivalent to an ordered set of n events with no associated explicit dependencies! Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 21 / 33 S
59 Internal Determinism D D(n) = O(n, 0) n deterministically ordered events are structurally equivalent to an ordered set of n events with no associated explicit dependencies! What happens if we interleave internal determinism with something else? Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 21 / 33 S
60 Internal Determinism D D(n) = O(n, 0) n deterministically ordered events are structurally equivalent to an ordered set of n events with no associated explicit dependencies! What happens if we interleave internal determinism with something else? k interruption points => O(k, k 1) Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 21 / 33 S
61 Communtative Events C Some events in programs are communtative Regardless of the execution order the state will be identical Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 22 / 33 S
62 Communtative Events C Some events in programs are communtative Regardless of the execution order the state will be identical All existing theories of message logging execute record a total order on them Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 22 / 33 S
63 Communtative Events C Some events in programs are communtative Regardless of the execution order the state will be identical All existing theories of message logging execute record a total order on them However we can reduce a commutative set to: C(n) = O(2, 1) A beginning and end event sequenced together Sequencing other sets of event around the region just puts them before and after Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 22 / 33 S
64 Communtative Events C Some events in programs are communtative Regardless of the execution order the state will be identical All existing theories of message logging execute record a total order on them However we can reduce a commutative set to: C(n) = O(2, 1) A beginning and end event sequenced together Sequencing other sets of event around the region just puts them before and after Interleaving other events puts them in three buckets: (1) before the begin event (2) during the commutative region (3) after the end event Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 22 / 33 S
65 Communtative Events C Some events in programs are communtative Regardless of the execution order the state will be identical All existing theories of message logging execute record a total order on them However we can reduce a commutative set to: C(n) = O(2, 1) A beginning and end event sequenced together Sequencing other sets of event around the region just puts them before and after Interleaving other events puts them in three buckets: (1) before the begin event (2) during the commutative region (3) after the end event This corresponds exactly to an ordered set of two events! Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 22 / 33 S
66 Applying the Theory PO-REPLAY: Partial-Order Message Identification Scheme Properties It tracks causality with Lamport clocks It uniquely identifies a sent message, whether or not its order is transposed It requires exactly the number of determinants and dependencies produced by the ordering algebra Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 23 / 33 S
67 Applying the Theory PO-REPLAY: Partial-Order Message Identification Scheme Properties It tracks causality with Lamport clocks It uniquely identifies a sent message, whether or not its order is transposed It requires exactly the number of determinants and dependencies produced by the ordering algebra Determinant Composition (3-tuple): <SRN,SPE,CPI> SRN: sender region number, incremented for every send outside a commutative region and incremented once when a commutative region starts SPE: sender processor endpoint CPI: commutative path identifier, sequence of bits that represents the path to the root of the commutative region Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 23 / 33 S
68 Experimental Results Forward Execution Overhead: Stencil3D Percent Overhead (%) k Cores 16k Cores 32k Cores 64k Cores 132k Cores 0 Stencil3D- PartialDetFT Stencil3D- FullDetFT Course-grained, shows small improvement over SB-ML Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 24 / 33 S
69 Experimental Results Forward Execution Overhead: LeanMD Percent Overhead (%) k Cores 16k Cores 32k Cores 64k Cores 132k Cores 0 LeanMD- PartialDetFT LeanMD- FullDetFT Fine-grained, reduction from 11-19% overhead to <5% Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 25 / 33 S
70 Experimental Results Forward Execution Overhead: LULESH Percent Overhead (%) k Cores 16k Cores 32k Cores 64k Cores 132k Cores 0 LULESH- PartialDetFT LULESH- FullDetFT Medium-grained, many messages, 17% overhead to <4% Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 26 / 33 S
71 Experimental Results Fault Injection Measure the recovery time for the different protocols Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 27 / 33 S
72 Experimental Results Fault Injection Measure the recovery time for the different protocols We inject a simulated fault on a random node Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 27 / 33 S
73 Experimental Results Fault Injection Measure the recovery time for the different protocols We inject a simulated fault on a random node During approximately the middle of the period Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 27 / 33 S
74 Experimental Results Fault Injection Measure the recovery time for the different protocols We inject a simulated fault on a random node During approximately the middle of the period We calculate the optimal checkpoint period duration using Daly s formula Assuming 64K 1M socket count Assuming MTBF of 10 years Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 27 / 33 S
75 Experimental Results Recovery Time Speedup C/R Cores Cores Cores Cores Cores 2.5 Speedup LeanMD Stencil3D LULESH LeanMD has the most speedup due to its fine-grained, overdecomposed nature We achieve speedup in all cases in recovery time Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 28 / 33 S
76 Experimental Results Recovery Time Speedup SB-ML Cores Cores Cores Cores Cores Speedup LeanMD Stencil3D LULESH Increased speedup with scale, due to expense of coordinating determinants and ordering Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 29 / 33 S
77 Experimental Results Summary Our new message logging protocol has about <5% overhead for the benchmarks tested Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 30 / 33 S
78 Experimental Results Summary Our new message logging protocol has about <5% overhead for the benchmarks tested Recover is significantly faster than C/R or causal Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 30 / 33 S
79 Experimental Results Summary Our new message logging protocol has about <5% overhead for the benchmarks tested Recover is significantly faster than C/R or causal Depending on the frequency of faults, it may perform better than C/R Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 30 / 33 S
80 Future Work More benchmarks Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 31 / 33 S
81 Future Work More benchmarks Study for broader range of programming models Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 31 / 33 S
82 Future Work More benchmarks Study for broader range of programming models Memory overhead of message logging makes it infeasible for some applications Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 31 / 33 S
83 Future Work More benchmarks Study for broader range of programming models Memory overhead of message logging makes it infeasible for some applications Automated extraction of ordering and interleaving properties Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 31 / 33 S
84 Future Work More benchmarks Study for broader range of programming models Memory overhead of message logging makes it infeasible for some applications Automated extraction of ordering and interleaving properties Programming language support? Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 31 / 33 S
85 Conclusion Comprehensive approach for reasoning about execution orderings and interleavings Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 32 / 33 S
86 Conclusion Comprehensive approach for reasoning about execution orderings and interleavings We observe that the information stored can be reduced in proportion to the knowledge of order flexibility Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 32 / 33 S
87 Conclusion Comprehensive approach for reasoning about execution orderings and interleavings We observe that the information stored can be reduced in proportion to the knowledge of order flexibility Programming paradigms should make this cost model clearer! Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander 32 / 33 S
88 Questions?
Scalable In-memory Checkpoint with Automatic Restart on Failures
Scalable In-memory Checkpoint with Automatic Restart on Failures Xiang Ni, Esteban Meneses, Laxmikant V. Kalé Parallel Programming Laboratory University of Illinois at Urbana-Champaign November, 2012 8th
More informationRollback-Recovery Protocols for Send-Deterministic Applications. Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir and Franck Cappello
Rollback-Recovery Protocols for Send-Deterministic Applications Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir and Franck Cappello Fault Tolerance in HPC Systems is Mandatory Resiliency is
More informationCharm++ for Productivity and Performance
Charm++ for Productivity and Performance A Submission to the 2011 HPC Class II Challenge Laxmikant V. Kale Anshu Arya Abhinav Bhatele Abhishek Gupta Nikhil Jain Pritish Jetley Jonathan Lifflander Phil
More informationAdaptive Runtime Support
Scalable Fault Tolerance Schemes using Adaptive Runtime Support Laxmikant (Sanjay) Kale http://charm.cs.uiuc.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at
More informationScalable Fault Tolerance Schemes using Adaptive Runtime Support
Scalable Fault Tolerance Schemes using Adaptive Runtime Support Laxmikant (Sanjay) Kale http://charm.cs.uiuc.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at
More informationHANDLING LOAD IMBALANCE IN DISTRIBUTED & SHARED MEMORY
HANDLING LOAD IMBALANCE IN DISTRIBUTED & SHARED MEMORY Presenters: Harshitha Menon, Seonmyeong Bak PPL Group Phil Miller, Sam White, Nitin Bhat, Tom Quinn, Jim Phillips, Laxmikant Kale MOTIVATION INTEGRATED
More informationFailure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems
Fault Tolerance Fault cause of an error that might lead to failure; could be transient, intermittent, or permanent Fault tolerance a system can provide its services even in the presence of faults Requirements
More informationDistributed recovery for senddeterministic. Tatiana V. Martsinkevich, Thomas Ropars, Amina Guermouche, Franck Cappello
Distributed recovery for senddeterministic HPC applications Tatiana V. Martsinkevich, Thomas Ropars, Amina Guermouche, Franck Cappello 1 Fault-tolerance in HPC applications Number of cores on one CPU and
More informationDetecting and Using Critical Paths at Runtime in Message Driven Parallel Programs
Detecting and Using Critical Paths at Runtime in Message Driven Parallel Programs Isaac Dooley, Laxmikant V. Kale Department of Computer Science University of Illinois idooley2@illinois.edu kale@illinois.edu
More informationRollback-Recovery p Σ Σ
Uncoordinated Checkpointing Rollback-Recovery p Σ Σ Easy to understand No synchronization overhead Flexible can choose when to checkpoint To recover from a crash: go back to last checkpoint restart m 8
More informationResilient Distributed Concurrent Collections. Cédric Bassem Promotor: Prof. Dr. Wolfgang De Meuter Advisor: Dr. Yves Vandriessche
Resilient Distributed Concurrent Collections Cédric Bassem Promotor: Prof. Dr. Wolfgang De Meuter Advisor: Dr. Yves Vandriessche 1 Evolution of Performance in High Performance Computing Exascale = 10 18
More informationc 2012 Harshitha Menon Gopalakrishnan Menon
c 2012 Harshitha Menon Gopalakrishnan Menon META-BALANCER: AUTOMATED LOAD BALANCING BASED ON APPLICATION BEHAVIOR BY HARSHITHA MENON GOPALAKRISHNAN MENON THESIS Submitted in partial fulfillment of the
More informationREMEM: REmote MEMory as Checkpointing Storage
REMEM: REmote MEMory as Checkpointing Storage Hui Jin Illinois Institute of Technology Xian-He Sun Illinois Institute of Technology Yong Chen Oak Ridge National Laboratory Tao Ke Illinois Institute of
More informationSamsara: Efficient Deterministic Replay in Multiprocessor. Environments with Hardware Virtualization Extensions
Samsara: Efficient Deterministic Replay in Multiprocessor Environments with Hardware Virtualization Extensions Shiru Ren, Le Tan, Chunqi Li, Zhen Xiao, and Weijia Song June 24, 2016 Table of Contents 1
More informationRuntime Support for Scalable Task-parallel Programs
Runtime Support for Scalable Task-parallel Programs Pacific Northwest National Lab xsig workshop May 2018 http://hpc.pnl.gov/people/sriram/ Single Program Multiple Data int main () {... } 2 Task Parallelism
More informationDr. Gengbin Zheng and Ehsan Totoni. Parallel Programming Laboratory University of Illinois at Urbana-Champaign
Dr. Gengbin Zheng and Ehsan Totoni Parallel Programming Laboratory University of Illinois at Urbana-Champaign April 18, 2011 A function level simulator for parallel applications on peta scale machine An
More informationHypervisor-based Fault-tolerance. Where should RC be implemented? The Hypervisor as a State Machine. The Architecture. In hardware
Where should RC be implemented? In hardware sensitive to architecture changes At the OS level state transitions hard to track and coordinate At the application level requires sophisticated application
More informationCheckpointing HPC Applications
Checkpointing HC Applications Thomas Ropars thomas.ropars@imag.fr Université Grenoble Alpes 2016 1 Failures in supercomputers Fault tolerance is a serious problem Systems with millions of components Failures
More informationA Survey of Rollback-Recovery Protocols in Message-Passing Systems
A Survey of Rollback-Recovery Protocols in Message-Passing Systems Mootaz Elnozahy * Lorenzo Alvisi Yi-Min Wang David B. Johnson June 1999 CMU-CS-99-148 (A revision of CMU-CS-96-181) School of Computer
More informationCall Paths for Pin Tools
, Xu Liu, and John Mellor-Crummey Department of Computer Science Rice University CGO'14, Orlando, FL February 17, 2014 What is a Call Path? main() A() B() Foo() { x = *ptr;} Chain of function calls that
More informationA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS
A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS Abhinav Bhatele, Eric Bohm, Laxmikant V. Kale Parallel Programming Laboratory Euro-Par 2009 University of Illinois at Urbana-Champaign
More informationFAULT TOLERANT SYSTEMS
FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 17 - Checkpointing II Chapter 6 - Checkpointing Part.17.1 Coordinated Checkpointing Uncoordinated checkpointing may lead
More informationAdvanced Memory Management
Advanced Memory Management Main Points Applications of memory management What can we do with ability to trap on memory references to individual pages? File systems and persistent storage Goals Abstractions
More informationTechniques to improve the scalability of Checkpoint-Restart
Techniques to improve the scalability of Checkpoint-Restart Bogdan Nicolae Exascale Systems Group IBM Research Ireland 1 Outline A few words about the lab and team Challenges of Exascale A case for Checkpoint-Restart
More informationScalable Interaction with Parallel Applications
Scalable Interaction with Parallel Applications Filippo Gioachin Chee Wai Lee Laxmikant V. Kalé Department of Computer Science University of Illinois at Urbana-Champaign Outline Overview Case Studies Charm++
More informationThree Models. 1. Time Order 2. Distributed Algorithms 3. Nature of Distributed Systems1. DEPT. OF Comp Sc. and Engg., IIT Delhi
DEPT. OF Comp Sc. and Engg., IIT Delhi Three Models 1. CSV888 - Distributed Systems 1. Time Order 2. Distributed Algorithms 3. Nature of Distributed Systems1 Index - Models to study [2] 1. LAN based systems
More informationCSE 5306 Distributed Systems. Fault Tolerance
CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure
More informationA Parallel-Object Programming Model for PetaFLOPS Machines and BlueGene/Cyclops Gengbin Zheng, Arun Singla, Joshua Unger, Laxmikant Kalé
A Parallel-Object Programming Model for PetaFLOPS Machines and BlueGene/Cyclops Gengbin Zheng, Arun Singla, Joshua Unger, Laxmikant Kalé Parallel Programming Laboratory Department of Computer Science University
More informationChapter 8 Fault Tolerance
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to
More informationPerformance Optimization Under Thermal and Power Constraints For High Performance Computing Data Centers
Performance Optimization Under Thermal and Power Constraints For High Performance Computing Data Centers Osman Sarood PhD Final Defense Department of Computer Science December 3rd, 2013!1 PhD Thesis Committee
More informationPage 1 FAULT TOLERANT SYSTEMS. Coordinated Checkpointing. Time-Based Synchronization. A Coordinated Checkpointing Algorithm
FAULT TOLERANT SYSTEMS Coordinated http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Chapter 6 II Uncoordinated checkpointing may lead to domino effect or to livelock Example: l P wants to take a
More informationMemory Tagging in Charm++
Memory Tagging in Charm++ Filippo Gioachin Laxmikant V. Kalé Department of Computer Science University of Illinois at Urbana Champaign Outline Overview Memory tagging Charm++ RTS CharmDebug Charm++ memory
More informationUncoordinated Checkpointing Without Domino Effect for Send-Deterministic MPI Applications
Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic MPI Applications Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir, Franck Cappello To cite this version: Amina Guermouche,
More informationAbhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana Champaign Sameer Kumar IBM T. J. Watson Research Center
Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana Champaign Sameer Kumar IBM T. J. Watson Research Center Motivation: Contention Experiments Bhatele, A., Kale, L. V. 2008 An Evaluation
More informationFault-Tolerant Computer Systems ECE 60872/CS Recovery
Fault-Tolerant Computer Systems ECE 60872/CS 59000 Recovery Saurabh Bagchi School of Electrical & Computer Engineering Purdue University Slides based on ECE442 at the University of Illinois taught by Profs.
More informationLoad Balancing Techniques for Asynchronous Spacetime Discontinuous Galerkin Methods
Load Balancing Techniques for Asynchronous Spacetime Discontinuous Galerkin Methods Aaron K. Becker (abecker3@illinois.edu) Robert B. Haber Laxmikant V. Kalé University of Illinois, Urbana-Champaign Parallel
More informationFailure Tolerance. Distributed Systems Santa Clara University
Failure Tolerance Distributed Systems Santa Clara University Distributed Checkpointing Distributed Checkpointing Capture the global state of a distributed system Chandy and Lamport: Distributed snapshot
More informationHighly Scalable Parallel Sorting. Edgar Solomonik and Laxmikant Kale University of Illinois at Urbana-Champaign April 20, 2010
Highly Scalable Parallel Sorting Edgar Solomonik and Laxmikant Kale University of Illinois at Urbana-Champaign April 20, 2010 1 Outline Parallel sorting background Histogram Sort overview Histogram Sort
More informationToday CSCI Recovery techniques. Recovery. Recovery CAP Theorem. Instructor: Abhishek Chandra
Today CSCI 5105 Recovery CAP Theorem Instructor: Abhishek Chandra 2 Recovery Operations to be performed to move from an erroneous state to an error-free state Backward recovery: Go back to a previous correct
More informationCSE 5306 Distributed Systems
CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves
More informationTraceR: A Parallel Trace Replay Tool for Studying Interconnection Networks
TraceR: A Parallel Trace Replay Tool for Studying Interconnection Networks Bilge Acun, PhD Candidate Department of Computer Science University of Illinois at Urbana- Champaign *Contributors: Nikhil Jain,
More informationFault Tolerant Java Virtual Machine. Roy Friedman and Alon Kama Technion Haifa, Israel
Fault Tolerant Java Virtual Machine Roy Friedman and Alon Kama Technion Haifa, Israel Objective Create framework for transparent fault-tolerance Support legacy applications Intended for long-lived, highly
More informationEvent Ordering. Greg Bilodeau CS 5204 November 3, 2009
Greg Bilodeau CS 5204 November 3, 2009 Fault Tolerance How do we prepare for rollback and recovery in a distributed system? How do we ensure the proper processing order of communications between distributed
More informationParallelization of DQMC Simulations for Strongly Correlated Electron Systems
Parallelization of DQMC Simulations for Strongly Correlated Electron Systems Che-Rung Lee Dept. of Computer Science National Tsing-Hua University Taiwan joint work with I-Hsin Chung (IBM Research), Zhaojun
More informationPICS - a Performance-analysis-based Introspective Control System to Steer Parallel Applications
PICS - a Performance-analysis-based Introspective Control System to Steer Parallel Applications Yanhua Sun, Laxmikant V. Kalé University of Illinois at Urbana-Champaign sun51@illinois.edu November 26,
More informationRelaxReplay: Record and Replay for Relaxed-Consistency Multiprocessors
RelaxReplay: Record and Replay for Relaxed-Consistency Multiprocessors Nima Honarmand and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/ 1 RnR: Record and Deterministic
More informationEnzo-P / Cello. Formation of the First Galaxies. San Diego Supercomputer Center. Department of Physics and Astronomy
Enzo-P / Cello Formation of the First Galaxies James Bordner 1 Michael L. Norman 1 Brian O Shea 2 1 University of California, San Diego San Diego Supercomputer Center 2 Michigan State University Department
More informationCS 431/531 Introduction to Performance Measurement, Modeling, and Analysis Winter 2019
CS 431/531 Introduction to Performance Measurement, Modeling, and Analysis Winter 2019 Prof. Karen L. Karavanic karavan@pdx.edu web.cecs.pdx.edu/~karavan Today s Agenda Why Study Performance? Why Study
More informationSAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics
1 SAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics Qin Liu, John C.S. Lui 1 Cheng He, Lujia Pan, Wei Fan, Yunlong Shi 2 1 The Chinese University of Hong Kong 2 Huawei Noah s
More informationCharm++: An Asynchronous Parallel Programming Model with an Intelligent Adaptive Runtime System
Charm++: An Asynchronous Parallel Programming Model with an Intelligent Adaptive Runtime System Laxmikant (Sanjay) Kale http://charm.cs.illinois.edu Parallel Programming Laboratory Department of Computer
More informationIS TOPOLOGY IMPORTANT AGAIN? Effects of Contention on Message Latencies in Large Supercomputers
IS TOPOLOGY IMPORTANT AGAIN? Effects of Contention on Message Latencies in Large Supercomputers Abhinav S Bhatele and Laxmikant V Kale ACM Research Competition, SC 08 Outline Why should we consider topology
More informationRecovering from a Crash. Three-Phase Commit
Recovering from a Crash If INIT : abort locally and inform coordinator If Ready, contact another process Q and examine Q s state Lecture 18, page 23 Three-Phase Commit Two phase commit: problem if coordinator
More informationWhat is checkpoint. Checkpoint libraries. Where to checkpoint? Why we need it? When to checkpoint? Who need checkpoint?
What is Checkpoint libraries Bosilca George bosilca@cs.utk.edu Saving the state of a program at a certain point so that it can be restarted from that point at a later time or on a different machine. interruption
More informationChapter 8 Fault Tolerance
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance Fault Tolerance Basic Concepts Being fault tolerant is strongly related to what
More informationApplication-Transparent Checkpoint/Restart for MPI Programs over InfiniBand
Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Qi Gao, Weikuan Yu, Wei Huang, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of Computer Science & Engineering
More informationReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors
ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors Milos Prvulovic, Zheng Zhang*, Josep Torrellas University of Illinois at Urbana-Champaign *Hewlett-Packard
More informationLoad Balancing for Parallel Multi-core Machines with Non-Uniform Communication Costs
Load Balancing for Parallel Multi-core Machines with Non-Uniform Communication Costs Laércio Lima Pilla llpilla@inf.ufrgs.br LIG Laboratory INRIA Grenoble University Grenoble, France Institute of Informatics
More informationLife, Death, and the Critical Transition: Finding Liveness Bugs in System Code
Life, Death, and the Critical Transition: Finding Liveness Bugs in System Code Charles Killian, James W. Anderson, Ranjit Jhala, and Amin Vahdat Presented by Nick Sumner 25 March 2008 Background We already
More informationClock and Time. THOAI NAM Faculty of Information Technology HCMC University of Technology
Clock and Time THOAI NAM Faculty of Information Technology HCMC University of Technology Using some slides of Prashant Shenoy, UMass Computer Science Chapter 3: Clock and Time Time ordering and clock synchronization
More informationRecent Advances in Heterogeneous Computing using Charm++
Recent Advances in Heterogeneous Computing using Charm++ Jaemin Choi, Michael Robson Parallel Programming Laboratory University of Illinois Urbana-Champaign April 12, 2018 1 / 24 Heterogeneous Computing
More informationProcess groups and message ordering
Process groups and message ordering If processes belong to groups, certain algorithms can be used that depend on group properties membership create ( name ), kill ( name ) join ( name, process ), leave
More informationDistributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf
Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need
More informationthe Cornell Checkpoint (pre-)compiler
3 the Cornell Checkpoint (pre-)compiler Daniel Marques Department of Computer Science Cornell University CS 612 April 10, 2003 Outline Introduction and background Checkpointing process state Checkpointing
More informationParallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer?
Parallel and Distributed Systems Instructor: Sandhya Dwarkadas Department of Computer Science University of Rochester What is a parallel computer? A collection of processing elements that communicate and
More informationExploring Use-cases for Non-Volatile Memories in support of HPC Resilience
Exploring Use-cases for Non-Volatile Memories in support of HPC Resilience Onkar Patil 1, Saurabh Hukerikar 2, Frank Mueller 1, Christian Engelmann 2 1 Dept. of Computer Science, North Carolina State University
More information2 Introduction to Processes
2 Introduction to Processes Required readings: Silberschatz/Galvin: Chapter 4 With many things happening at once in a system, need some clean way of separating them all out cleanly. sequential process,
More informationSemi-Passive Replication in the Presence of Byzantine Faults
Semi-Passive Replication in the Presence of Byzantine Faults HariGovind V. Ramasamy Adnan Agbaria William H. Sanders University of Illinois at Urbana-Champaign 1308 W. Main Street, Urbana IL 61801, USA
More informationIncremental Record and Replay Mechanism for Message Passing Concurrent Programs
Incremental Record and Replay Mechanism for Message Passing Concurrent Programs Yi Zeng School of Computer Science and Technology, Nanjing Normal University, Nanjing 210023, China. 05277@njnu.edu.cn Abstract
More informationFault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University
Fault Tolerance Part II CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Reliable Group Communication Reliable multicasting: A message that is sent to a process group should be delivered
More informationCharm++ Migratable Objects + Active Messages + Adaptive Runtime = Productivity + Performance
Charm++ Migratable Objects + Active Messages + Adaptive Runtime = Productivity + Performance Laxmikant V. Kale Anshu Arya, Nikhil Jain, Akhil Langer, Jonathan Lifflander, Harshitha Menon Xiang Ni, Yanhua
More informationUncoordinated Checkpointing Without Domino Effect for Send-Deterministic MPI Applications
Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic MPI Applications Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir, Franck Cappello INRIA Saclay-Île de France, F-91893
More informationFault Tolerance. Distributed Systems IT332
Fault Tolerance Distributed Systems IT332 2 Outline Introduction to fault tolerance Reliable Client Server Communication Distributed commit Failure recovery 3 Failures, Due to What? A system is said to
More informationDeterministic Process Groups in
Deterministic Process Groups in Tom Bergan Nicholas Hunt, Luis Ceze, Steven D. Gribble University of Washington A Nondeterministic Program global x=0 Thread 1 Thread 2 t := x x := t + 1 t := x x := t +
More informationEfficient Data Race Detection for Unified Parallel C
P A R A L L E L C O M P U T I N G L A B O R A T O R Y Efficient Data Race Detection for Unified Parallel C ParLab Winter Retreat 1/14/2011" Costin Iancu, LBL" Nick Jalbert, UC Berkeley" Chang-Seo Park,
More informationDistributed Deadlock
Distributed Deadlock 9.55 DS Deadlock Topics Prevention Too expensive in time and network traffic in a distributed system Avoidance Determining safe and unsafe states would require a huge number of messages
More informationHigh Performance MPI on IBM 12x InfiniBand Architecture
High Performance MPI on IBM 12x InfiniBand Architecture Abhinav Vishnu, Brad Benton 1 and Dhabaleswar K. Panda {vishnu, panda} @ cse.ohio-state.edu {brad.benton}@us.ibm.com 1 1 Presentation Road-Map Introduction
More informationHydEE: Failure Containment without Event Logging for Large Scale Send-Deterministic MPI Applications
HydEE: Failure Containment without Event Logging for Large Scale Send-Deterministic MPI Applications Amina Guermouche, Thomas Ropars, Marc Snir, Franck Cappello To cite this version: Amina Guermouche,
More informationAdaptive Cluster Computing using JavaSpaces
Adaptive Cluster Computing using JavaSpaces Jyoti Batheja and Manish Parashar The Applied Software Systems Lab. ECE Department, Rutgers University Outline Background Introduction Related Work Summary of
More informationScaling Optimistic Concurrency Control by Approximately Partitioning the Certifier and Log
Scaling Optimistic Concurrency Control by Approximately Partitioning the Certifier and Log Philip A. Bernstein Microsoft Research Redmond, WA, USA phil.bernstein@microsoft.com Sudipto Das Microsoft Research
More informationMPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016
MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared
More informationMULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationVALLIAMMAI ENGINEERING COLLEGE
VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK II SEMESTER CP7204 Advanced Operating Systems Regulation 2013 Academic Year
More informationDistributed Systems (5DV147)
Distributed Systems (5DV147) Replication and consistency Fall 2013 1 Replication 2 What is replication? Introduction Make different copies of data ensuring that all copies are identical Immutable data
More informationConcurrent & Distributed Systems Supervision Exercises
Concurrent & Distributed Systems Supervision Exercises Stephen Kell Stephen.Kell@cl.cam.ac.uk November 9, 2009 These exercises are intended to cover all the main points of understanding in the lecture
More informationNoise Injection Techniques to Expose Subtle and Unintended Message Races
Noise Injection Techniques to Expose Subtle and Unintended Message Races PPoPP2017 February 6th, 2017 Kento Sato, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee, Martin Schulz and Christopher M. Chambreau
More informationGFS: The Google File System
GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one
More informationGFS: The Google File System. Dr. Yingwu Zhu
GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can
More informationFault Tolerance Techniques for Sparse Matrix Methods
Fault Tolerance Techniques for Sparse Matrix Methods Simon McIntosh-Smith Rob Hunt An Intel Parallel Computing Center Twitter: @simonmcs 1 ! Acknowledgements Funded by FP7 Exascale project: Mont Blanc
More informationCHRONO::HPC DISTRIBUTED MEMORY FLUID-SOLID INTERACTION SIMULATIONS. Felipe Gutierrez, Arman Pazouki, and Dan Negrut University of Wisconsin Madison
CHRONO::HPC DISTRIBUTED MEMORY FLUID-SOLID INTERACTION SIMULATIONS Felipe Gutierrez, Arman Pazouki, and Dan Negrut University of Wisconsin Madison Support: Rapid Innovation Fund, U.S. Army TARDEC ASME
More informationCRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart
CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavier Besseron, Hao Wang, Jian Huang, Dhabaleswar K. Panda Department of Computer
More informationDistributed Systems (ICE 601) Fault Tolerance
Distributed Systems (ICE 601) Fault Tolerance Dongman Lee ICU Introduction Failure Model Fault Tolerance Models state machine primary-backup Class Overview Introduction Dependability availability reliability
More informationHeckaton. SQL Server's Memory Optimized OLTP Engine
Heckaton SQL Server's Memory Optimized OLTP Engine Agenda Introduction to Hekaton Design Consideration High Level Architecture Storage and Indexing Query Processing Transaction Management Transaction Durability
More informationAlleviating Scalability Issues of Checkpointing
Rolf Riesen, Kurt Ferreira, Dilma Da Silva, Pierre Lemarinier, Dorian Arnold, Patrick G. Bridges 13 November 2012 Alleviating Scalability Issues of Checkpointing Protocols Overview 2 3 Motivation: scaling
More informationLast Class: RPCs and RMI. Today: Communication Issues
Last Class: RPCs and RMI Case Study: Sun RPC Lightweight RPCs Remote Method Invocation (RMI) Design issues Lecture 9, page 1 Today: Communication Issues Message-oriented communication Persistence and synchronicity
More informationWelcome to the 2017 Charm++ Workshop!
Welcome to the 2017 Charm++ Workshop! Laxmikant (Sanjay) Kale http://charm.cs.illinois.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana Champaign 2017
More informationReflections on Failure in Post-Terascale Parallel Computing
Reflections on Failure in Post-Terascale Parallel Computing 2007 Int. Conf. on Parallel Processing, Xi An China Garth Gibson Carnegie Mellon University and Panasas Inc. DOE SciDAC Petascale Data Storage
More informationPreliminary Evaluation of a Parallel Trace Replay Tool for HPC Network Simulations
Preliminary Evaluation of a Parallel Trace Replay Tool for HPC Network Simulations Bilge Acun 1, Nikhil Jain 1, Abhinav Bhatele 2, Misbah Mubarak 3, Christopher D. Carothers 4, and Laxmikant V. Kale 1
More informationTransparent TCP Recovery
Transparent Recovery with Chain Replication Robert Burgess Ken Birman Robert Broberg Rick Payne Robbert van Renesse October 26, 2009 Motivation Us: Motivation Them: Client Motivation There is a connection...
More informationDistributed Systems
15-440 Distributed Systems 11 - Fault Tolerance, Logging and Recovery Tuesday, Oct 2 nd, 2018 Logistics Updates P1 Part A checkpoint Part A due: Saturday 10/6 (6-week drop deadline 10/8) *Please WORK hard
More informationCommunication and Topology-aware Load Balancing in Charm++ with TreeMatch
Communication and Topology-aware Load Balancing in Charm++ with TreeMatch Joint lab 10th workshop (IEEE Cluster 2013, Indianapolis, IN) Emmanuel Jeannot Esteban Meneses-Rojas Guillaume Mercier François
More information