OPERATING SYSTEM SUPPORT FOR REDUNDANT MULTITHREADING. Björn Döbel (TU Dresden)
|
|
- Clemence Kelley
- 5 years ago
- Views:
Transcription
1 OPERATING SYSTEM SUPPORT FOR REDUNDANT MULTITHREADING Björn Döbel (TU Dresden) Brussels,
2 Hardware Faults Radiation-induced soft errors Mainly an issue in avionics+space 1 DRAM errors in large data centers Google Study: > 2% failing DRAM DIMMs per year 2 ECC is not going to even detect a significant amount 3 Disk failure rate about 5% 4 Furthermore: decreasing transistor sizes, higher rate of transient errors in CPU functional units 5 1 Shirvani, McCluskey: Fault-Tolerant Systems in A Space Environment: The CRC ARGOS Project, Schroeder, Pinheiro, Weber: DRAM Errors in the Wild: A Large-Scale Field Study, SIGMETRICS Hwang, Stefanovici, Schroeder: Cosmic Rays Don t Strike Twice: Understanding the Nature of DRAM Errors and the Implications for System Design, ASPLOS Pinheiro, Weber, Barroso: Failure Trends in a Large Disk Drive Population, FAST Shivakumar, Kistler, Keckler: Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic, DSN 2002 Operating System Support for Redundant Multithreading slide 1 of 19
3 Fault Tolerance: State of the Union Software errors Hardware errors non- COTS COTS Operating System Support for Redundant Multithreading slide 2 of 19
4 Fault Tolerance: State of the Union Software errors Hardware errors RAD-hard CPUs Redundant Multithr. non- COTS COTS Operating System Support for Redundant Multithreading slide 2 of 19
5 Fault Tolerance: State of the Union Software errors IBM z/os HP NonStop Hardware errors RAD-hard CPUs Redundant Multithr. non- COTS COTS Operating System Support for Redundant Multithreading slide 2 of 19
6 Fault Tolerance: State of the Union Software errors SeL4 Minix3 IBM z/os HP NonStop Carburizer Hardware errors RAD-hard CPUs Redundant Multithr. non- COTS COTS Operating System Support for Redundant Multithreading slide 2 of 19
7 Fault Tolerance: State of the Union Software errors SeL4 Minix3 IBM z/os HP NonStop Carburizer SWIFT Hardware errors RAD-hard CPUs Redundant Multithr. Encoded Processing non- COTS COTS Operating System Support for Redundant Multithreading slide 2 of 19
8 Fault Tolerance: State of the Union Software errors SeL4 Minix3 IBM z/os HP NonStop Carburizer SWIFT Hardware errors RAD-hard CPUs Redundant Multithr. Encoded Romain Processing non- COTS COTS Operating System Support for Redundant Multithreading slide 2 of 19
9 Transparent Replication as OS Service Application L4 Runtime Environment L4/Fiasco.OC microkernel Operating System Support for Redundant Multithreading slide 3 of 19
10 Transparent Replication as OS Service Replicated Application L4 Runtime Environment Romain L4/Fiasco.OC microkernel Operating System Support for Redundant Multithreading slide 3 of 19
11 Transparent Replication as OS Service Unreplicated Application Replicated Application L4 Runtime Environment Romain L4/Fiasco.OC microkernel Operating System Support for Redundant Multithreading slide 3 of 19
12 Transparent Replication as OS Service Replicated Driver Unreplicated Application Replicated Application L4 Runtime Environment Romain L4/Fiasco.OC microkernel Operating System Support for Redundant Multithreading slide 3 of 19
13 Transparent Replication as OS Service Replicated Driver Unreplicated Application Replicated Application L4 Runtime Environment Romain L4/Fiasco.OC microkernel Reliable Computing Base Operating System Support for Redundant Multithreading slide 3 of 19
14 Process-Level Redundancy 6 Binary recompilation Complex, unprotected compiler Architecture-dependent System calls for replica synchronization Virtual memory fault isolation Restricted to Linux user-level programs 6 Shye, Blomsted, Moseley, Reddi, Connors: PLR: A software approach to transient fault tolerance for multicore architectures, DSN 2009 Operating System Support for Redundant Multithreading slide 4 of 19
15 Process-Level Redundancy 6 Binary recompilation Complex, unprotected compiler Architecture-dependent Reuse OS mechanisms System calls for replica synchronization Additional synchronization events Virtual memory fault isolation Restricted to Linux user-level programs Microkernel-based 6 Shye, Blomsted, Moseley, Reddi, Connors: PLR: A software approach to transient fault tolerance for multicore architectures, DSN 2009 Operating System Support for Redundant Multithreading slide 4 of 19
16 Why A Microkernel? Small components Microrebootable 7 Custom-tailor reliability to application needs 8 7 Herder: Building a Dependable Operating System Fault Tolerance in MINIX3, PhD Thesis, Sridharan, Kaeli: Eliminating microarchitectural dependency from architectural vulnerability, HPCA 2009 Operating System Support for Redundant Multithreading slide 5 of 19
17 Why A Microkernel? Small components Microrebootable 7 Custom-tailor reliability to application needs 8 That s what we do in Dresden (tm). Reuse Fiasco.OC mechanisms instead of adding new code to the RCB 7 Herder: Building a Dependable Operating System Fault Tolerance in MINIX3, PhD Thesis, Sridharan, Kaeli: Eliminating microarchitectural dependency from architectural vulnerability, HPCA 2009 Operating System Support for Redundant Multithreading slide 5 of 19
18 Why A Microkernel? Small components Microrebootable 7 Custom-tailor reliability to application needs 8 That s what we do in Dresden (tm). Reuse Fiasco.OC mechanisms instead of adding new code to the RCB Lean system call interface Need to add special handling to fewer syscalls 7 Herder: Building a Dependable Operating System Fault Tolerance in MINIX3, PhD Thesis, Sridharan, Kaeli: Eliminating microarchitectural dependency from architectural vulnerability, HPCA 2009 Operating System Support for Redundant Multithreading slide 5 of 19
19 Romain: Structure Master Operating System Support for Redundant Multithreading slide 6 of 19
20 Romain: Structure Replica Replica Replica Master Operating System Support for Redundant Multithreading slide 6 of 19
21 Romain: Structure Replica Replica Replica = Master Operating System Support for Redundant Multithreading slide 6 of 19
22 Romain: Structure Replica Replica Replica Resource Manager = System Call Proxy Master Operating System Support for Redundant Multithreading slide 6 of 19
23 Resource Management: Capabilities Replica Operating System Support for Redundant Multithreading slide 7 of 19
24 Resource Management: Capabilities Replica Replica Operating System Support for Redundant Multithreading slide 7 of 19
25 Resource Management: Capabilities Replica Replica Master Operating System Support for Redundant Multithreading slide 7 of 19
26 Partitioned Capability Tables Replica Replica Marked used Master Master private Operating System Support for Redundant Multithreading slide 8 of 19
27 Overhead vs. Unreplicated Execution 9 9 Döbel, Härtig, Engel: Operating System Support for Redundant Multithreading, EMSOFT 2012 Operating System Support for Redundant Multithreading slide 9 of 19
28 Romain Lines of Code Base code (main, logging, locking) 325 Application loader 375 Replica manager 628 Redundancy 153 Memory manager 445 System call proxy 311 Shared memory 281 Total 2,518 Fault injector 668 GDB server stub 1,304 Operating System Support for Redundant Multithreading slide 10 of 19
29 Hardening the RCB We need: Dedicated mechanisms to protect the RCB (HW or SW) We have: Full control over software Use FT-encoding compiler? Has not been done for kernel code yet Only protects SW components RAD-hardened hardware? Too expensive Operating System Support for Redundant Multithreading slide 11 of 19
30 Hardening the RCB We need: Dedicated mechanisms to protect the RCB (HW or SW) We have: Full control over software Use FT-encoding compiler? Has not been done for kernel code yet Only protects SW components RAD-hardened hardware? Too expensive Our proposal: Split HW into ResCores and NonRes-Cores NonRes Core NonRes Core NonRes Core NonRes Core NonRes Core ResCore NonRes Core NonRes Core NonRes Core NonRes Core NonRes Core Operating System Support for Redundant Multithreading slide 11 of 19
31 Signaling Performance Exploring master replica communication 10 12x Intel Core2 2.6 GHz Replicas pinned to dedicated physical cores Overhead in % Hyperthreading off Overhead by notification method Local Faults Migration Sync IPC Shared Mem susan CRC32 DMR susan CRC32 TMR 10 Döbel, Härtig: Who watches the watchmen? Protecting Operating System Reliability Mechanisms, HotDep 2012 Operating System Support for Redundant Multithreading slide 12 of 19
32 How About Multithreading? A 1 A 1 A 2 A 2 A 3 A 3 A 4 A 4 Operating System Support for Redundant Multithreading slide 13 of 19
33 How About Multithreading? A 1 A 1 B 1 A 2 B 1 A 2 A 3 B 2 A 3 B 2 B 3 A 4 B 3 A 4 Operating System Support for Redundant Multithreading slide 13 of 19
34 How About Multithreading? A 1 C 1 A 1 B 1 C 1 A 2 B 1 C 2 A 2 C 2 A 3 B 2 C 3 A 3 B 2 B 3 C 3 A 4 B 3 C 4 A 4 C 4 Operating System Support for Redundant Multithreading slide 13 of 19
35 Problem: Nondeterminism A 1 C 1 A 1 B 1 C 1 A 2 B 1 C 2 A 2 C 2 A 3 B 2 C 3 A 3 B 2 B 3 C 3 A 4 B 3 C 3 A 4 C 4 Operating System Support for Redundant Multithreading slide 14 of 19
36 Problem: Nondeterminism A 1 C 1 A 1 B 1 C 1 A 2 B 1 C 2 A 2 B 2 A 3 B 2 C 3 A 3 B 3 C 2 C 3 A 4 B 3 C 3 A 4 B 4 Operating System Support for Redundant Multithreading slide 14 of 19
37 Deterministic Multithreading Related work: Debugging multithreaded programs Slightly different requirement: determinism across runs 11 Liu, Curtsinger, Berger: DThreads: Efficient Deterministic Multithreading, OSDI Olszewski, Ansel, Amarasinghe: Kendo: Efficient Deterministic Multithreading in Software, ASPLOS 2009 Operating System Support for Redundant Multithreading slide 15 of 19
38 Deterministic Multithreading Related work: Debugging multithreaded programs Slightly different requirement: determinism across runs Strong Determinism: All accesses to shared resources happen in the same order 11. Requires heavy involvement with shared memory accesses Replicating SHM is slow 11 Liu, Curtsinger, Berger: DThreads: Efficient Deterministic Multithreading, OSDI Olszewski, Ansel, Amarasinghe: Kendo: Efficient Deterministic Multithreading in Software, ASPLOS 2009 Operating System Support for Redundant Multithreading slide 15 of 19
39 Deterministic Multithreading Related work: Debugging multithreaded programs Slightly different requirement: determinism across runs Strong Determinism: All accesses to shared resources happen in the same order 11. Requires heavy involvement with shared memory accesses Replicating SHM is slow Weak Determinism: All lock acquisitions in a program happen in the same order 12 Intercept calls to pthread mutex {lock,unlock} 11 Liu, Curtsinger, Berger: DThreads: Efficient Deterministic Multithreading, OSDI Olszewski, Ansel, Amarasinghe: Kendo: Efficient Deterministic Multithreading in Software, ASPLOS 2009 Operating System Support for Redundant Multithreading slide 15 of 19
40 Externally Enforced Determinism Patch entries to pthread mutex {lock,unlock} Catch exception for every call Enforce ordering in side the master Microbenchmark: 2 threads, global counter, 1 lock Replication kind Execution time Overhead Native execution 0.24 s 1.00 x Unreplicated RomainMT 4.50 s x RomainMT: DMR s x RomainMT: TMR s x Operating System Support for Redundant Multithreading slide 16 of 19
41 Internal Determinism Exception for every lock/unlock hurts a lot! Non-contention case: get progress without exception Operating System Support for Redundant Multithreading slide 17 of 19
42 Internal Determinism Exception for every lock/unlock hurts a lot! Non-contention case: get progress without exception Replication-aware pthreads library Shared memory between replicas Exchange per-lock progress information Only block if lock currently used by different thread in other replica Operating System Support for Redundant Multithreading slide 17 of 19
43 Internal Determinism Exception for every lock/unlock hurts a lot! Non-contention case: get progress without exception Replication-aware pthreads library Shared memory between replicas Exchange per-lock progress information Only block if lock currently used by different thread in other replica Problems: Replacing pthreads vs. binary-only support pthreads is a shared lib anyway No support for different synchronization mechanisms New pthreads code becomes part of the RCB. Operating System Support for Redundant Multithreading slide 17 of 19
44 Internal Determinism Microbenchmark: 2 threads, global counter, 1 lock Replication kind Execution time Overhead External det. Native execution 0.24 s 1.00 x Unreplicated RomainMT 0.27 s 1.13 x x RomainMT: DMR 0.46 s 1.92 x x RomainMT: TMR 1.43 s 6.01 x x Operating System Support for Redundant Multithreading slide 18 of 19
45 Conclusion Redundant Multithreading as an OS service Support for binary-only applications Benefit from microkernel by reuse and design Overheads <30%, often <5% Multithreading external vs. internal determinism Work in progress (not in this talk): Shared memory handling is slow Bounded detection latency using a watchdog Dynamic adjustment of replication level and resource usage Operating System Support for Redundant Multithreading slide 19 of 19
46 Nothing to see here This slide intentionally left blank. Except for above text. Operating System Support for Redundant Multithreading slide 20 of 19
47 What about signalling failures? Missed CPU exceptions detected by watchdog Spurious CPU exceptions detected by watchdog / state comparison Transmission of corrupt state detected during state comparison Overwriting remote state during transmission NonResCore memory Accessible by ResCores, but not by other NonResCores Prevents overwriting other states Already available in HW: IBM/Cell Operating System Support for Redundant Multithreading slide 21 of 19
48 Romain Operating System Support for Redundant Multithreading slide 22 of 19
HAFT Hardware-Assisted Fault Tolerance
HAFT Hardware-Assisted Fault Tolerance Dmitrii Kuvaiskii Rasha Faqeh Pramod Bhatotia Christof Fetzer Technische Universität Dresden Pascal Felber Université de Neuchâtel Hardware Errors in the Wild Online
More informationWhere Have all the Cycles Gone? Investigating Runtime Overheads of OS-Assisted Replication
Where Have all the Cycles Gone? Investigating Runtime Overheads of OS-Assisted Replication Björn Döbel, Hermann Härtig Operating Systems Group TU Dresden Nöthnitzer Str. 46 01187 Dresden {doebel,haertig}@tudos.org
More informationOperating Systems Meet Fault Tolerance
Operating Systems Meet Fault Tolerance Microkernel-Based Operating Systems Maksym Planeta Björn Döbel 30.01.2018 1/53 If there s more than one possible outcome of a job or task, and one of those outcome
More informationUsing Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance
Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance Outline Introduction and Motivation Software-centric Fault Detection Process-Level Redundancy Experimental Results
More informationCommercial-Off-the-shelf Hardware Transactional Memory for Tolerating Transient Hardware Errors
Commercial-Off-the-shelf Hardware Transactional Memory for Tolerating Transient Hardware Errors Rasha Faqeh TU- Dresden 19.01.2015 Dresden, 23.09.2011 Transient Error Recovery Motivation Folie Nr. 12 von
More informationDESIGN AND ANALYSIS OF TRANSIENT FAULT TOLERANCE FOR MULTI CORE ARCHITECTURE
DESIGN AND ANALYSIS OF TRANSIENT FAULT TOLERANCE FOR MULTI CORE ARCHITECTURE DivyaRani 1 1pg scholar, ECE Department, SNS college of technology, Tamil Nadu, India -----------------------------------------------------------------------------------------------------------------------------------------------
More informationEliminating Single Points of Failure in Software Based Redundancy
Eliminating Single Points of Failure in Software Based Redundancy Peter Ulbrich, Martin Hoffmann, Rüdiger Kapitza, Daniel Lohmann, Reiner Schmid and Wolfgang Schröder-Preikschat EDCC May 9, 2012 SYSTEM
More informationReplicating Device Drivers on L4Re
Großer Beleg Replicating Device Drivers on L4Re Maurice Bailleu 14. September 2015 Technische Universität Dresden Fakultät Informatik Institut für Systemarchitektur Professur Betriebssysteme Betreuender
More informationCS 470 Spring Fault Tolerance. Mike Lam, Professor. Content taken from the following:
CS 47 Spring 27 Mike Lam, Professor Fault Tolerance Content taken from the following: "Distributed Systems: Principles and Paradigms" by Andrew S. Tanenbaum and Maarten Van Steen (Chapter 8) Various online
More informationFAULT TOLERANT SYSTEMS
FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 18 Chapter 7 Case Studies Part.18.1 Introduction Illustrate practical use of methods described previously Highlight fault-tolerance
More informationDeterministic Process Groups in
Deterministic Process Groups in Tom Bergan Nicholas Hunt, Luis Ceze, Steven D. Gribble University of Washington A Nondeterministic Program global x=0 Thread 1 Thread 2 t := x x := t + 1 t := x x := t +
More informationTyped Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts
Typed Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts Toshiyuki Maeda and Akinori Yonezawa University of Tokyo Quiz [Environment] CPU: Intel Xeon X5570 (2.93GHz)
More informationAUTOBEST: A United AUTOSAR-OS And ARINC 653 Kernel. Alexander Züpke, Marc Bommert, Daniel Lohmann
AUTOBEST: A United AUTOSAR-OS And ARINC 653 Kernel Alexander Züpke, Marc Bommert, Daniel Lohmann alexander.zuepke@hs-rm.de, marc.bommert@hs-rm.de, lohmann@cs.fau.de Motivation Automotive and Avionic industry
More informationCS 31: Intro to Systems Threading & Parallel Applications. Kevin Webb Swarthmore College November 27, 2018
CS 31: Intro to Systems Threading & Parallel Applications Kevin Webb Swarthmore College November 27, 2018 Reading Quiz Making Programs Run Faster We all like how fast computers are In the old days (1980
More informationMajor Project, CSD411
Major Project, CSD411 Deterministic Execution In Multithreaded Applications Supervisor Prof. Sorav Bansal Ashwin Kumar 2010CS10211 Himanshu Gupta 2010CS10220 1 Deterministic Execution In Multithreaded
More informationGraphene-SGX. A Practical Library OS for Unmodified Applications on SGX. Chia-Che Tsai Donald E. Porter Mona Vij
Graphene-SGX A Practical Library OS for Unmodified Applications on SGX Chia-Che Tsai Donald E. Porter Mona Vij Intel SGX: Trusted Execution on Untrusted Hosts Processing Sensitive Data (Ex: Medical Records)
More informationLecture 5: Scheduling and Reliability. Topics: scheduling policies, handling DRAM errors
Lecture 5: Scheduling and Reliability Topics: scheduling policies, handling DRAM errors 1 PAR-BS Mutlu and Moscibroda, ISCA 08 A batch of requests (per bank) is formed: each thread can only contribute
More informationSIListra. Coded Processing in Medical Devices. Dr. Martin Süßkraut (TU-Dresden / SIListra Systems)
SIListra making systems safer Coded Processing in Medical Devices Dr. Martin Süßkraut (TU-Dresden / SIListra Systems) martin.suesskraut@se.inf.tu-dresden.de Embedded goes Medical 5./6. Oct. 2011 1 SIListra
More informationBjörn Döbel. Microkernel-Based Operating Systems. Exercise 3: Virtualization
Faculty of Computer Science Institute for System Architecture, Operating Systems Group Björn Döbel Microkernel-Based Operating Systems Exercise 3: Virtualization Emulation Virtualization Emulation / Simulation
More informationTransient Fault Detection and Reducing Transient Error Rate. Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof.
Transient Fault Detection and Reducing Transient Error Rate Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof. Steven Swanson Outline Motivation What are transient faults? Hardware Fault Detection
More informationARCHITECTURE DESIGN FOR SOFT ERRORS
ARCHITECTURE DESIGN FOR SOFT ERRORS Shubu Mukherjee ^ШВпШшр"* AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO T^"ТГПШГ SAN FRANCISCO SINGAPORE SYDNEY TOKYO ^ P f ^ ^ ELSEVIER Morgan
More informationDeterministic Scaling
Deterministic Scaling Gabriel Southern Madan Das Jose Renau Dept. of Computer Engineering, University of California Santa Cruz {gsouther, madandas, renau}@soe.ucsc.edu http://masc.soe.ucsc.edu ABSTRACT
More informationDeflating the hype: Embedded Virtualization in 3 steps
Deflating the hype: Embedded Virtualization in 3 steps Klaas van Gend MontaVista Software LLC For Embedded Linux Conference Europe 2010, Cambridge Agenda Why multicore made the topic more relevant Partitioning
More informationSCONE: Secure Linux Container Environments with Intel SGX
SCONE: Secure Linux Container Environments with Intel SGX S. Arnautov, B. Trach, F. Gregor, Thomas Knauth, and A. Martin, Technische Universität Dresden; C. Priebe, J. Lind, D. Muthukumaran, D. O'Keeffe,
More informationECE 588/688 Advanced Computer Architecture II
ECE 588/688 Advanced Computer Architecture II Instructor: Alaa Alameldeen alaa@ece.pdx.edu Fall 2009 Portland State University Copyright by Alaa Alameldeen and Haitham Akkary 2009 1 When and Where? When:
More informationProviding Real-Time and Fault Tolerance for CORBA Applications
Providing Real-Time and Tolerance for CORBA Applications Priya Narasimhan Assistant Professor of ECE and CS University Pittsburgh, PA 15213-3890 Sponsored in part by the CMU-NASA High Dependability Computing
More informationDistributed Systems Principles and Paradigms
Distributed Systems Principles and Paradigms Chapter 03 (version February 11, 2008) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.20.
More informationChapter 4: Multithreaded
Chapter 4: Multithreaded Programming Chapter 4: Multithreaded Programming Overview Multithreading Models Thread Libraries Threading Issues Operating-System Examples 2009/10/19 2 4.1 Overview A thread is
More informationMultiprocessor OS. COMP9242 Advanced Operating Systems S2/2011 Week 7: Multiprocessors Part 2. Scalability of Multiprocessor OS.
Multiprocessor OS COMP9242 Advanced Operating Systems S2/2011 Week 7: Multiprocessors Part 2 2 Key design challenges: Correctness of (shared) data structures Scalability Scalability of Multiprocessor OS
More informationSoftware-based Fault Tolerance Mission (Im)possible?
Software-based Fault Tolerance Mission Im)possible? Peter Ulbrich The 29th CREST Open Workshop on Software Redundancy November 18, 2013 System Software Group http://www4.cs.fau.de Embedded Systems Initiative
More informationPriya Narasimhan. Assistant Professor of ECE and CS Carnegie Mellon University Pittsburgh, PA
OMG Real-Time and Distributed Object Computing Workshop, July 2002, Arlington, VA Providing Real-Time and Fault Tolerance for CORBA Applications Priya Narasimhan Assistant Professor of ECE and CS Carnegie
More informationComputer Architecture: Multithreading (III) Prof. Onur Mutlu Carnegie Mellon University
Computer Architecture: Multithreading (III) Prof. Onur Mutlu Carnegie Mellon University A Note on This Lecture These slides are partly from 18-742 Fall 2012, Parallel Computer Architecture, Lecture 13:
More informationMICROKERNELS: MACH AND L4
1 MICROKERNELS: MACH AND L4 CS6410 Hakim Weatherspoon Introduction to Kernels Different Types of Kernel Designs Monolithic kernel Microkernel Hybrid Kernel Exokernel Virtual Machines? Monolithic Kernels
More informationDMP Deterministic Shared Memory Multiprocessing
DMP Deterministic Shared Memory Multiprocessing University of Washington Joe Devietti, Brandon Lucia, Luis Ceze, Mark Oskin A multithreaded voting machine 2 thread 0 thread 1 while (more_votes) { load
More informationReplacement policies for shared caches on symmetric multicores : a programmer-centric point of view
1 Replacement policies for shared caches on symmetric multicores : a programmer-centric point of view Pierre Michaud INRIA HiPEAC 11, January 26, 2011 2 Outline Self-performance contract Proposition for
More informationTEMPERATURE MANAGEMENT IN DATA CENTERS: WHY SOME (MIGHT) LIKE IT HOT
TEMPERATURE MANAGEMENT IN DATA CENTERS: WHY SOME (MIGHT) LIKE IT HOT Nosayba El-Sayed, Ioan Stefanovici, George Amvrosiadis, Andy A. Hwang, Bianca Schroeder {nosayba, ioan, gamvrosi, hwang, bianca}@cs.toronto.edu
More informationCSCE 410/611: Virtualization
CSCE 410/611: Virtualization Definitions, Terminology Why Virtual Machines? Mechanics of Virtualization Virtualization of Resources (Memory) Some slides made available Courtesy of Gernot Heiser, UNSW.
More informationDISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 3 Processes
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 3 Processes Context Switching Processor context: The minimal collection of values stored in the
More informationElzar Triple Modular Redundancy using Intel AVX
Elzar Triple Modular Redundancy using Intel AVX Dmitrii Kuvaiskii Oleksii Oleksenko Pramod Bhatotia Christof Fetzer Pascal Felber Hardware Errors in the Wild Online services run in huge data centers 1
More informationFailures in Large Systems. Hive: Fault Containment for Shared- Memory Multiprocessors. Why failures? Component Reliability
Hive: Fault Containment for Shared- Memory Multiprocessors. Chapin95: John Chapin, Mendel Rosenblum, Scott Devine, Tirthankar Lahiri, Dan Teodosiu, Anoop Gupta, ACM Symp. on Operating Systems Principles,
More informationFakultät Informatik Institut für Systemarchitektur, Betriebssysteme THE NOVA KERNEL API. Julian Stecklina
Fakultät Informatik Institut für Systemarchitektur, Betriebssysteme THE NOVA KERNEL API Julian Stecklina (jsteckli@os.inf.tu-dresden.de) Dresden, 5.2.2012 00 Disclaimer This is not about OpenStack Compute.
More informationReal-Time Component Software. slide credits: H. Kopetz, P. Puschner
Real-Time Component Software slide credits: H. Kopetz, P. Puschner Overview OS services Task Structure Task Interaction Input/Output Error Detection 2 Operating System and Middleware Application Software
More informationLecture 28 Multicore, Multithread" Suggested reading:" (H&P Chapter 7.4)"
Lecture 28 Multicore, Multithread" Suggested reading:" (H&P Chapter 7.4)" 1" Processor components" Multicore processors and programming" Processor comparison" CSE 30321 - Lecture 01 - vs." Goal: Explain
More informationA Userspace Packet Switch for Virtual Machines
SHRINKING THE HYPERVISOR ONE SUBSYSTEM AT A TIME A Userspace Packet Switch for Virtual Machines Julian Stecklina OS Group, TU Dresden jsteckli@os.inf.tu-dresden.de VEE 2014, Salt Lake City 1 Motivation
More informationA Multi-Kernel Survey for High-Performance Computing
A Multi-Kernel Survey for High-Performance Computing Balazs Gerofi, Yutaka Ishikawa, Rolf Riesen, Robert W. Wisniewski, Yoonho Park, Bryan Rosenburg RIKEN Advanced Institute for Computational Science,
More informationBounding Error Detection Latencies for Replicated Execution
Bachelor Thesis Bounding Error Detection Latencies for Replicated Execution Martin Kriegel June 27, 2013 Technische Universität Dresden Fakultät Informatik Institut für Systemarchitektur Professur Betriebssysteme
More information6.033 Spring Lecture #6. Monolithic kernels vs. Microkernels Virtual Machines spring 2018 Katrina LaCurts
6.033 Spring 2018 Lecture #6 Monolithic kernels vs. Microkernels Virtual Machines 1 operating systems enforce modularity on a single machine using virtualization in order to enforce modularity + build
More informationReplication of Concurrent Applications in a Shared Memory Multikernel
Replication of Concurrent Applications in a Shared Memory Multikernel Yuzhong Wen Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the
More informationDesigning and Evaluating a Distributed Computing Language Runtime. Christopher Meiklejohn Université catholique de Louvain, Belgium
Designing and Evaluating a Distributed Computing Language Runtime Christopher Meiklejohn (@cmeik) Université catholique de Louvain, Belgium R A R B R A set() R B R A set() set(2) 2 R B set(3) 3 set() set(2)
More informationTERN: Stable Deterministic Multithreading through Schedule Memoization
TERN: Stable Deterministic Multithreading through Schedule Memoization Heming Cui, Jingyue Wu, Chia-che Tsai, Junfeng Yang Columbia University Appeared in OSDI 10 Nondeterministic Execution One input many
More informationProcess Description and Control
Process Description and Control 1 Process:the concept Process = a program in execution Example processes: OS kernel OS shell Program executing after compilation www-browser Process management by OS : Allocate
More informationKESO Functional Safety and the Use of Java in Embedded Systems
KESO Functional Safety and the Use of Java in Embedded Systems Isabella S1lkerich, Bernhard Sechser Embedded Systems Engineering Kongress 05.12.2012 Lehrstuhl für Informa1k 4 Verteilte Systeme und Betriebssysteme
More informationCS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019
CS 31: Introduction to Computer Systems 22-23: Threads & Synchronization April 16-18, 2019 Making Programs Run Faster We all like how fast computers are In the old days (1980 s - 2005): Algorithm too slow?
More informationMicrokernel Construction
Introduction SS2013 Class Goals Provide deeper understanding of OS mechanisms Introduce L4 principles and concepts Make you become enthusiastic L4 hackers Propaganda for OS research at 2 Administration
More informationStatus Update About COLO (COLO: COarse-grain LOck-stepping Virtual Machines for Non-stop Service)
Status Update About COLO (COLO: COarse-grain LOck-stepping Virtual Machines for Non-stop Service) eddie.dong@intel.com arei.gonglei@huawei.com yanghy@cn.fujitsu.com Agenda Background Introduction Of COLO
More informationTiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture
Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture Donghyuk Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, Onur Mutlu Carnegie Mellon University HPCA - 2013 Executive
More informationBackground: I/O Concurrency
Background: I/O Concurrency Brad Karp UCL Computer Science CS GZ03 / M030 5 th October 2011 Outline Worse Is Better and Distributed Systems Problem: Naïve single-process server leaves system resources
More informationDepartment of Computer Science Institute for System Architecture, Operating Systems Group REAL-TIME MICHAEL ROITZSCH OVERVIEW
Department of Computer Science Institute for System Architecture, Operating Systems Group REAL-TIME MICHAEL ROITZSCH OVERVIEW 2 SO FAR talked about in-kernel building blocks: threads memory IPC drivers
More informationTolerating Malicious Drivers in Linux. Silas Boyd-Wickizer and Nickolai Zeldovich
XXX Tolerating Malicious Drivers in Linux Silas Boyd-Wickizer and Nickolai Zeldovich How could a device driver be malicious? Today's device drivers are highly privileged Write kernel memory, allocate memory,...
More informationCS420: Operating Systems
Threads James Moscola Department of Physical Sciences York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Threads A thread is a basic unit of processing
More informationTransparent TCP Recovery
Transparent Recovery with Chain Replication Robert Burgess Ken Birman Robert Broberg Rick Payne Robbert van Renesse October 26, 2009 Motivation Us: Motivation Them: Client Motivation There is a connection...
More informationRuntime Integrity Checking for Exploit Mitigation on Embedded Devices
Runtime Integrity Checking for Exploit Mitigation on Embedded Devices Matthias Neugschwandtner IBM Research, Zurich eug@zurich.ibm.com Collin Mulliner Northeastern University, Boston collin@mulliner.org
More informationFrom eventual to strong consistency. Primary-Backup Replication. Primary-Backup Replication. Replication State Machines via Primary-Backup
From eventual to strong consistency Replication s via - Eventual consistency Multi-master: Any node can accept operation Asynchronously, nodes synchronize state COS 418: Distributed Systems Lecture 10
More informationAgenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2
Lecture 3: Processes Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Process in General 3.3 Process Concept Process is an active program in execution; process
More informationTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Prathap Kumar Valsan, Heechul Yun, Farzad Farshchi University of Kansas 1 Why? High-Performance Multicores for Real-Time Systems
More information4. Jump to *RA 4. StackGuard 5. Execute code 5. Instruction Set Randomization 6. Make system call 6. System call Randomization
04/04/06 Lecture Notes Untrusted Beili Wang Stages of Static Overflow Solution 1. Find bug in 1. Static Analysis 2. Send overflowing input 2. CCured 3. Overwrite return address 3. Address Space Randomization
More informationCSE 451 Midterm 1. Name:
CSE 451 Midterm 1 Name: 1. [2 points] Imagine that a new CPU were built that contained multiple, complete sets of registers each set contains a PC plus all the other registers available to user programs.
More informationECC Protection in Software
Center for RC eliable omputing ECC Protection in Software by Philip P Shirvani RATS June 8, 1999 Outline l Motivation l Requirements l Coding Schemes l Multiple Error Handling l Implementation in ARGOS
More informationFoundations of the C++ Concurrency Memory Model
Foundations of the C++ Concurrency Memory Model John Mellor-Crummey and Karthik Murthy Department of Computer Science Rice University johnmc@rice.edu COMP 522 27 September 2016 Before C++ Memory Model
More informationOverview of Operating Systems
Lecture Outline Overview of Operating Systems Instructor: Dr. Tongping Liu Thank Dr. Dakai Zhu and Dr. Palden Lama for providing their slides. 1 2 Lecture Outline Von Neumann Architecture 3 This describes
More informationMultiprocessors 2007/2008
Multiprocessors 2007/2008 Abstractions of parallel machines Johan Lukkien 1 Overview Problem context Abstraction Operating system support Language / middleware support 2 Parallel processing Scope: several
More informationMultiprocessor Systems Continuous need for faster computers Multiprocessors: shared memory model, access time nanosec (ns) Multicomputers: message pas
Multiple processor systems 1 Multiprocessor Systems Continuous need for faster computers Multiprocessors: shared memory model, access time nanosec (ns) Multicomputers: message passing multiprocessor, access
More informationFall 2014:: CSE 506:: Section 2 (PhD) Threading. Nima Honarmand (Based on slides by Don Porter and Mike Ferdman)
Threading Nima Honarmand (Based on slides by Don Porter and Mike Ferdman) Threading Review Multiple threads of execution in one address space Why? Exploits multiple processors Separate execution stream
More informationAR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors Computer Sciences Department University of Wisconsin Madison http://www.cs.wisc.edu/~ericro/ericro.html ericro@cs.wisc.edu High-Performance
More information殷亚凤. Processes. Distributed Systems [3]
Processes Distributed Systems [3] 殷亚凤 Email: yafeng@nju.edu.cn Homepage: http://cs.nju.edu.cn/yafeng/ Room 301, Building of Computer Science and Technology Review Architectural Styles: Layered style, Object-based,
More informationFlexSC. Flexible System Call Scheduling with Exception-Less System Calls. Livio Soares and Michael Stumm. University of Toronto
FlexSC Flexible System Call Scheduling with Exception-Less System Calls Livio Soares and Michael Stumm University of Toronto Motivation The synchronous system call interface is a legacy from the single
More informationCS377P Programming for Performance Multicore Performance Multithreading
CS377P Programming for Performance Multicore Performance Multithreading Sreepathi Pai UTCS October 14, 2015 Outline 1 Multiprocessor Systems 2 Programming Models for Multicore 3 Multithreading and POSIX
More informationOperating Systems, Fall Lecture 9, Tiina Niklander 1
Multiprocessor Systems Multiple processor systems Ch 8.1 8.3 1 Continuous need for faster computers Multiprocessors: shared memory model, access time nanosec (ns) Multicomputers: message passing multiprocessor,
More informationECE 588/688 Advanced Computer Architecture II
ECE 588/688 Advanced Computer Architecture II Instructor: Alaa Alameldeen alaa@ece.pdx.edu Winter 2018 Portland State University Copyright by Alaa Alameldeen and Haitham Akkary 2018 1 When and Where? When:
More informationCSCE 410/611: Virtualization!
CSCE 410/611: Virtualization! Definitions, Terminology! Why Virtual Machines?! Mechanics of Virtualization! Virtualization of Resources (Memory)! Some slides made available Courtesy of Gernot Heiser, UNSW.!
More informationTolerating Hardware Device Failures in Software. Asim Kadav, Matthew J. Renzelmann, Michael M. Swift University of Wisconsin Madison
Tolerating Hardware Device Failures in Software Asim Kadav, Matthew J. Renzelmann, Michael M. Swift University of Wisconsin Madison Current state of OS hardware interaction Many device drivers assume device
More informationChapter 10 DISTRIBUTED OBJECT-BASED SYSTEMS
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 10 DISTRIBUTED OBJECT-BASED SYSTEMS Distributed Objects Figure 10-1. Common organization of a remote
More informationFault Isolation for Device Drivers
Fault Isolation for Device Drivers 39 th International Conference on Dependable Systems and Networks, 30 June 2009, Estoril Lisbon, Portugal Jorrit N. Herder Vrije Universiteit Amsterdam ~26% of Windows
More informationCPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.
CPSC/ECE 3220 Fall 2017 Exam 1 Name: 1. Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.) Referee / Illusionist / Glue. Circle only one of R, I, or G.
More informationNUMA replicated pagecache for Linux
NUMA replicated pagecache for Linux Nick Piggin SuSE Labs January 27, 2008 0-0 Talk outline I will cover the following areas: Give some NUMA background information Introduce some of Linux s NUMA optimisations
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance 1000 100 10 Processor Memory 1 1985 1990 1995 2000 2005 2010 Want memory to appear: As fast as CPU As large as required
More informationExokernel: An Operating System Architecture for Application Level Resource Management
Exokernel: An Operating System Architecture for Application Level Resource Management Dawson R. Engler, M. Frans Kaashoek, and James O'Tool Jr. M.I.T Laboratory for Computer Science Cambridge, MA 02139,
More informationDistributed File Systems Issues. NFS (Network File System) AFS: Namespace. The Andrew File System (AFS) Operating Systems 11/19/2012 CSC 256/456 1
Distributed File Systems Issues NFS (Network File System) Naming and transparency (location transparency versus location independence) Host:local-name Attach remote directories (mount) Single global name
More informationLoad-Sto-Meter: Generating Workloads for Persistent Memory Damini Chopra, Doug Voigt Hewlett Packard (Enterprise)
Load-Sto-Meter: Generating Workloads for Persistent Memory Damini Chopra, Doug Voigt Hewlett Packard (Enterprise) Application vs. Pure Workloads Benchmarks that reproduce application workloads Assist in
More informationMicrokernel-based Operating Systems - Introduction
Faculty of Computer Science Institute for System Architecture, Operating Systems Group Microkernel-based Operating Systems - Introduction Nils Asmussen Dresden, Oct 09 2018 Lecture Goals Provide deeper
More informationCrossCheck: A Holistic Approach for Tolerating Crash-Faults and Arbitrary Failures
CrossCheck: A Holistic Approach for Tolerating Crash-Faults and Arbitrary Failures Arthur Martens 1, Christoph Borchert 2, Manuel Nieke 1, Olaf Spinczyk 2 and Rüdiger Kapitza 1 1 TU Braunschweig 2 TU Dortmund
More informationOutline. Threads. Single and Multithreaded Processes. Benefits of Threads. Eike Ritter 1. Modified: October 16, 2012
Eike Ritter 1 Modified: October 16, 2012 Lecture 8: Operating Systems with C/C++ School of Computer Science, University of Birmingham, UK 1 Based on material by Matt Smart and Nick Blundell Outline 1 Concurrent
More informationChapter 4: Multi-Threaded Programming
Chapter 4: Multi-Threaded Programming Chapter 4: Threads 4.1 Overview 4.2 Multicore Programming 4.3 Multithreading Models 4.4 Thread Libraries Pthreads Win32 Threads Java Threads 4.5 Implicit Threading
More informationScheduler Support for Video-oriented Multimedia on Client-side Virtualization
Scheduler Support for Video-oriented Multimedia on Client-side Virtualization Hwanju Kim 1, Jinkyu Jeong 1, Jaeho Hwang 1, Joonwon Lee 2, and Seungryoul Maeng 1 Korea Advanced Institute of Science and
More informationDistributed OS and Algorithms
Distributed OS and Algorithms Fundamental concepts OS definition in general: OS is a collection of software modules to an extended machine for the users viewpoint, and it is a resource manager from the
More informationTU Wien. Fault Isolation and Error Containment in the TT-SoC. H. Kopetz. TU Wien. July 2007
TU Wien 1 Fault Isolation and Error Containment in the TT-SoC H. Kopetz TU Wien July 2007 This is joint work with C. El.Salloum, B.Huber and R.Obermaisser Outline 2 Introduction The Concept of a Distributed
More informationcsci3411: Operating Systems
csci3411: Operating Systems Lecture 3: System structure and Processes Gabriel Parmer Some slide material from Silberschatz and West System Structure System Structure How different parts of software 1)
More informationDistributed Systems 24. Fault Tolerance
Distributed Systems 24. Fault Tolerance Paul Krzyzanowski pxk@cs.rutgers.edu 1 Faults Deviation from expected behavior Due to a variety of factors: Hardware failure Software bugs Operator errors Network
More informationCS Lecture 3! Threads! George Mason University! Spring 2010!
CS 571 - Lecture 3! Threads! George Mason University! Spring 2010! Threads! Overview! Multithreading! Example Applications! User-level Threads! Kernel-level Threads! Hybrid Implementation! Observing Threads!
More informationFlexible Architecture Research Machine (FARM)
Flexible Architecture Research Machine (FARM) RAMP Retreat June 25, 2009 Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan Bronson Christos Kozyrakis, Kunle Olukotun Motivation Why CPUs + FPGAs make sense
More information