CS510 Concurrent Systems Class 2. A Lock-Free Multiprocessor OS Kernel

Similar documents
CS510 Concurrent Systems Class 1a. Linux Kernel Locking Techniques

CSE 3320 Operating Systems Synchronization Jia Rao

Overview of Threads and Concurrency

CS4500/5500 Operating Systems Synchronization

Overview of Threads and Concurrency

CSE 3320 Operating Systems Deadlock Jia Rao

Experience With Processes and Monitors in Mesa

Computer Organization and Architecture

Project 4: System Calls 1

CSE 361S Intro to Systems Software Lab #2

Spin Leading OS Research Astray?

* The mode WheelWork starts in can be changed using command line options.

CS4500/5500 Operating Systems Page Replacement Algorithms and Segmentation

INSTALLING CCRQINVOICE

CMU 15-7/381 CSPs. Teachers: Ariel Procaccia Emma Brunskill (THIS TIME) With thanks to Ariel Procaccia and other prior instructions for slides

CSE 3320 Operating Systems Page Replacement Algorithms and Segmentation Jia Rao

Distributed Data Structures xfs: Serverless Network File System

Iteration Part 2. Review: Iteration [Part 1] Flow charts for two loop constructs. Review: Syntax of loops. while continuation_condition : statement1

Operating systems. Module 7 IPC (Interprocess communication) PART I. Tami Sorgente 1

FIREWALL RULE SET OPTIMIZATION

Operating systems. Module 15 kernel I/O subsystem. Tami Sorgente 1

Lab 5 Sorting with Linked Lists

- Replacement of a single statement with a sequence of statements(promotes regularity)

These tasks can now be performed by a special program called FTP clients.

CCNA Security v2.0 Chapter 3 Exam Answers

Using SPLAY Tree s for state-full packet classification

Eastern Mediterranean University School of Computing and Technology Information Technology Lecture2 Functions

Scroll down to New and another menu will appear. Select Folder and a new

In Java, we can use Comparable and Comparator to compare objects.

One reason for controlling access to an object is to defer the full cost of its creation and initialization until we actually need to use it.

Access the site directly by navigating to in your web browser.

LAB 7 (June 29/July 4) Structures, Stream I/O, Self-referential structures (Linked list) in C

1 Version Spaces. CS 478 Homework 1 SOLUTION

Word 2007 The Ribbon, the Mini toolbar, and the Quick Access Toolbar

OpenSceneGraph Tutorial

NVIDIA S KEPLER ARCHITECTURE. Tony Chen 2015

In-Class Exercise. Hashing Used in: Hashing Algorithm

Infrastructure Series

Pages of the Template

Using the Swiftpage Connect List Manager

Concurrent Programming

Student Guide. Where can I print? Charges for Printing & Copying. Top up your Print Credits Online, whenever you like

Municode Website Instructions

Assignment #5: Rootkit. ECE 650 Fall 2018

CS4500/5500 Operating Systems Computer and Operating Systems Overview

Primitive Types and Methods. Reference Types and Methods. Review: Methods and Reference Types

Preparation: Follow the instructions on the course website to install Java JDK and jgrasp on your laptop.

Using Some of those Nifty New Features You Might Have Missed

of Prolog An Overview 1.1 An example program: defining family relations

Speculative Parallelization. Devarshi Ghoshal

Software Toolbox Extender.NET Component. Development Best Practices

Lecture Notes. UNIX Fast File System Log-Structured File System Analysis and Evolution of Journaling File Systems

UiPath Automation. Walkthrough. Walkthrough Calculate Client Security Hash

Ascii Art Capstone project in C

RISKMAN REFERENCE GUIDE TO USER MANAGEMENT (Non-Network Logins)

COP2800 Homework #3 Assignment Spring 2013

Getting Started with the Web Designer Suite

1 Binary Trees and Adaptive Data Compression

Operating Systems Notes

The Mathematics of the Rubik s Cube

Life Cycle Objectives (LCO) CSE 403, Spring 2006, Alverson

Proper Document Usage and Document Distribution. TIP! How to Use the Guide. Managing the News Page

Arduino Basics Intro to ArduBlocks

UiPath Automation. Walkthrough. Walkthrough Calculate Client Security Hash

An Introduction to Crescendo s Maestro Application Delivery Platform

CS1150 Principles of Computer Science Methods

Gmail and Google Drive for Rutherford County Master Gardeners

Quick Start Guide for EAB Campus Advisors

Chapter 14. Basic Planning Methodology

Procurement Contract Portal. User Guide

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Please contact technical support if you have questions about the directory that your organization uses for user management.

Composition class is responsible for maintaining and updating the linebreaks of text displayed in a text viewer.

ClassFlow Administrator User Guide

Using the Swiftpage Connect List Manager

Performance of VSA in VMware vsphere 5

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Employee Self Service (ESS) FAQs

Computer Organization and Architecture

$ARCSIGHT_HOME/current/user/agent/map. The files are named in sequential order such as:

McGill University School of Computer Science COMP-206. Software Systems. Due: September 29, 2008 on WEB CT at 23:55.

Profiling & Debugging

Last time: search strategies

UBC BLOGS NSYNC PLUGIN

Lab 4. Name: Checked: Objectives:

Faculty Textbook Adoption Instructions

Beyond Verification. Software Synthesis

Laboratory #13: Trigger

Data Structure Interview Questions

Quick Guide on implementing SQL Manage for SAP Business One

Contents: Module. Objectives. Lesson 1: Lesson 2: appropriately. As benefit of good. with almost any planning. it places on the.

Administrativia. Assignment 1 due tuesday 9/23/2003 BEFORE midnight. Midterm exam 10/09/2003. CS 561, Sessions 8-9 1

Lab 0: Compiling, Running, and Debugging

Project #1 - Fraction Calculator

ComplyWorks Subscription User Guide. October 6, 2011

InformationNOW Letters

You try: Find an equivalent fraction. 1 Find an equivalent fraction to 4. Model: Model: 6. Multiply by a Form of One: 8

Max 8/16 and T1/E1 Gateway, Version FAQs

CSCI L Topics in Computing Fall 2018 Web Page Project 50 points

CSE 3320 Operating Systems Computer and Operating Systems Overview Jia Rao

Transcription:

CS510 Cncurrent Systems Class 2 A Lck-Free Multiprcessr OS Kernel

The Synthesis kernel A research prject at Clumbia University Synthesis V.0 ( 68020 Uniprcessr (Mtrla N virtual memry 1991 - Synthesis V.1 Dual 68030s virtual memry, threads, etc Lck-free kernel CS510 - Cncurrent Systems 2

Lcking Why d kernels nrmally use lcks? Lcks supprt a cncurrent prgramming style based n mutual exclusin Acquire lck n entry t critical sectins Release lck n exit Blck r spin if lck is held Only ne thread at a time executes the critical sectin Lcks prevent cncurrent access and enable sequential reasning abut critical sectin cde CS510 - Cncurrent Systems 3

S why nt use lcking? Granularity decisins Simplicity vs perfrmance Increasingly pr perfrmance (superscalar CPUs) Cmplicates cmpsitin Need t knw the lcks I m hlding befre calling a functin Need t knw if its safe t call while hlding thse lcks? Risk f deadlck Prpagates thread failures t ther threads What if I crash while hlding a lck? CS510 - Cncurrent Systems 4

Is there an alternative? Use lck-free, ptimistic synchrnizatin Execute the critical sectin uncnstrained, and check at the end t see if yu were the nly ne If s, cntinue. If nt rll back and retry Synthesis uses n lcks at all! Gal: Shw that Lck-Free synchrnizatin is... Sufficient fr all OS synchrnizatin needs Practical High perfrmance CS510 - Cncurrent Systems 5

Lcking is pessimistic Murphy's law: If it can g wrng, it will... In cncurrent prgramming: If we can have a race cnditin, we will... If anther thread culd mess us up, it will... Slutin: Hide the resurces behind lcked drs Make everyne wait until we're dne That is...if there was anyne at all We pay the same cst either way CS510 - Cncurrent Systems 6

Optimistic synchrnizatin The cmmn case is ften little r n cntentin Or at least it shuld be! D we really need t shut ut the whle wrld? Why nt prceed ptimistically and nly incur cst if we encunter cntentin? If there's little cntentin, there's n starvatin S we dn t need t be wait-free which guarantees n starvatin Lck-free is easier and cheaper than wait-free Small critical sectins really help perfrmance CS510 - Cncurrent Systems 7

Hw des it wrk? Cpy Write dwn any state we need in rder t retry D the wrk Perfrm the cmputatin Atmically test and cmmit r retry Cmpare saved assumptins with the actual state f the wrld If different, und wrk, and start ver with new state If precnditins still hld, cmmit the results and cntinue This is where the wrk becmes visible t the wrld (ideally) CS510 - Cncurrent Systems 8

Example stack pp Pp() { retry: ld_sp = SP; new_sp = ld_sp + 1; elem = *ld_sp; if (CAS(ld_SP, new_sp, &SP) == FAIL) gt retry; return elem; } CS510 - Cncurrent Systems 9

Example stack pp lp Pp() { retry: ld_sp = SP; new_sp = ld_sp + 1; elem = *ld_sp; if (CAS(ld_SP, new_sp, &SP) == FAIL) gt retry; return elem; } CS510 - Cncurrent Systems 10

Example stack pp Lcals - wn t change! Pp() { retry: ld_sp = SP; new_sp = ld_sp + 1; elem = *ld_sp; if (CAS(ld_SP, new_sp, &SP) == FAIL) gt retry; return elem; } Glbal - may change any time! Atmic read-mdify-write instructin CS510 - Cncurrent Systems 11

CAS CAS single wrd Cmpare and Swap An atmic read-mdify-write instructin Semantics f the single atmic instructin are: CAS(cpy, update, mem_addr) { if (*mem_addr == cpy) { *mem_addr = update; return SUCCESS; } else return FAIL; } CS510 - Cncurrent Systems 12

Example stack pp Pp() { retry: ld_sp = SP; new_sp = ld_sp + 1; elem = *ld_sp; if (CAS(ld_SP, new_sp, &SP) == FAIL) gt retry; return elem; } CS510 - Cncurrent Systems 13

Example stack pp Pp() { retry: ld_sp = SP; new_sp = ld_sp + 1; D Wrk elem = *ld_sp; if (CAS(ld_SP, new_sp, &SP) == FAIL) gt retry; return elem; } CS510 - Cncurrent Systems 14

Example stack pp Pp() { retry: ld_sp = SP; new_sp = ld_sp + 1; elem = *ld_sp; if (CAS(ld_SP, new_sp, &SP) == FAIL) gt retry; return elem; } CS510 - Cncurrent Systems 15

Example stack pp Pp() { retry: ld_sp = SP; new_sp = ld_sp + 1; D Wrk elem = *ld_sp; if (CAS(ld_SP, new_sp, &SP) == FAIL) gt retry; return elem; } CS510 - Cncurrent Systems 16

What made it wrk? It wrks because we can atmically cmmit the new stack pinter value and cmpare the ld stack pinter with the ne at cmmit time This allws us t verify n ther thread has accessed the stack cncurrently with ur peratin i.e. since we tk the cpy Well, at least we knw the address in the stack pinter is the same as it was when we started Des this guarantee there was n cncurrent activity? Des it matter? We have t be careful! CS510 - Cncurrent Systems 17

Stack push Push(elem) { retry: ld_sp = SP; new_sp = ld_sp 1; ld_val = *new_sp; if(cas2(ld_sp, ld_val, new_sp, elem, &SP, new_sp) == ( FAIL gt retry; } CS510 - Cncurrent Systems 18

Stack push Push(elem) { retry: Cpy ld_sp = SP; new_sp = ld_sp 1; ld_val = *new_sp; if(cas2(ld_sp, ld_val, new_sp, elem, &SP, new_sp) == ( FAIL gt retry; } CS510 - Cncurrent Systems 19

Stack push Push(elem) { retry: D Wrk ld_sp = SP; new_sp = ld_sp 1; ld_val = *new_sp; if(cas2(ld_sp, ld_val, new_sp, elem, &SP, new_sp) == ( FAIL gt retry; } CS510 - Cncurrent Systems 20

Stack push Push(elem) { retry: ld_sp = SP; new_sp = ld_sp 1; ld_val = *new_sp; if(cas2(ld_sp, ld_val, new_sp, elem, &SP, new_sp) == ( FAIL gt retry; } CS510 - Cncurrent Systems 21

Stack push Push(elem) { retry: ld_sp = SP; Unnecessary new_sp = ld_sp 1; } ld_val = *new_sp; Cmpare if(cas2(ld_sp, ld_val, new_sp, elem, &SP, new_sp) == ( FAIL gt retry; Nte: this is a duble cmpare and swap! Its needed t atmically update bth the new item and the new stack pinter CS510 - Cncurrent Systems 22

CAS2 CAS2 = duble cmpare and swap Smetimes referred t as DCAS CAS2(cpy1, cpy2, update1, update2, addr1, addr2) { if(addr1 == cpy1 && addr2 == cpy2) { *addr1 = update1; *addr2 = update2; return SUCCEED; } else return FAIL; } CS510 - Cncurrent Systems 23

Stack push Push(elem) { retry: ld_sp = SP; new_sp = ld_sp 1; D Wrk ld_val = *new_sp; if(cas2(ld_sp, ld_val, new_sp, elem, &SP, new_sp) == ( FAIL gt retry; } CS510 - Cncurrent Systems 24

Optimistic synchrnizatin in Synthesis Saved state is nly ne r tw wrds Cmmit is dne via Cmpare-and-Swap (CAS), r Duble-Cmpare-and-Swap (CAS2 r DCAS) Can we really d everything in nly tw wrds? Every synchrnizatin prblem in the Synthesis kernel is reduced t nly needing t atmically tuch tw wrds at a time! Requires sme very clever kernel architecture CS510 - Cncurrent Systems 25

Apprach Build data structures that wrk cncurrently Stacks Queues (array-based t avid allcatins) Linked lists Then build the OS arund these data structures Cncurrency is a first-class cncern CS510 - Cncurrent Systems 26

Why is this trickier than it seems? List peratins shw insert and delete at the head This is the easy case What abut insert and delete f interir ndes? Next pinters f deletable ndes are nt safe t traverse, even the first time! Need reference cunts and DCAS t atmically cmpare and update the cunt and pinter values This is expensive, s we may chse t defer deletes instead (mre n this later in the curse) Specialized list and queue implementatins can reduce the verheads CS510 - Cncurrent Systems 27

The fall-back psitin If yu can t reduce the wrk such that it requires atmic updates t tw r less wrds: Create a single server thread and d the wrk sequentially n a single CPU Why is this faster than letting multiple CPUs try t d it cncurrently? Callers pack the requested peratin int a message Send it t the server (using lck-free queues!) Wait fr a respnse/callback/... The queue effectively serializes the peratins CS510 - Cncurrent Systems 28

Lck vs lck-free critical sectins CS510 - Cncurrent Systems 29

Cnclusins This is really intriguing! Its pssible t build an entire OS withut lcks! But d yu really want t? Des it add r remve cmplexity? What if hardware nly gives yu CAS and n DCAS? What if critical sectins are large r lng lived? What if cntentin is high? What if we can t und the wrk?? CS510 - Cncurrent Systems 30