AC: COMPOSABLE ASYNCHRONOUS IO FOR NATIVE LANGUAGES. Tim Harris, Martín Abadi, Rebecca Isaacs & Ross McIlroy

Size: px
Start display at page:

Download "AC: COMPOSABLE ASYNCHRONOUS IO FOR NATIVE LANGUAGES. Tim Harris, Martín Abadi, Rebecca Isaacs & Ross McIlroy"

Transcription

1 AC: COMPOSABLE ASYNCHRONOUS IO FOR NATIVE LANGUAGES Tim Harris, Martín Abadi, Rebecca Isaacs & Ross McIlroy

2 Synchronous IO in the Windows API Read the contents of h, and compute a result BOOL ProcessFile(HANDLE h, int *output) { BOOL ok = TRUE; DWORD size = FileSize(h); LPVOID buf = malloc(size); for (int i = 0; ok && i<(size/1024); i++) { ok &= ReadFile(h, buf+(i*1024), 1024, NULL, NULL); if (ok) { *output = free(buf); return ok; It exposes reads one by one to the IO system: no pipelining, poor performance If we want to process multiple files Implemented using synchronous calls to we must either (i) handle them ReadFile to read each block in turn serially, (ii) introduce threading 15/11/2011 2

3 Asynchronous IO static OVERLAPPED o[pd]; BOOL ProcessFile(HANDLE h, int *output) { DWORD size = FileSize(h); LPVOID buf = malloc(size); HANDLE cp = CreateIoCompletionPort(h, NULL, 0, 0); BOOL ok = TRUE; DWORD idx; LPOVERLAPPED *op; for (int i = 0; i<pd; i++) { PostQueuedCompletionStatus(cp, 0, i, &o[i]); for (int i = 0; i<size/1024; i++) { DWORD idx; LPOVERLAPPED *op; GetQueuedCompletionStatus(cp, NULL, &idx, &op, INFINITE); op >offset = i*1024; ReadFile(h, buf+(i*1024), 1024, NULL, op); for (int i = 0; i<pd; i++) { GetQueuedCompletionStatus(cp, NULL, &idx, &op, INFINITE); if (ok) { *output =... CloseHandle(cp); free(buf); return ok; Create completion port to synchronize with IOs Main loop, issuing IO operations Drain the completion port, waiting for all IOs to be done and asynchronous IO is not composable: ProcessFile is just a normal synchronous function (And this is with a lot of error handling code omitted) 15/11/2011 3

4 AC: asynchronous C Synchronous IO Easy to use Poor perf AC: similar programming model to synchronous, similar performance to async Async IO 1. Focus on performance and ability for use in low level code (e.g., monitor process in Barrelfish research OS) Difficult to use 2. Small number of constructs: starting async work, synchronize with async work, stop Good waiting perf for async work 3. Enable higher level features to be built over this join patterns, futures, 15/11/2011 4

5 AC: almost the same as synchronous BOOL ProcessFileAC(HANDLE h, int *output) { BOOL ok = TRUE; DWORD size = FileSize(h); LPVOID buf = malloc(size); do { for (int i = 0; ok && i<n; i++) { async { ok &= ReadFileAC(h, i*1024, buf+(i*1024), 1024); finish; if (ok) { *output = free(buf); return ok; 1. Use the AC primitives for data access 2. Use async to identify work that can continue while waiting: { async X ; Y start X and, if it blocks, continue with Y 3. Add finish to identify synchronization: do { X finish don t go past finish until all the async work in X is done 15/11/2011 5

6 AC is composable We can build layered abstractions using AC: user code can be called asynchronously, not just primitives: BOOL ProcessTwoFilesAC(HANDLE h1, HANDLE h2, int *output) { int o1, o2; BOOL ok = TRUE; do { async { ok &= ProcessFileAC(h1, &o1); async { ok &= ProcessFileAC(h2, &o2); finish; if (ok) *output = return ok; 15/11/2011 6

7 Making multiple requests // Counting semaphore to limit requests SemAC_t sem; SemACInit(&sem, 100); // Loop potentially starting lots of work do { for (int i = 0; i < ; i++) { P_SemAC(&sem); async { V_SemAC(&sem); finish; // Do work 15/11/2011 7

8 Cancellation Suppose we want to add a timeout 2. It marks the named do..finish as cancelled, and pokes blocked BOOL TryProcessFileAC(HANDLE h, int timeout, 1. Suppose int *result) we execute operations dynamically within it { BOOL ok = TRUE; this cancel statement with_timoeut: do { async { ok = ProcessFileAC(h, result); cancel with_timeout; async { SleepAC(result); cancel with_timeout; finish; return ok; 5. The finish works as before: wait until both asyncs are done 3. SleepAC is woken and terminates early with FALSE 4. This second cancellation has no effect 15/11/2011 8

9 Cancellation BOOL ProcessFileAC(HANDLE h, int *output) { BOOL ok = TRUE; DWORD size = FileSize(h); 2. It pokes blocked LPVOID buf = malloc(size); operations in the do { 3. Cancellation BOOL TryProcessFileAC(HANDLE named do finish for (int i = h, 0; int ok propagates timeout, && i<n; i++) into *result) { { BOOL ok = TRUE; async { ProcessFileAC with_timoeut: do { ok &= ReadFileAC(h, i*1024, buf+(i*1024), 1024); async { ok = ProcessFileAC(h, result); cancel with_timeout; async { SleepAC(result); finish; cancel with_timeout; finish; if (ok) { return ok; *output = Suppose we want to add a timeout 4. The result of ProcessFileAC free(buf); forms the overall return result ok; 1. Suppose we execute this cancel statement 15/11/2011 9

10 Exact cancellation A convention: provided by built primitives, recommended for user code An *AC function either Returns TRUE if it completes successfully without cancellation Returns FALSE and undoes its partial side effects if it is cancelled Easier said than done 1. Remember cancellation is co operative 2. *NAC non cancellable primitives 3. For responsiveness, push clean up work to a thread pool 10

11 Implementation Thread Run queue Current FB Finish block (FB) ACTV Enclosing FB Count = 0 Completion AWI Stack main First Last Techniques: Per thread run queue of atomic work items ( AWIs ) Wait for next IO completion when AWI queue empty Book keeping data all stack allocated Cactus stack implementation, creating a new stack lazily when blocking (rather than eagerly on entering an async) 15/11/

12 Performance Function call latency (cycles) Direct (normal function call) async X() (X does not block) async X() (X blocks) Do not fear async Think about correctness: if the callee doesn t block then perf is basically unchanged 15/11/

13 Performance: Barrelfish ping pong test Using ring-buffer channel directly Using portable event-based interface Using AC (one side only) Using AC (both sides) MPI (Visual Studio HPC-Pack 2008 SDK) Latency (cycles) Ping pong test Minimum sized messages AMD 4 * 4-core machine Using cores sharing L3 cache 15/11/

14 2 phase commit between cores AC Async Sync 19 event handlers, 5 temporary structures 15/11/

15 Software RAID0 model, 64KB transfers 2 * Intel X25 M SSDs Each 250MB/s AC Async Sync 15/11/

16 Performance (embedded web server) Apache benchmark workload driver Running across WAN between CA and UK 15/11/

17 Summary AC: composable asynchronous IO for native languages Similar programming style to synchronous IO APIs Similar performance for asynchronous IO APIs Also in the paper: Operational semantics for core language Integration of IO with runtime system Available as part of the Barrelfish research OS We re hiring: 15/11/

18 Microsoft Corporation. All rights reserved. This material is provided for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft is a registered trademark or trademark of Microsoft Corporation in the United States and/or other countries.

Extensions to Barrelfish Asynchronous C

Extensions to Barrelfish Asynchronous C Extensions to Barrelfish Asynchronous C Michael Quigley michaelforrquigley@gmail.com School of Computing, University of Utah October 27, 2016 1 Abstract The intent of the Microsoft Barrelfish Asynchronous

More information

AC: Composable Asynchronous IO for Native Languages

AC: Composable Asynchronous IO for Native Languages AC: Composable Asynchronous IO for Native Languages Tim Harris Martín Abadi Rebecca Isaacs Ross McIlroy Microsoft Research, Cambridge Microsoft Research, Silicon Valley University of California, Santa

More information

CS 153 Design of Operating Systems Winter 2016

CS 153 Design of Operating Systems Winter 2016 CS 153 Design of Operating Systems Winter 2016 Lecture 7: Synchronization Administrivia Homework 1 Due today by the end of day Hopefully you have started on project 1 by now? Kernel-level threads (preemptable

More information

INTERFACING REAL-TIME AUDIO AND FILE I/O

INTERFACING REAL-TIME AUDIO AND FILE I/O INTERFACING REAL-TIME AUDIO AND FILE I/O Ross Bencina portaudio.com rossbencina.com rossb@audiomulch.com @RossBencina Example source code: https://github.com/rossbencina/realtimefilestreaming Q: How to

More information

Remote Procedure Call (RPC) and Transparency

Remote Procedure Call (RPC) and Transparency Remote Procedure Call (RPC) and Transparency Brad Karp UCL Computer Science CS GZ03 / M030 10 th October 2014 Transparency in Distributed Systems Programmers accustomed to writing code for a single box

More information

Ben Walker Data Center Group Intel Corporation

Ben Walker Data Center Group Intel Corporation Ben Walker Data Center Group Intel Corporation Notices and Disclaimers Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service activation.

More information

CSE 153 Design of Operating Systems

CSE 153 Design of Operating Systems CSE 153 Design of Operating Systems Winter 19 Lecture 7/8: Synchronization (1) Administrivia How is Lab going? Be prepared with questions for this weeks Lab My impression from TAs is that you are on track

More information

GPUfs: Integrating a file system with GPUs

GPUfs: Integrating a file system with GPUs GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Traditional System Architecture Applications OS CPU

More information

CSE 153 Design of Operating Systems Fall 2018

CSE 153 Design of Operating Systems Fall 2018 CSE 153 Design of Operating Systems Fall 2018 Lecture 5: Threads/Synchronization Implementing threads l Kernel Level Threads l u u All thread operations are implemented in the kernel The OS schedules all

More information

Lecture 5: Synchronization w/locks

Lecture 5: Synchronization w/locks Lecture 5: Synchronization w/locks CSE 120: Principles of Operating Systems Alex C. Snoeren Lab 1 Due 10/19 Threads Are Made to Share Global variables and static objects are shared Stored in the static

More information

RDMA-like VirtIO Network Device for Palacios Virtual Machines

RDMA-like VirtIO Network Device for Palacios Virtual Machines RDMA-like VirtIO Network Device for Palacios Virtual Machines Kevin Pedretti UNM ID: 101511969 CS-591 Special Topics in Virtualization May 10, 2012 Abstract This project developed an RDMA-like VirtIO network

More information

GPUfs: Integrating a file system with GPUs

GPUfs: Integrating a file system with GPUs GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Building systems with GPUs is hard. Why? 2 Goal of

More information

GPUfs: Integrating a file system with GPUs

GPUfs: Integrating a file system with GPUs ASPLOS 2013 GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Traditional System Architecture Applications

More information

Operating System Design Issues. I/O Management

Operating System Design Issues. I/O Management I/O Management Chapter 5 Operating System Design Issues Efficiency Most I/O devices slow compared to main memory (and the CPU) Use of multiprogramming allows for some processes to be waiting on I/O while

More information

1. With respect to space, linked lists are and arrays are.

1. With respect to space, linked lists are and arrays are. 1. With respect to space, linked lists are and arrays are. 1. With respect to space, linked lists are good and arrays are bad. 2. With respect to time, linked lists are and arrays are. 2. With respect

More information

Beyond Block I/O: Rethinking

Beyond Block I/O: Rethinking Beyond Block I/O: Rethinking Traditional Storage Primitives Xiangyong Ouyang *, David Nellans, Robert Wipfel, David idflynn, D. K. Panda * * The Ohio State University Fusion io Agenda Introduction and

More information

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 22: Remote Procedure Call (RPC)

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 22: Remote Procedure Call (RPC) CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2002 Lecture 22: Remote Procedure Call (RPC) 22.0 Main Point Send/receive One vs. two-way communication Remote Procedure

More information

Chapter 3 Process Description and Control

Chapter 3 Process Description and Control Operating Systems: Internals and Design Principles Chapter 3 Process Description and Control Seventh Edition By William Stallings Example of Standard API Consider the ReadFile() function in the Win32 API

More information

Scalable Software Transactional Memory for Chapel High-Productivity Language

Scalable Software Transactional Memory for Chapel High-Productivity Language Scalable Software Transactional Memory for Chapel High-Productivity Language Srinivas Sridharan and Peter Kogge, U. Notre Dame Brad Chamberlain, Cray Inc Jeffrey Vetter, Future Technologies Group, ORNL

More information

Systèmes d Exploitation Avancés

Systèmes d Exploitation Avancés Systèmes d Exploitation Avancés Instructor: Pablo Oliveira ISTY Instructor: Pablo Oliveira (ISTY) Systèmes d Exploitation Avancés 1 / 32 Review : Thread package API tid thread create (void (*fn) (void

More information

Predictable Programming on a Precision Timed Architecture

Predictable Programming on a Precision Timed Architecture Predictable Programming on a Precision Timed Architecture Ben Lickly, Isaac Liu, Hiren Patel, Edward Lee, University of California, Berkeley Sungjun Kim, Stephen Edwards, Columbia University, New York

More information

LIQUIDVR TODAY AND TOMORROW GUENNADI RIGUER, SOFTWARE ARCHITECT

LIQUIDVR TODAY AND TOMORROW GUENNADI RIGUER, SOFTWARE ARCHITECT LIQUIDVR TODAY AND TOMORROW GUENNADI RIGUER, SOFTWARE ARCHITECT Bootstrapping the industry for better VR experience Complimentary to HMD SDKs It s all about giving developers the tools they want! AMD LIQUIDVR

More information

Breaking Down Barriers: An Intro to GPU Synchronization. Matt Pettineo Lead Engine Programmer Ready At Dawn Studios

Breaking Down Barriers: An Intro to GPU Synchronization. Matt Pettineo Lead Engine Programmer Ready At Dawn Studios Breaking Down Barriers: An Intro to GPU Synchronization Matt Pettineo Lead Engine Programmer Ready At Dawn Studios Who am I? Ready At Dawn for 9 years Lead Engine Programmer for 5 I like GPUs and APIs!

More information

Control Abstraction. Hwansoo Han

Control Abstraction. Hwansoo Han Control Abstraction Hwansoo Han Review of Static Allocation Static allocation strategies Code Global variables Own variables (live within an encapsulation - static in C) Explicit constants (including strings,

More information

CODE TIME TECHNOLOGIES. Abassi RTOS. CMSIS Version 3.0 RTOS API

CODE TIME TECHNOLOGIES. Abassi RTOS. CMSIS Version 3.0 RTOS API CODE TIME TECHNOLOGIES Abassi RTOS CMSIS Version 3.0 RTOS API Copyright Information This document is copyright Code Time Technologies Inc. 2011-2013. All rights reserved. No part of this document may be

More information

Exception-Less System Calls for Event-Driven Servers

Exception-Less System Calls for Event-Driven Servers Exception-Less System Calls for Event-Driven Servers Livio Soares and Michael Stumm University of Toronto Talk overview At OSDI'10: exception-less system calls Technique targeted at highly threaded servers

More information

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 4, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts

More information

Multi-core Architecture and Programming

Multi-core Architecture and Programming Multi-core Architecture and Programming Yang Quansheng( 杨全胜 ) http://www.njyangqs.com School of Computer Science & Engineering 1 http://www.njyangqs.com Process, threads and Parallel Programming Content

More information

Operating System Services

Operating System Services CSE325 Principles of Operating Systems Operating System Services David Duggan dduggan@sandia.gov January 22, 2013 Reading Assignment 3 Chapter 3, due 01/29 1/23/13 CSE325 - OS Services 2 What Categories

More information

Flash: an efficient and portable web server

Flash: an efficient and portable web server Flash: an efficient and portable web server High Level Ideas Server performance has several dimensions Lots of different choices on how to express and effect concurrency in a program Paper argues that

More information

Xen and the Art of Virtualization. CSE-291 (Cloud Computing) Fall 2016

Xen and the Art of Virtualization. CSE-291 (Cloud Computing) Fall 2016 Xen and the Art of Virtualization CSE-291 (Cloud Computing) Fall 2016 Why Virtualization? Share resources among many uses Allow heterogeneity in environments Allow differences in host and guest Provide

More information

Lightweight Concurrency Primitives for GHC. Peng Li Simon Peyton Jones Andrew Tolmach Simon Marlow

Lightweight Concurrency Primitives for GHC. Peng Li Simon Peyton Jones Andrew Tolmach Simon Marlow Lightweight Concurrency Primitives for GHC Peng Li Simon Peyton Jones Andrew Tolmach Simon Marlow The Problem GHC has rich support for concurrency & parallelism: Lightweight threads (fast) Transparent

More information

Distributed Systems. Distributed Shared Memory. Paul Krzyzanowski

Distributed Systems. Distributed Shared Memory. Paul Krzyzanowski Distributed Systems Distributed Shared Memory Paul Krzyzanowski pxk@cs.rutgers.edu Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.

More information

JANUARY 28, 2014, SAN JOSE, CA. Microsoft Lead Partner Architect OS Vendors: What NVM Means to Them

JANUARY 28, 2014, SAN JOSE, CA. Microsoft Lead Partner Architect OS Vendors: What NVM Means to Them JANUARY 28, 2014, SAN JOSE, CA PRESENTATION James TITLE Pinkerton GOES HERE Microsoft Lead Partner Architect OS Vendors: What NVM Means to Them Why should NVM be Interesting to OS Vendors? New levels of

More information

Disruptor Using High Performance, Low Latency Technology in the CERN Control System

Disruptor Using High Performance, Low Latency Technology in the CERN Control System Disruptor Using High Performance, Low Latency Technology in the CERN Control System ICALEPCS 2015 21/10/2015 2 The problem at hand 21/10/2015 WEB3O03 3 The problem at hand CESAR is used to control the

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604

More information

Caching and Demand-Paged Virtual Memory

Caching and Demand-Paged Virtual Memory Caching and Demand-Paged Virtual Memory Definitions Cache Copy of data that is faster to access than the original Hit: if cache has copy Miss: if cache does not have copy Cache block Unit of cache storage

More information

Operating System: Chap2 OS Structure. National Tsing-Hua University 2016, Fall Semester

Operating System: Chap2 OS Structure. National Tsing-Hua University 2016, Fall Semester Operating System: Chap2 OS Structure National Tsing-Hua University 2016, Fall Semester Outline OS Services OS-Application Interface OS Structure Chapter2 OS-Structure Operating System Concepts NTHU LSA

More information

Composable Shared Memory Transactions Lecture 20-2

Composable Shared Memory Transactions Lecture 20-2 Composable Shared Memory Transactions Lecture 20-2 April 3, 2008 This was actually the 21st lecture of the class, but I messed up the naming of subsequent notes files so I ll just call this one 20-2. This

More information

JavaScript. The Bad Parts. Patrick Behr

JavaScript. The Bad Parts. Patrick Behr JavaScript The Bad Parts Patrick Behr History Created in 1995 by Netscape Originally called Mocha, then LiveScript, then JavaScript It s not related to Java ECMAScript is the official name Many implementations

More information

Parallel Programming in Distributed Systems Or Distributed Systems in Parallel Programming

Parallel Programming in Distributed Systems Or Distributed Systems in Parallel Programming Parallel Programming in Distributed Systems Or Distributed Systems in Parallel Programming Philippas Tsigas Chalmers University of Technology Computer Science and Engineering Department Philippas Tsigas

More information

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012) Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues

More information

Linux Storage System Analysis for e.mmc With Command Queuing

Linux Storage System Analysis for e.mmc With Command Queuing Linux Storage System Analysis for e.mmc With Command Queuing Linux is a widely used embedded OS that also manages block devices such as e.mmc, UFS and SSD. Traditionally, advanced embedded systems have

More information

CSL373/CSL633 Major Exam Solutions Operating Systems Sem II, May 6, 2013 Answer all 8 questions Max. Marks: 56

CSL373/CSL633 Major Exam Solutions Operating Systems Sem II, May 6, 2013 Answer all 8 questions Max. Marks: 56 CSL373/CSL633 Major Exam Solutions Operating Systems Sem II, 2012 13 May 6, 2013 Answer all 8 questions Max. Marks: 56 1. True or False. Give reasons and/or brief explanation. No marks for incomplete/wrong

More information

COS 318: Operating Systems. Virtual Machine Monitors

COS 318: Operating Systems. Virtual Machine Monitors COS 318: Operating Systems Virtual Machine Monitors Prof. Margaret Martonosi Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall11/cos318/ Announcements Project

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)] ECE7995 (6) Improving Cache Performance [Adapted from Mary Jane Irwin s slides (PSU)] Measuring Cache Performance Assuming cache hit costs are included as part of the normal CPU execution cycle, then CPU

More information

Swapping. Operating Systems I. Swapping. Motivation. Paging Implementation. Demand Paging. Active processes use more physical memory than system has

Swapping. Operating Systems I. Swapping. Motivation. Paging Implementation. Demand Paging. Active processes use more physical memory than system has Swapping Active processes use more physical memory than system has Operating Systems I Address Binding can be fixed or relocatable at runtime Swap out P P Virtual Memory OS Backing Store (Swap Space) Main

More information

CALIFORNIA SOFTWARE LABS

CALIFORNIA SOFTWARE LABS Using the JetSend SDK CALIFORNIA SOFTWARE LABS R E A L I Z E Y O U R I D E A S California Software Labs 6800 Koll Center Parkway, Suite 100 Pleasanton CA 94566, USA. Phone (925) 249 3000 Fax (925) 426

More information

Final Exam Solutions May 11, 2012 CS162 Operating Systems

Final Exam Solutions May 11, 2012 CS162 Operating Systems University of California, Berkeley College of Engineering Computer Science Division EECS Spring 2012 Anthony D. Joseph and Ion Stoica Final Exam May 11, 2012 CS162 Operating Systems Your Name: SID AND

More information

AD910A M.2 (NGFF) to SATA III Converter Card

AD910A M.2 (NGFF) to SATA III Converter Card MINERVA AD910A M.2 (NGFF) to SATA III Converter Card Performance & Burn In Test Rev. 1.0 Table of Contents 1. Overview 2. Performance Measurement Tools and Results 2.1 Test Platform 2.2 Test target and

More information

Order Is A Lie. Are you sure you know how your code runs?

Order Is A Lie. Are you sure you know how your code runs? Order Is A Lie Are you sure you know how your code runs? Order in code is not respected by Compilers Processors (out-of-order execution) SMP Cache Management Understanding execution order in a multithreaded

More information

Vulkan (including Vulkan Fast Paths)

Vulkan (including Vulkan Fast Paths) Vulkan (including Vulkan Fast Paths) Łukasz Migas Software Development Engineer WS Graphics Let s talk about OpenGL (a bit) History 1.0-1992 1.3-2001 multitexturing 1.5-2003 vertex buffer object 2.0-2004

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Marble House The Knife (Silent Shout) Before starting The Knife, we were working

More information

Operating Systems. Synchronization

Operating Systems. Synchronization Operating Systems Fall 2014 Synchronization Myungjin Lee myungjin.lee@ed.ac.uk 1 Temporal relations Instructions executed by a single thread are totally ordered A < B < C < Absent synchronization, instructions

More information

Lecture 4: Threads; weaving control flow

Lecture 4: Threads; weaving control flow Lecture 4: Threads; weaving control flow CSE 120: Principles of Operating Systems Alex C. Snoeren HW 1 Due NOW Announcements Homework #1 due now Project 0 due tonight Project groups Please send project

More information

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 17 November 2017

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 17 November 2017 NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 17 November 2017 Lecture 7 Linearizability Lock-free progress properties Hashtables and skip-lists Queues Reducing contention Explicit

More information

Java Internals. Frank Yellin Tim Lindholm JavaSoft

Java Internals. Frank Yellin Tim Lindholm JavaSoft Java Internals Frank Yellin Tim Lindholm JavaSoft About This Talk The JavaSoft implementation of the Java Virtual Machine (JDK 1.0.2) Some companies have tweaked our implementation Alternative implementations

More information

PCI-EK01 Driver Level Programming Guide

PCI-EK01 Driver Level Programming Guide PCI-EK01 Driver Level Programming Guide Windows, Windows2000, Windows NT and Windows XP are trademarks of Microsoft. We acknowledge that the trademarks or service names of all other organizations mentioned

More information

Asynchronous OSGi: Promises for the masses. Tim Ward.

Asynchronous OSGi: Promises for the masses. Tim Ward. Asynchronous OSGi: Promises for the masses Tim Ward http://www.paremus.com info@paremus.com Who is Tim Ward? @TimothyWard Senior Consulting Engineer, Trainer and Architect at Paremus 5 years at IBM developing

More information

Design Principles for End-to-End Multicore Schedulers

Design Principles for End-to-End Multicore Schedulers c Systems Group Department of Computer Science ETH Zürich HotPar 10 Design Principles for End-to-End Multicore Schedulers Simon Peter Adrian Schüpbach Paul Barham Andrew Baumann Rebecca Isaacs Tim Harris

More information

Threads Implementation. Jo, Heeseung

Threads Implementation. Jo, Heeseung Threads Implementation Jo, Heeseung Today's Topics How to implement threads? User-level threads Kernel-level threads Threading models 2 Kernel/User-level Threads Who is responsible for creating/managing

More information

Chapter 6: Process Synchronization. Module 6: Process Synchronization

Chapter 6: Process Synchronization. Module 6: Process Synchronization Chapter 6: Process Synchronization Module 6: Process Synchronization Background The Critical-Section Problem Peterson s Solution Synchronization Hardware Semaphores Classic Problems of Synchronization

More information

Minerva. Performance & Burn In Test Rev AD903A/AD903D Converter Card. Table of Contents. 1. Overview

Minerva. Performance & Burn In Test Rev AD903A/AD903D Converter Card. Table of Contents. 1. Overview Minerva AD903A/AD903D Converter Card Performance & Burn In Test Rev. 1.0 Table of Contents 1. Overview 2. Performance Measurement Tools and Results 2.1 Test Platform 2.2 Test target and Used SATA III SSD

More information

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona MySQL Performance Optimization and Troubleshooting with PMM Peter Zaitsev, CEO, Percona In the Presentation Practical approach to deal with some of the common MySQL Issues 2 Assumptions You re looking

More information

CS 167 Final Exam Solutions

CS 167 Final Exam Solutions CS 167 Final Exam Solutions Spring 2018 Do all questions. 1. [20%] This question concerns a system employing a single (single-core) processor running a Unix-like operating system, in which interrupts are

More information

Martin Kruliš, v

Martin Kruliš, v Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal

More information

ParalleX. A Cure for Scaling Impaired Parallel Applications. Hartmut Kaiser

ParalleX. A Cure for Scaling Impaired Parallel Applications. Hartmut Kaiser ParalleX A Cure for Scaling Impaired Parallel Applications Hartmut Kaiser (hkaiser@cct.lsu.edu) 2 Tianhe-1A 2.566 Petaflops Rmax Heterogeneous Architecture: 14,336 Intel Xeon CPUs 7,168 Nvidia Tesla M2050

More information

3/7/2018. Sometimes, Knowing Which Thing is Enough. ECE 220: Computer Systems & Programming. Often Want to Group Data Together Conceptually

3/7/2018. Sometimes, Knowing Which Thing is Enough. ECE 220: Computer Systems & Programming. Often Want to Group Data Together Conceptually University of Illinois at Urbana-Champaign Dept. of Electrical and Computer Engineering ECE 220: Computer Systems & Programming Structured Data in C Sometimes, Knowing Which Thing is Enough In MP6, we

More information

Dell EMC CIFS-ECS Tool

Dell EMC CIFS-ECS Tool Dell EMC CIFS-ECS Tool Architecture Overview, Performance and Best Practices March 2018 A Dell EMC Technical Whitepaper Revisions Date May 2016 September 2016 Description Initial release Renaming of tool

More information

Concurrent & Distributed Systems Supervision Exercises

Concurrent & Distributed Systems Supervision Exercises Concurrent & Distributed Systems Supervision Exercises Stephen Kell Stephen.Kell@cl.cam.ac.uk November 9, 2009 These exercises are intended to cover all the main points of understanding in the lecture

More information

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1 Memory Management Disclaimer: some slides are adopted from book authors slides with permission 1 CPU management Roadmap Process, thread, synchronization, scheduling Memory management Virtual memory Disk

More information

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 21 November 2014

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 21 November 2014 NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 21 November 2014 Lecture 7 Linearizability Lock-free progress properties Queues Reducing contention Explicit memory management Linearizability

More information

Simplify Software Integration for FPGA Accelerators with OPAE

Simplify Software Integration for FPGA Accelerators with OPAE white paper Intel FPGA Simplify Software Integration for FPGA Accelerators with OPAE Cross-Platform FPGA Programming Layer for Application Developers Authors Enno Luebbers Senior Software Engineer Intel

More information

Final Exam Preparation Questions

Final Exam Preparation Questions EECS 678 Spring 2013 Final Exam Preparation Questions 1 Chapter 6 1. What is a critical section? What are the three conditions to be ensured by any solution to the critical section problem? 2. The following

More information

iseries Job Attributes

iseries Job Attributes iseries Job Attributes iseries Job Attributes Copyright ternational Business Machines Corporation 5. All rights reserved. US Government Users Restricted Rights Use, duplication or disclosure restricted

More information

ECE 574 Cluster Computing Lecture 8

ECE 574 Cluster Computing Lecture 8 ECE 574 Cluster Computing Lecture 8 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 16 February 2017 Announcements Too many snow days Posted a video with HW#4 Review HW#5 will

More information

Memory Management Outline. Operating Systems. Motivation. Paging Implementation. Accessing Invalid Pages. Performance of Demand Paging

Memory Management Outline. Operating Systems. Motivation. Paging Implementation. Accessing Invalid Pages. Performance of Demand Paging Memory Management Outline Operating Systems Processes (done) Memory Management Basic (done) Paging (done) Virtual memory Virtual Memory (Chapter.) Motivation Logical address space larger than physical

More information

Computer Science 161

Computer Science 161 Computer Science 161 150 minutes/150 points Fill in your name, logname, and TF s name below. Name Logname The points allocated to each problem correspond to how much time we think the problem should take.

More information

CPS 310 first midterm exam, 10/6/2014

CPS 310 first midterm exam, 10/6/2014 CPS 310 first midterm exam, 10/6/2014 Your name please: Part 1. More fun with fork and exec* What is the output generated by this program? Please assume that each executed print statement completes, e.g.,

More information

CSL373/CSL633 Major Exam Operating Systems Sem II, May 6, 2013 Answer all 8 questions Max. Marks: 56

CSL373/CSL633 Major Exam Operating Systems Sem II, May 6, 2013 Answer all 8 questions Max. Marks: 56 CSL373/CSL633 Major Exam Operating Systems Sem II, 2012 13 May 6, 2013 Answer all 8 questions Max. Marks: 56 1. True or False. Give reasons and/or brief explanation. No marks for incomplete/wrong explanation.

More information

CS 134: Operating Systems

CS 134: Operating Systems CS 134: Operating Systems Locks and Low-Level Synchronization CS 134: Operating Systems Locks and Low-Level Synchronization 1 / 25 2 / 25 Overview Overview Overview Low-Level Synchronization Non-Blocking

More information

Agenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based?

Agenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based? Agenda Designing Transactional Memory Systems Part III: Lock-based STMs Pascal Felber University of Neuchatel Pascal.Felber@unine.ch Part I: Introduction Part II: Obstruction-free STMs Part III: Lock-based

More information

SPDK China Summit Ziye Yang. Senior Software Engineer. Network Platforms Group, Intel Corporation

SPDK China Summit Ziye Yang. Senior Software Engineer. Network Platforms Group, Intel Corporation SPDK China Summit 2018 Ziye Yang Senior Software Engineer Network Platforms Group, Intel Corporation Agenda SPDK programming framework Accelerated NVMe-oF via SPDK Conclusion 2 Agenda SPDK programming

More information

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts. CPSC/ECE 3220 Fall 2017 Exam 1 Name: 1. Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.) Referee / Illusionist / Glue. Circle only one of R, I, or G.

More information

Myths and reality of communication/computation overlap in MPI applications

Myths and reality of communication/computation overlap in MPI applications Myths and reality of communication/computation overlap in MPI applications Alessandro Fanfarillo National Center for Atmospheric Research Boulder, Colorado, USA elfanfa@ucar.edu Oct 12th, 2017 (elfanfa@ucar.edu)

More information

EbbRT: A Framework for Building Per-Application Library Operating Systems

EbbRT: A Framework for Building Per-Application Library Operating Systems EbbRT: A Framework for Building Per-Application Library Operating Systems Overview Motivation Objectives System design Implementation Evaluation Conclusion Motivation Emphasis on CPU performance and software

More information

10/17/ Gribble, Lazowska, Levy, Zahorjan 2. 10/17/ Gribble, Lazowska, Levy, Zahorjan 4

10/17/ Gribble, Lazowska, Levy, Zahorjan 2. 10/17/ Gribble, Lazowska, Levy, Zahorjan 4 Temporal relations CSE 451: Operating Systems Autumn 2010 Module 7 Synchronization Instructions executed by a single thread are totally ordered A < B < C < Absent synchronization, instructions executed

More information

MINERVA. Performance & Burn In Test Rev AD912A Interposer Card. Table of Contents. 1. Overview

MINERVA. Performance & Burn In Test Rev AD912A Interposer Card. Table of Contents. 1. Overview MINERVA AD912A Interposer Card Performance & Burn In Test Rev. 1.0 Table of Contents 1. Overview 2. Performance Measurement Tools and Results 2.1 Test Platform 2.2 Test target and Used msata III SSD 2.3

More information

Lustre * Features In Development Fan Yong High Performance Data Division, Intel CLUG

Lustre * Features In Development Fan Yong High Performance Data Division, Intel CLUG Lustre * Features In Development Fan Yong High Performance Data Division, Intel CLUG 2017 @Beijing Outline LNet reliability DNE improvements Small file performance File Level Redundancy Miscellaneous improvements

More information

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant

More information

Programming with MPI

Programming with MPI Programming with MPI p. 1/?? Programming with MPI One-sided Communication Nick Maclaren nmm1@cam.ac.uk October 2010 Programming with MPI p. 2/?? What Is It? This corresponds to what is often called RDMA

More information

Threads in Java. Threads (part 2) 1/18

Threads in Java. Threads (part 2) 1/18 in Java are part of the Java language. There are two ways to create a new thread of execution. Declare a class to be a subclass of Thread. This subclass should override the run method of class Thread.

More information

Threads, Synchronization, and Scheduling. Eric Wu

Threads, Synchronization, and Scheduling. Eric Wu Threads, Synchronization, and Scheduling Eric Wu (ericwu@cs) Topics for Today Project 2 Due tomorrow! Project 3 Due Feb. 17 th! Threads Synchronization Scheduling Project 2 Troubleshooting: Stock kernel

More information

Processors, Performance, and Profiling

Processors, Performance, and Profiling Processors, Performance, and Profiling Architecture 101: 5-Stage Pipeline Fetch Decode Execute Memory Write-Back Registers PC FP ALU Memory Architecture 101 1. Fetch instruction from memory. 2. Decode

More information

Non-Blocking Writes to Files

Non-Blocking Writes to Files Non-Blocking Writes to Files Daniel Campello, Hector Lopez, Luis Useche 1, Ricardo Koller 2, and Raju Rangaswami 1 Google, Inc. 2 IBM TJ Watson Memory Memory Synchrony vs Asynchrony Applications have different

More information

Design Patterns for Real-Time Computer Music Systems

Design Patterns for Real-Time Computer Music Systems Design Patterns for Real-Time Computer Music Systems Roger B. Dannenberg and Ross Bencina 4 September 2005 This document contains a set of design patterns for real time systems, particularly for computer

More information

CS510 Advanced Topics in Concurrency. Jonathan Walpole

CS510 Advanced Topics in Concurrency. Jonathan Walpole CS510 Advanced Topics in Concurrency Jonathan Walpole Threads Cannot Be Implemented as a Library Reasoning About Programs What are the valid outcomes for this program? Is it valid for both r1 and r2 to

More information

MPI: A Message-Passing Interface Standard

MPI: A Message-Passing Interface Standard MPI: A Message-Passing Interface Standard Version 2.1 Message Passing Interface Forum June 23, 2008 Contents Acknowledgments xvl1 1 Introduction to MPI 1 1.1 Overview and Goals 1 1.2 Background of MPI-1.0

More information

Windows 7 Overview. Windows 7. Objectives. The History of Windows. CS140M Fall Lake 1

Windows 7 Overview. Windows 7. Objectives. The History of Windows. CS140M Fall Lake 1 Windows 7 Overview Windows 7 Overview By Al Lake History Design Principles System Components Environmental Subsystems File system Networking Programmer Interface Lake 2 Objectives To explore the principles

More information