JCTools: Pit Of Performance Nitsan Copyright - TTNR Labs Limited 2018
|
|
- Noah McDowell
- 5 years ago
- Views:
Transcription
1 JCTools: Pit Of Performance Nitsan 1
2 You Look Familiar... 2
3 Why me? Main JCTools contributor Performance Eng. Find me on: GitHub : nitsanw Blog : psy-lob-saw.blogspot.com Twitter : nitsanw Also: Cape Town JUG Organizer 3
4 JCTools? Concurrent Queues: SPSC/MPSC/SPMC/MPMC Linked/Array/LinkedArray/Compound And MOAR!!! 4
5 What is this about? Optimizations as found in the code Design pressures/choices Novel algorithm: MPSC linked queues 5
6 From the codebase... Layout Unsafe Concurrency specialization Bits and bobs 6
7 Layout abstract class...l1pad<e> extends...coldfield<e> { long p,p1,p2,p3,p4,p5,p6,p7, p8,p9,p1,p11,p12,p13,p14; } abstract class...producerindexfields<e> extends...l1pad<e> { private volatile long producerindex; protected long producerlimit; } abstract class...l2pad<e> extends...producerindexfields<e> { long p,p1,p2,p3,p4,p5,p6,p7, p8,p9,p1,p11,p12,p13,p14; } 7
8 Layout abstract class...l1pad<e> extends...coldfield<e> { } long p,p1,p2,p3,p4,p5,p6,p7, p8,p9,p1,p11,p12,p13,p14; abstract class...producerindexfields<e> extends...l1pad<e> { private volatile long producerindex; protected long producerlimit; } abstract class...l2pad<e> extends...producerindexfields<e> { } long p,p1,p2,p3,p4,p5,p6,p7, p8,p9,p1,p11,p12,p13,p14; 8
9 Measure with/with out: With padding: ~95 ns/op Sans padding: ~5 ns/op *QueueBurst, 1 messages benchmark 9
10 Java Object Layout 8b aligned Object header (8/12/16b) Sub-classes fields after parent fields Field order within class can change 1
11 Let s use JOL!!! # Running 64-bit HotSpot VM. # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # WARNING Compressed references base/shifts are guessed by the experiment! # WARNING Therefore, computed addresses are just guesses, and ARE NOT RELIABLE. # WARNING Make sure to attach Serviceability Agent to get the reliable addresses. # Objects are 8 bytes aligned. # Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes] # Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes] Instantiated the sample instance via public org.jctools.queues.ffbuffer(int) org.jctools.queues.spscarrayqueue object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 4 (object header) 1 (1 ) (1) 4 4 (object header) ( ) () 8 4 (object header) f8 ( ) ( ) 12 4 (alignment/padding gap) 16 8 long ConcurrentCircularArrayQueueLPad.p 24 8 long ConcurrentCircularArrayQueueLPad.p long ConcurrentCircularArrayQueueLPad.p2 4 8 long ConcurrentCircularArrayQueueLPad.p long ConcurrentCircularArrayQueueLPad.p long ConcurrentCircularArrayQueueLPad.p long ConcurrentCircularArrayQueueLPad.p long ConcurrentCircularArrayQueueLPad.p7 8 8 long ConcurrentCircularArrayQueueLPad.p long ConcurrentCircularArrayQueueLPad.p long ConcurrentCircularArrayQueueLPad.p long ConcurrentCircularArrayQueueLPad.p long ConcurrentCircularArrayQueueLPad.p long ConcurrentCircularArrayQueueLPad.p long ConcurrentCircularArrayQueueLPad.p long ConcurrentCircularArrayQueue.mask java.lang.object[] ConcurrentCircularArrayQueue.buffer [null, null, null, null] int SpscArrayQueueColdField.lookAheadStep long SpscArrayQueueL1Pad.p 16 8 long SpscArrayQueueL1Pad.p long SpscArrayQueueL1Pad.p long SpscArrayQueueL1Pad.p long SpscArrayQueueL1Pad.p long SpscArrayQueueL1Pad.p5 2 8 long SpscArrayQueueL1Pad.p long SpscArrayQueueL1Pad.p long SpscArrayQueueL1Pad.p long SpscArrayQueueL1Pad.p long SpscArrayQueueL1Pad.p long SpscArrayQueueL1Pad.p long SpscArrayQueueL1Pad.p long SpscArrayQueueL1Pad.p long SpscArrayQueueL1Pad.p long SpscArrayQueueProducerIndexFields.producerIndex 28 8 long SpscArrayQueueProducerIndexFields.producerLimit long SpscArrayQueueL2Pad.p long SpscArrayQueueL2Pad.p long SpscArrayQueueL2Pad.p long SpscArrayQueueL2Pad.p long SpscArrayQueueL2Pad.p long SpscArrayQueueL2Pad.p long SpscArrayQueueL2Pad.p long SpscArrayQueueL2Pad.p long SpscArrayQueueL2Pad.p long SpscArrayQueueL2Pad.p long SpscArrayQueueL2Pad.p long SpscArrayQueueL2Pad.p long SpscArrayQueueL2Pad.p long SpscArrayQueueL2Pad.p long SpscArrayQueueL2Pad.p long SpscArrayQueueConsumerIndexField.consumerIndex long SpscArrayQueueL3Pad.p long SpscArrayQueueL3Pad.p long SpscArrayQueueL3Pad.p long SpscArrayQueueL3Pad.p long SpscArrayQueueL3Pad.p long SpscArrayQueueL3Pad.p long SpscArrayQueueL3Pad.p long SpscArrayQueueL3Pad.p long SpscArrayQueueL3Pad.p long SpscArrayQueueL3Pad.p long SpscArrayQueueL3Pad.p long SpscArrayQueueL3Pad.p long SpscArrayQueueL3Pad.p long SpscArrayQueueL3Pad.p long SpscArrayQueueL3Pad.p14 Instance size: 536 bytes 11
12 Let s use JOL!!! # Running 64-bit HotSpot VM. # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # Objects are 8 bytes aligned. # Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes] # Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes] 12
13 Let s use JOL!!! org.jctools.queues.spscarrayqueue object internals: OFFSET SIZE TYPE DESCRIPTION 12 (object header) 12 4 (alignment/padding gap) long ConcurrentCircularArrayQueueLPad.p [... p14] long ConcurrentCircularArrayQueue.mask java.lang.object[] ConcurrentCircularArrayQueue.buffer int SpscArrayQueueColdField.lookAheadStep long SpscArrayQueueL1Pad.p [... p14] long SpscArrayQueueProducerIndexFields.producerIndex 28 8 long SpscArrayQueueProducerIndexFields.producerLimit long SpscArrayQueueL2Pad.p [... p14] 48 8 long SpscArrayQueueConsumerIndexField.consumerIndex long SpscArrayQueueL3Pad.p [... p14] Instance size: 536 bytes 13
14 Let s use JOL!!! org.jctools.queues.spscarrayqueue object internals: OFFSET SIZE TYPE DESCRIPTION (object header) (alignment/padding gap) 12 long ConcurrentCircularArrayQueueLPad.p [... p14] 8 long ConcurrentCircularArrayQueue.mask 4 java.lang.object[] ConcurrentCircularArrayQueue.buffer 4 int SpscArrayQueueColdField.lookAheadStep 12 long SpscArrayQueueL1Pad.p [... p14] 8 long SpscArrayQueueProducerIndexFields.producerIndex 8 long SpscArrayQueueProducerIndexFields.producerLimit 12 long SpscArrayQueueL2Pad.p [... p14] 8 long SpscArrayQueueConsumerIndexField.consumerIndex 12 long SpscArrayQueueL3Pad.p [... p14] Instance size: 536 bytes 14
15 Let s use JOL!!! org.jctools.queues.spscarrayqueue object internals: OFFSET SIZE TYPE DESCRIPTION 12 (object header) 12 4 (alignment/padding gap) long ConcurrentCircularArrayQueue.mask java.lang.object[] ConcurrentCircularArrayQueue.buffer int SpscArrayQueueColdField.lookAheadStep long ConcurrentQueueLPad.p [... p14] long SpscArrayQueueProducerIndexFields.producerIndex long SpscArrayQueueProducerIndexFields.producerLimit 12 8 long SpscArrayQueueL1Pad.p [... p14] 12 long SpscArrayQueueL2Pad.p [... p14] long SpscArrayQueueConsumerIndexField.consumerIndex long SpscArrayQueueL3Pad.p [... p14] Instance size: 536 bytes 15
16 Let s use JOL!!! org.jctools.queues.spscarrayqueue object internals: OFFSET SIZE TYPE DESCRIPTION 12 (object header) 12 4 (alignment/padding gap) long ConcurrentCircularArrayQueueLPad.p [... p14] long ConcurrentCircularArrayQueue.mask java.lang.object[] ConcurrentCircularArrayQueue.buffer int SpscArrayQueueColdField.lookAheadStep long SpscArrayQueueL1Pad.p [... p14] long SpscArrayQueueProducerIndexFields.producerIndex 28 8 long SpscArrayQueueProducerIndexFields.producerLimit long SpscArrayQueueL2Pad.p [... p14] 48 8 long SpscArrayQueueConsumerIndexField.consumerIndex long SpscArrayQueueL3Pad.p [... p14] Instance size: 536 bytes 16
17 Why bother with layout? False sharing Minimize load misses Offheap? Atomicity Alignment requirements (line/page/sector) Different method (see Aeron) 17
18 Sharing? header header pindex cindex mask buffer 18
19 False Sharing header header pindex cindex Mask buffer 19
20 False Sharing header header pindex cindex Mask buffer 2
21 Pad to resolve...??? header header pad12 pad13 pad14 pad15 pad16 pad17 pad2 pad21 pad22 pad23 pad24 pad25 pad26 pindex pmask pbuffer pad1 pad11 pad12 pad13 pad14 pad15 pad16 pad17 pad2 pad21 pad22 pad23 pad24 pad25 pad26 pad27 cindex cmask cbuffer pad1 pad11 21
22 Alternatives To Method? Rely on field (field/class level, -XX:-Restrict) Go Offheap 22
23 What would be // state alignment preference Class X { long field1; // still type aligned byte[12] pad; // allocate presized array inline long field2; } 23
24 Unsafe 24
25 Looks safe enough... abstract class...producerindexfields<e> extends SpscArrayQueueL1Pad<E> { final static long P_INDEX_OFFSET = fieldoffset(...fields.class, "producerindex"); private volatile long producerindex; protected long producerlimit; long lvproducerindex() { return producerindex; } long lpproducerindex() { return } UNSAFE.getLong(this, P_INDEX_OFFSET); void soproducerindex(long newvalue) { } } UNSAFE.putOrderedLong(this, P_INDEX_OFFSET, newvalue); 25
26 Why do we need Unsafe? Better than a poke in the eye with a sharp stick? Performance? Compatibility????!??!?!?!? 26
27 Unsafe for Compatibility? Just Works! (JDK6,7,8,11) Shouldn t we use VarHandle? 27
28 Unsafe vs. A*FU (pre-8u11) With Unsafe : ~95 ns/op With A*FU@7 : ~135 ns/op * Tested with JDK 7u8 as an example 28
29 Unsafe vs. A*FU (post 8u11/11) With Unsafe : ~95 ns/op With A*FU@8 : ~15 ns/op With A*FU@11 : ~115 ns/op See: 29
30 Unsafe vs VarHandles (>=11) With Unsafe : ~95 ns/op With VarHandle : ~115 ns/op 3
31 Unsafe for performance? With : ~95 ns/op With : ~13 ns/op With : ~15 ns/op With : ~115 ns/op With : ~115 ns/op 31
32 Unsafe vs Alternatives Pre-JDK11? Volatile AtomicBoolean/Integer/Long/*Array Atomic*FieldUpdater JDK11+? Should move to VarHandle? Multi-version jars Offheap? GO UNSAFE!!! 32
33 Concurrency Specialization 33
34 Generic vs. Specialized JDK? MPMC Lock based (ArrayBlockingQ/LinkedBlockingQ) Lock-less (ConcurrentLinkedQ) Allocation? JCTools? MPMC/SPMC/MPSC/SPSC Array backed/linked/both! Lock-less/lock-free/wait-free Zero allocation options 34
35 Generic vs. Specialized Load: plain, opaque, volatile Store: plain, ordered, volatile compareandswap getandadd 35
36 Single Writer Load: plain/volatile Store: ordered 36
37 Multi-Writer Load: plain/volatile Store: getandadd/compareandswap 37
38 Generic vs. Specialized SPSC 95 ns/op MPSC 65 ns/op MPMC 55 ns/op CLQ 138 ns/op ABQ 36 ns/op 38
39 Lock less/lock free/wait free SPSC 95 ns/op WF MPSC 5/65 ns/op LF/LL MPMC 5/55 ns/op LF/LL CLQ 138 ns/op LL ABQ 36 ns/op Locks... 39
40 Allocations CLQ 24b per op(1 messages) LBQ ~3b ABQ? 4
41 Allocations CLQ 24b per op(1 messages) LBQ ~3b ABQ ~15b!!! JCTools array backed - b 41
42 Helping the Compiler 42
43 Const vs Final private final long size = <some power of 2>; long offset = REF_ARRAY_BASE + ((consumerindex % size) * REF_ELEMENT_SCALE); long offset = REF_ARRAY_BASE + ((consumerindex & (size-1)) * REF_ELEMENT_SCALE); 43
44 Const vs Final long offset = REF_ARRAY_BASE + ((consumerindex & (size-1)) * REF_ELEMENT_SCALE); long offset = REF_ARRAY_BASE + << REF_ELEMENT_SHIFT); ((consumerindex & mask) 44
45 Const vs Final long offset = REF_ARRAY_BASE + ((consumerindex & mask) << REF_ELEMENT_SHIFT); long offset = calcelementoffset(consumerindex, mask); 45
46 Loads Across an Volatile Load public E poll() { long consumerindex = lpconsumerindex(); buffer buffer E[] = this. ; long offset = calcelementoffset(consumerindex, mask); E e = (E) UNSAFE.getObjectVolatile( if (null == e) return null; // EMPTY UNSAFE.putOrderedObject( } buffer, buffer, offset); offset, null); soconsumerindex(consumerindex + 1); return e; 46
47 A Novel Algorithm 47
48 Typical Offerings Array backed Pre-allocated, fixed sized Linked Allocate cell per element No pre-allocation Bounded/Unbounded 48
49 Actor Use Case Lots of queues Some full & busy Most empty MPSC 49
50 Linked Array Queues? Middle Ground? Allocate a multi-element cell/chunk Stay in cell/chunk? no allocation Resizing on the fly 5
51 Challenges? Lock less algo? Linking new array Sizing 51
52 Hanging on a Bit Steal producerindex parity bit Fixup code to match CAS parity to block ALL other producers Let them spin 52
53 JUMP indicator Notify consumer needs to jump Use extra cell in array as next pointer 53
54 Performance? Slightly slower than array backed MPSC Very similar throughput Allocates as needed: Stay in chunk no allocation Growable allocate until capacity Chunked bounded, fixed chunks Unbounded blow yer heap, in fixed increments! 54
55 In Closing... 55
56 Thanks! 56
A Quest for Predictable Latency Adventures in Java Concurrency. Martin Thompson
A Quest for Predictable Latency Adventures in Java Concurrency Martin Thompson - @mjpt777 If a system does not respond in a timely manner then it is effectively unavailable 1. It s all about the Blocking
More informationSafely Shoot Yourself in the Foot with Java 9
1 Safely Shoot Yourself in the Foot with Java 9 Dr Heinz M. Kabutz Last Updated 2017-11-04 2 Project Jigsaw: Primary Goals (Reinhold)! Make the Java SE Platform, and the JDK, more easily scalable down
More informationSafely Shoot Yourself in the Foot with Java 9 Dr Heinz M. Kabutz
Safely Shoot Yourself in the Foot with Java 9 Dr Heinz M. Kabutz Last Updated 2017-11-08 Project Jigsaw: Primary Goals (Reinhold) Make the Java SE Platform, and the JDK, more easily scalable down to small
More informationAn efficient Unbounded Lock-Free Queue for Multi-Core Systems
An efficient Unbounded Lock-Free Queue for Multi-Core Systems Authors: Marco Aldinucci 1, Marco Danelutto 2, Peter Kilpatrick 3, Massimiliano Meneghin 4 and Massimo Torquati 2 1 Computer Science Dept.
More informationComplex, concurrent software. Precision (no false positives) Find real bugs in real executions
Harry Xu May 2012 Complex, concurrent software Precision (no false positives) Find real bugs in real executions Need to modify JVM (e.g., object layout, GC, or ISA-level code) Need to demonstrate realism
More informationProcess s Address Space. Dynamic Memory. Backing the Heap. Dynamic memory allocation 3/29/2013. When a process starts the heap is empty
/9/01 Process s Address Space Dynamic Memory 0x7fffffff Stack Data (Heap) Data (Heap) 0 Text (Code) Backing the Heap When a process starts the heap is empty The process is responsible for requesting memory
More informationINTERFACING REAL-TIME AUDIO AND FILE I/O
INTERFACING REAL-TIME AUDIO AND FILE I/O Ross Bencina portaudio.com rossbencina.com rossb@audiomulch.com @RossBencina Example source code: https://github.com/rossbencina/realtimefilestreaming Q: How to
More informationA program execution is memory safe so long as memory access errors never occur:
A program execution is memory safe so long as memory access errors never occur: Buffer overflows, null pointer dereference, use after free, use of uninitialized memory, illegal free Memory safety categories
More informationMultiJav: A Distributed Shared Memory System Based on Multiple Java Virtual Machines. MultiJav: Introduction
: A Distributed Shared Memory System Based on Multiple Java Virtual Machines X. Chen and V.H. Allan Computer Science Department, Utah State University 1998 : Introduction Built on concurrency supported
More informationProblems with Concurrency. February 19, 2014
with Concurrency February 19, 2014 s with concurrency interleavings race conditions dead GUI source of s non-determinism deterministic execution model 2 / 30 General ideas Shared variable Access interleavings
More informationFine-grained synchronization & lock-free data structures
Lecture 19: Fine-grained synchronization & lock-free data structures Parallel Computer Architecture and Programming Redo Exam statistics Example: a sorted linked list struct Node { int value; Node* next;
More informationThe Art and Science of Memory Allocation
Logical Diagram The Art and Science of Memory Allocation Don Porter CSE 506 Binary Formats RCU Memory Management Memory Allocators CPU Scheduler User System Calls Kernel Today s Lecture File System Networking
More informationOperating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017
Operating Systems Lecture 4 - Concurrency and Synchronization Adrien Krähenbühl Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Mutual exclusion Hardware solutions Semaphores IPC: Message passing
More informationAdvanced Topic: Efficient Synchronization
Advanced Topic: Efficient Synchronization Multi-Object Programs What happens when we try to synchronize across multiple objects in a large program? Each object with its own lock, condition variables Is
More informationCPS 310 first midterm exam, 10/6/2014
CPS 310 first midterm exam, 10/6/2014 Your name please: Part 1. More fun with fork and exec* What is the output generated by this program? Please assume that each executed print statement completes, e.g.,
More informationDRAFT. String Density: Performance and Footprint. Xueming Shen. Aleksey Shipilev Oracle Brent Christian.
String Density: Performance and Footprint Aleksey Shipilev Oracle aleksey.shipilev@oracle.com Brent Christian Oracle brent.christian@oracle.com December 2, 214 Xueming Shen Oracle xueming.shen@oracle.com
More informationHeap Management portion of the store lives indefinitely until the program explicitly deletes it C++ and Java new Such objects are stored on a heap
Heap Management The heap is the portion of the store that is used for data that lives indefinitely, or until the program explicitly deletes it. While local variables typically become inaccessible when
More informationMarcus Gründler aixigo AG. Financial Portfolio Management on Steroids
Marcus Gründler aixigo AG Financial Portfolio Management on Steroids Marcus Gründler @marcusgruendler About me Head of Portfolio Management Systems Architect at aixigo AG, Germany www.aixigo.de JAX Finance
More informationMy malloc: mylloc and mhysa. Johan Montelius HT2016
1 Introduction My malloc: mylloc and mhysa Johan Montelius HT2016 So this is an experiment where we will implement our own malloc. We will not implement the world s fastest allocator, but it will work
More informationConcurrency and High Performance Reloaded
Concurrency and High Performance Reloaded Disclaimer Any performance tuning advice provided in this presentation... will be wrong! 2 www.kodewerk.com Me Work as independent (a.k.a. freelancer) performance
More informationA brief introduction to C++
A brief introduction to C++ Rupert Nash r.nash@epcc.ed.ac.uk 13 June 2018 1 References Bjarne Stroustrup, Programming: Principles and Practice Using C++ (2nd Ed.). Assumes very little but it s long Bjarne
More informationQuestion 1. Notes on the Exam. Today. Comp 104: Operating Systems Concepts 11/05/2015. Revision Lectures
Comp 104: Operating Systems Concepts Revision Lectures Today Here are a sample of questions that could appear in the exam Please LET ME KNOW if there are particular subjects you want to know about??? 1
More informationMATRIX:DJLSYS EXPLORING RESOURCE ALLOCATION TECHNIQUES FOR DISTRIBUTED JOB LAUNCH UNDER HIGH SYSTEM UTILIZATION
MATRIX:DJLSYS EXPLORING RESOURCE ALLOCATION TECHNIQUES FOR DISTRIBUTED JOB LAUNCH UNDER HIGH SYSTEM UTILIZATION XIAOBING ZHOU(xzhou40@hawk.iit.edu) HAO CHEN (hchen71@hawk.iit.edu) Contents Introduction
More information18-642: Code Style for Compilers
18-642: Code Style for Compilers 9/25/2017 1 Anti-Patterns: Coding Style: Language Use Code compiles with warnings Warnings are turned off or over-ridden Insufficient warning level set Language safety
More informationFrom Java Code to Java Heap Understanding the Memory Usage of Your Application
Chris Bailey IBM Java Service Architect 3 rd October 2012 From Java Code to Java Heap Understanding the Memory Usage of Your Application 2012 IBM Corporation Important Disclaimers THE INFORMATION CONTAINED
More informationECE 598 Advanced Operating Systems Lecture 10
ECE 598 Advanced Operating Systems Lecture 10 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 22 February 2018 Announcements Homework #5 will be posted 1 Blocking vs Nonblocking
More informationCOMP31212: Concurrency A Review of Java Concurrency. Giles Reger
COMP31212: Concurrency A Review of Java Concurrency Giles Reger Outline What are Java Threads? In Java, concurrency is achieved by Threads A Java Thread object is just an object on the heap, like any other
More informationScaling the OpenJDK. Claes Redestad Java SE Performance Team Oracle. Copyright 2017, Oracle and/or its afliates. All rights reserved.
Scaling the OpenJDK Claes Redestad Java SE Performance Team Oracle Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only,
More informationNotes on the Exam. Question 1. Today. Comp 104:Operating Systems Concepts 11/05/2015. Revision Lectures (separate questions and answers)
Comp 104:Operating Systems Concepts Revision Lectures (separate questions and answers) Today Here are a sample of questions that could appear in the exam Please LET ME KNOW if there are particular subjects
More informationHigh Performance Managed Languages. Martin Thompson
High Performance Managed Languages Martin Thompson - @mjpt777 Really, what is your preferred platform for building HFT applications? Why do you build low-latency applications on a GC ed platform? Agenda
More informationUsing Java 8 Lambdas And Stampedlock To Manage Thread Safety
1 Using Java 8 Lambdas And Stampedlock To Manage Thread Safety Dr Heinz M. Kabutz heinz@javaspecialists.eu Last updated 2017-05-09 2013-2017 Heinz Kabutz All Rights Reserved 2 Why Should You Care About
More informationFine-grained synchronization & lock-free programming
Lecture 17: Fine-grained synchronization & lock-free programming Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2016 Tunes Minnie the Moocher Robbie Williams (Swings Both Ways)
More informationComp 204: Computer Systems and Their Implementation. Lecture 25a: Revision Lectures (separate questions and answers)
Comp 204: Computer Systems and Their Implementation Lecture 25a: Revision Lectures (separate questions and answers) 1 Today Here are a sample of questions that could appear in the exam Please LET ME KNOW
More informationTypes II. Hwansoo Han
Types II Hwansoo Han Arrays Most common and important composite data types Homogeneous elements, unlike records Fortran77 requires element type be scalar Elements can be any type (Fortran90, etc.) A mapping
More informationOperating Systems Comprehensive Exam. Spring Student ID # 3/16/2006
Operating Systems Comprehensive Exam Spring 2006 Student ID # 3/16/2006 You must complete all of part I (60%) You must complete two of the three sections in part II (20% each) In Part I, circle or select
More informationC Programming. Course Outline. C Programming. Code: MBD101. Duration: 10 Hours. Prerequisites:
C Programming Code: MBD101 Duration: 10 Hours Prerequisites: You are a computer science Professional/ graduate student You can execute Linux/UNIX commands You know how to use a text-editing tool You should
More informationVirtual Machine Design
Virtual Machine Design Lecture 4: Multithreading and Synchronization Antero Taivalsaari September 2003 Session #2026: J2MEPlatform, Connected Limited Device Configuration (CLDC) Lecture Goals Give an overview
More informationMethodHandlesArrayElementGetterBench.testCreate Analysis. Copyright 2016, Oracle and/or its affiliates. All rights reserved.
MethodHandlesArrayElementGetterBench.testCreate Analysis Overview Benchmark : nom.indy.methodhandlesarrayelementgetterbench.testcreate Results with JDK8 (ops/us) JDK8 Intel 234 T8 T8 with -XX:FreqInlineSize=325
More informationAtomicity CS 2110 Fall 2017
Atomicity CS 2110 Fall 2017 Parallel Programming Thus Far Parallel programs can be faster and more efficient Problem: race conditions Solution: synchronization Are there more efficient ways to ensure the
More informationShenandoah: Theory and Practice. Christine Flood Roman Kennke Principal Software Engineers Red Hat
Shenandoah: Theory and Practice Christine Flood Roman Kennke Principal Software Engineers Red Hat 1 Shenandoah Christine Flood Roman Kennke Principal Software Engineers Red Hat 2 Shenandoah Why do we need
More informationLecture 10: Multi-Object Synchronization
CS 422/522 Design & Implementation of Operating Systems Lecture 10: Multi-Object Synchronization Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken from previous
More informationA JVM Does What? Eva Andreasson Product Manager, Azul Systems
A JVM Does What? Eva Andreasson Product Manager, Azul Systems Presenter Eva Andreasson Innovator & Problem solver Implemented the Deterministic GC of JRockit Real Time Awarded patents on GC heuristics
More informationIn Java we have the keyword null, which is the value of an uninitialized reference type
+ More on Pointers + Null pointers In Java we have the keyword null, which is the value of an uninitialized reference type In C we sometimes use NULL, but its just a macro for the integer 0 Pointers are
More informationEfficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed
Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed Marco Aldinucci Computer Science Dept. - University of Torino - Italy Marco Danelutto, Massimiliano Meneghin,
More informationCS 326: Operating Systems. Process Execution. Lecture 5
CS 326: Operating Systems Process Execution Lecture 5 Today s Schedule Process Creation Threads Limited Direct Execution Basic Scheduling 2/5/18 CS 326: Operating Systems 2 Today s Schedule Process Creation
More informationNON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 17 November 2017
NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 17 November 2017 Lecture 7 Linearizability Lock-free progress properties Hashtables and skip-lists Queues Reducing contention Explicit
More informationCS 33. Storage Allocation. CS33 Intro to Computer Systems XXVII 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.
CS 33 Storage Allocation CS33 Intro to Computer Systems XXVII 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. The Unix Address Space stack dynamic bss program break data text CS33 Intro to Computer
More informationDynamic Memory Allocation I Nov 5, 2002
15-213 The course that gives CMU its Zip! Dynamic Memory Allocation I Nov 5, 2002 Topics Simple explicit allocators Data structures Mechanisms Policies class21.ppt Harsh Reality Memory is not unbounded
More informationArmide Documentation. Release Kyle Mayes
Armide Documentation Release 0.3.1 Kyle Mayes December 19, 2014 Contents 1 Introduction 1 1.1 Features.................................................. 1 1.2 License..................................................
More informationReal-Time and Concurrent Programming Lecture 4 (F4): Monitors: synchronized, wait and notify
http://cs.lth.se/eda040 Real-Time and Concurrent Programming Lecture 4 (F4): Monitors: synchronized, wait and notify Klas Nilsson 2016-09-20 http://cs.lth.se/eda040 F4: Monitors: synchronized, wait and
More informationMemory Management. q Basic memory management q Swapping q Kernel memory allocation q Next Time: Virtual memory
Memory Management q Basic memory management q Swapping q Kernel memory allocation q Next Time: Virtual memory Memory management Ideal memory for a programmer large, fast, nonvolatile and cheap not an option
More informationLinuxCon 2010 Tracing Mini-Summit
LinuxCon 2010 Tracing Mini-Summit A new unified Lockless Ring Buffer library for efficient kernel tracing Presentation at: http://www.efficios.com/linuxcon2010-tracingsummit E-mail: mathieu.desnoyers@efficios.com
More informationDynamic Memory Allocation: Basic Concepts
Dynamic Memory Allocation: Basic Concepts CSE 238/2038/2138: Systems Programming Instructor: Fatma CORUT ERGİN Slides adapted from Bryant & O Hallaron s slides 1 Today Basic concepts Implicit free lists
More informationCOL106: Data Structures and Algorithms. Ragesh Jaiswal, IIT Delhi
Stack and Queue How do we implement a Queue using Array? : A collection of nodes with linear ordering defined on them. Each node holds an element and points to the next node in the order. The first node
More informationDynamic Memory Allocation I
Dynamic Memory Allocation I William J. Taffe Plymouth State University Using the Slides of Randall E. Bryant Carnegie Mellon University Topics Simple explicit allocators Data structures Mechanisms Policies
More informationCS 31: Intro to Systems Pointers and Memory. Kevin Webb Swarthmore College October 2, 2018
CS 31: Intro to Systems Pointers and Memory Kevin Webb Swarthmore College October 2, 2018 Overview How to reference the location of a variable in memory Where variables are placed in memory How to make
More informationFavoring Isolated Mutability The Actor Model of Concurrency. CSCI 5828: Foundations of Software Engineering Lecture 24 04/11/2012
Favoring Isolated Mutability The Actor Model of Concurrency CSCI 5828: Foundations of Software Engineering Lecture 24 04/11/2012 1 Goals Review the material in Chapter 8 of the Concurrency textbook that
More informationPatterns: Working with Arrays
Steven Zeil October 14, 2013 Outline 1 Static & Dynamic Allocation Static Allocation Dynamic Allocation 2 Partially Filled Arrays Adding Elements Searching for Elements Removing Elements 3 Arrays and Templates
More informationShort Notes of CS201
#includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system
More informationAn Almost Non- Blocking Stack
An Almost Non- Blocking Stack Hans-J. Boehm HP Labs 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Motivation Update data structures
More informationDynamic Memory Allocation. Gerson Robboy Portland State University. class20.ppt
Dynamic Memory Allocation Gerson Robboy Portland State University class20.ppt Harsh Reality Memory is not unbounded It must be allocated and managed Many applications are memory dominated Especially those
More informationApache Lucene and Java 9+ Opportunities and Challenges for Apache Solr and Elasticsearch
Apache Lucene and Java 9+ Opportunities and Challenges for Apache Solr and Elasticsearch Uwe Schindler SD DataSolutions GmbH / Apache Software Foundation thetaph1 https://www.thetaphi.de My Background
More informationParallel GC. (Chapter 14) Eleanor Ainy December 16 th 2014
GC (Chapter 14) Eleanor Ainy December 16 th 2014 1 Outline of Today s Talk How to use parallelism in each of the 4 components of tracing GC: Marking Copying Sweeping Compaction 2 Introduction Till now
More informationCS201 - Introduction to Programming Glossary By
CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with
More informationCS 261 Fall Mike Lam, Professor. Structs and I/O
CS 261 Fall 2018 Mike Lam, Professor Structs and I/O Typedefs A typedef is a way to create a new type name Basically a synonym for another type Useful for shortening long types or providing more meaningful
More informationThe G1 GC in JDK 9. Erik Duveblad Senior Member of Technical Staf Oracle JVM GC Team October, 2017
The G1 GC in JDK 9 Erik Duveblad Senior Member of Technical Staf racle JVM GC Team ctober, 2017 Copyright 2017, racle and/or its affiliates. All rights reserved. 3 Safe Harbor Statement The following is
More informationDynamic Memory Allocation. Zhaoguo Wang
Dynamic Memory Allocation Zhaoguo Wang Why dynamic memory allocation? Do not know the size until the program runs (at runtime). #define MAXN 15213 int array[maxn]; int main(void) { int i, n; scanf("%d",
More informationChapter 2. Instructions: Language of the Computer. Adapted by Paulo Lopes
Chapter 2 Instructions: Language of the Computer Adapted by Paulo Lopes Instruction Set The repertoire of instructions of a computer Different computers have different instruction sets But with many aspects
More informationCSE 509: Computer Security
CSE 509: Computer Security Date: 2.16.2009 BUFFER OVERFLOWS: input data Server running a daemon Attacker Code The attacker sends data to the daemon process running at the server side and could thus trigger
More informationHigh Performance Managed Languages. Martin Thompson
High Performance Managed Languages Martin Thompson - @mjpt777 Really, what s your preferred platform for building HFT applications? Why would you build low-latency applications on a GC ed platform? Some
More informationWhatever can go wrong will go wrong. attributed to Edward A. Murphy. Murphy was an optimist. authors of lock-free programs 3.
Whatever can go wrong will go wrong. attributed to Edward A. Murphy Murphy was an optimist. authors of lock-free programs 3. LOCK FREE KERNEL 309 Literature Maurice Herlihy and Nir Shavit. The Art of Multiprocessor
More information15 418/618 Project Final Report Concurrent Lock free BST
15 418/618 Project Final Report Concurrent Lock free BST Names: Swapnil Pimpale, Romit Kudtarkar AndrewID: spimpale, rkudtark 1.0 SUMMARY We implemented two concurrent binary search trees (BSTs): a fine
More informationMultiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems
Multiprocessors II: CC-NUMA DSM DSM cache coherence the hardware stuff Today s topics: what happens when we lose snooping new issues: global vs. local cache line state enter the directory issues of increasing
More informationThread-Local. Lecture 27: Concurrency 3. Dealing with the Rest. Immutable. Whenever possible, don t share resources
Thread-Local Lecture 27: Concurrency 3 CS 62 Fall 2016 Kim Bruce & Peter Mawhorter Some slides based on those from Dan Grossman, U. of Washington Whenever possible, don t share resources Easier to have
More informationCS11 Java. Fall Lecture 7
CS11 Java Fall 2006-2007 Lecture 7 Today s Topics All about Java Threads Some Lab 7 tips Java Threading Recap A program can use multiple threads to do several things at once A thread can have local (non-shared)
More informationCPSC 421 Database Management Systems. Lecture 11: Storage and File Organization
CPSC 421 Database Management Systems Lecture 11: Storage and File Organization * Some material adapted from R. Ramakrishnan, L. Delcambre, and B. Ludaescher Today s Agenda Start on Database Internals:
More informationNon-Blocking Inter-Partition Communication with Wait-Free Pair Transactions
Non-Blocking Inter-Partition Communication with Wait-Free Pair Transactions Ethan Blanton and Lukasz Ziarek Fiji Systems, Inc. October 10 th, 2013 WFPT Overview Wait-Free Pair Transactions A communication
More informationCS 220: Introduction to Parallel Computing. Arrays. Lecture 4
CS 220: Introduction to Parallel Computing Arrays Lecture 4 Note: Windows I updated the VM image on the website It now includes: Sublime text Gitkraken (a nice git GUI) And the git command line tools 1/30/18
More informationModels of concurrency & synchronization algorithms
Models of concurrency & synchronization algorithms Lecture 3 of TDA383/DIT390 (Concurrent Programming) Carlo A. Furia Chalmers University of Technology University of Gothenburg SP3 2016/2017 Today s menu
More informationDynamic Memory Allocation: Basic Concepts
Dynamic Memory Allocation: Basic Concepts 15-213: Introduction to Computer Systems 19 th Lecture, March 30, 2017 Instructor: Franz Franchetti & Seth Copen Goldstein 1 Today Basic concepts Implicit free
More informationMotivation & examples Threads, shared memory, & synchronization
1 Motivation & examples Threads, shared memory, & synchronization How do locks work? Data races (a lower level property) How do data race detectors work? Atomicity (a higher level property) Concurrency
More informationDealing with Issues for Interprocess Communication
Dealing with Issues for Interprocess Communication Ref Section 2.3 Tanenbaum 7.1 Overview Processes frequently need to communicate with other processes. In a shell pipe the o/p of one process is passed
More information[0569] p 0318 garbage
A Pointer is a variable which contains the address of another variable. Declaration syntax: Pointer_type *pointer_name; This declaration will create a pointer of the pointer_name which will point to the
More informationIntroduction to Computer Systems /18 243, fall th Lecture, Oct. 22 th
Introduction to Computer Systems 15 213/18 243, fall 2009 16 th Lecture, Oct. 22 th Instructors: Gregory Kesden and Markus Püschel Today Dynamic memory allocation Process Memory Image %esp kernel virtual
More informationLock-Free, Wait-Free and Multi-core Programming. Roger Deran boilerbay.com Fast, Efficient Concurrent Maps AirMap
Lock-Free, Wait-Free and Multi-core Programming Roger Deran boilerbay.com Fast, Efficient Concurrent Maps AirMap Lock-Free and Wait-Free Data Structures Overview The Java Maps that use Lock-Free techniques
More informationPerformance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs
Performance of Non-Moving Garbage Collectors Hans-J. Boehm HP Labs Why Use (Tracing) Garbage Collection to Reclaim Program Memory? Increasingly common Java, C#, Scheme, Python, ML,... gcc, w3m, emacs,
More informationCSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1
CSE P 501 Compilers Java Implementation JVMs, JITs &c Hal Perkins Winter 2008 3/11/2008 2002-08 Hal Perkins & UW CSE V-1 Agenda Java virtual machine architecture.class files Class loading Execution engines
More informationCMPSCI 230 Discussion 1. Virtual Box and HW0
CMPSCI 230 Discussion 1 Virtual Box and HW0 1 Contact Info Name: Kaituo Li Communicate through Piazza questions/posts to entire class or private messages to instructors Under extreme circumstances contact
More informationMemory Management. To do. q Basic memory management q Swapping q Kernel memory allocation q Next Time: Virtual memory
Memory Management To do q Basic memory management q Swapping q Kernel memory allocation q Next Time: Virtual memory Memory management Ideal memory for a programmer large, fast, nonvolatile and cheap not
More informationMcRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime
McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime B. Saha, A-R. Adl- Tabatabai, R. Hudson, C.C. Minh, B. Hertzberg PPoPP 2006 Introductory TM Sales Pitch Two legs
More informationCPS 310 midterm exam #1, 2/17/2017
CPS 310 midterm exam #1, 2/17/2017 Your name please: NetID: Sign for your honor: Answer all questions. Please attempt to confine your answers to the boxes provided. If you don t know the answer to a question,
More informationRace Conditions & Synchronization
Race Conditions & Synchronization Lecture 25 Spring 2017 Gries office hours 2 A number of students have asked for time to talk over how to study for the prelim. Gries will hold office hours at the times
More informationHigh Performance Computing in C and C++
High Performance Computing in C and C++ Rita Borgo Computer Science Department, Swansea University Summary Introduction to C Writing a simple C program Compiling a simple C program Running a simple C program
More informationToday. Dynamic Memory Allocation: Basic Concepts. Dynamic Memory Allocation. Dynamic Memory Allocation. malloc Example. The malloc Package
Today Dynamic Memory Allocation: Basic Concepts Basic concepts Performance concerns Approach 1: implicit free lists CSci 01: Machine Architecture and Organization October 17th-nd, 018 Your instructor:
More informationC#: framework overview and in-the-small features
Chair of Software Engineering Carlo A. Furia, Marco Piccioni, Bertrand Meyer C#: framework overview and in-the-small features Chair of Software Engineering Carlo A. Furia, Marco Piccioni, Bertrand Meyer
More informationCondition Variables CS 241. Prof. Brighten Godfrey. March 16, University of Illinois
Condition Variables CS 241 Prof. Brighten Godfrey March 16, 2012 University of Illinois 1 Synchronization primitives Mutex locks Used for exclusive access to a shared resource (critical section) Operations:
More informationRACE CONDITIONS AND SYNCHRONIZATION
RACE CONDITIONS AND SYNCHRONIZATION Lecture 21 CS2110 Fall 2010 Reminder 2 A race condition arises if two threads try and share some data One updates it and the other reads it, or both update the data
More informationCS 31: Intro to Systems Pointers and Memory. Martin Gagne Swarthmore College February 16, 2016
CS 31: Intro to Systems Pointers and Memory Martin Gagne Swarthmore College February 16, 2016 So we declared a pointer How do we make it point to something? 1. Assign it the address of an existing variable
More informationConcurrency: Signaling and Condition Variables
Concurrency: Signaling and Condition Variables Questions Answered in this Lecture: How do we make fair locks perform better? How do we notify threads of that some condition has been met? Hale CS450 Thanks
More informationChapter 8 :: Composite Types
Chapter 8 :: Composite Types Programming Language Pragmatics, Fourth Edition Michael L. Scott Copyright 2016 Elsevier 1 Chapter08_Composite_Types_4e - Tue November 21, 2017 Records (Structures) and Variants
More information