Effective Concurrent Java. Brian Goetz Sr. Staff Engineer, Sun Microsystems

Similar documents
Get out, you will, of this bind If, your objects, you have confined

Thread Safety. Review. Today o Confinement o Threadsafe datatypes Required reading. Concurrency Wrapper Collections

Concurrency: Past and Present

Concurrency Utilities: JSR-166

Advanced concurrent programming in Java Shared objects

Synchronization SPL/2010 SPL/20 1

Multi-threaded Performance And Scalability

Concurrency in Object Oriented Programs 1. Object-Oriented Software Development COMP4001 CSE UNSW Sydney Lecturer: John Potter

The New Java Technology Memory Model

Sharing Objects. Java and Android Concurrency.

JAVA CONCURRENCY FRAMEWORK. Kaushik Kanetkar

+ Today. Lecture 26: Concurrency 3/31/14. n Reading. n Objectives. n Announcements. n P&C Section 7. n Race conditions.

Race Conditions & Synchronization

Practical Concurrency. Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited.

Java Concurrency in practice Chapter 9 GUI Applications

G Programming Languages Spring 2010 Lecture 13. Robert Grimm, New York University

Lecture 17: Sharing Objects in Java

Java Concurrency. Towards a better life By - -

Concurrency. Glossary

A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 5 Programming with Locks and Critical Sections

CSE332: Data Abstractions Lecture 23: Programming with Locks and Critical Sections. Tyler Robison Summer 2010

Multithreaded Programming Part II. CSE 219 Stony Brook University, Department of Computer Science

Learning from Bad Examples. CSCI 5828: Foundations of Software Engineering Lecture 25 11/18/2014

Lecture 19: Composing Objects in Java

System Programming. Practical Session 4: Threads and Concurrency / Safety

Sharing Objects Ch. 3

Design of Thread-Safe Classes

Safety SPL/2010 SPL/20 1

Threads and Too Much Milk! CS439: Principles of Computer Systems January 31, 2018

Threads Questions Important Questions

Computation Abstractions. Processes vs. Threads. So, What Is a Thread? CMSC 433 Programming Language Technologies and Paradigms Spring 2007

Threads and Too Much Milk! CS439: Principles of Computer Systems February 6, 2019

Lecture 16: Thread Safety in Java

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

Advances in Programming Languages

CSE 374 Programming Concepts & Tools

Problems with Concurrency. February 19, 2014

Synchronization. Announcements. Concurrent Programs. Race Conditions. Race Conditions 11/9/17. Purpose of this lecture. A8 released today, Due: 11/21

CS 2112 Lecture 20 Synchronization 5 April 2012 Lecturer: Andrew Myers

The Java Memory Model

6.033 Computer System Engineering

Programmazione di sistemi multicore

Thinking parallel. Decomposition. Thinking parallel. COMP528 Ways of exploiting parallelism, or thinking parallel

Race Conditions & Synchronization

Composing Objects. Java and Android Concurrency.

Multi-threading in Java. Jeff HUANG

Thread-Local. Lecture 27: Concurrency 3. Dealing with the Rest. Immutable. Whenever possible, don t share resources

Math Modeling in Java: An S-I Compartment Model

CMSC 132: Object-Oriented Programming II. Effective Java. Department of Computer Science University of Maryland, College Park

Synchronization Lecture 24 Fall 2018

CMSC 330: Organization of Programming Languages

Synchronization Lecture 23 Fall 2017

CSE332: Data Abstractions Lecture 19: Mutual Exclusion and Locking

Introduction to Concurrent Software Systems. CSCI 5828: Foundations of Software Engineering Lecture 12 09/29/2016

Kotlin/Native concurrency model. nikolay

CMSC 433 Spring 2013 Exam 1

Races. Example. A race condi-on occurs when the computa-on result depends on scheduling (how threads are interleaved)

CMSC 132: Object-Oriented Programming II. Threads in Java

Parallel Task Executor in Java

Introduction to Parallel Computing. CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014

3/25/14. Lecture 25: Concurrency. + Today. n Reading. n P&C Section 6. n Objectives. n Concurrency

Operating Systems (234123) Spring (Homework 3 Wet) Homework 3 Wet

Introduction to Concurrent Software Systems. CSCI 5828: Foundations of Software Engineering Lecture 08 09/17/2015

Results of prelim 2 on Piazza

Principles of Software Construction: Objects, Design, and Concurrency. Concurrency: More Design Tradeoffs School of Computer Science

Java Platform Concurrency Gotchas

CS193k, Stanford Handout #10. HW2b ThreadBank

CS 159: Parallel Processing

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

Java theory and practice: Going atomic The new atomic classes are the hidden gems of java.util.concurrent

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

CMSC 433 Programming Language Technologies and Paradigms. Sharing Objects

THREADS & CONCURRENCY

CMSC 132: Object-Oriented Programming II

05. SINGLETON PATTERN. One of a Kind Objects

Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions

Dynamic Fine Grain Scheduling of Pipeline Parallelism. Presented by: Ram Manohar Oruganti and Michael TeWinkle

C++ Concurrency in Action

Transactum Business Process Manager with High-Performance Elastic Scaling. November 2011 Ivan Klianev

G51PGP Programming Paradigms. Lecture 009 Concurrency, exceptions

Parallelism. CS6787 Lecture 8 Fall 2017

Understanding Hardware Transactional Memory

Principles of Software Construction: Objects, Design, and Concurrency. The Perils of Concurrency Can't live with it. Cant live without it.

Reintroduction to Concurrency

CST242 Concurrency Page 1

Josh Bloch Charlie Garrod Darya Melicher

CMSC 433 Programming Language Technologies and Paradigms. Composing Objects

Principles of Software Construction: Concurrency, Part 2

Threads SPL/2010 SPL/20 1

Programmazione di sistemi multicore

Concurrency & Parallelism. Threads, Concurrency, and Parallelism. Multicore Processors 11/7/17

Networks and distributed computing

Parallel Constraint Programming (and why it is hard... ) Ciaran McCreesh and Patrick Prosser

CMSC 433 Section 0101 Fall 2012 Midterm Exam #1

Program Graph. Lecture 25: Parallelism & Concurrency. Performance. What does it mean?

Concurrency, Thread. Dongkun Shin, SKKU

Concurrent Programming in C++ Venkat

Synchronization in Java

Finding Concurrency Bugs in Java

Last week, David Terei lectured about the compilation pipeline which is responsible for producing the executable binaries of the Haskell code you

Transcription:

Effective Concurrent Java Brian Goetz Sr. Staff Engineer, Sun Microsystems brian.goetz@sun.com

The Big Picture Writing correct concurrent code is difficult, but not impossible. Using good object-oriented design techniques can make it easier.

About the speaker Professional software developer for 20 years Author of Java Concurrency in Practice Author of over 75 articles on Java development See http://www.briangoetz.com/pubs.html Member of JCP Expert Groups for JSRs 166 (Concurrency), 107 (Caching), and ( annotations (Safety 305 Regular presenter at JavaOne, SDWest, OOPSLA, JavaPolis, and No Fluff, Just Stuff

Agenda Introduction Rules for writing thread-safe code Document thread-safety intent and implementation Encapsulate data and synchronization Prefer immutable objects Exploit effective immutability Rules for structuring concurrent applications Think tasks, not threads Build resource-management into your architecture Decouple identification of work from execution Rules for improving scalability Find and eliminate the serialization

Agenda Introduction Rules for writing thread-safe code Document thread-safety intent and implementation Encapsulate data and synchronization Prefer immutable objects Exploit effective immutability Rules for structuring concurrent applications Think tasks, not threads Build resource-management into your architecture Decouple identification of work from execution Rules for improving scalability Find and eliminate the serialization

Introduction This talk is about identifying patterns for concurrent code that are less fragile Conveniently, many are the good practices we already know Though sometimes we forget the basics Feel free to break (almost) all the rules here But be prepared to pay for it at maintenance time Remember the core language value: Reading code is more important than writing code

Agenda Introduction Rules for writing thread-safe code Document thread-safety intent and implementation Encapsulate data and synchronization Prefer immutable objects Exploit effective immutability Rules for structuring concurrent applications Think tasks, not threads Build resource-management into your architecture Decouple identification of work from execution Rules for improving scalability Find and eliminate the serialization

Document thread-safety One of the easiest way to write thread-safe classes is to build on existing thread-safe classes But how do you know if a class is thread-safe? The documentation should say, but frequently doesn't Can be dangerous to guess Should assume not thread-safe unless otherwise specified Document thread-safety design intent Class annotations: @ThreadSafe, @NotThreadSafe @ThreadSafe public class ConcurrentHashMap {... With class-level thread-safety annotations: Clients will know whether the class is thread-safe Maintainers will know what promises must be kept Tools can help identify common mistakes

Document thread-safety Should also document how a class gets its thread-safety This is your synchronization policy The Rule: When writing a variable that might next be read by another thread, or reading a variable that might last have been written by another thread, both threads must synchronize using a common lock Leads to design rules of the form hold lock L when accessing variable V We say V is guarded by L These rules form protocols for coordinating access to data Such as Only the one holding the conch shell can speak Only work if all participants follow the protocol If one party cheats, everyone loses

Document thread-safety Use @GuardedBy to document your locking protocols Annotating a field with @GuardedBy("this") means: Only access the field when holding the lock on this @ThreadSafe public class PositiveInteger { // INVARIANT: value > 0 @GuardedBy("this") private int value = 1; public synchronized int getvalue() { return value; public void setvalue(int value) { ( 0 <= (value if throw new IllegalArgumentException(...); synchronized (this) { this.value = value; Simplifies maintenance and avoids common mistakes Like adding a new code path and forgetting to synchronize Improper maintenance is a big source of concurrency bugs

Document thread-safety For primitive variables, @GuardedBy is straightforward But what about @GuardedBy("this") Set<Rock> knownrocks = new HashSet<Rock>(); There are three different types of potentially mutable state The knownrocks reference The internal data structures in the HashSet The elements of the collection Which types of state are we talking about? All of them? It varies, but we can often tell from context Are the elements owned by the class, or by clients? Are the elements thread-safe? Is the reference to the collection mutable? @GuardedBy("this") final Set<Rock> knownrocks =...

Document thread-safety For complicated data structures, draw a diagram identifying ownership and synchronization policies Color each state domain with its synchronization policy @ThreadSafe public class Rock {... @GuardedBy("this") final Set<Rock> knownrocks = new HashSet<Rock>(); Very effective for designing and reviewing code! Frequently identifies gaps or inconsistencies in synchronization policies knownrocks HashSet<Rock> Rock Rock RockRock

Summary: Document thread-safety Document classes as @ThreadSafe or @NotThreadSafe Saves your clients from guessing wrong Puts maintainers on notice to preserve thread-safety Document synchronization policy with @GuardedBy Helps you make sure you have a clear thread-safety strategy Helps maintainers keep promises made to clients Helps tools alert you to mistakes Use diagrams to verify thread-safety strategies for nontrivial data structures Inadequate documentation fragility

Agenda Introduction Rules for writing thread-safe code Document thread-safety intent and implementation Encapsulate data and synchronization Prefer immutable objects Exploit effective immutability Rules for structuring concurrent applications Think tasks, not threads Build resource-management into your architecture Decouple identification of work from execution Rules for improving scalability Find and eliminate the serialization

Encapsulate data and synchronization Encapsulation promotes clear, maintainable code Reduces scope of effect of code changes Encapsulation similarly promotes thread safety Reduces how much code can access a variable And therefore how much be examined to ensure that synchronization protocols are followed Thread safety is about coordinating access to shared mutable data Shared might be accessed by more than one thread Mutable might be modified by some thread Less code that accesses a variable means fewer opportunities for error

Encapsulate data and synchronization A body of code is thread-safe if: It is correct in a single-threaded environment, and It continues to be correct when called from multiple threads Regardless of interleaving of execution by the runtime Without additional coordination by callers Correct means conforms to its specification Often framed in terms of invariants and postconditions lower <= upper r 2 = x 2 + y 2 These are statements about state

Encapsulate data and synchronization Encapsulation makes it sensible to talk about individual classes being thread-safe We can't say a body of code guarantees an invariant unless no other code can modify the underlying state Thread-safety can only describe a body of code that manages all access to its mutable state Without encapsulation, that's the whole program

Encapsulate data and synchronization Is this code correct? Is it thread-safe? public class PositiveInteger { // INVARIANT: value > 0 @GuardedBy("this") public int value = 1; public synchronized int getvalue() { return value; public synchronized void setvalue(int value) { ( 0 <= (value if throw new IllegalArgumentException(...); this.value = value; We can't say unless we examine all the code that accesses value Doesn't even enforce invariants in single-threaded case Difficult to reason about invariants when data can change at any time Can't ensure data is accessed with proper synchronization

Encapsulate data and synchronization Without encapsulation, we cannot determine thread-safety without reviewing the entire application Much easier to analyze one class than a whole program Harder to accidentally break thread safety if data and synchronization are encapsulated We can build thread-safe code without encapsulation But it's fragile Requires code all over the program to follow the protocol public final static Object lock = new Object(); @GuardedBy("lock") public final static Set<String> users = new HashSet<String>(); Imposing locking requirements on external code is asking for trouble Fragility increases with the distance between declaration and use

Encapsulate data and synchronization Sometimes we can push the encapsulation even deeper Manage state using thread-safe objects or volatile variables Even less fragile can't forget to synchronize But only if class imposes no additional invariants Can transform this public class Users { (" GuardedBy("this @ private final Set<User> users = new HashSet<User>(); public synchronized void adduser(user u) { users.add(u);... Into this public class Users { private final Set<User> users = Collections.synchronizedSet(new HashSet<User>()); public void adduser(user u) { users.add(u);...

Encapsulate data and synchronization If a class imposes invariants on its state, it must also provide its own synchronization to protect these invariants Even if component classes are thread-safe! UserManager follows The Rule But still might not be thread-safe! public class UserManager { // Each known user is in exactly one of {active, inactive private final Set<User> active = Collections.synchronizedSet(new HashSet<User>()); private final Set<User> inactive = Collections.synchronizedSet(new HashSet<User>()); // Constructor populates inactive set with known users public void activate(user u) { if (inactive.remove(u)) active.add(u); public boolean isknownuser(user u) { return active.contains(u) inactive.contains(u);

Encapsulate data and synchronization In UserManager, all data is accessed with synchronization But still possible to see a user as neither active nor inactive Therefore not thread-safe can violate its specification! Need to make compound operations atomic with respect to one other Solution: synchronize UserManager methods public class UserManager { // Each known user is in exactly one of {active, inactive private final Set<User> active = Collections.synchronizedSet(...); private final Set<User> inactive = Collections.synchronizedSet(...); public synchronized void activate(user u) { if (inactive.remove(u)) active.add(u); public synchronized boolean isknownuser(user u) { return active.contains(u) inactive.contains(u); public Set<User> getactiveusers() { return Collections.unmodifiableSet(active);

Encapsulate data and synchronization The problem was that synchronization was specified at a different level than the invariants Result: atomicity failures (race conditions) Could fix with client-side locking, but is fragile Instead, encapsulate enforcement of invariants All variables in an invariant should be guarded by same lock Hold lock for duration of operation on related variables Always provide synchronization at the same level as the invariants When composing operations on thread-safe objects, you may end up with multiple layers of synchronization And that's OK!

Summary: Encapsulation A thread-safe class encapsulates its data and any needed synchronization Lack of encapsulation fragility Without encapsulation, correctness and thread-safety can only describe the entire program, not a single class Wherever a class defines invariants on its state, it must provide synchronization to preserve those invariants Even if this means multiple layers of synchronization Where should the synchronization go? In the client too fragile In the component classes may not preserve invariants In the composite that defines invariants just right

Agenda Introduction Rules for writing thread-safe code Document thread-safety intent and implementation Encapsulate data and synchronization Prefer immutable objects Exploit effective immutability Rules for structuring concurrent applications Think tasks, not threads Build resource-management into your architecture Decouple identification of work from execution Rules for improving scalability Find and eliminate the serialization

Prefer immutable objects An immutable object is one whose State cannot be changed after construction All fields are final Not optional critical for thread-safety of immutable objects Immutable objects are automatically thread-safe! Simpler Can only ever be in one state, controlled by the constructor Safer Can be freely shared with unknown or malicious code, who cannot subvert their invariants More scalable No synchronization required when sharing! (See Effective Java Item #13 for more)

Prefer immutable objects Most concurrency hazards stem from the need to coordinate access to mutable state Race conditions and data races come from insufficient synchronization Many other problems (e.g., deadlock) are consequences of strategies for proper coordination No mutable state no need for coordination No race conditions, data races, deadlocks, scalability bottlenecks Identify immutable objects with @Immutable @Immutable implies @ThreadSafe Don't worry about the cost of object creation Object lifecycle is generally cheap Immutable objects have some performance benefits too

Prefer immutable objects Even if immutability is not an option, less mutable state can still mean less coordination Benefits of immutability apply to individual variables as well as objects Final fields have special visibility guarantees Final fields are simpler than mutable fields Final is the new private Declare fields final wherever practical Worth doing extra work to avoid making fields nonfinal In synchronization policy diagrams, final variables provide a synchronization policy for references But not the referred-to object If you can't get away with full immutability, seek to limit mutable state as much as possible

Agenda Introduction Rules for writing thread-safe code Document thread-safety intent and implementation Encapsulate data and synchronization Prefer immutable objects Exploit effective immutability Rules for structuring concurrent applications Think tasks, not threads Build resource-management into your architecture Decouple identification of work from execution Rules for improving scalability Find and eliminate the serialization

Plan for resource management Sometimes we throw more work at an application than it has resources to handle We'd like applications to degrade gracefully under load If you don't place some sort of bound on total resource usage (threads, memory, file handles, etc) the system will do it for you Probably less gracefully, like OutOfMemoryError Unbounded resource consumption makes applications unstable Threads are fairly heavy resources Watch out for unbounded thread creation! class UnstableWebServer { public static void main(string[] args) { ServerSocket socket = new ServerSocket(80); while (true) { final Socket connection = socket.accept(); Runnable r = new Runnable() { public void run() { /* handle request */ ; new Thread(r).start(); // Don't do this!

Plan for resource management The problem with UnstableWebServer is that if it takes one request too many, the whole server comes crashing down Poor resource management Instead, bound thread usage, and queue excess work for later class BetterWebServer { Executor pool = Executors.newFixedThreadPool(10); public static void main(string[] args) { ServerSocket socket = new ServerSocket(80); while (true) { final Socket connection = socket.accept(); Runnable r = new Runnable() { public void run() { /* handle request */ ; pool.execute(r);

Plan for resource management BetterWebServer addresses unbounded thread creation But any unbounded resource usage can cause failure Like queue elements Use bounded queues when queueing work to avoid OOME Establish a policy for what to do when the queue fills up Executor can use bounded queues for its work queue class EvenBetterWebServer { Executor pool = new ThreadPoolExecutor(10, 10, Long.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(1000), new ThreadPoolExecutor.DiscardOldestPolicy()); public static void main(string[] args) { ServerSocket socket = new ServerSocket(80); while (true) { final Socket connection = socket.accept(); Runnable r = new Runnable() { public void run() { /* handle request */ ; pool.execute(r);

Plan for resource management When acquiring any resource, place a bound on how much you will consume Thread count, queued work items, database connections, incoming connections, file handles Don't let request volume control resource consumption But avoid pooling fine-grained cheap resources not worth it Pool threads using ThreadPoolExecutor Pool database connections Use bounded queues where appropriate Use Semaphore to implement other custom resource bounds Think about resource management early on Very common mistake to ignore resource exhaustion issues Harder to retrofit than to incorporate from the outset

Agenda Introduction Rules for writing thread-safe code Document thread-safety intent and implementation Encapsulate data and synchronization Prefer immutable objects Exploit effective immutability Rules for structuring concurrent applications Think tasks, not threads Build resource-management into your architecture Decouple identification of work from execution Rules for improving scalability Find and eliminate the serialization

Find the serialization Performance is a measure of how fast Learning to work faster increases your performance Scalability is a measure of how much more work could be done with more resources Learning to delegate increases your scalability When problems get over a certain size, performance improvements won't get you there you need to scale If a problem got ten times bigger, how much more resources would I need to solve it? If you can just buy ten times as many CPUs (or memory or disks), then we say the problem scales linearly or perfectly

Find the serialization Processor speeds flattened out around 2003 Moore's law now gives us more cores, not faster ones Increasing throughput means keeping more cores busy Can no longer just buy a faster box to get a speedup Must write programs that take advantage of additional CPUs Just adding more cores may not improve throughput Tasks must be amenable to parallelization (Graphic 2006 Herb Sutter)

Find the serialization System throughput is governed by Amdahl's Law Divides work into serial and parallel portions Serial work cannot be sped up by adding resources Parallelizable work can be Most tasks have a mix of serial and parallel work Harvesting crops can be sped up with more workers But additional workers will not make them grow any faster Amdahl's Law says: Speedup F is the fraction that must be executed serially N is the number of available workers As N infinity, speedup 1/F With 50% serialization, can only speed up by a factor of two No matter how many processors 1 1 F F+ N

Find the serialization Every task has some sources of serialization You just have to know where to look The primary source of serialization is the exclusive lock The longer locks are held for, the worse it gets Even when tasks consist only of thread-local computation, there is still serialization inherent in task dispatching while (!shutdownrequested) { Task t = taskqueue.take(); // potential serialization Result r = t.dotask(); resultset.add(result); // potential serialization Accessing the task queue and the results container invariably involves serialization

Find the serialization To improve scalability, you have to find the serialization and break it up Can reduce lock-induced serialization in several ways Hold locks for less time get in, get out Move thread-local computation out of synchronized blocks But don't make them so small as to split atomic operations Replace synchronized counters with AtomicInteger Use lock splitting or lock striping to reduce lock contention Guards different state with different locks Reduces likelihood of lock contention Replace synchronized Map with ConcurrentHashMap

Find the serialization Can eliminate locking entirely in some cases Replace mutable objects with immutable ones Replace shared objects with thread-local ones Confine objects to a specific thread (as in Swing) Consider ThreadLocal for heavyweight mutable objects that don't need to be shared (e.g., GregorianCalendar) Signs that a concurrent program is bound by locking and not by CPU resources Total CPU utilization < 100% High percentage of kernel CPU usage

Questions?