Fast Java Programs. Jacob

Size: px
Start display at page:

Download "Fast Java Programs. Jacob"

Transcription

1 Fast Java Programs

2 Outline Intro The Stack Measuring Architecture Patterns & APIs Q&A

3 Outline Intro The Stack Measuring Architecture Patterns & APIs Q&A

4

5

6

7 Why?

8 Premature optimization (is not a free ticket to write dumb code)

9 Key points Perform science Mechanical sympathy Performance comes from architecture Build an arsenal of known unknowns

10 Outline Intro The Stack Measuring Architecture Patterns & APIs Q&A

11

12 The Java Language JVM OS Hardware

13 The Java Language JVM OS Hardware

14 CPU PU PU PU PU L/S L/S S S

15 long sum = 0; for( int i=0; i<vals.length; i++ ) { sum += vals[i]; } CPU 0.5ns L1 Cache ~64k cost: 128 loop iterations 5.0ns L2 Cache ~256k cost: 1280 loop iterations 15.0ns LLC 70.0ns Main Memory 20M cost: loop iterations

16 CPU L1 Cache L2 Cache LLC Main Memory

17 store x = 12 store y = 14 Can be eliminated store x = 10

18 CPU L1 Cache L2 Cache LLC Main Memory

19 CPU L1 Cache L2 Cache LLC Main Memory

20 64b (16 java ints) LLC

21 CPU L1 Cache L2 Cache LLC Main Memory

22 CPU CPU L1 Cache L1 Cache L2 Cache L2 Cache LLC Main Memory

23 CPU CPU L1 Cache L1 Cache L2 Cache L2 Cache LLC Main Memory

24 CPU CPU CPU CPU L1 Cache L1 Cache L1 Cache L1 Cache L2 Cache L2 Cache L2 Cache L2 Cache LLC LLC Main Memory Main Memory

25 CPU CPU CPU CPU L1 Cache L1 Cache L1 Cache L1 Cache L2 Cache L2 Cache L2 Cache L2 Cache LLC LLC QPI Main Memory QPI Main Memory

26 CPU CPU CPU CPU L1 Cache L1 Cache L1 Cache L1 Cache L2 Cache L2 Cache L2 Cache L2 Cache LLC LLC QPI Main Memory SSD PCIe QPI Network Main Memory

27 L1 Cache L1 Cache L1 Cache L1 Cache L2 Cache L2 Cache L2 Cache L2 Cache LLC LLC QPI 70.0ns CPU CPU CPU CPU Main Memory SSD PCIe QPI Network Main Memory

28 CPU CPU CPU CPU L1 Cache L1 Cache L1 Cache L1 Cache L2 Cache L2 Cache L2 Cache L2 Cache LLC LLC QPI Main Memory SSD PCIe QPI Main Memory Network 300.0ns

29

30 Core

31 LLC 20MB

32 Memory Bus

33 Other stuff - PCIe, QPI(?) etc.

34 CPU CPU CPU CPU L1 Cache L1 Cache L1 Cache L1 Cache L2 Cache L2 Cache L2 Cache L2 Cache LLC LLC QPI Main Memory SSD PCIe QPI Network Main Memory

35

36 ns

37 CPU PU PU PU PU L/S L/S S S

38 CPU L1 Cache L2 Cache LLC Main Memory

39 CPU CPU L1 Cache L1 Cache L2 Cache L2 Cache LLC Main Memory

40 CPU CPU CPU CPU L1 Cache L1 Cache L1 Cache L1 Cache L2 Cache L2 Cache L2 Cache L2 Cache LLC LLC QPI Main Memory SSD PCIe QPI Network Main Memory

41

42 Key points Modern hardware is fractal Concurrent software is distributed system Memory is major choke point

43 The Java Language JVM OS Hardware

44 Concurrency! Eat lunch Do the dishes Walk the dog Read a book Time

45 Concurrency! Time

46

47 ~ ns Time

48 Time

49 Lunch Dishes Dog Book

50 Key points Cost of mode switch is high Thread scheduling is key part of architecture Java locks can cause context switches Work in batches

51 The Java Language JVM OS Hardware

52 Managed Memory (GC) VM/Compiler (HotSpot) OS Abstraction (StdLib)

53 Managed Memory (GC) VM/Compiler (HotSpot) OS Abstraction (StdLib)

54 Managed Memory (GC) VM/Compiler (HotSpot) OS Abstraction (StdLib)

55 Interpreter int add( int a, int b ) { return a + b; } javac Client JIT (C1) mov add add pop %rdx,%rax %edx,%eax $0x30,%rsp %rbp Server JIT (C2) CPU

56 int dothing( int var ) { return stuff(var); } # int stuff( int ) mov %rdx,%rax add %edx,%eax jmp %idunno # void baz() pop %rbp.. jmp %idunno # int bar() mov %rdx,%rax.. jmp %idunno # void mov add add pop jmp add( int ) %rdx,%rax %edx,%eax $0x30,%rsp %rbp %idunno

57 Managed Memory (GC) VM/Compiler (HotSpot) OS Abstraction (StdLib)

58

59 new Object()

60 Key points HotSpot is fantastic, help it help you new is not free, not even close the first time you meet a new JVM, turn on GC logging, and keep it on.

61 Outline Intro The Stack Measuring Architecture Patterns & APIs Q&A

62

63

64 There are.. Lies Damn lies Statistics Dynamically profiled code that rewrites itself while it executes

65 ops / ns public void bench() { Math.log(Math.PI); }

66 ops / ns public double bench() { return Math.log(Math.PI); }

67 ops / ns public int bench(int reps) { int s = 0; for(int i = 0; i < reps; i++) { s += (x + y); } return s; }

68

69 @State( Scope.Thread ) class MyBenchmark { private double pi = Math.PI; ops / public double bench() { return Math.log(pi); } }

70

71 Angry Users Average

72

73 Credit: Azul Systems

74

75 Outline Intro The Stack Measuring Architecture Patterns & APIs Q&A

76 HTTP Parser Calls Router Processors Services Calls Endpoint

77 HTTP Parser Parsing Uses HTTP Parser Calls Router Routing Uses Router Calls Endpoint Endpoint Execution

78 Thread A HTTP Parser Parsing Calls Routing Uses HTTP Parser Uses Router Router Calls Endpoint Endpoint Execution Thread B

79 Thread A Owns Parsing HTTP Parser s ad Re Uses Router ns w O Routing Uses Endpoint Execution RAM Thread B

80 interface Locks { void lock( long processid, long resourceid ); }

81 interface Locks { void lock( long processid, long resourceid ); } interface Locks { interface Client extends AutoCloseable { void lock( long resourceid ); } Client newclient(); }

82 Key points Architecture > micro optimizations Design with processors and services Single-writer principle Make multi-client services explicit

83 Outline Intro The Stack Measuring Architecture Patterns & APIs Q&A

84 Patterns to help GC

85 // Iterator interface MyEvents extends Iterator<Event> { boolean hasnext(); Event next(); } // Cursor interface MyEvents extends Event { boolean next(); }

86 // Regular heap new byte[..] // "Off-heap" ByteBuffer.allocateDirect(..); // malloc unsafe.allocatememory(..); // realloc unsafe.reallocatememory(..); // free unsafe.freememory(..);

87 Benefits of "raw" memory: - No impact on GC*. Remember, GC time is (usually) relative to heap size. Significantly faster in some cases (1-100x, test!) Channels API pass direct BB straight through, no copying *as long as you keep the memory around!

88 // Classic thing.handle( new MyEvent(.. ) ); // Reset-pattern thing.handle( event.reset(.. ) );

89 class Foo { private final Thing thing =..; public void sendevent() { thing.handle( new Event(.. ) ); } }

90 class Foo { private final Thing thing =..; private final Event event = new Event(); public void sendevent() { thing.handle( event.reset(.. ) ); } }

91 Pooling: Now you have two problems! Or?

92

93 Map<Integer, Object> map =..; int mykey = 12; // Ends up on stack ( = ~free ) map.put( 12,.. ); // Creates Object(!= free )

94 public void myhelpfulmethod( int... vals ) {.. } myhelpfulmethod( 12 ); // Creates array

95 Other tips & tricks

96 int readbyreaderthread; int writtenbywriterthread; // With writer thread running, reader performance: reads/ns // Without writer thread running reads/ns (!)

97 CPU X CPU Y X Cache Cache X Y Y LLC Main Memory

98

99 // Java int writtenbywriterthread; // Java 7 long p01, p02, p03, p04, p05, p06, p07, p08; int writtenbywriterthread;

100 The power of final & class

101 -XX:MinInliningThreshold -XX:MaxInlineSize

102 Collection libraries

103 Outline Intro The Stack Measuring Architecture Patterns & APIs Q&A

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder JAVA PERFORMANCE PR SW2 S18 Dr. Prähofer DI Leopoldseder OUTLINE 1. What is performance? 1. Benchmarking 2. What is Java performance? 1. Interpreter vs JIT 3. Tools to measure performance 4. Memory Performance

More information

Low latency & Mechanical Sympathy: Issues and solutions

Low latency & Mechanical Sympathy: Issues and solutions Low latency & Mechanical Sympathy: Issues and solutions Jean-Philippe BEMPEL Performance Architect @jpbempel http://jpbempel.blogspot.com ULLINK 2016 Low latency order router pure Java SE application FIX

More information

Memory Management: The Details

Memory Management: The Details Lecture 10 Memory Management: The Details Sizing Up Memory Primitive Data Types Complex Data Types byte: char: short: basic value (8 bits) 1 byte 2 bytes Pointer: platform dependent 4 bytes on 32 bit machine

More information

High Performance Managed Languages. Martin Thompson

High Performance Managed Languages. Martin Thompson High Performance Managed Languages Martin Thompson - @mjpt777 Really, what s your preferred platform for building HFT applications? Why would you build low-latency applications on a GC ed platform? Some

More information

High Performance Managed Languages. Martin Thompson

High Performance Managed Languages. Martin Thompson High Performance Managed Languages Martin Thompson - @mjpt777 Really, what is your preferred platform for building HFT applications? Why do you build low-latency applications on a GC ed platform? Agenda

More information

the gamedesigninitiative at cornell university Lecture 9 Memory Management

the gamedesigninitiative at cornell university Lecture 9 Memory Management Lecture 9 Gaming Memory Constraints Redux Wii-U Playstation 4 2GB of RAM 1GB dedicated to OS Shared with GPGPU 8GB of RAM Shared GPU, 8-core CPU OS footprint unknown 2 Two Main Concerns with Memory Getting

More information

Java Performance: The Definitive Guide

Java Performance: The Definitive Guide Java Performance: The Definitive Guide Scott Oaks Beijing Cambridge Farnham Kbln Sebastopol Tokyo O'REILLY Table of Contents Preface ix 1. Introduction 1 A Brief Outline 2 Platforms and Conventions 2 JVM

More information

Understanding Hardware Transactional Memory

Understanding Hardware Transactional Memory Understanding Hardware Transactional Memory Gil Tene, CTO & co-founder, Azul Systems @giltene 2015 Azul Systems, Inc. Agenda Brief introduction What is Hardware Transactional Memory (HTM)? Cache coherence

More information

StackVsHeap SPL/2010 SPL/20

StackVsHeap SPL/2010 SPL/20 StackVsHeap Objectives Memory management central shared resource in multiprocessing RTE memory models that are used in Java and C++ services for Java/C++ programmer from RTE (JVM / OS). Perspectives of

More information

Priming Java for Speed

Priming Java for Speed Priming Java for Speed Getting Fast & Staying Fast Gil Tene, CTO & co-founder, Azul Systems 2013 Azul Systems, Inc. High level agenda Intro Java realities at Load Start A whole bunch of compiler optimization

More information

A JVM Does What? Eva Andreasson Product Manager, Azul Systems

A JVM Does What? Eva Andreasson Product Manager, Azul Systems A JVM Does What? Eva Andreasson Product Manager, Azul Systems Presenter Eva Andreasson Innovator & Problem solver Implemented the Deterministic GC of JRockit Real Time Awarded patents on GC heuristics

More information

Taming the Java Virtual Machine. Li Haoyi, Chicago Scala Meetup, 19 Apr 2017

Taming the Java Virtual Machine. Li Haoyi, Chicago Scala Meetup, 19 Apr 2017 Taming the Java Virtual Machine Li Haoyi, Chicago Scala Meetup, 19 Apr 2017 Who Am I? Previously: Dropbox Engineering Currently: Bright Technology Services - Data Science, Scala consultancy Fluent Code

More information

Optimizing the Data Integration Service to Process Concurrent Web Services

Optimizing the Data Integration Service to Process Concurrent Web Services Optimizing the Data Integration Service to Process Concurrent Web Services 2012 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic,

More information

Computer Memory. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

Computer Memory. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1 Computer Memory Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1 Warm Up public int sum1(int n, int m, int[][] table) { int output = 0; for (int i = 0; i < n; i++) { for (int j = 0; j

More information

Running class Timing on Java HotSpot VM, 1

Running class Timing on Java HotSpot VM, 1 Compiler construction 2009 Lecture 3. A first look at optimization: Peephole optimization. A simple example A Java class public class A { public static int f (int x) { int r = 3; int s = r + 5; return

More information

143A: Principles of Operating Systems. Lecture 6: Address translation. Anton Burtsev January, 2017

143A: Principles of Operating Systems. Lecture 6: Address translation. Anton Burtsev January, 2017 143A: Principles of Operating Systems Lecture 6: Address translation Anton Burtsev January, 2017 Address translation Segmentation Descriptor table Descriptor table Base address 0 4 GB Limit

More information

Heap Off Memory WTF?? Reducing Heap memory stress

Heap Off Memory WTF?? Reducing Heap memory stress Heap Off Memory WTF?? Reducing Heap memory stress Abstract * Java memory fundamental * heap off memory principles * heap-off cache with Apache DirectMemory /me Olivier Lamy * Open Source Architect @Talend

More information

Chapter 2 Processes and Threads

Chapter 2 Processes and Threads MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 2 Processes and Threads The Process Model Figure 2-1. (a) Multiprogramming of four programs. (b) Conceptual model of four independent,

More information

Optimising Multicore JVMs. Khaled Alnowaiser

Optimising Multicore JVMs. Khaled Alnowaiser Optimising Multicore JVMs Khaled Alnowaiser Outline JVM structure and overhead analysis Multithreaded JVM services JVM on multicore An observational study Potential JVM optimisations Basic JVM Services

More information

Compiler construction 2009

Compiler construction 2009 Compiler construction 2009 Lecture 3 JVM and optimization. A first look at optimization: Peephole optimization. A simple example A Java class public class A { public static int f (int x) { int r = 3; int

More information

Hierarchical PLABs, CLABs, TLABs in Hotspot

Hierarchical PLABs, CLABs, TLABs in Hotspot Hierarchical s, CLABs, s in Hotspot Christoph M. Kirsch ck@cs.uni-salzburg.at Hannes Payer hpayer@cs.uni-salzburg.at Harald Röck hroeck@cs.uni-salzburg.at Abstract Thread-local allocation buffers (s) are

More information

Introduction to Java

Introduction to Java Introduction to Java Module 1: Getting started, Java Basics 22/01/2010 Prepared by Chris Panayiotou for EPL 233 1 Lab Objectives o Objective: Learn how to write, compile and execute HelloWorld.java Learn

More information

Virtual Machine. Part I: Stack Arithmetic. Building a Modern Computer From First Principles.

Virtual Machine. Part I: Stack Arithmetic. Building a Modern Computer From First Principles. Virtual Machine Part I: Stack Arithmetic Building a Modern Computer From First Principles www.nand2tetris.org Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 7:

More information

Necessar(il)y Evil dealing with benchmarks, ugh. Aleksey

Necessar(il)y Evil dealing with benchmarks, ugh. Aleksey Necessar(il)y Evil dealing with benchmarks, ugh Aleksey Shipilev aleksey.shipilev@oracle.com, @shipilev The following is intended to outline our general product direction. It is intended for information

More information

Compiling Techniques

Compiling Techniques Lecture 10: Introduction to 10 November 2015 Coursework: Block and Procedure Table of contents Introduction 1 Introduction Overview Java Virtual Machine Frames and Function Call 2 JVM Types and Mnemonics

More information

Zing Vision. Answering your toughest production Java performance questions

Zing Vision. Answering your toughest production Java performance questions Zing Vision Answering your toughest production Java performance questions Outline What is Zing Vision? Where does Zing Vision fit in your Java environment? Key features How it works Using ZVRobot Q & A

More information

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1 Agenda CSE P 501 Compilers Java Implementation JVMs, JITs &c Hal Perkins Summer 2004 Java virtual machine architecture.class files Class loading Execution engines Interpreters & JITs various strategies

More information

Why do we care about parallel?

Why do we care about parallel? Threads 11/15/16 CS31 teaches you How a computer runs a program. How the hardware performs computations How the compiler translates your code How the operating system connects hardware and software The

More information

CSCI 136 Written Exam #1 Fundamentals of Computer Science II Spring 2013

CSCI 136 Written Exam #1 Fundamentals of Computer Science II Spring 2013 CSCI 136 Written Exam #1 Fundamentals of Computer Science II Spring 2013 Name: This exam consists of 5 problems on the following 6 pages. You may use your double-sided hand-written 8 ½ x 11 note sheet

More information

238P: Operating Systems. Lecture 5: Address translation. Anton Burtsev January, 2018

238P: Operating Systems. Lecture 5: Address translation. Anton Burtsev January, 2018 238P: Operating Systems Lecture 5: Address translation Anton Burtsev January, 2018 Two programs one memory Very much like car sharing What are we aiming for? Illusion of a private address space Identical

More information

Designing experiments Performing experiments in Java Intel s Manycore Testing Lab

Designing experiments Performing experiments in Java Intel s Manycore Testing Lab Designing experiments Performing experiments in Java Intel s Manycore Testing Lab High quality results that capture, e.g., How an algorithm scales Which of several algorithms performs best Pretty graphs

More information

C06: Memory Management

C06: Memory Management CISC 7310X C06: Memory Management Hui Chen Department of Computer & Information Science CUNY Brooklyn College 3/8/2018 CUNY Brooklyn College 1 Outline Recap & issues Project 1 feedback Memory management:

More information

Processes. CS 416: Operating Systems Design, Spring 2011 Department of Computer Science Rutgers University

Processes. CS 416: Operating Systems Design, Spring 2011 Department of Computer Science Rutgers University Processes Design, Spring 2011 Department of Computer Science Von Neuman Model Both text (program) and data reside in memory Execution cycle Fetch instruction Decode instruction Execute instruction CPU

More information

Shenandoah: Theory and Practice. Christine Flood Roman Kennke Principal Software Engineers Red Hat

Shenandoah: Theory and Practice. Christine Flood Roman Kennke Principal Software Engineers Red Hat Shenandoah: Theory and Practice Christine Flood Roman Kennke Principal Software Engineers Red Hat 1 Shenandoah Christine Flood Roman Kennke Principal Software Engineers Red Hat 2 Shenandoah Why do we need

More information

The X86 Assembly Language Instruction Nop Means

The X86 Assembly Language Instruction Nop Means The X86 Assembly Language Instruction Nop Means As little as 1 CPU cycle is "wasted" to execute a NOP instruction (the exact and other "assembly tricks", as explained also in this thread on Programmers.

More information

Don t Get Caught In the Cold, Warm-up Your JVM Understand and Eliminate JVM Warm-up Overhead in Data-parallel Systems

Don t Get Caught In the Cold, Warm-up Your JVM Understand and Eliminate JVM Warm-up Overhead in Data-parallel Systems Don t Get Caught In the Cold, Warm-up Your JVM Understand and Eliminate JVM Warm-up Overhead in Data-parallel Systems David Lion, Adrian Chiu, Hailong Sun*, Xin Zhuang, Nikola Grcevski, Ding Yuan University

More information

Experiences with Multi-threading and Dynamic Class Loading in a Java Just-In-Time Compiler

Experiences with Multi-threading and Dynamic Class Loading in a Java Just-In-Time Compiler , Compilation Technology Experiences with Multi-threading and Dynamic Class Loading in a Java Just-In-Time Compiler Daryl Maier, Pramod Ramarao, Mark Stoodley, Vijay Sundaresan TestaRossa JIT compiler

More information

Just In Time Compilation

Just In Time Compilation Just In Time Compilation JIT Compilation: What is it? Compilation done during execution of a program (at run time) rather than prior to execution Seen in today s JVMs and elsewhere Outline Traditional

More information

Java TM Introduction. Renaud Florquin Isabelle Leclercq. FloConsult SPRL.

Java TM Introduction. Renaud Florquin Isabelle Leclercq. FloConsult SPRL. Java TM Introduction Renaud Florquin Isabelle Leclercq FloConsult SPRL http://www.floconsult.be mailto:info@floconsult.be Java Technical Virtues Write once, run anywhere Get started quickly Write less

More information

NGN Progress Report. Table of Contents

NGN Progress Report. Table of Contents NGN Progress Report Title: Simulator Scalability Testing Prepared by: Richard Nelson Date: 08 February, 2006 Table of Contents Introduction...2 Simulators...2 Test Method...2 Simulation Model...2 CPU Utilisation...2

More information

Synchronization. CS 475, Spring 2018 Concurrent & Distributed Systems

Synchronization. CS 475, Spring 2018 Concurrent & Distributed Systems Synchronization CS 475, Spring 2018 Concurrent & Distributed Systems Review: Threads: Memory View code heap data files code heap data files stack stack stack stack m1 m1 a1 b1 m2 m2 a2 b2 m3 m3 a3 m4 m4

More information

Welcome to the session...

Welcome to the session... Welcome to the session... Copyright 2013, Oracle and/or its affiliates. All rights reserved. 02/22/2013 1 The following is intended to outline our general product direction. It is intended for information

More information

OS-caused Long JVM Pauses - Deep Dive and Solutions

OS-caused Long JVM Pauses - Deep Dive and Solutions OS-caused Long JVM Pauses - Deep Dive and Solutions Zhenyun Zhuang LinkedIn Corp., Mountain View, California, USA https://www.linkedin.com/in/zhenyun Zhenyun@gmail.com 2016-4-21 Outline q Introduction

More information

micro:bit runtime ARM mbed Nordic nrf51-sdk

micro:bit runtime ARM mbed Nordic nrf51-sdk Block Editor Touch Develop PXT Java Script C / C++ Python Microsoft Microsoft Microsoft Code Kingdoms ARM mbed PSF +friends micro:bit runtime ARM mbed Nordic nrf51-sdk runtime Applications Bluetooth Profile

More information

Advanced CUDA Optimizations. Umar Arshad ArrayFire

Advanced CUDA Optimizations. Umar Arshad ArrayFire Advanced CUDA Optimizations Umar Arshad (@arshad_umar) ArrayFire (@arrayfire) ArrayFire World s leading GPU experts In the industry since 2007 NVIDIA Partner Deep experience working with thousands of customers

More information

PennBench: A Benchmark Suite for Embedded Java

PennBench: A Benchmark Suite for Embedded Java WWC5 Austin, TX. Nov. 2002 PennBench: A Benchmark Suite for Embedded Java G. Chen, M. Kandemir, N. Vijaykrishnan, And M. J. Irwin Penn State University http://www.cse.psu.edu/~mdl Outline Introduction

More information

Project. there are a couple of 3 person teams. a new drop with new type checking is coming. regroup or see me or forever hold your peace

Project. there are a couple of 3 person teams. a new drop with new type checking is coming. regroup or see me or forever hold your peace Project there are a couple of 3 person teams regroup or see me or forever hold your peace a new drop with new type checking is coming using it is optional 1 Compiler Architecture source code Now we jump

More information

What the CPU Sees Basic Flow Control Conditional Flow Control Structured Flow Control Functions and Scope. C Flow Control.

What the CPU Sees Basic Flow Control Conditional Flow Control Structured Flow Control Functions and Scope. C Flow Control. C Flow Control David Chisnall February 1, 2011 Outline What the CPU Sees Basic Flow Control Conditional Flow Control Structured Flow Control Functions and Scope Disclaimer! These slides contain a lot of

More information

Profiling & Optimization

Profiling & Optimization Lecture 11 Sources of Game Performance Issues? 2 Avoid Premature Optimization Novice developers rely on ad hoc optimization Make private data public Force function inlining Decrease code modularity removes

More information

Updates. Office B116E Office Hours : Friday 1 p.m. 2 p.m. Also by appointment Office Hours Purpose:

Updates. Office B116E Office Hours : Friday 1 p.m. 2 p.m. Also by appointment Office Hours Purpose: CS 180 Amit Gupta Updates Office B116E Office Hours : Friday 1 p.m. 2 p.m. Also by appointment gupta75@purdue.edu Office Hours Purpose: Course Material and Recitation No project questions Quizzes I will

More information

Synchronization COMPSCI 386

Synchronization COMPSCI 386 Synchronization COMPSCI 386 Obvious? // push an item onto the stack while (top == SIZE) ; stack[top++] = item; // pop an item off the stack while (top == 0) ; item = stack[top--]; PRODUCER CONSUMER Suppose

More information

Separating Access Control Policy, Enforcement, and Functionality in Extensible Systems. Robert Grimm University of Washington

Separating Access Control Policy, Enforcement, and Functionality in Extensible Systems. Robert Grimm University of Washington Separating Access Control Policy, Enforcement, and Functionality in Extensible Systems Robert Grimm University of Washington Extensions Added to running system Interact through low-latency interfaces Form

More information

Compiler Design Spring 2017

Compiler Design Spring 2017 Compiler Design Spring 2017 6.0 Runtime system and object layout Dr. Zoltán Majó Compiler Group Java HotSpot Virtual Machine Oracle Corporation 1 Runtime system Some open issues from last time Handling

More information

CS 3305 Intro to Threads. Lecture 6

CS 3305 Intro to Threads. Lecture 6 CS 3305 Intro to Threads Lecture 6 Introduction Multiple applications run concurrently! This means that there are multiple processes running on a computer Introduction Applications often need to perform

More information

LMAX Disruptor 3.0.! Advanced Patterns and details (Making the fast,

LMAX Disruptor 3.0.! Advanced Patterns and details (Making the fast, LMAX Disruptor 3.0 Advanced Patterns and details (Making the fast, faster) @mikeb2701 Agenda Prove that memory access is key to software performance What is LMAX? What is LMAX? And why do we care about

More information

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java)

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java) COMP 412 FALL 2017 Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java) Copyright 2017, Keith D. Cooper & Zoran Budimlić, all rights reserved. Students enrolled in Comp 412 at Rice

More information

Operating Systems CMPSCI 377, Lec 2 Intro to C/C++ Prashant Shenoy University of Massachusetts Amherst

Operating Systems CMPSCI 377, Lec 2 Intro to C/C++ Prashant Shenoy University of Massachusetts Amherst Operating Systems CMPSCI 377, Lec 2 Intro to C/C++ Prashant Shenoy University of Massachusetts Amherst Department of Computer Science Why C? Low-level Direct access to memory WYSIWYG (more or less) Effectively

More information

Ordering Within Expressions. Control Flow. Side-effects. Side-effects. Order of Evaluation. Misbehaving Floating-Point Numbers.

Ordering Within Expressions. Control Flow. Side-effects. Side-effects. Order of Evaluation. Misbehaving Floating-Point Numbers. Control Flow COMS W4115 Prof. Stephen A. Edwards Spring 2003 Columbia University Department of Computer Science Control Flow Time is Nature s way of preventing everything from happening at once. Scott

More information

Control Flow COMS W4115. Prof. Stephen A. Edwards Fall 2006 Columbia University Department of Computer Science

Control Flow COMS W4115. Prof. Stephen A. Edwards Fall 2006 Columbia University Department of Computer Science Control Flow COMS W4115 Prof. Stephen A. Edwards Fall 2006 Columbia University Department of Computer Science Control Flow Time is Nature s way of preventing everything from happening at once. Scott identifies

More information

Project Loom Ron Pressler, Alan Bateman June 2018

Project Loom Ron Pressler, Alan Bateman June 2018 Project Loom Ron Pressler, Alan Bateman June 2018 Copyright 2018, Oracle and/or its affiliates. All rights reserved.!1 Safe Harbor Statement The following is intended to outline our general product direction.

More information

Chapter 6: Synchronization. Operating System Concepts 8 th Edition,

Chapter 6: Synchronization. Operating System Concepts 8 th Edition, Chapter 6: Synchronization, Silberschatz, Galvin and Gagne 2009 Outline Background The Critical-Section Problem Peterson s Solution Synchronization Hardware Semaphores Classic Problems of Synchronization

More information

Challenges in maintaing a high-performance Search-Engine written in Java

Challenges in maintaing a high-performance Search-Engine written in Java Challenges in maintaing a high-performance Search-Engine written in Java Simon Willnauer Apache Lucene Core Committer & PMC Chair simonw@apache.org / simon.willnauer@searchworkings.com 1 Who am I? Lucene

More information

WHO AM I.

WHO AM I. WHO AM I Christoph Engelbert (@noctarius2k) 8+ years of professional Java development Specialized to performance, GC, traffic topics Apache DirectMemory PMC Previous companies incl. Ubisoft and HRS Official

More information

Escape Analysis. Applications to ML and Java TM

Escape Analysis. Applications to ML and Java TM Escape Analysis. Applications to ML and Java TM Bruno Blanchet INRIA Rocquencourt Bruno.Blanchet@inria.fr December 2000 Overview 1. Introduction: escape analysis and applications. 2. Escape analysis 2.a

More information

COE318 Lecture Notes Week 3 (Week of Sept 17, 2012)

COE318 Lecture Notes Week 3 (Week of Sept 17, 2012) COE318 Lecture Notes: Week 3 1 of 8 COE318 Lecture Notes Week 3 (Week of Sept 17, 2012) Announcements Quiz (5% of total mark) on Wednesday, September 26, 2012. Covers weeks 1 3. This includes both the

More information

Profiling & Optimization

Profiling & Optimization Lecture 18 Sources of Game Performance Issues? 2 Avoid Premature Optimization Novice developers rely on ad hoc optimization Make private data public Force function inlining Decrease code modularity removes

More information

ANALYZING THE MOST COMMON PERFORMANCE AND MEMORY PROBLEMS IN JAVA. 18 October 2017

ANALYZING THE MOST COMMON PERFORMANCE AND MEMORY PROBLEMS IN JAVA. 18 October 2017 ANALYZING THE MOST COMMON PERFORMANCE AND MEMORY PROBLEMS IN JAVA 18 October 2017 Who am I? Working in Performance and Reliability Engineering Team at Hotels.com Part of Expedia Inc, handling $72billion

More information

New Java performance developments: compilation and garbage collection

New Java performance developments: compilation and garbage collection New Java performance developments: compilation and garbage collection Jeroen Borgers @jborgers #jfall17 Part 1: New in Java compilation Part 2: New in Java garbage collection 2 Part 1 New in Java compilation

More information

W4118: PC Hardware and x86. Junfeng Yang

W4118: PC Hardware and x86. Junfeng Yang W4118: PC Hardware and x86 Junfeng Yang A PC How to make it do something useful? 2 Outline PC organization x86 instruction set gcc calling conventions PC emulation 3 PC board 4 PC organization One or more

More information

Parallelism Marco Serafini

Parallelism Marco Serafini Parallelism Marco Serafini COMPSCI 590S Lecture 3 Announcements Reviews First paper posted on website Review due by this Wednesday 11 PM (hard deadline) Data Science Career Mixer (save the date!) November

More information

I/O Devices. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

I/O Devices. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau) I/O Devices Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau) Hardware Support for I/O CPU RAM Network Card Graphics Card Memory Bus General I/O Bus (e.g., PCI) Canonical Device OS reads/writes

More information

Programming Language Concepts Scoping. Janyl Jumadinova January 31, 2017

Programming Language Concepts Scoping. Janyl Jumadinova January 31, 2017 Programming Language Concepts Scoping Janyl Jumadinova January 31, 2017 Scope Rules A scope is a program section of maximal size in which no bindings change, or at least in which no re-declarations are

More information

Top Ten Enterprise Java performance problems. Vincent Partington Xebia

Top Ten Enterprise Java performance problems. Vincent Partington Xebia Top Ten Enterprise Java performance problems and their solutions Vincent Partington Xebia Introduction Xebia is into Enterprise Java: Development Performance audits a.o. Lots of experience with performance

More information

Performance Profiling. Curtin University of Technology Department of Computing

Performance Profiling. Curtin University of Technology Department of Computing Performance Profiling Curtin University of Technology Department of Computing Objectives To develop a strategy to characterise the performance of Java applications benchmark to compare algorithm choices

More information

Just-In-Time Compilers & Runtime Optimizers

Just-In-Time Compilers & Runtime Optimizers COMP 412 FALL 2017 Just-In-Time Compilers & Runtime Optimizers Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved.

More information

Java and C II. CSE 351 Spring Instructor: Ruth Anderson

Java and C II. CSE 351 Spring Instructor: Ruth Anderson Java and C II CSE 351 Spring 2017 Instructor: Ruth Anderson Teaching Assistants: Dylan Johnson Kevin Bi Linxing Preston Jiang Cody Ohlsen Yufang Sun Joshua Curtis Administrivia Lab 5 Due TONIGHT! Fri 6/2

More information

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

Roadmap. Java: Assembly language: OS: Machine code: Computer system: Roadmap C: car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c); Assembly language: Machine code: Computer system: get_mpg: pushq movq... popq ret %rbp %rsp, %rbp

More information

Virtual Memory 2: demand paging

Virtual Memory 2: demand paging Virtual Memory : demand paging also: anatomy of a process Guillaume Salagnac Insa-Lyon IST Semester Fall 8 Reminder: OS duties CPU CPU cache (SRAM) main memory (DRAM) fast storage (SSD) large storage (disk)

More information

John Davies LJP Tech. Understand your data

John Davies LJP Tech. Understand your data John Davies LJP Tech Understand your data Understand your data John Davies CTO 11 th October 2016 @jtdavies It worked so well someone bought it - yesterday Copyright 2016 LJP Technologies Ltd. 3 Why do

More information

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1 CSE P 501 Compilers Java Implementation JVMs, JITs &c Hal Perkins Winter 2008 3/11/2008 2002-08 Hal Perkins & UW CSE V-1 Agenda Java virtual machine architecture.class files Class loading Execution engines

More information

CMPSC 311- Introduction to Systems Programming Module: Systems Programming

CMPSC 311- Introduction to Systems Programming Module: Systems Programming CMPSC 311- Introduction to Systems Programming Module: Systems Programming Professor Patrick McDaniel Fall 2015 WARNING Warning: for those not in the class, there is an unusually large number of people

More information

CSE 333 Lecture 9 - storage

CSE 333 Lecture 9 - storage CSE 333 Lecture 9 - storage Steve Gribble Department of Computer Science & Engineering University of Washington Administrivia Colin s away this week - Aryan will be covering his office hours (check the

More information

JPDM, A Structured approach To Performance Tuning. Copyright 2017 Kirk Pepperdine. All rights reserved

JPDM, A Structured approach To Performance Tuning. Copyright 2017 Kirk Pepperdine. All rights reserved JPDM, A Structured approach To Performance Tuning About Us Performance Consulting Java Performance Tuning Workshops Co-Founded jclarity Disclaimer Our Typical Customer Application isn t performing to project

More information

A new Mono GC. Paolo Molaro October 25, 2006

A new Mono GC. Paolo Molaro October 25, 2006 A new Mono GC Paolo Molaro lupus@novell.com October 25, 2006 Current GC: why Boehm Ported to the major architectures and systems Featurefull Very easy to integrate Handles managed pointers in unmanaged

More information

Realtime Search with Lucene. Michael

Realtime Search with Lucene. Michael Realtime Search with Lucene Michael Busch @michibusch michael@twitter.com buschmi@apache.org 1 Realtime Search with Lucene Agenda Introduction - Near-realtime Search (NRT) - Searching DocumentsWriter s

More information

Chapter 5: Threads. Outline

Chapter 5: Threads. Outline Department of Electr rical Eng ineering, Chapter 5: Threads 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Feng-Chia Unive ersity Outline Overview Multithreading Models Threading Issues 2 Depar

More information

Eliminating Global Interpreter Locks in Ruby through Hardware Transactional Memory

Eliminating Global Interpreter Locks in Ruby through Hardware Transactional Memory Eliminating Global Interpreter Locks in Ruby through Hardware Transactional Memory Rei Odaira, Jose G. Castanos and Hisanobu Tomari IBM Research and University of Tokyo April 8, 2014 Rei Odaira, Jose G.

More information

code://rubinius/technical

code://rubinius/technical code://rubinius/technical /GC, /cpu, /organization, /compiler weeee!! Rubinius New, custom VM for running ruby code Small VM written in not ruby Kernel and everything else in ruby http://rubini.us git://rubini.us/code

More information

A+ Computer Science -

A+ Computer Science - An array is a group of items all of the same type which are accessed through a single identifier. int[] nums = new int[10]; 0 1 2 3 4 5 6 7 8 9 nums 0 0 0 0 0 0 0 0 0 0 int[] nums; nums null null nothing

More information

Compilers. Lecture 2 Overview. (original slides by Sam

Compilers. Lecture 2 Overview. (original slides by Sam Compilers Lecture 2 Overview Yannis Smaragdakis, U. Athens Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Last time The compilation problem Source language High-level abstractions Easy

More information

Buffer overflows (a security interlude) Address space layout the stack discipline + C's lack of bounds-checking HUGE PROBLEM

Buffer overflows (a security interlude) Address space layout the stack discipline + C's lack of bounds-checking HUGE PROBLEM Buffer overflows (a security interlude) Address space layout the stack discipline + C's lack of bounds-checking HUGE PROBLEM x86-64 Linux Memory Layout 0x00007fffffffffff not drawn to scale Stack... Caller

More information

CS 261 Fall Mike Lam, Professor. Virtual Memory

CS 261 Fall Mike Lam, Professor. Virtual Memory CS 261 Fall 2016 Mike Lam, Professor Virtual Memory Topics Operating systems Address spaces Virtual memory Address translation Memory allocation Lingering questions What happens when you call malloc()?

More information

A Quest for Predictable Latency Adventures in Java Concurrency. Martin Thompson

A Quest for Predictable Latency Adventures in Java Concurrency. Martin Thompson A Quest for Predictable Latency Adventures in Java Concurrency Martin Thompson - @mjpt777 If a system does not respond in a timely manner then it is effectively unavailable 1. It s all about the Blocking

More information

Disruptor Using High Performance, Low Latency Technology in the CERN Control System

Disruptor Using High Performance, Low Latency Technology in the CERN Control System Disruptor Using High Performance, Low Latency Technology in the CERN Control System ICALEPCS 2015 21/10/2015 2 The problem at hand 21/10/2015 WEB3O03 3 The problem at hand CESAR is used to control the

More information

Informatica Data Explorer Performance Tuning

Informatica Data Explorer Performance Tuning Informatica Data Explorer Performance Tuning 2011 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)

More information

Benchmarking/Profiling (In)sanity

Benchmarking/Profiling (In)sanity Benchmarking/Profiling (In)sanity It all started when I stumbled upon AppendableWriter in guava which is nothing more than an adapter class that adapts an Appendable to a Writer. When looking over the

More information

Transparent Pointer Compression for Linked Data Structures

Transparent Pointer Compression for Linked Data Structures Transparent Pointer Compression for Linked Data Structures lattner@cs.uiuc.edu Vikram Adve vadve@cs.uiuc.edu June 12, 2005 MSP 2005 http://llvm.cs.uiuc.edu llvm.cs.uiuc.edu/ Growth of 64-bit computing

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Multithreading Multiprocessors Description Multiple processing units (multiprocessor) From single microprocessor to large compute clusters Can perform multiple

More information

CS377P Programming for Performance Operating System Performance

CS377P Programming for Performance Operating System Performance CS377P Programming for Performance Operating System Performance Sreepathi Pai UTCS November 2, 2015 Outline 1 Effects of OS on Performance 2 Become the Kernel 3 Leverage the Kernel 4 Ignore the Kernel

More information

Under the Hood: The Java Virtual Machine. Lecture 23 CS2110 Fall 2008

Under the Hood: The Java Virtual Machine. Lecture 23 CS2110 Fall 2008 Under the Hood: The Java Virtual Machine Lecture 23 CS2110 Fall 2008 Compiling for Different Platforms Program written in some high-level language (C, Fortran, ML,...) Compiled to intermediate form Optimized

More information