Profiling & Optimization

Similar documents
Profiling & Optimization

Optimizing Your Android Applications

Vertical Profiling: Understanding the Behavior of Object-Oriented Applications

CSE 403 Lecture 18. Performance Testing. Reading: Code Complete, Ch (McConnell) slides created by Marty Stepp

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

Using Intel VTune Amplifier XE and Inspector XE in.net environment

Low level security. Andrew Ruef

Lecture 9 Dynamic Compilation

CSE 120 Principles of Operating Systems

Operating Systems Design Fall 2010 Exam 1 Review. Paul Krzyzanowski

Running class Timing on Java HotSpot VM, 1

Chapter 2: Operating-System Structures. Operating System Concepts 9 th Edit9on

Introduction to Concurrent Software Systems. CSCI 5828: Foundations of Software Engineering Lecture 08 09/17/2015

Zing Vision. Answering your toughest production Java performance questions

Computer Organization: A Programmer's Perspective

Memory Management: The Details

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

Introduction to Concurrent Software Systems. CSCI 5828: Foundations of Software Engineering Lecture 12 09/29/2016

Performance Profiling

Compiler construction 2009

EECS 482 Introduction to Operating Systems

Performance analysis basics

Systems software design. Software build configurations; Debugging, profiling & Quality Assurance tools

Computer Systems A Programmer s Perspective 1 (Beta Draft)

Process- Concept &Process Scheduling OPERATING SYSTEMS

Chapter 2. Operating-System Structures

CSE 303: Concepts and Tools for Software Development

Programming with MPI

Code Profiling. CSE260, Computer Science B: Honors Stony Brook University

Dynamic Memory Management

Chapter 2: Operating-System Structures

Introduction to Parallel Performance Engineering

by Marina Cholakyan, Hyduke Noshadi, Sepehr Sahba and Young Cha

CS 160: Interactive Programming

o Code, executable, and process o Main memory vs. virtual memory

the gamedesigninitiative at cornell university Lecture 9 Memory Management

Chapter 2: Operating-System Structures. Operating System Concepts 9 th Edition

Operating Systems (2INC0) 2018/19. Introduction (01) Dr. Tanir Ozcelebi. Courtesy of Prof. Dr. Johan Lukkien. System Architecture and Networking Group

Why do we care about parallel?

Computation Abstractions. Processes vs. Threads. So, What Is a Thread? CMSC 433 Programming Language Technologies and Paradigms Spring 2007

CSC 1600 Memory Layout for Unix Processes"

Section 7: Wait/Exit, Address Translation

OPERATING SYSTEM PROJECT: SOS

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley

Lectures 5-6: Introduction to C

Dynamic Memory Management! Goals of this Lecture!

DNWSH - Version: 2.3..NET Performance and Debugging Workshop

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

A new Mono GC. Paolo Molaro October 25, 2006

Process Concepts. CSC400 - Operating Systems. 3. Process Concepts. J. Sumey

CERN IT Technical Forum

Dynamic Memory Management

CHAPTER 2: SYSTEM STRUCTURES. By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

CS 475. Process = Address space + one thread of control Concurrent program = multiple threads of control

Review: Easy Piece 1

Recitation 2/18/2012

Operating Systems Design Exam 1 Review: Spring 2012

Debugging and Profiling

Managed runtimes & garbage collection

CMPSCI 377: Operating Systems Exam 1: Processes, Threads, CPU Scheduling and Synchronization. October 9, 2002

COSC Software Engineering. Lecture 16: Managing Memory Managers

Operating Systems Design Exam 2 Review: Spring 2011

CMSC 330: Organization of Programming Languages

Spring CS 170 Exercise Set 1 (Updated with Part III)

CS 416: Opera-ng Systems Design March 23, 2012

Timing programs with time

CS 31: Intro to Systems Threading & Parallel Applications. Kevin Webb Swarthmore College November 27, 2018

Designing experiments Performing experiments in Java Intel s Manycore Testing Lab

the gamedesigninitiative at cornell university Lecture 10 Memory Management

Learning from Bad Examples. CSCI 5828: Foundations of Software Engineering Lecture 25 11/18/2014

Is your profiler speaking the same language as you? Simon

Run-time Environments - 3

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions

DB2 10 Capturing Tuning and Trending for SQL Workloads - a resource and cost saving approach

Performance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs

Parallel Programming: Background Information

Parallel Programming: Background Information

Deallocation Mechanisms. User-controlled Deallocation. Automatic Garbage Collection

The Art and Science of Memory Allocation

Operating Systems. Lecture Process Scheduling. Golestan University. Hossein Momeni

the gamedesigninitiative at cornell university Lecture 7 C++ Overview

Lecture 11 Code Optimization I: Machine Independent Optimizations. Optimizing Compilers. Limitations of Optimizing Compilers

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

CS 326: Operating Systems. Process Execution. Lecture 5

21. This is a screenshot of the Android Studio Debugger. It shows the current thread and the object tree for a certain variable.

OpenMP at Sun. EWOMP 2000, Edinburgh September 14-15, 2000 Larry Meadows Sun Microsystems

J2EE Development Best Practices: Improving Code Quality

Lecture Notes on Garbage Collection

COMP 202 Recursion. CONTENTS: Recursion. COMP Recursion 1

G52CON: Concepts of Concurrency

Heap Management. Heap Allocation

MultiThreading. Object Orientated Programming in Java. Benjamin Kenwright

CS193k, Stanford Handout #10. HW2b ThreadBank

Advanced Topic: Efficient Synchronization

Inspiring Creative Fun Ysbrydoledig Creadigol Hwyl. App Inventor Workbook

Jackson Marusarz Intel Corporation

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime

Java Threads. Written by John Bell for CS 342, Spring 2018

ANITA S SUPER AWESOME RECITATION SLIDES

Exception Safe Coding

Transcription:

Lecture 11

Sources of Game Performance Issues? 2

Avoid Premature Optimization Novice developers rely on ad hoc optimization Make private data public Force function inlining Decrease code modularity removes function calls But this is a very bad idea Rarely gives significant performance benefits Non-modular code is very hard to maintain Write clean code first; optimize later 3

Performance Tuning Code follows an 80/20 rule (or even 90/10) 80% of run-time spent in 20% of code Optimizing or 80% provides little benefit Do nothing until you know what this 20% is Be careful in tuning performance Never overtune some inputs at expense of ors Always focus on overall algorithm first Think hard before making non-modular changes 4

What Can We Measure? Time Performance What code takes most time What is called most often How long I/O takes to finish Time to switch threads Time threads hold locks Time threads wait for locks Memory Performance Number of heap allocations Location of allocations Timing of allocations Location of releases Timing of releases (Location of memory leaks) 5

Static Analysis Analyze without running Relies on language features Major area of PL research Advantages Offline; no performance hit Can analyze deep properties Disadvantages Conservative; misses a lot Cannot capture user input Analysis Methods 6

Analysis Methods Profiling Analysis runs with program Record behavior of program Helps visualize this record Advantages More data than static anal. Can capture user input Disadvantages Hurts performance a lot May alter program behavior 7

Analysis Methods Static Analysis Analyze without running Relies on language features Major area of PL research Advantages Offline; no performance hit Can analyze deep properties Disadvantages Conservative; misses a lot Cannot capture user input Profiling Analysis runs with program Record behavior of program Helps visualize this record Advantages More data than static anal. Can capture user input Disadvantages Hurts performance a lot May alter program behavior 8

Static Analysis: Control Flow int sum = 0" boolean done = false;" for(int ii; ii<=5 &&!done;) {" }}" if (j >= 0) {" sum += j;" if (sum > 100) {" done = true;" } else {" }" i = i+1;" print(sum); p q q may be executed immediately after p F sum = 0 done = F i = 0 i <= 5 &&!done F T j >= 0 sum = sum + j sum > 100 done = T i = i+1 T endif print sum F T 9

Static Analysis: Flow Dependence int sum = 0" boolean done = false;" for(int ii; ii<=5 &&!done;) {" }}" if (j >= 0) {" sum += j;" if (sum > 100) {" done = true;" } else {" }" i = i+1;" print(sum); p q Value assigned at p is read at command q sum = 0 done = F i = 0 i <= 5 &&!done j >= 0 sum = sum + j sum > 100 done = T i = i+1 endif print sum 10

Model Checking Given a graph, logical formula ϕ ϕ expresses properties of graph Checker determines if is true Often applied to software Program as control-flow graph ϕ indicates acceptable paths F sum = 0 done = F i = 0 i <= 5 &&!done F T j >= 0 sum = sum + j sum > 100 done = T i = i+1 T endif print sum F T 11

Static Analysis: Applications Pointer analysis Look at pointer variables Determine possible values for variable at each place Can find memory leaks Deadlock detection Locks are flow dependency Determine possible owners of lock at each position Dead code analysis Heap X p 12

Example: Clang for iphone 13

Time Profiling 14

Time Profiling: Methods Software Code added to program Captures start of function Captures end of function Subtract to get time spent Calculate percentage at end Not completely accurate Changes actual program Also, how get time? Hardware Measurements in hardware Feature attached to CPU Does not change how program is run Simulate w/ hypervisors Virtual machine for Oss VM includes profiling measurement features Example: Xen Hypervisor 15

Time Profiling: Methods Time-Sampling Count at periodic intervals Wakes up from sleep Looks at parent function Adds that to count Relatively lower overhead Doesn t count everything Performance hit acceptable May miss small functions Instrumentation Count pre-specified places Specific function calls Hardware interrupts Different from sampling Still not getting everything But exact view of slice Used for targeted searches 16

Issues with Periodic Sampling Real Sampled 17

Issues with Periodic Sampling Real Sampled Modern profilers fix with random sampling 18

What Can We Measure? Time Performance What code takes most time What is called most often How long I/O takes to finish Time to switch threads Time threads hold locks Time threads wait for locks Memory Performance Number of heap allocations Location of allocations Timing of allocations Location of releases Timing of releases (Location of memory leaks)

What Can We Measure? Instrument? Time Performance What code takes most time What is called most often How long I/O takes to finish Time to switch threads Time threads hold locks Time threads wait for locks Memory Performance Number of heap allocations Location of allocations Timing of allocations Location of releases Timing of releases (Location of memory leaks) 20

Instrumentation: Memory Memory handled by malloc Basic C allocation method C++ new uses malloc Allocates raw bytes malloc can be instrumented Count number of mallocs Track malloc addresses Look for frees later on Finds memory leaks! p1 = malloc(4) p2 = malloc(5) p3 = malloc(6) free(p2) 21

Instrumentation: Memory 22

Profiling Tools General Java VisualVM (Built-in profiler from Sun/Oracle) Eclipse Test & Performance Tools Platform (TPTP) Android Dalvik Debug Monitor Server (DDMS) for traces TraceView helps visualize results of DDMS ios/x-code Instruments (wide variety of special tools) GNU gprof for sampled time profiling 23

Android Profiling // Non-profiled code Debug.startMethodTracing("profile"); // Profiled code Debug.stopMethodTracing(); // Non-profiled code captures everything Android App profile.trace Traceview 24

Android Profiling 25

Poor Man s Sampling Call Graph Create a hashtable Keys = pairs (a calls b) Values = time (time spent) Place code around call Code inside outer func. a Code before & after call b Records start and end time Put difference in hashtable Timing Use processor s timer Track time used by program System dependent function Java: System.nanoTime() Do not use wall clock Timer for whole system Includes or programs Java version: System.currentTimeMillis() 26

Poor Man s Sampling Call Graph Timing Create a hashtable Keys = pairs (a calls b) Values = time (time spent) Place code around call Code inside outer func. a Code before & after call b Records start and end time Put difference in hashtable Use processor s timer Track time used by program System dependent function Java: System.nanoTime() Do not use wall clock Useful in networked setting Timer for whole system Includes or programs Java version: System.currentTimeMillis() 27

Summary Premature optimization is bad Make code unmanageable for little gain Best to identify bottlenecks first Static analysis is useful in some cases Finding memory leaks and or issues Deadlock and resource analysis Profiling can find runtime performance issues But changes program and incurs overhead Sampling and instrumentation reduce overhead 28