Profiling & Optimization

Similar documents
Profiling & Optimization

Computer Organization: A Programmer's Perspective

21. This is a screenshot of the Android Studio Debugger. It shows the current thread and the object tree for a certain variable.

Optimizing Your Android Applications

Using Intel VTune Amplifier XE and Inspector XE in.net environment

Timing programs with time

CSE 303: Concepts and Tools for Software Development

Vertical Profiling: Understanding the Behavior of Object-Oriented Applications

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

Operating Systems Design Fall 2010 Exam 1 Review. Paul Krzyzanowski

Zing Vision. Answering your toughest production Java performance questions

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

Performance Profiling

Memory Management: The Details

Lecture 9 Dynamic Compilation

CSE 120 Principles of Operating Systems

Running class Timing on Java HotSpot VM, 1

Introduction to Concurrent Software Systems. CSCI 5828: Foundations of Software Engineering Lecture 08 09/17/2015

Introduction to Concurrent Software Systems. CSCI 5828: Foundations of Software Engineering Lecture 12 09/29/2016

Programming with MPI

Low level security. Andrew Ruef

Compiler construction 2009

Dynamic Memory Management

Introduction to Parallel Performance Engineering

Performance analysis basics

CS 160: Interactive Programming

Computer Systems A Programmer s Perspective 1 (Beta Draft)

o Code, executable, and process o Main memory vs. virtual memory

Process- Concept &Process Scheduling OPERATING SYSTEMS

EECS 482 Introduction to Operating Systems

Operating Systems (2INC0) 2018/19. Introduction (01) Dr. Tanir Ozcelebi. Courtesy of Prof. Dr. Johan Lukkien. System Architecture and Networking Group

the gamedesigninitiative at cornell university Lecture 9 Memory Management

Computation Abstractions. Processes vs. Threads. So, What Is a Thread? CMSC 433 Programming Language Technologies and Paradigms Spring 2007

Is your profiler speaking the same language as you? Simon

Chapter 2. Operating-System Structures

Parallelization Primer. by Christian Bienia March 05, 2007

CSE 403 Lecture 18. Performance Testing. Reading: Code Complete, Ch (McConnell) slides created by Marty Stepp

Chapter 2: Operating-System Structures. Operating System Concepts 9 th Edit9on

CSC 1600 Memory Layout for Unix Processes"

Lecture 11 Code Optimization I: Machine Independent Optimizations. Optimizing Compilers. Limitations of Optimizing Compilers

OPERATING SYSTEM PROJECT: SOS

Section 7: Wait/Exit, Address Translation

Debugging and Profiling

Lectures 5-6: Introduction to C

Dynamic Memory Management! Goals of this Lecture!

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley

DNWSH - Version: 2.3..NET Performance and Debugging Workshop

Systems software design. Software build configurations; Debugging, profiling & Quality Assurance tools

A new Mono GC. Paolo Molaro October 25, 2006

CERN IT Technical Forum

Dynamic Memory Management

CS 475. Process = Address space + one thread of control Concurrent program = multiple threads of control

by Marina Cholakyan, Hyduke Noshadi, Sepehr Sahba and Young Cha

Review: Easy Piece 1

Recitation 2/18/2012

Operating Systems Design Exam 1 Review: Spring 2012

Managed runtimes & garbage collection

CMPSCI 377: Operating Systems Exam 1: Processes, Threads, CPU Scheduling and Synchronization. October 9, 2002

Tools and techniques for optimization and debugging. Fabio Affinito October 2015

COSC Software Engineering. Lecture 16: Managing Memory Managers

Operating Systems Design Exam 2 Review: Spring 2011

CMSC 330: Organization of Programming Languages

Spring CS 170 Exercise Set 1 (Updated with Part III)

CS 416: Opera-ng Systems Design March 23, 2012

Code Optimization September 27, 2001

the gamedesigninitiative at cornell university Lecture 10 Memory Management

Learning from Bad Examples. CSCI 5828: Foundations of Software Engineering Lecture 25 11/18/2014

Run-time Environments - 3

Profiling for Performance in C++

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions

Great Reality #4. Code Optimization September 27, Optimizing Compilers. Limitations of Optimizing Compilers

Performance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs

Parallel Programming: Background Information

Parallel Programming: Background Information

C Review. MaxMSP Developers Workshop Summer 2009 CNMAT

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

Deallocation Mechanisms. User-controlled Deallocation. Automatic Garbage Collection

The Art and Science of Memory Allocation

Why do we care about parallel?

the gamedesigninitiative at cornell university Lecture 7 C++ Overview

J2EE Development Best Practices: Improving Code Quality

CS 326: Operating Systems. Process Execution. Lecture 5

Chapter 2: Operating-System Structures

OpenMP at Sun. EWOMP 2000, Edinburgh September 14-15, 2000 Larry Meadows Sun Microsystems

COMP 202 Recursion. CONTENTS: Recursion. COMP Recursion 1

G52CON: Concepts of Concurrency

Midterm Sample Answer ECE 454F 2008: Computer Systems Programming Date: Tuesday, Oct 28, p.m. - 5 p.m.

Process Concepts. CSC400 - Operating Systems. 3. Process Concepts. J. Sumey

Heap Management. Heap Allocation

MultiThreading. Object Orientated Programming in Java. Benjamin Kenwright

CS193k, Stanford Handout #10. HW2b ThreadBank

Advanced Topic: Efficient Synchronization

Inspiring Creative Fun Ysbrydoledig Creadigol Hwyl. App Inventor Workbook

Code Profiling. CSE260, Computer Science B: Honors Stony Brook University

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime

Kampala August, Agner Fog

Chapter 2: Operating-System Structures. Operating System Concepts 9 th Edition

Java Threads. Written by John Bell for CS 342, Spring 2018

ANITA S SUPER AWESOME RECITATION SLIDES

Exception Safe Coding

Transcription:

Lecture 18

Sources of Game Performance Issues? 2

Avoid Premature Optimization Novice developers rely on ad hoc optimization Make private data public Force function inlining Decrease code modularity removes function calls But this is a very bad idea Rarely gives significant performance benefits Non-modular code is very hard to maintain Write clean code first; optimize later 3

Performance Tuning Code follows an 80/20 rule (or even 90/10) 80% of run-time spent in 20% of code Optimizing or 80% provides little benefit Do nothing until you know what this 20% is Be careful in tuning performance Never overtune some inputs at expense of ors Always focus on overall algorithm first Think hard before making non-modular changes 4

What Can We Measure? Time Performance What code takes most time What is called most often How long I/O takes to finish Time to switch threads Time threads hold locks Time threads wait for locks Memory Performance Number of heap allocations Location of allocations Timing of allocations Location of releases Timing of releases (Location of memory leaks) 5

Static Analysis Analyze without running Relies on language features Major area of PL research Advantages Offline; no performance hit Can analyze deep properties Disadvantages Conservative; misses a lot Cannot capture user input Analysis Methods 6

Analysis Methods Profiling Analysis runs with program Record behavior of program Helps visualize this record Advantages More data than static anal. Can capture user input Disadvantages Hurts performance a lot May alter program behavior 7

Analysis Methods Static Analysis Analyze without running Relies on language features Major area of PL research Advantages Offline; no performance hit Can analyze deep properties Disadvantages Conservative; misses a lot Cannot capture user input Profiling Analysis runs with program Record behavior of program Helps visualize this record Advantages More data than static anal. Can capture user input Disadvantages Hurts performance a lot May alter program behavior 8

Static Analysis: Control Flow int sum = 0 boolean done = false; for(int ii; ii<=5 &&!done;) { }} if (j >= 0) { sum += j; if (sum > 100) { done = true; } else { i = i+1; } print(sum); p q q may be executed immediately after p F sum = 0 done = F i = 0 i <= 5 &&!done F T j >= 0 sum = sum + j sum > 100 done = T i = i+1 T endif print sum F T 9

Static Analysis: Flow Dependence int sum = 0 boolean done = false; for(int ii; ii<=5 &&!done;) { }} if (j >= 0) { sum += j; if (sum > 100) { done = true; } else { i = i+1; } print(sum); p q Value assigned at p is read at command q sum = 0 done = F i = 0 i <= 5 &&!done j >= 0 sum = sum + j sum > 100 done = T i = i+1 endif print sum 10

Model Checking Given a graph, logical formula j j expresses properties of graph Checker determines if is true Often applied to software Program as control-flow graph j indicates acceptable paths F sum = 0 done = F i = 0 i <= 5 &&!done F T j >= 0 sum = sum + j sum > 100 done = T i = i+1 T endif print sum F T 11

Static Analysis: Applications Pointer analysis Look at pointer variables Determine possible values for variable at each place Can find memory leaks Deadlock detection Locks are flow dependency Determine possible owners of lock at each position Dead code analysis Heap X p 12

Example: Analyze in X-Code 13

Time Profiling 14

Time Profiling: Methods Software Code added to program Captures start of function Captures end of function Subtract to get time spent Calculate percentage at end Not completely accurate Changes actual program Also, how get time? Hardware Measurements in hardware Feature attached to CPU Does not change how program is run Simulate w/ hypervisors Virtual machine for Oss VM includes profiling measurement features Example: Xen Hypervisor 15

Time Profiling: Methods Time-Sampling Count at periodic intervals Wakes up from sleep Looks at parent function Adds that to count Relatively lower overhead Doesn t count everything Performance hit acceptable May miss small functions Instrumentation Count pre-specified places Specific function calls Hardware interrupts Different from sampling Still not getting everything But exact view of slice Used for targeted searches 16

Issues with Periodic Sampling Real Sampled 17

Issues with Periodic Sampling Real Sampled 18

What Can We Measure? Time Performance What code takes most time What is called most often How long I/O takes to finish Time to switch threads Time threads hold locks Time threads wait for locks Memory Performance Number of heap allocations Location of allocations Timing of allocations Location of releases Timing of releases (Location of memory leaks)

What Can We Measure? Time Performance What code takes most time What is called most often How long I/O takes to finish Time to switch threads Time threads hold locks Time threads wait for locks Memory Performance Number of heap allocations Location of allocations Timing of allocations Location of releases Timing of releases (Location of memory leaks) 20

Instrumentation: Memory Memory handled by malloc Basic C allocation method C++ new uses malloc Allocates raw bytes p1 = malloc(4) p2 = malloc(5) malloc can be instrumented Count number of mallocs p3 = malloc(6) Track malloc addresses Look for frees later on Finds memory leaks! free(p2) 21

Instrumentation: Memory 22

Profiling Tools ios/x-code Instruments (wide variety of special tools) GNU gprof (Profile I) for sampled time Android (Java) Dalvik Debug Monitor Server (DDMS) for traces TraceView helps visualize results of DDMS Android (C++) Android NDK Profiler (3 rd party) GNU gprof visualizes results of gmon.out 23

Android NDK Profiling // Non-profiled code monstartup("your_lib.so"); // Profiled code moncleanup(); // Non-profiled code captures everything Android App gmon.out gprof 24

Android Profiling 25

Poor Man s Sampling Call Graph Create a hashtable Keys = pairs (a calls b) Values = time (time spent) Place code around call Code inside outer func. a Code before & after call b Records start and end time Put difference in hashtable Timing Use processor s timer Track time used by program System dependent function Java: System.nanoTime() Do not use wall clock Timer for whole system Includes or programs Java version: System.currentTimeMillis() 26

Poor Man s Sampling Call Graph Create a hashtable Keys = pairs (a calls b) Values = time (time spent) Place code around call Code inside outer func. a Code before & after call b Records start and end time Put difference in hashtable Timing Use processor s timer Track time used by program System dependent function Java: System.nanoTime() Do not use wall clock Timer for whole system Includes or programs Java version: System.currentTimeMillis() 27

Summary Premature optimization is bad Make code unmanageable for little gain Best to identify bottlenecks first Static analysis is useful in some cases Finding memory leaks and or issues Deadlock and resource analysis Profiling can find runtime performance issues But changes program and incurs overhead Sampling and instrumentation reduce overhead 28