Module I: Measuring Program Performance

Size: px
Start display at page:

Download "Module I: Measuring Program Performance"

Transcription

1 Performance Programming: Theory, Practice and Case Studies Module I: Measuring Program Performance 9

2 Outline 10 Measuring methodology and guidelines Measurement tools Timing Tools Profiling Tools Process monitoring and tracing tools System monitoring tools Hardware counter measurements Monitoring tools Code instrumentation Parallel performance measurements Guidelines and recommendations Tools for parallel monitoring Summary

3 Measurement Methodology 11 Quantifying performance is the first step in the application tuning process Important to set reasonable expectations for optimization Measurements should be made repeatedly to identify parts of the program that need to be optimized Proper choice of measurement characteristics suitable for a particular application Comparison of measurements to theoretical peak values

4 Timing measurements What to Measure Wall clock time for a single job (turnaround time) Wall clock time for multiple jobs (throughput measurements) Wall clock time for parallel runs (scalability measurements) Execution and computation rates MFLOPS (million floating point operations per second) MIPS (million instructions per second) IPC (instructions per cycle) Resource utilization Memory usage I/O utilization Network usage 12

5 Benchmarking Guidelines 13 Benchmark runs should adequately represent the use of the application Preferably only one parameter changing at a time Overhead of measurement should be considered Runs from tmpfs or from a locally mounted ufs System activities should be monitored The systems should not have any other computational jobs running during benchmarking System parameters and settings should be documented together with the results of the runs.

6 Functionality Timing tools Profiling tools Monitoring tools Measurement Tools Usage requirements Tools that can operate on optimized binaries Tools that require recompilation Tools that require source code instrumentation Parallel / serial measurement tools Tools measuring serial performance Tools measuring parallel performance 14

7 Timing Entire Program Measuring the elapsed (wall- clock) time that passes during the program execution Example: Solaris time, timex, and ptime 15

8 Timing Program Portions Fortran 77: etime, dtime (both not thread safe) C, C++, Fortran 90/95: gethrtime High resolution timer (nanoseconds) Can be called via a C wrapper from Fortran 77 Can be used for multithreaded applications Platform-specific tools and methods Solaris microstate accounting Fine-grain timing measurements by accessing UltraSPARC TICK register directly.inline readtick,1 rd %tick, %o1 stx %o1, [%o0].end 16

9 Measurement Overhead Computing overhead of gethrtime() call #include<sys/time.h> time_t start, end; int i, iters = ; for (i = 0; i < iters; i++) { start = gethrtime(); end = gethrtime(); (void)printf("%lld \n", (end - start)); } Distribution Call overhead (ns) (ns)

10 Program Profiling with gprof Application profiling Special form of timing measurements that shows which functions account for large parts of application runtimes Should be used on multiple and representative test cases gprof - standard UNIX profiling utility Can be used for profiling executalbes and shared libraries Based on Program Counter (PC) sampling at periodic intervals Requires recompilation with -pg (Linux, Solaris, Tru64) or -G (HP-UX) After the run the data is collected in gmon.out file Profiling results displayed with gprof command 18

11 Output includes gprof Output Absolute time spent in a function Percentage of total run time spent in a function Number of calls to the function Average time per call Functions can be sorted by time they consume together with their descendants (commulative or inclusive time) time spent executing the function itself (self or exclusive time) % cumulative self self total time seconds seconds calls ms/call ms/call name dmmch_ [4] dmake_ [8] dgemm_ [9]... 19

12 Profiling Using Coverage Analysis Coverage analysis tools annotate source code with the number of times each line was executed Basic block profiling Results can be accumulated for multiple runs Information about hot loops in the code and branches taken Code coverage for quality assurance DO 350 L = LL, LL+ LSEC > F11 = F11 + T1( L- LL+ 1, I- II+ 1 )* $ T2( L- LL+ 1, J- JJ+ 1 ) Available on UNIX platforms Linux/GNU: gcov Solaris: tcov IRIX: cvcov, cvxcov Tru64: pixie AIX: tprof 20

13 Advanced Profiling Tools Measurement parameters and features Measurements based on hardware counters Profiling by functions basic blocks lines of high level code assembly instructions Source code annotation Capabilities to work with parallel programs synchronization overhead, load balancing monitoring Available tools Tool Vendor Platforms 21 VTune Intel NT Analyzer Sun Solaris SpeedShop SGI IRIX DCPI DEC Compaq HP Tru64, NT

14 Example: Sun Performance Analyzer (1 of 3) Profiling by function and module (no recompilation) 22

15 Example: Sun Performance Analyzer (2 of 3) Annotated source (recompilation with -g) and disassembly (no recompilation) 23

16 Example: Sun Performance Analyzer (3 of 3) Hardware counter overflow profiling 24

17 Process Monitoring Tools Tracing tools Linux: strace (ltrace for dynamic library calls) Solaris: truss (sotruss for dynamic library calls) IRIX: par Tru64: atom -tool ptrace procfs-based tools pmap: prints the address space of the program pldd: lists the dynamic shared objects linked into the process (including ones explicitly attached using dlopen) pstack: prints a stack trace for each LWP in the process pflags: prints the /proc tracing flags ptree: process trees containing specified pids or users pwait: wait for specified processes to terminate pcred: prints the credentials (effective, real, saved UIDs and GIDs) 25

18 Example: profiling system calls truss on Solaris Reports the number of system calls for a process and associated time 26

19 System Monitoring Tools Tools for various UNIX platforms vmstat, vm_stat, memvis - virtual memory and CPU statistics mpstat, mpvis - parallel memory/cpu statistics netstat, nfsstat, nfsvis - network status and statistics iostat, dkvis - I/O statistics sar - system activity report top, prstat - list of most active processes systat - system activity stats lockstat - kernel lock statistics dkstat - file status information 27

20 vmstat - Virtual Memory Statistics Available on HP-UX, Tru64, Solaris, Linux, FreeBSD, etc. Example on Alpha/Tru64 Memory Usage Paging Activity CPU Usage Idle System 28

21 Hardware Counter Measurements Hardware performance counters allow for the runtime low-overhead measurements of various hardware events Cache references Cache misses Pipeline stalls Branch misprediction statistics D-TLB (Data Translation Lookaside Buffer) misses I-TLB (Instruction Translation Lookaside Buffer) Bus statistics including DMA and cache coherency transactions on a multiprocessor systems Others Only several events can be monitored at the same time 29

22 Code Instrumentation APIs can be used directly in the code High-resolution timing of performance-critical parts of the program Access to HW performance counters Example (Solaris) if (cpc_take_sample(&before) == -1) exit(-1); for (k = 0; k < N-1; k++) sum = sum + a[k]*b[k]; if (cpc_take_sample(&after) == -1) exit(-1); Counters specified by setting PERFEVENTS environment variable example% setenv PERFEVENTS pic0=load_use,pic1=load_use_raw Works on UltraSPARC CPUs 30

23 Parallel Measurement Methodology Same guidelines as in the serial case Parallel benchmarks should be representative of typical uses of applications Benchmarking must be performed to ensure repeatable and consistent results Probe effects and tool overheads should be minimized Specifics of parallel benchmarking Parallelism vs. Concurrency Dedicated mode of benchmarking Number of processors Choice of timer and time criterion Processor-set configuration Processor allocation in clusters 31

24 Timing a Parallel Threaded Program timex can be used for parallel timing Note that the real time decreases, but the user time representing combined CPU usage stays constant 32

25 Specific Parallel Timers Timing MPI programs time or timex timers can be used in combination with MPI submitting commands (mprun, mpirun, etc.) For timing portions of an MPI program, one can use the MPI_Wtime function available in Fortran, C and C++ bindings (typically highly accurate). Threaded applications can use gethrvtime (Solaris, Tru64 with Solaris Compatibility Library) Shows the user time on a per-thread basis Can be used in combination with gethrtime, which returns the elapsed real (wallclock) time on a per-thread basis 33

26 Parallel System Monitoring CPU ID mpstat - mutliprocessor monitoring Cross calls Interrupts Context switches Thread migrations Mutex info System calls CPU usage First snapshot: average since boot Sample measurements 34

27 Kernel Lock Statistics 35 Tools that report kernel lock statistics lockstat - Solaris, IRIX, AIX, Linux lockinfo - Tru64 Allows one to specify what events to monitor spin on adaptive mutex block on read access to rwlock due to waiting writers On some platforms generates gprof-like output # lockstat -IWk example_tnf Profiling interrupt: events in seconds (1164 events/sec) Count indv cuml rcnt nsec Hottest CPU+PIL Caller % 57% cpu[12] mutex_vector_enter % 66% cpu[9]+10 disp_getwork % 74% cpu[14] mutex_tryenter % 81% cpu[5] (usermode) % 82% cpu[1] splx % 84% cpu[5]+10 _resume_from_idle % 85% cpu[9]+10 disp % 85% cpu[15]+10 setfrontdq

28 Binding a Program To a Set of Processors Process monitoring can be difficult on multiprocessor systems due to process migration Single-threaded programs One can bind to a processor For multithreaded programs One can use processor sets Commands to set up and use processor sets psrset (HP-UX, Solaris) pset (IRIX) pset_create, pset_assign_cpu, pset_assign_pid, etc. (Tru64) 36

29 Summary Monitoring performance is essential to optimization If you cannot measure it you cannot improve it Important to select benchmarks carefully and identify parameters to measure Select tools suitable for the task System-wide or process-specific? Parallel or serial? Require recompilation or instrumentation? Need source-level information? Need hardware counter information? 37

Chapter1 Solaris Overview

Chapter1 Solaris Overview Chapter1 Solaris Overview Feature and architecture Huimei Lu blueboo@bit.edu.cn Outline Introduction to Solaris Solaris Kernel Features Solaris Kernel Architecture Solaris 10 Features Performance and Tracing

More information

Performance Analysis. HPC Fall 2007 Prof. Robert van Engelen

Performance Analysis. HPC Fall 2007 Prof. Robert van Engelen Performance Analysis HPC Fall 2007 Prof. Robert van Engelen Overview What to measure? Timers Benchmarking Profiling Finding hotspots Profile-guided compilation Messaging and network performance analysis

More information

Performance Profiling

Performance Profiling Performance Profiling Minsoo Ryu Real-Time Computing and Communications Lab. Hanyang University msryu@hanyang.ac.kr Outline History Understanding Profiling Understanding Performance Understanding Performance

More information

Performance analysis basics

Performance analysis basics Performance analysis basics Christian Iwainsky Iwainsky@rz.rwth-aachen.de 25.3.2010 1 Overview 1. Motivation 2. Performance analysis basics 3. Measurement Techniques 2 Why bother with performance analysis

More information

Computer Organization: A Programmer's Perspective

Computer Organization: A Programmer's Perspective Profiling Oren Kapah orenkapah.ac@gmail.com Profiling: Performance Analysis Performance Analysis ( Profiling ) Understanding the run-time behavior of programs What parts are executed, when, for how long

More information

Profiling: Understand Your Application

Profiling: Understand Your Application Profiling: Understand Your Application Michal Merta michal.merta@vsb.cz 1st of March 2018 Agenda Hardware events based sampling Some fundamental bottlenecks Overview of profiling tools perf tools Intel

More information

Intel VTune Amplifier XE

Intel VTune Amplifier XE Intel VTune Amplifier XE Vladimir Tsymbal Performance, Analysis and Threading Lab 1 Agenda Intel VTune Amplifier XE Overview Features Data collectors Analysis types Key Concepts Collecting performance

More information

CSE 141 Summer 2016 Homework 2

CSE 141 Summer 2016 Homework 2 CSE 141 Summer 2016 Homework 2 PID: Name: 1. A matrix multiplication program can spend 10% of its execution time in reading inputs from a disk, 10% of its execution time in parsing and creating arrays

More information

Profiling and Debugging Tools. Lars Koesterke University of Porto, Portugal May 28-29, 2009

Profiling and Debugging Tools. Lars Koesterke University of Porto, Portugal May 28-29, 2009 Profiling and Debugging Tools Lars Koesterke University of Porto, Portugal May 28-29, 2009 Outline General (Analysis Tools) Listings & Reports Timers Profilers (gprof, tprof, Tau) Hardware performance

More information

MPI Performance Tools

MPI Performance Tools Physics 244 31 May 2012 Outline 1 Introduction 2 Timing functions: MPI Wtime,etime,gettimeofday 3 Profiling tools time: gprof,tau hardware counters: PAPI,PerfSuite,TAU MPI communication: IPM,TAU 4 MPI

More information

Performance Measuring on Blue Horizon and Sun HPC Systems:

Performance Measuring on Blue Horizon and Sun HPC Systems: Performance Measuring on Blue Horizon and Sun HPC Systems: Timing, Profiling, and Reading Assembly Language NPACI Parallel Computing Institute 2000 Sean Peisert peisert@sdsc.edu Performance Programming

More information

Performance Analysis of KDD Applications using Hardware Event Counters. CAP Theme 2.

Performance Analysis of KDD Applications using Hardware Event Counters. CAP Theme 2. Performance Analysis of KDD Applications using Hardware Event Counters CAP Theme 2 http://cap.anu.edu.au/cap/projects/kddmemperf/ Peter Christen and Adam Czezowski Peter.Christen@anu.edu.au Adam.Czezowski@anu.edu.au

More information

Main Points of the Computer Organization and System Software Module

Main Points of the Computer Organization and System Software Module Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a

More information

Threads Implementation. Jo, Heeseung

Threads Implementation. Jo, Heeseung Threads Implementation Jo, Heeseung Today's Topics How to implement threads? User-level threads Kernel-level threads Threading models 2 Kernel/User-level Threads Who is responsible for creating/managing

More information

Tools and techniques for optimization and debugging. Fabio Affinito October 2015

Tools and techniques for optimization and debugging. Fabio Affinito October 2015 Tools and techniques for optimization and debugging Fabio Affinito October 2015 Profiling Why? Parallel or serial codes are usually quite complex and it is difficult to understand what is the most time

More information

OpenMP at Sun. EWOMP 2000, Edinburgh September 14-15, 2000 Larry Meadows Sun Microsystems

OpenMP at Sun. EWOMP 2000, Edinburgh September 14-15, 2000 Larry Meadows Sun Microsystems OpenMP at Sun EWOMP 2000, Edinburgh September 14-15, 2000 Larry Meadows Sun Microsystems Outline Sun and Parallelism Implementation Compiler Runtime Performance Analyzer Collection of data Data analysis

More information

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Lecture 3: Processes Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Process in General 3.3 Process Concept Process is an active program in execution; process

More information

CSE 4/521 Introduction to Operating Systems

CSE 4/521 Introduction to Operating Systems CSE 4/521 Introduction to Operating Systems Lecture 5 Threads (Overview, Multicore Programming, Multithreading Models, Thread Libraries, Implicit Threading, Operating- System Examples) Summer 2018 Overview

More information

Chapter 4: Multithreaded

Chapter 4: Multithreaded Chapter 4: Multithreaded Programming Chapter 4: Multithreaded Programming Overview Multithreading Models Thread Libraries Threading Issues Operating-System Examples 2009/10/19 2 4.1 Overview A thread is

More information

Martin Kruliš, v

Martin Kruliš, v Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal

More information

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues 4.2 Silberschatz, Galvin

More information

CSE 120 Principles of Operating Systems

CSE 120 Principles of Operating Systems CSE 120 Principles of Operating Systems Spring 2018 Lecture 15: Multicore Geoffrey M. Voelker Multicore Operating Systems We have generally discussed operating systems concepts independent of the number

More information

OPERATING SYSTEM. Chapter 4: Threads

OPERATING SYSTEM. Chapter 4: Threads OPERATING SYSTEM Chapter 4: Threads Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples Objectives To

More information

Oracle Developer Studio Performance Analyzer

Oracle Developer Studio Performance Analyzer Oracle Developer Studio Performance Analyzer The Oracle Developer Studio Performance Analyzer provides unparalleled insight into the behavior of your application, allowing you to identify bottlenecks and

More information

Threads. What is a thread? Motivation. Single and Multithreaded Processes. Benefits

Threads. What is a thread? Motivation. Single and Multithreaded Processes. Benefits CS307 What is a thread? Threads A thread is a basic unit of CPU utilization contains a thread ID, a program counter, a register set, and a stack shares with other threads belonging to the same process

More information

Tuning Parallel Code on Solaris Lessons Learned from HPC

Tuning Parallel Code on Solaris Lessons Learned from HPC Tuning Parallel Code on Solaris Lessons Learned from HPC Dani Flexer dani@daniflexer.com Presentation to the London OpenSolaris User Group Based on a Sun White Paper of the same name published 09/09 23/9/2009

More information

Method-Level Phase Behavior in Java Workloads

Method-Level Phase Behavior in Java Workloads Method-Level Phase Behavior in Java Workloads Andy Georges, Dries Buytaert, Lieven Eeckhout and Koen De Bosschere Ghent University Presented by Bruno Dufour dufour@cs.rutgers.edu Rutgers University DCS

More information

Chapter 4: Multithreaded Programming. Operating System Concepts 8 th Edition,

Chapter 4: Multithreaded Programming. Operating System Concepts 8 th Edition, Chapter 4: Multithreaded Programming, Silberschatz, Galvin and Gagne 2009 Chapter 4: Multithreaded Programming Overview Multithreading Models Thread Libraries Threading Issues 4.2 Silberschatz, Galvin

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 2

ECE 571 Advanced Microprocessor-Based Design Lecture 2 ECE 571 Advanced Microprocessor-Based Design Lecture 2 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 January 2016 Announcements HW#1 will be posted tomorrow I am handing out

More information

The PAPI Cross-Platform Interface to Hardware Performance Counters

The PAPI Cross-Platform Interface to Hardware Performance Counters The PAPI Cross-Platform Interface to Hardware Performance Counters Kevin London, Shirley Moore, Philip Mucci, and Keith Seymour University of Tennessee-Knoxville {london, shirley, mucci, seymour}@cs.utk.edu

More information

Introduction to Parallel Performance Engineering

Introduction to Parallel Performance Engineering Introduction to Parallel Performance Engineering Markus Geimer, Brian Wylie Jülich Supercomputing Centre (with content used with permission from tutorials by Bernd Mohr/JSC and Luiz DeRose/Cray) Performance:

More information

Overview. Timers. Profilers. HPM Toolkit

Overview. Timers. Profilers. HPM Toolkit Overview Timers Profilers HPM Toolkit 2 Timers Wide range of timers available on the HPCx system Varying precision portability language ease of use 3 Timers Timer Usage Wallclock/C PU Resolution Language

More information

Chapter 5: Processes & Process Concept. Objectives. Process Concept Process Scheduling Operations on Processes. Communication in Client-Server Systems

Chapter 5: Processes & Process Concept. Objectives. Process Concept Process Scheduling Operations on Processes. Communication in Client-Server Systems Chapter 5: Processes Chapter 5: Processes & Threads Process Concept Process Scheduling Operations on Processes Interprocess Communication Communication in Client-Server Systems, Silberschatz, Galvin and

More information

Kaisen Lin and Michael Conley

Kaisen Lin and Michael Conley Kaisen Lin and Michael Conley Simultaneous Multithreading Instructions from multiple threads run simultaneously on superscalar processor More instruction fetching and register state Commercialized! DEC

More information

Lecture 2 Process Management

Lecture 2 Process Management Lecture 2 Process Management Process Concept An operating system executes a variety of programs: Batch system jobs Time-shared systems user programs or tasks The terms job and process may be interchangeable

More information

Threads Implementation. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Threads Implementation. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Threads Implementation Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics How to implement threads? User-level threads Kernel-level

More information

CS420: Operating Systems

CS420: Operating Systems Threads James Moscola Department of Physical Sciences York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Threads A thread is a basic unit of processing

More information

Performance Evaluation I. November 5, 1998

Performance Evaluation I. November 5, 1998 15-213 Performance Evaluation I November 5, 1998 Topics Performance measures (metrics) Timing Profiling Performance expressed as a time Absolute time measures (metrics) difference between start and finish

More information

KNL tools. Dr. Fabio Baruffa

KNL tools. Dr. Fabio Baruffa KNL tools Dr. Fabio Baruffa fabio.baruffa@lrz.de 2 Which tool do I use? A roadmap to optimization We will focus on tools developed by Intel, available to users of the LRZ systems. Again, we will skip the

More information

Chapter 4: Multi-Threaded Programming

Chapter 4: Multi-Threaded Programming Chapter 4: Multi-Threaded Programming Chapter 4: Threads 4.1 Overview 4.2 Multicore Programming 4.3 Multithreading Models 4.4 Thread Libraries Pthreads Win32 Threads Java Threads 4.5 Implicit Threading

More information

Performance analysis tools: Intel VTuneTM Amplifier and Advisor. Dr. Luigi Iapichino

Performance analysis tools: Intel VTuneTM Amplifier and Advisor. Dr. Luigi Iapichino Performance analysis tools: Intel VTuneTM Amplifier and Advisor Dr. Luigi Iapichino luigi.iapichino@lrz.de Which tool do I use in my project? A roadmap to optimisation After having considered the MPI layer,

More information

Intel profiling tools and roofline model. Dr. Luigi Iapichino

Intel profiling tools and roofline model. Dr. Luigi Iapichino Intel profiling tools and roofline model Dr. Luigi Iapichino luigi.iapichino@lrz.de Which tool do I use in my project? A roadmap to optimization (and to the next hour) We will focus on tools developed

More information

Stanislav Bratanov; Roman Belenov; Ludmila Pakhomova 4/27/2015

Stanislav Bratanov; Roman Belenov; Ludmila Pakhomova 4/27/2015 Stanislav Bratanov; Roman Belenov; Ludmila Pakhomova 4/27/2015 What is Intel Processor Trace? Intel Processor Trace (Intel PT) provides hardware a means to trace branching, transaction, and timing information

More information

Hardware and Software solutions for scaling highly threaded processors. Denis Sheahan Distinguished Engineer Sun Microsystems Inc.

Hardware and Software solutions for scaling highly threaded processors. Denis Sheahan Distinguished Engineer Sun Microsystems Inc. Hardware and Software solutions for scaling highly threaded processors Denis Sheahan Distinguished Engineer Sun Microsystems Inc. Agenda Chip Multi-threaded concepts Lessons learned from 6 years of CMT

More information

Debugging and Profiling

Debugging and Profiling Debugging and Profiling Dr. Axel Kohlmeyer Senior Scientific Computing Expert Information and Telecommunication Section The Abdus Salam International Centre for Theoretical Physics http://sites.google.com/site/akohlmey/

More information

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University 1. Introduction 2. System Structures 3. Process Concept 4. Multithreaded Programming

More information

CSCE 313 Introduction to Computer Systems. Instructor: Dezhen Song

CSCE 313 Introduction to Computer Systems. Instructor: Dezhen Song CSCE 313 Introduction to Computer Systems Instructor: Dezhen Song Programs, Processes, and Threads Programs and Processes Threads Programs, Processes, and Threads Programs and Processes Threads Processes

More information

Chapter 4: Threads. Chapter 4: Threads

Chapter 4: Threads. Chapter 4: Threads Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples

More information

CSCE 313: Intro to Computer Systems

CSCE 313: Intro to Computer Systems CSCE 313 Introduction to Computer Systems Instructor: Dr. Guofei Gu http://courses.cse.tamu.edu/guofei/csce313/ Programs, Processes, and Threads Programs and Processes Threads 1 Programs, Processes, and

More information

Jackson Marusarz Intel Corporation

Jackson Marusarz Intel Corporation Jackson Marusarz Intel Corporation Intel VTune Amplifier Quick Introduction Get the Data You Need Hotspot (Statistical call tree), Call counts (Statistical) Thread Profiling Concurrency and Lock & Waits

More information

Multiprocessors and Locking

Multiprocessors and Locking Types of Multiprocessors (MPs) Uniform memory-access (UMA) MP Access to all memory occurs at the same speed for all processors. Multiprocessors and Locking COMP9242 2008/S2 Week 12 Part 1 Non-uniform memory-access

More information

Chapter 4: Multithreaded Programming

Chapter 4: Multithreaded Programming Chapter 4: Multithreaded Programming Silberschatz, Galvin and Gagne 2013! Chapter 4: Multithreaded Programming Overview Multicore Programming Multithreading Models Threading Issues Operating System Examples

More information

Profiling and Workflow

Profiling and Workflow Profiling and Workflow Preben N. Olsen University of Oslo and Simula Research Laboratory preben@simula.no September 13, 2013 1 / 34 Agenda 1 Introduction What? Why? How? 2 Profiling Tracing Performance

More information

Introduction to Performance Tuning & Optimization Tools

Introduction to Performance Tuning & Optimization Tools Introduction to Performance Tuning & Optimization Tools a[i] a[i+1] + a[i+2] a[i+3] b[i] b[i+1] b[i+2] b[i+3] = a[i]+b[i] a[i+1]+b[i+1] a[i+2]+b[i+2] a[i+3]+b[i+3] Ian A. Cosden, Ph.D. Manager, HPC Software

More information

Evaluating Performance Via Profiling

Evaluating Performance Via Profiling Performance Engineering of Software Systems September 21, 2010 Massachusetts Institute of Technology 6.172 Professors Saman Amarasinghe and Charles E. Leiserson Handout 6 Profiling Project 2-1 Evaluating

More information

Lecture 4: Threads; weaving control flow

Lecture 4: Threads; weaving control flow Lecture 4: Threads; weaving control flow CSE 120: Principles of Operating Systems Alex C. Snoeren HW 1 Due NOW Announcements Homework #1 due now Project 0 due tonight Project groups Please send project

More information

CSC Operating Systems Fall Lecture - II OS Structures. Tevfik Ko!ar. Louisiana State University. August 27 th, 2009.

CSC Operating Systems Fall Lecture - II OS Structures. Tevfik Ko!ar. Louisiana State University. August 27 th, 2009. CSC 4103 - Operating Systems Fall 2009 Lecture - II OS Structures Tevfik Ko!ar Louisiana State University August 27 th, 2009 1 Announcements TA Changed. New TA: Praveenkumar Kondikoppa Email: pkondi1@lsu.edu

More information

Announcements. Computer System Organization. Roadmap. Major OS Components. Processes. Tevfik Ko!ar. CSC Operating Systems Fall 2009

Announcements. Computer System Organization. Roadmap. Major OS Components. Processes. Tevfik Ko!ar. CSC Operating Systems Fall 2009 CSC 4103 - Operating Systems Fall 2009 Lecture - II OS Structures Tevfik Ko!ar TA Changed. New TA: Praveenkumar Kondikoppa Email: pkondi1@lsu.edu Announcements All of you should be now in the class mailing

More information

ECE 454 Computer Systems Programming Measuring and profiling

ECE 454 Computer Systems Programming Measuring and profiling ECE 454 Computer Systems Programming Measuring and profiling Ding Yuan ECE Dept., University of Toronto http://www.eecg.toronto.edu/~yuan It is a capital mistake to theorize before one has data. Insensibly

More information

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads... OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.

More information

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured System Performance Analysis Introduction Performance Means many things to many people Important in any design Critical in real time systems 1 ns can mean the difference between system Doing job expected

More information

Performance analysis with Periscope

Performance analysis with Periscope Performance analysis with Periscope M. Gerndt, V. Petkov, Y. Oleynik, S. Benedict Technische Universität petkovve@in.tum.de March 2010 Outline Motivation Periscope (PSC) Periscope performance analysis

More information

2 TEST: A Tracer for Extracting Speculative Threads

2 TEST: A Tracer for Extracting Speculative Threads EE392C: Advanced Topics in Computer Architecture Lecture #11 Polymorphic Processors Stanford University Handout Date??? On-line Profiling Techniques Lecture #11: Tuesday, 6 May 2003 Lecturer: Shivnath

More information

CSI3131 Final Exam Review

CSI3131 Final Exam Review CSI3131 Final Exam Review Final Exam: When: April 24, 2015 2:00 PM Where: SMD 425 File Systems I/O Hard Drive Virtual Memory Swap Memory Storage and I/O Introduction CSI3131 Topics Process Computing Systems

More information

OS Design Approaches. Roadmap. OS Design Approaches. Tevfik Koşar. Operating System Design and Implementation

OS Design Approaches. Roadmap. OS Design Approaches. Tevfik Koşar. Operating System Design and Implementation CSE 421/521 - Operating Systems Fall 2012 Lecture - II OS Structures Roadmap OS Design and Implementation Different Design Approaches Major OS Components!! Memory management! CPU Scheduling! I/O Management

More information

Threads. Thread Concept Multithreading Models User & Kernel Threads Pthreads Threads in Solaris, Linux, Windows. 2/13/11 CSE325 - Threads 1

Threads. Thread Concept Multithreading Models User & Kernel Threads Pthreads Threads in Solaris, Linux, Windows. 2/13/11 CSE325 - Threads 1 Threads Thread Concept Multithreading Models User & Kernel Threads Pthreads Threads in Solaris, Linux, Windows 2/13/11 CSE325 - Threads 1 Threads The process concept incorporates two abstractions: a virtual

More information

Linux Performance Tuning

Linux Performance Tuning Page 1 of 5 close window Print Linux Performance Tuning Getting the most from your Linux investment February March 2007 by Jaqui Lynch This is the first article in a two-part series. The second installment

More information

6.1 Multiprocessor Computing Environment

6.1 Multiprocessor Computing Environment 6 Parallel Computing 6.1 Multiprocessor Computing Environment The high-performance computing environment used in this book for optimization of very large building structures is the Origin 2000 multiprocessor,

More information

Comp 310 Computer Systems and Organization

Comp 310 Computer Systems and Organization Comp 310 Computer Systems and Organization Lecture #9 Process Management (CPU Scheduling) 1 Prof. Joseph Vybihal Announcements Oct 16 Midterm exam (in class) In class review Oct 14 (½ class review) Ass#2

More information

EE382 Processor Design. Processor Issues for MP

EE382 Processor Design. Processor Issues for MP EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors, Part I EE 382 Processor Design Winter 98/99 Michael Flynn 1 Processor Issues for MP Initialization Interrupts Virtual Memory TLB Coherency

More information

Chapter 4: Threads. Operating System Concepts 8 th Edition,

Chapter 4: Threads. Operating System Concepts 8 th Edition, Chapter 4: Threads, Silberschatz, Galvin and Gagne 2009 Chapter 4: Threads Overview Multithreading Models Thread Libraries 4.2 Silberschatz, Galvin and Gagne 2009 Objectives To introduce the notion of

More information

Performance Optimization: Simulation and Real Measurement

Performance Optimization: Simulation and Real Measurement Performance Optimization: Simulation and Real Measurement KDE Developer Conference, Introduction Agenda Performance Analysis Profiling Tools: Examples & Demo KCachegrind: Visualizing Results What s to

More information

CS370 Operating Systems Midterm Review

CS370 Operating Systems Midterm Review CS370 Operating Systems Midterm Review Yashwant K Malaiya Fall 2015 Slides based on Text by Silberschatz, Galvin, Gagne 1 1 What is an Operating System? An OS is a program that acts an intermediary between

More information

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic) I/O Systems Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) I/O Systems 1393/9/15 1 / 57 Motivation Amir H. Payberah (Tehran

More information

Barbara Chapman, Gabriele Jost, Ruud van der Pas

Barbara Chapman, Gabriele Jost, Ruud van der Pas Using OpenMP Portable Shared Memory Parallel Programming Barbara Chapman, Gabriele Jost, Ruud van der Pas The MIT Press Cambridge, Massachusetts London, England c 2008 Massachusetts Institute of Technology

More information

ΕΛΠ 605: Προχωρηµένη Αρχιτεκτονική Υπολογιστών. Εργαστήριο Αρ. 4. Linux Monitoring Utilities (perf,top,mpstat ps, free) and gdb dissasembler, gnuplot

ΕΛΠ 605: Προχωρηµένη Αρχιτεκτονική Υπολογιστών. Εργαστήριο Αρ. 4. Linux Monitoring Utilities (perf,top,mpstat ps, free) and gdb dissasembler, gnuplot ΕΛΠ 605: Προχωρηµένη Αρχιτεκτονική Υπολογιστών Εργαστήριο Αρ. 4 Linux Monitoring Utilities (perf,top,mpstat ps, free) and gdb dissasembler, gnuplot Lecturer: Zacharias Hadjilambrou Σελ. 1 Realtime monitoring

More information

QUESTION BANK UNIT I

QUESTION BANK UNIT I QUESTION BANK Subject Name: Operating Systems UNIT I 1) Differentiate between tightly coupled systems and loosely coupled systems. 2) Define OS 3) What are the differences between Batch OS and Multiprogramming?

More information

An Implementation Of Multiprocessor Linux

An Implementation Of Multiprocessor Linux An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than

More information

Chapter 13: I/O Systems

Chapter 13: I/O Systems Chapter 13: I/O Systems DM510-14 Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations STREAMS Performance 13.2 Objectives

More information

Outline Background Jaluna-1 Presentation Jaluna-2 Presentation Overview Use Cases Architecture Features Copyright Jaluna SA. All rights reserved

Outline Background Jaluna-1 Presentation Jaluna-2 Presentation Overview Use Cases Architecture Features Copyright Jaluna SA. All rights reserved C5 Micro-Kernel: Real-Time Services for Embedded and Linux Systems Copyright 2003- Jaluna SA. All rights reserved. JL/TR-03-31.0.1 1 Outline Background Jaluna-1 Presentation Jaluna-2 Presentation Overview

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

Intel VTune Performance Analyzer 9.1 for Windows* In-Depth

Intel VTune Performance Analyzer 9.1 for Windows* In-Depth Intel VTune Performance Analyzer 9.1 for Windows* In-Depth Contents Deliver Faster Code...................................... 3 Optimize Multicore Performance...3 Highlights...............................................

More information

The control of I/O devices is a major concern for OS designers

The control of I/O devices is a major concern for OS designers Lecture Overview I/O devices I/O hardware Interrupts Direct memory access Device dimensions Device drivers Kernel I/O subsystem Operating Systems - June 26, 2001 I/O Device Issues The control of I/O devices

More information

Zing Vision. Answering your toughest production Java performance questions

Zing Vision. Answering your toughest production Java performance questions Zing Vision Answering your toughest production Java performance questions Outline What is Zing Vision? Where does Zing Vision fit in your Java environment? Key features How it works Using ZVRobot Q & A

More information

Parallel Performance Analysis Using the Paraver Toolkit

Parallel Performance Analysis Using the Paraver Toolkit Parallel Performance Analysis Using the Paraver Toolkit Parallel Performance Analysis Using the Paraver Toolkit [16a] [16a] Slide 1 University of Stuttgart High-Performance Computing Center Stuttgart (HLRS)

More information

Threads. CS3026 Operating Systems Lecture 06

Threads. CS3026 Operating Systems Lecture 06 Threads CS3026 Operating Systems Lecture 06 Multithreading Multithreading is the ability of an operating system to support multiple threads of execution within a single process Processes have at least

More information

Performance Tools for Technical Computing

Performance Tools for Technical Computing Christian Terboven terboven@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University Intel Software Conference 2010 April 13th, Barcelona, Spain Agenda o Motivation and Methodology

More information

Lightweight Remote Procedure Call

Lightweight Remote Procedure Call Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, Henry M. Levy ACM Transactions Vol. 8, No. 1, February 1990, pp. 37-55 presented by Ian Dees for PSU CS533, Jonathan

More information

Techno India Batanagar Department of Computer Science & Engineering. Model Questions. Multiple Choice Questions:

Techno India Batanagar Department of Computer Science & Engineering. Model Questions. Multiple Choice Questions: Techno India Batanagar Department of Computer Science & Engineering Model Questions Subject Name: Operating System Multiple Choice Questions: Subject Code: CS603 1) Shell is the exclusive feature of a)

More information

Course Syllabus. Operating Systems

Course Syllabus. Operating Systems Course Syllabus. Introduction - History; Views; Concepts; Structure 2. Process Management - Processes; State + Resources; Threads; Unix implementation of Processes 3. Scheduling Paradigms; Unix; Modeling

More information

Lecture Topics. Announcements. Today: Threads (Stallings, chapter , 4.6) Next: Concurrency (Stallings, chapter , 5.

Lecture Topics. Announcements. Today: Threads (Stallings, chapter , 4.6) Next: Concurrency (Stallings, chapter , 5. Lecture Topics Today: Threads (Stallings, chapter 4.1-4.3, 4.6) Next: Concurrency (Stallings, chapter 5.1-5.4, 5.7) 1 Announcements Make tutorial Self-Study Exercise #4 Project #2 (due 9/20) Project #3

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 2

ECE 571 Advanced Microprocessor-Based Design Lecture 2 ECE 571 Advanced Microprocessor-Based Design Lecture 2 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 4 September 2014 Announcements HW#1 delayed until Tuesday 1 Hardware Performance

More information

Processes and Threads

Processes and Threads TDDI04 Concurrent Programming, Operating Systems, and Real-time Operating Systems Processes and Threads [SGG7] Chapters 3 and 4 Copyright Notice: The lecture notes are mainly based on Silberschatz s, Galvin

More information

The Role of Performance

The Role of Performance Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture The Role of Performance What is performance? A set of metrics that allow us to compare two different hardware

More information

Overview. Thread Packages. Threads The Thread Model (1) The Thread Model (2) The Thread Model (3) Thread Usage (1)

Overview. Thread Packages. Threads The Thread Model (1) The Thread Model (2) The Thread Model (3) Thread Usage (1) Overview Thread Packages Thomas Plagemann With slides from O. Anshus, C. Griwodz, M. van Steen, and A. Tanenbaum What are threads? Why threads? Example: Da CaPo 1.0 Thread implementation User level level

More information

Processes and Threads

Processes and Threads COS 318: Operating Systems Processes and Threads Kai Li and Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall13/cos318 Today s Topics u Concurrency

More information

Chapter 13: I/O Systems

Chapter 13: I/O Systems Chapter 13: I/O Systems Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Streams Performance 13.2 Silberschatz, Galvin

More information

Chapter 4: Threads. Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads

Chapter 4: Threads. Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads Chapter 4: Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads Chapter 4: Threads Objectives To introduce the notion of a

More information

Operating System(16MCA24)

Operating System(16MCA24) PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore 560 100 Department of MCA COURSE INFORMATION SHEET Operating System(16MCA24) 1. GENERAL INFORMATION Academic Year: 2017 Semester(s):I

More information

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs CSE 451: Operating Systems Winter 2005 Lecture 7 Synchronization Steve Gribble Synchronization Threads cooperate in multithreaded programs to share resources, access shared data structures e.g., threads

More information