Performance analysis in large C++ projects

Size: px
Start display at page:

Download "Performance analysis in large C++ projects"

Transcription

1 Performance analysis in large C++ projects Florent D HALLUIN <d-halluin@lrde.epita.fr> EPITA Research and Development Laboratory May 13th, 2009 Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 1 / 39

2 Outline Introduction to performance analysis 1 Introduction to performance analysis Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 2 / 39

3 Outline Introduction to performance analysis 1 Introduction to performance analysis Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 2 / 39

4 Outline Introduction to performance analysis 1 Introduction to performance analysis Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 2 / 39

5 Outline Introduction to performance analysis 1 Introduction to performance analysis Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 2 / 39

6 Definitions Performance indicators Popular open-source profilers Introduction to performance analysis 1 Introduction to performance analysis Definitions Performance indicators Popular open-source profilers Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 3 / 39

7 Definitions I Introduction to performance analysis Definitions Performance indicators Popular open-source profilers Profiling Investigation of a program s behavior. Done as the program executes. Used to locate bottlenecks. Profilers Link functions to their resource usage. Work by instrumentation or by sampling. Output a flat profile or a call graph. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 4 / 39

8 Definitions I Introduction to performance analysis Definitions Performance indicators Popular open-source profilers Profiling Investigation of a program s behavior. Done as the program executes. Used to locate bottlenecks. Profilers Link functions to their resource usage. Work by instrumentation or by sampling. Output a flat profile or a call graph. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 4 / 39

9 Definitions II Introduction to performance analysis Definitions Performance indicators Popular open-source profilers Optimization Can be done at several levels (design, source code, compile, run-time). Should be selective: 10% of the code is responsible for 90% of the execution time. Donald Knuth: We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 5 / 39

10 Definitions III Introduction to performance analysis Definitions Performance indicators Popular open-source profilers Overhead Cost of the measurement process. Low for sampling profilers. Higher for instrumenting profilers. Its impact on the measures can be estimated. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 6 / 39

11 Performance indicators Definitions Performance indicators Popular open-source profilers Wall time Time as perceived by a human, i.e. measured on a wall clock. CPU time Time spent in CPU instructions. Divided into user time and system time. Virtual memory Amount of virtual memory allocated to a program. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 7 / 39

12 Performance indicators Definitions Performance indicators Popular open-source profilers Wall time Time as perceived by a human, i.e. measured on a wall clock. CPU time Time spent in CPU instructions. Divided into user time and system time. Virtual memory Amount of virtual memory allocated to a program. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 7 / 39

13 Performance indicators Definitions Performance indicators Popular open-source profilers Wall time Time as perceived by a human, i.e. measured on a wall clock. CPU time Time spent in CPU instructions. Divided into user time and system time. Virtual memory Amount of virtual memory allocated to a program. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 7 / 39

14 Definitions Performance indicators Popular open-source profilers Popular open-source profilers TIME TIME Widely available shell command. Shows CPU time and wall time for a program execution. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 8 / 39

15 GPROF Introduction to performance analysis Definitions Performance indicators Popular open-source profilers Popular open-source profilers GPROF Written in Part of the GNU Binutils. Uses both instrumentation and sampling. Low overhead. Results are not always accurate. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 9 / 39

16 Definitions Performance indicators Popular open-source profilers Popular open-source profilers CALLGRIND CALLGRIND Runtime-instrumentation profiler. Uses the VALGRIND framework. High overhead. Fairly complex output. KCACHEGRIND Powerful GUI. Displays flat profile and call graph from CALLGRIND results. Can display data from other sources. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 10 / 39

17 Definitions Performance indicators Popular open-source profilers Popular open-source profilers CALLGRIND CALLGRIND Runtime-instrumentation profiler. Uses the VALGRIND framework. High overhead. Fairly complex output. KCACHEGRIND Powerful GUI. Displays flat profile and call graph from CALLGRIND results. Can display data from other sources. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 10 / 39

18 Features Structure Output Under the hood 1 Introduction to performance analysis 2 Features Structure Output Under the hood 3 4 Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 11 / 39

19 Features Introduction to performance analysis Features Structure Output Under the hood CBS C++ Benchmarking Suite for VAUCANSON. Manual instrumentation profiler. Low overhead. Adaptable output. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 12 / 39

20 Structure Introduction to performance analysis Features Structure Output Under the hood LIBBENCH Profiling library TIMER, measures time and generates the call graph. MEMPLOT, measures memory usage. XML archiving Own (simple) XML format. Stores benchmarks and statistics to be processed. Plot helpers Convenient scripts for GNUPLOT. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 13 / 39

21 Structure Introduction to performance analysis Features Structure Output Under the hood LIBBENCH Profiling library TIMER, measures time and generates the call graph. MEMPLOT, measures memory usage. XML archiving Own (simple) XML format. Stores benchmarks and statistics to be processed. Plot helpers Convenient scripts for GNUPLOT. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 13 / 39

22 Structure Introduction to performance analysis Features Structure Output Under the hood LIBBENCH Profiling library TIMER, measures time and generates the call graph. MEMPLOT, measures memory usage. XML archiving Own (simple) XML format. Stores benchmarks and statistics to be processed. Plot helpers Convenient scripts for GNUPLOT. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 13 / 39

23 Output Introduction to performance analysis Features Structure Output Under the hood 3 output elements Benchmark summary. Flat profile and call graph. Memory consumption. 3 output formats XML. DOT. Text. Several verbosity levels Adaptable to one s needs. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 14 / 39

24 Output Introduction to performance analysis Features Structure Output Under the hood 3 output elements Benchmark summary. Flat profile and call graph. Memory consumption. 3 output formats XML. DOT. Text. Several verbosity levels Adaptable to one s needs. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 14 / 39

25 Output Introduction to performance analysis Features Structure Output Under the hood 3 output elements Benchmark summary. Flat profile and call graph. Memory consumption. 3 output formats XML. DOT. Text. Several verbosity levels Adaptable to one s needs. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 14 / 39

26 Output Summary (text) Features Structure Output Under the hood P r o f i l e r comparison Listing: Summary text output. [ D e s c r i p t i o n : ] A simple program t h a t consumes time and memory The parameter n defines the program complexity, i. e. the time and memory taken. [ I n f o s : ] Date : Mon May 4 14:43: [ Parameters : ] n : 20 [ Results : ] memory peak : r e l a t i v e memory usage : time : time ( system ) : time ( user ) : time ( w a l l ) : Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 15 / 39

27 Output Flat profile (text) Features Structure Output Under the hood [ Task l i s t : ] Listing: Flat profile text output. Charge i d : <name> t o t a l s e l f c a l l s s e l f avg. t o t a l avg % 0: _program s s s s 30.7% 3: f i l e _ i o ( ) (C: 0) s ms (C: 0) 25.5% 2: new_integer ( ) (C: 0) 17.21s ms (C: 0) 23.5% 4: list_push_back ( ) 15.84s 15.84s ms 37.73ms 20.3% 1: parent ( ) 67.44s 13.71s s 67.44s Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 16 / 39

28 Output Call graph (DOT) Features Structure Output Under the hood _program Calls: 1 Self time: 67.44s 1 parent() Calls: 1 Self time: 13.71s new_integer() Calls: 420 Self time: 17.21s (C: 0) Incoming calls: 20 Internal calls: 820 Outgoing calls: 0 Self time: 37.88s list_push_back() Calls: 420 Self time: 15.84s file_io() Calls: 420 Self time: 20.68s Figure: DOT output example. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 17 / 39

29 Output Summary (XML) Features Structure Output Under the hood Listing: Main section of the XML output. <?xml version= " 1.0 " encoding= "UTF 8"?> <bench> <name> P r o f i l e r comparison< / name> <date>mon May 4 14 :43: < / date> <time> < / time> < d e s c r i p t i o n >A simple program t h a t consumes time and memory The parameter n defines the program complexity, i. e. the time and memory taken. < / description> <parameters> <parameter name= " n " value= " 20 " / > < / parameters> < r e s u l t s > < result name= "memory peak " value= " " / > < r e s u l t name= " r e l a t i v e memory usage " value= " " / > < result name= " time " value= " " / > < result name= " time ( system ) " value= " " / > < result name= " time ( user ) " value= " " / > < result name= " time ( wall ) " value= " " / > < / r e s u l t s > < / bench> Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 18 / 39

30 Statistics Introduction to performance analysis Features Structure Output Under the hood Statistics 3000 lines of C++ code. 600 lines of demo scripts. 400 lines of documentation. 200 lines of Perl. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 19 / 39

31 System calls Introduction to performance analysis Features Structure Output Under the hood Time CPU time: getrusage(). Wall time: gettimeofday(). Memory Virtual memory: parse /proc/self/stat. Also: sbrk(0). Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 20 / 39

32 System calls Introduction to performance analysis Features Structure Output Under the hood Time CPU time: getrusage(). Wall time: gettimeofday(). Memory Virtual memory: parse /proc/self/stat. Also: sbrk(0). Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 20 / 39

33 Call graph Introduction to performance analysis Features Structure Output Under the hood Call graph Directed graph that represents calling relationships between tasks. Cycles (strongly connected components) show recursive tasks. Built in two steps: data collection and call graph computation. Implemented using boost::adjacency_list. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 21 / 39

34 Performance trace Demo Optimizing algorithms 1 Introduction to performance analysis 2 3 Performance trace Demo Optimizing algorithms 4 Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 22 / 39

35 Performance trace Demo Optimizing algorithms VAUCANSON s performance trace Goals Give a performance overview across versions. Compare VAUCANSON to OPENFST. Compare implementations (LISTG and BMIG). Measure the impact of structural changes in the library. Implementation Benchmarking suite (make bench). 15 of the most used algorithms on common test cases. Output CBS XML files. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 23 / 39

36 Performance trace Demo Optimizing algorithms VAUCANSON s performance trace Goals Give a performance overview across versions. Compare VAUCANSON to OPENFST. Compare implementations (LISTG and BMIG). Measure the impact of structural changes in the library. Implementation Benchmarking suite (make bench). 15 of the most used algorithms on common test cases. Output CBS XML files. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 23 / 39

37 Some results Introduction to performance analysis Performance trace Demo Optimizing algorithms Time (ms) Graph implementation influence on determinize() (2.66GHz Intel Core 2 Quad Q9400, 4GB RAM) CPU time (bmig) user (bmig) system (bmig) CPU time (listg) user (listg) system (listg) Automaton complexity Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 24 / 39

38 Some results Introduction to performance analysis Performance trace Demo Optimizing algorithms Time (ms) Graph implementation influence on minimization-2n-moore() (2.66GHz Intel Core 2 Quad Q9400, 4GB RAM) CPU time (bmig) user (bmig) system (bmig) CPU time (listg) user (listg) system (listg) Automaton complexity Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 25 / 39

39 Optimizing algorithms Performance trace Demo Optimizing algorithms quotient() One of the most important algorithms in VAUCANSON. CBS showed a 30-line bottleneck in a 900-line file. Selective optimization led to a 40% performance boost. a a b b b Figure: Base automaton for quotient() benchmark. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 26 / 39

40 Optimizing algorithms Performance trace Demo Optimizing algorithms quotient() One of the most important algorithms in VAUCANSON. CBS showed a 30-line bottleneck in a 900-line file. Selective optimization led to a 40% performance boost. a a b b b Figure: Base automaton for quotient() benchmark. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 26 / 39

41 Performance trace Demo Optimizing algorithms Quotient code before optimization Listing: quotient() code excerpt, before optimization. 1 bool compute_going_in_states ( p a r t i t i o n _ t & p, l e t t e r _ t a ) 2 { 3 f o r _ a l l _ ( going_in_t, s, going_in_ ) 4 s = false ; 5 6 f o r _ a l l _ ( p a r t i t i o n _ t, s, p ) 7 { 8 for ( r d e l t a _ i t e r a t o r t ( input_. value ( ), s ) ;! t. done ( ) ; t. next ( ) ) 9 { 10 / / Some code o p t i m i z a t i o n i s p o s sible here 11 monoid_elt_t w( i n p u t _. s e r i e s _ o f ( t ). s t r u c t u r e ( ). monoid ( ), a ) ; 12 i f ( i n p u t _. s e r i e s _ o f ( t ). get (w)!= input_. s e r i e s ( ). semiring ( ). wzero_ ) 13 { 14 / / [... ] 15 } 16 } 17 } 18 return! met_class_. empty ( ) ; 19 } Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 27 / 39

42 Performance trace Demo Optimizing algorithms Quotient code after optimization Listing: quotient() code excerpt, after optimization. 1 bool compute_going_in_states ( p a r t i t i o n _ t & p, l e t t e r _ t a ) 2 { 3 f o r _ a l l _ ( going_in_t, s, going_in_ ) 4 s = false ; 5 6 / / Compute as much as possible outside loops. 7 s e m i r i n g _ e l t _ t weight_zero = input_. s e r i e s ( ). semiring ( ). wzero_ ; 8 monoid_elt_t w( i n p u t _. s e r i e s ( ). monoid ( ), a ) ; 9 10 f o r _ a l l _ ( p a r t i t i o n _ t, s, p ) 11 { 12 for ( r d e l t a _ i t e r a t o r t ( input_. value ( ), s ) ;! t. done ( ) ; t. next ( ) ) 13 { 14 / / Optimization led to a 40% performance boost on test cases. 15 i f ( i n p u t _. s e r i e s _ o f ( t ). get (w)!= weight_zero ) 16 { 17 / / [... ] 18 } 19 } 20 } 21 return! met_class_. empty ( ) ; 22 } Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 28 / 39

43 Quotient benchmark results Performance trace Demo Optimizing algorithms Time (ms) Code optimization influence on quotient() (2GHz Intel Centrino, 1GB RAM) CPU time (after) user (after) system (after) CPU time (before) user (before) system (before) Product automaton complexity Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 29 / 39

44 1 Introduction to performance analysis Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 30 / 39

45 Future works VAUCANSON Optimize algorithms. Expand benchmarking suite. Add OPENFST benchmarks. CBS Add a script front-end. Turn into a separate open-source project. Add support for Mac & Windows. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 31 / 39

46 Future works VAUCANSON Optimize algorithms. Expand benchmarking suite. Add OPENFST benchmarks. CBS Add a script front-end. Turn into a separate open-source project. Add support for Mac & Windows. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 31 / 39

47 Questions Figure: A dog. Picture from FreeDigitalPhotos.net. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 32 / 39

48 References I Allauzen, C., Riley, M., Schalkwyk, J., Skut, W., and Mohri, M. (2007). OpenFst: A general and efficient weighted finite-state transducer library. In Proceedings of the Ninth International Conference on Implementation and Application of Automata, (CIAA 2007), volume 4783 of Lecture Notes in Computer Science, pages Springer. Bigaignon, R. (2006). Présentation du TAF-KIT. lrde-techreps/trunk/0701. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 33 / 39

49 References II D Halluin, F. (2009a). CBS archive homepage (temporary). D Halluin, F. (2009b). VAUCANSON benchmark archive (temporary). Lazzara, G. (2006). Automata and performances. Publications/ Seminar-Lazzara. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 34 / 39

50 References III Lombardy, S., Régis-Gianas, Y., and Sakarovitch, J. (2004). Introducing Vaucanson. Theoretical Computer Science, 328: Osier, J. (1993). gprof manual. Available on texinfo/as/gprof.html. Sakarovitch, J. (2003). Éléments de théorie des automates. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 35 / 39

51 References IV Shende, S., Malony, A. D., Cuny, J., Malony, S. S. A. D., Lindlan, K., Beckman, P., and Karmesin, S. (1998). Portable profiling and tracing for parallel, scientific applications using c++. In In Proceedings of the SIGMETRICS symposium on Parallel and distributed tools, pages ACM. VAUCANSON Group (2008). VAUCANSON home page. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 36 / 39

52 References V Weidendorfer, J. (2005a). Calltree file format. show.cgi/kcachegrindcalltreeformat. Weidendorfer, J. (2005b). KCachegrind homepage. show.cgi/kcachegrindindex. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 37 / 39

53 References VI Wikipedia (2009a). Optimization (computer science) wikipedia, the free encyclopedia. Optimization_(computer_science)&oldid= [Online; accessed 2-May-2009]. Wikipedia (2009b). Performance tuning wikipedia, the free encyclopedia. Performance_tuning&oldid= [Online; accessed 2-May-2009]. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 38 / 39

54 References VII Wikipedia (2009c). Software performance analysis wikipedia, the free encyclopedia. Software_performance_analysis&oldid= [Online; accessed 2-May-2009]. Florent D HALLUIN CSI Seminar Performance analysis in large C++ projects 39 / 39

Implementation of transducers in Vaucanson

Implementation of transducers in Vaucanson Implementation of transducers in Vaucanson Sarah O Connor LRDE seminar, June 23, 2004 http://vaucanson.lrde.epita.fr/ Copying this document Copying this document Copyright

More information

Timing programs with time

Timing programs with time Profiling Profiling measures the performance of a program and can be used to find CPU or memory bottlenecks. time A stopwatch gprof The GNU (CPU) Profiler callgrind Valgrind s CPU profiling tool massif

More information

Under The Hood: Performance Tuning With Tizen. Ravi Sankar Guntur

Under The Hood: Performance Tuning With Tizen. Ravi Sankar Guntur Under The Hood: Performance Tuning With Tizen Ravi Sankar Guntur How to write a Tizen App Tools already available in IDE v2.3 Dynamic Analyzer Valgrind 2 What s NEXT? Want to optimize my application App

More information

FSMXML and its application in Vaucanson

FSMXML and its application in Vaucanson FSMXML and its application in Vaucanson Florian Lesaint Technical Report n o 0818, May 2008 revision 1877 Last year, we started to work on a new proposal of an XML automata description format, now called

More information

FSMXML and its application in Vaucanson

FSMXML and its application in Vaucanson FSMXML and its application in Vaucanson Florian Lesaint Technical Report n o 0818, May 2008 revision 1734 Last year, we started to work on a new proposal of an XML automata description format, now called

More information

Profiling & Optimization

Profiling & Optimization Lecture 18 Sources of Game Performance Issues? 2 Avoid Premature Optimization Novice developers rely on ad hoc optimization Make private data public Force function inlining Decrease code modularity removes

More information

Profiling & Optimization

Profiling & Optimization Lecture 11 Sources of Game Performance Issues? 2 Avoid Premature Optimization Novice developers rely on ad hoc optimization Make private data public Force function inlining Decrease code modularity removes

More information

Introduction to Parallel Performance Engineering

Introduction to Parallel Performance Engineering Introduction to Parallel Performance Engineering Markus Geimer, Brian Wylie Jülich Supercomputing Centre (with content used with permission from tutorials by Bernd Mohr/JSC and Luiz DeRose/Cray) Performance:

More information

Profiling and Workflow

Profiling and Workflow Profiling and Workflow Preben N. Olsen University of Oslo and Simula Research Laboratory preben@simula.no September 13, 2013 1 / 34 Agenda 1 Introduction What? Why? How? 2 Profiling Tracing Performance

More information

Performance Optimization: Simulation and Real Measurement

Performance Optimization: Simulation and Real Measurement Performance Optimization: Simulation and Real Measurement KDE Developer Conference, Introduction Agenda Performance Analysis Profiling Tools: Examples & Demo KCachegrind: Visualizing Results What s to

More information

Bindel, Fall 2011 Applications of Parallel Computers (CS 5220) Tuning on a single core

Bindel, Fall 2011 Applications of Parallel Computers (CS 5220) Tuning on a single core Tuning on a single core 1 From models to practice In lecture 2, we discussed features such as instruction-level parallelism and cache hierarchies that we need to understand in order to have a reasonable

More information

Scalable Performance Analysis of Parallel Systems: Concepts and Experiences

Scalable Performance Analysis of Parallel Systems: Concepts and Experiences 1 Scalable Performance Analysis of Parallel Systems: Concepts and Experiences Holger Brunst ab and Wolfgang E. Nagel a a Center for High Performance Computing, Dresden University of Technology, 01062 Dresden,

More information

Systems software design. Software build configurations; Debugging, profiling & Quality Assurance tools

Systems software design. Software build configurations; Debugging, profiling & Quality Assurance tools Systems software design Software build configurations; Debugging, profiling & Quality Assurance tools Who are we? Krzysztof Kąkol Software Developer Jarosław Świniarski Software Developer Presentation

More information

CSE 141 Summer 2016 Homework 2

CSE 141 Summer 2016 Homework 2 CSE 141 Summer 2016 Homework 2 PID: Name: 1. A matrix multiplication program can spend 10% of its execution time in reading inputs from a disk, 10% of its execution time in parsing and creating arrays

More information

PCAN-Explorer 6. Tel: Professional Windows Software to Communicate with CAN and CAN FD Busses. Software >> PC Software

PCAN-Explorer 6. Tel: Professional Windows Software to Communicate with CAN and CAN FD Busses. Software >> PC Software PCAN-Explorer 6 Professional Windows Software to Communicate with CAN and CAN FD Busses The PCAN-Explorer 6 is a versatile, professional program for working with CAN and CAN FD networks. The user is not

More information

A Type System for Weighted Automata and Rational Expressions

A Type System for Weighted Automata and Rational Expressions A Type System for Weighted Automata and Rational Expressions Akim Demaille 1, Alexandre Duret-Lutz 1, Sylvain Lombardy 2, Luca Saiu 3,1 and Jacques Sakarovitch 3 1 LRDE, EPITA, {akim,adl}@lrde.epita.fr

More information

Cache Performance Analysis with Callgrind and KCachegrind

Cache Performance Analysis with Callgrind and KCachegrind Cache Performance Analysis with Callgrind and KCachegrind Parallel Performance Analysis Course, 31 October, 2010 King Abdullah University of Science and Technology, Saudi Arabia Josef Weidendorfer Computer

More information

Introduction to Performance Engineering

Introduction to Performance Engineering Introduction to Performance Engineering Markus Geimer Jülich Supercomputing Centre (with content used with permission from tutorials by Bernd Mohr/JSC and Luiz DeRose/Cray) Performance: an old problem

More information

SIMD. Utilization of a SIMD unit in the OS Kernel. Shogo Saito 1 and Shuichi Oikawa 2 2. SIMD. SIMD (Single SIMD SIMD SIMD SIMD

SIMD. Utilization of a SIMD unit in the OS Kernel. Shogo Saito 1 and Shuichi Oikawa 2 2. SIMD. SIMD (Single SIMD SIMD SIMD SIMD OS SIMD 1 2 SIMD (Single Instruction Multiple Data) SIMD OS (Operating System) SIMD SIMD OS Utilization of a SIMD unit in the OS Kernel Shogo Saito 1 and Shuichi Oikawa 2 Nowadays, it is very common that

More information

Profiling for Performance in C++

Profiling for Performance in C++ Profiling for Performance in C++ Your Current Toolbox Big-O Analysis An O(n 2 ) function will theoretically take longer than O(n log n) The C++ standard library provides complexity for most functions Guessing

More information

COL862 Programming Assignment-1

COL862 Programming Assignment-1 Submitted By: Rajesh Kedia (214CSZ8383) COL862 Programming Assignment-1 Objective: Understand the power and energy behavior of various benchmarks on different types of x86 based systems. We explore a laptop,

More information

Parallel LZ77 Decoding with a GPU. Emmanuel Morfiadakis Supervisor: Dr Eric McCreath College of Engineering and Computer Science, ANU

Parallel LZ77 Decoding with a GPU. Emmanuel Morfiadakis Supervisor: Dr Eric McCreath College of Engineering and Computer Science, ANU Parallel LZ77 Decoding with a GPU Emmanuel Morfiadakis Supervisor: Dr Eric McCreath College of Engineering and Computer Science, ANU Outline Background (What?) Problem definition and motivation (Why?)

More information

Devel::NYTProf. Perl Source Code Profiler

Devel::NYTProf. Perl Source Code Profiler Devel::NYTProf Perl Source Code Profiler Tim Bunce - July 2010 Devel::DProf Is Broken $ perl -we 'print "sub s$_ { sqrt(42) for 1..100 }; s$_({});\n" for 1..1000' > x.pl $ perl -d:dprof x.pl $ dprofpp

More information

Capturing and Analyzing the Execution Control Flow of OpenMP Applications

Capturing and Analyzing the Execution Control Flow of OpenMP Applications Int J Parallel Prog (2009) 37:266 276 DOI 10.1007/s10766-009-0100-2 Capturing and Analyzing the Execution Control Flow of OpenMP Applications Karl Fürlinger Shirley Moore Received: 9 March 2009 / Accepted:

More information

Performance Evaluation

Performance Evaluation Performance Evaluation Master 2 Research Tutorial: High-Performance Architectures Arnaud Legrand et Jean-François Méhaut ID laboratory, arnaud.legrand@imag.fr November 29, 2006 A. Legrand (CNRS-ID) INRIA-MESCAL

More information

Computer Organization: A Programmer's Perspective

Computer Organization: A Programmer's Perspective Profiling Oren Kapah orenkapah.ac@gmail.com Profiling: Performance Analysis Performance Analysis ( Profiling ) Understanding the run-time behavior of programs What parts are executed, when, for how long

More information

Exploiting the Behavior of Generational Garbage Collector

Exploiting the Behavior of Generational Garbage Collector Exploiting the Behavior of Generational Garbage Collector I. Introduction Zhe Xu, Jia Zhao Garbage collection is a form of automatic memory management. The garbage collector, attempts to reclaim garbage,

More information

Python where we can, C ++ where we must

Python where we can, C ++ where we must Python where we can, C ++ where we must Source: http://xkcd.com/353/ Guy K. Kloss Python where we can,c++ where we must 1/28 Python where we can, C ++ where we must Guy K. Kloss BarCamp Auckland 2007 15

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 12, December 2015 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Debugging and Profiling

Debugging and Profiling Debugging and Profiling Dr. Axel Kohlmeyer Senior Scientific Computing Expert Information and Telecommunication Section The Abdus Salam International Centre for Theoretical Physics http://sites.google.com/site/akohlmey/

More information

L41: Lab 2 - Kernel implications of IPC

L41: Lab 2 - Kernel implications of IPC L41: Lab 2 - Kernel implications of IPC Dr Robert N.M. Watson Michaelmas Term 2015 The goals of this lab are to: Continue to gain experience tracing user-kernel interactions via system calls Explore the

More information

21. This is a screenshot of the Android Studio Debugger. It shows the current thread and the object tree for a certain variable.

21. This is a screenshot of the Android Studio Debugger. It shows the current thread and the object tree for a certain variable. 4. Logging is an important part of debugging, which is hard to achieve on mobile devices, where application development and execution take place on different systems. Android includes a framework that

More information

Workload Characterization using the TAU Performance System

Workload Characterization using the TAU Performance System Workload Characterization using the TAU Performance System Sameer Shende, Allen D. Malony, and Alan Morris Performance Research Laboratory, Department of Computer and Information Science University of

More information

Performance analysis basics

Performance analysis basics Performance analysis basics Christian Iwainsky Iwainsky@rz.rwth-aachen.de 25.3.2010 1 Overview 1. Motivation 2. Performance analysis basics 3. Measurement Techniques 2 Why bother with performance analysis

More information

Outline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work

Outline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work Using Non-blocking Operations in HPC to Reduce Execution Times David Buettner, Julian Kunkel, Thomas Ludwig Euro PVM/MPI September 8th, 2009 Outline 1 Motivation 2 Theory of a non-blocking benchmark 3

More information

Profiling: Understand Your Application

Profiling: Understand Your Application Profiling: Understand Your Application Michal Merta michal.merta@vsb.cz 1st of March 2018 Agenda Hardware events based sampling Some fundamental bottlenecks Overview of profiling tools perf tools Intel

More information

COMP40 Assignment: Code Improvement Through Profiling

COMP40 Assignment: Code Improvement Through Profiling COMP40 Assignment: Code Improvement Through Profiling Assignment due Tuesday, November 22 at 11:59 PM. There is no design document for this assignment. Contents 1 Purpose and overview 1 2 What we expect

More information

Sparse Matrix Operations on Multi-core Architectures

Sparse Matrix Operations on Multi-core Architectures Sparse Matrix Operations on Multi-core Architectures Carsten Trinitis 1, Tilman Küstner 1, Josef Weidendorfer 1, and Jasmin Smajic 2 1 Lehrstuhl für Rechnertechnik und Rechnerorganisation Institut für

More information

Weighted Finite-State Transducers in Computational Biology

Weighted Finite-State Transducers in Computational Biology Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). This Tutorial Weighted

More information

Performance Profiling

Performance Profiling Performance Profiling Minsoo Ryu Real-Time Computing and Communications Lab. Hanyang University msryu@hanyang.ac.kr Outline History Understanding Profiling Understanding Performance Understanding Performance

More information

The Bulk Synchronous Parallel Model (PSC 1.2) Lecture 1.2 Bulk Synchronous Parallel Model p.1

The Bulk Synchronous Parallel Model (PSC 1.2) Lecture 1.2 Bulk Synchronous Parallel Model p.1 The Bulk Synchronous Parallel Model (PSC 1.2) Lecture 1.2 Bulk Synchronous Parallel Model p.1 What is a parallel computer? switch switch switch switch PC PC PC PC PC PC PC PC A parallel computer consists

More information

Embedded SDR for Small Form Factor Systems

Embedded SDR for Small Form Factor Systems Embedded SDR for Small Form Factor Systems Philip Balister, Tom Tsou, and Jeff Reed MPRG Wireless @ Virginia Tech Blacksburg, VA 24060 balister@vt.edu Outline Embedded Software Defined Radio SDR Frameworks

More information

Devel::NYTProf. Perl Source Code Profiler. Tim Bunce - July 2009 Screencast available at

Devel::NYTProf. Perl Source Code Profiler. Tim Bunce - July 2009 Screencast available at Devel::NYTProf Perl Source Code Profiler Tim Bunce - July 2009 Screencast available at http://blog.timbunce.org/tag/nytprof/ Devel::DProf Oldest Perl Profiler 1995 Design flaws make it practically useless

More information

Detection and Analysis of Iterative Behavior in Parallel Applications

Detection and Analysis of Iterative Behavior in Parallel Applications Detection and Analysis of Iterative Behavior in Parallel Applications Karl Fürlinger and Shirley Moore Innovative Computing Laboratory, Department of Electrical Engineering and Computer Science, University

More information

A Case Study in Optimizing GNU Radio s ATSC Flowgraph

A Case Study in Optimizing GNU Radio s ATSC Flowgraph A Case Study in Optimizing GNU Radio s ATSC Flowgraph Presented by Greg Scallon and Kirby Cartwright GNU Radio Conference 2017 Thursday, September 14 th 10am ATSC FLOWGRAPH LOADING 3% 99% 76% 36% 10% 33%

More information

SCALING UP VS. SCALING OUT IN A QLIKVIEW ENVIRONMENT

SCALING UP VS. SCALING OUT IN A QLIKVIEW ENVIRONMENT SCALING UP VS. SCALING OUT IN A QLIKVIEW ENVIRONMENT QlikView Technical Brief February 2012 qlikview.com Introduction When it comes to the enterprise Business Discovery environments, the ability of the

More information

Making the Box Transparent: System Call Performance as a First-class Result. Yaoping Ruan, Vivek Pai Princeton University

Making the Box Transparent: System Call Performance as a First-class Result. Yaoping Ruan, Vivek Pai Princeton University Making the Box Transparent: System Call Performance as a First-class Result Yaoping Ruan, Vivek Pai Princeton University Outline n Motivation n Design & implementation n Case study n More results Motivation

More information

Potentials and Limitations for Energy Efficiency Auto-Tuning

Potentials and Limitations for Energy Efficiency Auto-Tuning Center for Information Services and High Performance Computing (ZIH) Potentials and Limitations for Energy Efficiency Auto-Tuning Parco Symposium Application Autotuning for HPC (Architectures) Robert Schöne

More information

Monitor Application for Panasonic TDA

Monitor Application for Panasonic TDA Monitor Application for Panasonic TDA MAP Demo Getting Started Version 1.0 G3 NOVA Communications SRL 28 Iacob Felix, Sector 1, Bucharest, ROMANIA Phone: +1 877 777 8753 www.g3novacommunications.com 2005

More information

Cache Performance Analysis with Callgrind and KCachegrind

Cache Performance Analysis with Callgrind and KCachegrind Cache Performance Analysis with Callgrind and KCachegrind VI-HPS Tuning Workshop 8 September 2011, Aachen Josef Weidendorfer Computer Architecture I-10, Department of Informatics Technische Universität

More information

Enemy Territory Traffic Analysis

Enemy Territory Traffic Analysis Enemy Territory Traffic Analysis Julie-Anne Bussiere *, Sebastian Zander Centre for Advanced Internet Architectures. Technical Report 00203A Swinburne University of Technology Melbourne, Australia julie-anne.bussiere@laposte.net,

More information

MPI Performance Tools

MPI Performance Tools Physics 244 31 May 2012 Outline 1 Introduction 2 Timing functions: MPI Wtime,etime,gettimeofday 3 Profiling tools time: gprof,tau hardware counters: PAPI,PerfSuite,TAU MPI communication: IPM,TAU 4 MPI

More information

Performance Tools. Tulin Kaman. Department of Applied Mathematics and Statistics

Performance Tools. Tulin Kaman. Department of Applied Mathematics and Statistics Performance Tools Tulin Kaman Department of Applied Mathematics and Statistics Stony Brook/BNL New York Center for Computational Science tkaman@ams.sunysb.edu Aug 23, 2012 Do you have information on exactly

More information

Lecture 3: Intro to parallel machines and models

Lecture 3: Intro to parallel machines and models Lecture 3: Intro to parallel machines and models David Bindel 1 Sep 2011 Logistics Remember: http://www.cs.cornell.edu/~bindel/class/cs5220-f11/ http://www.piazza.com/cornell/cs5220 Note: the entire class

More information

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING Larson Hogstrom, Mukarram Tahir, Andres Hasfura Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 18.337/6.338

More information

Performance Analysis of Parallel Scientific Applications In Eclipse

Performance Analysis of Parallel Scientific Applications In Eclipse Performance Analysis of Parallel Scientific Applications In Eclipse EclipseCon 2015 Wyatt Spear, University of Oregon wspear@cs.uoregon.edu Supercomputing Big systems solving big problems Performance gains

More information

High Performance Computing and Programming. Performance Analysis

High Performance Computing and Programming. Performance Analysis High Performance Computing and Programming Performance Analysis What is performance? Performance is a total effectiveness of a program What are the measures of the performance? execution time FLOPS memory

More information

Balanced Trees. Nate Foster Spring 2019

Balanced Trees. Nate Foster Spring 2019 Balanced Trees Nate Foster Spring 2019 Today s music: Get the Balance Right by Depeche Mode Review Previously in 3110: Streams Today: Balanced trees Running example: Sets module type Set = sig type 'a

More information

Performance Improvement. The material for this lecture is drawn, in part, from The Practice of Programming (Kernighan & Pike) Chapter 7

Performance Improvement. The material for this lecture is drawn, in part, from The Practice of Programming (Kernighan & Pike) Chapter 7 Performance Improvement The material for this lecture is drawn, in part, from The Practice of Programming (Kernighan & Pike) Chapter 7 1 For Your Amusement Optimization hinders evolution. -- Alan Perlis

More information

Collecting and Exploiting Cache-Reuse Metrics

Collecting and Exploiting Cache-Reuse Metrics Collecting and Exploiting Cache-Reuse Metrics Josef Weidendorfer and Carsten Trinitis Technische Universität München, Germany {weidendo, trinitic}@cs.tum.edu Abstract. The increasing gap of processor and

More information

Performance measurements of computer systems: tools and analysis

Performance measurements of computer systems: tools and analysis Performance measurements of computer systems: tools and analysis M2R PDES Jean-Marc Vincent and Arnaud Legrand Laboratory LIG MESCAL Project Universities of Grenoble {Jean-Marc.Vincent,Arnaud.Legrand}@imag.fr

More information

Automatic trace analysis with the Scalasca Trace Tools

Automatic trace analysis with the Scalasca Trace Tools Automatic trace analysis with the Scalasca Trace Tools Ilya Zhukov Jülich Supercomputing Centre Property Automatic trace analysis Idea Automatic search for patterns of inefficient behaviour Classification

More information

Shared Memory Programming With OpenMP Exercise Instructions

Shared Memory Programming With OpenMP Exercise Instructions Shared Memory Programming With OpenMP Exercise Instructions John Burkardt Interdisciplinary Center for Applied Mathematics & Information Technology Department Virginia Tech... Advanced Computational Science

More information

ECE 454 Computer Systems Programming Measuring and profiling

ECE 454 Computer Systems Programming Measuring and profiling ECE 454 Computer Systems Programming Measuring and profiling Ding Yuan ECE Dept., University of Toronto http://www.eecg.toronto.edu/~yuan It is a capital mistake to theorize before one has data. Insensibly

More information

Efficient Clustering and Scheduling for Task-Graph based Parallelization

Efficient Clustering and Scheduling for Task-Graph based Parallelization Center for Information Services and High Performance Computing TU Dresden Efficient Clustering and Scheduling for Task-Graph based Parallelization Marc Hartung 02. February 2015 E-Mail: marc.hartung@tu-dresden.de

More information

Evaluation studies: From controlled to natural settings

Evaluation studies: From controlled to natural settings Chapter 14 Evaluation studies: From controlled to natural settings 1 The aims: Explain how to do usability testing Outline the basics of experimental design Describe how to do field studies 2 1. Usability

More information

Separating Access Control Policy, Enforcement, and Functionality in Extensible Systems. Robert Grimm University of Washington

Separating Access Control Policy, Enforcement, and Functionality in Extensible Systems. Robert Grimm University of Washington Separating Access Control Policy, Enforcement, and Functionality in Extensible Systems Robert Grimm University of Washington Extensions Added to running system Interact through low-latency interfaces Form

More information

<Insert Picture Here> Boost Linux Performance with Enhancements from Oracle

<Insert Picture Here> Boost Linux Performance with Enhancements from Oracle Boost Linux Performance with Enhancements from Oracle Chris Mason Director of Linux Kernel Engineering Linux Performance on Large Systems Exadata Hardware How large systems are different

More information

Benchmark of a Cubieboard cluster

Benchmark of a Cubieboard cluster Benchmark of a Cubieboard cluster M J Schnepf, D Gudu, B Rische, M Fischer, C Jung and M Hardt Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Karlsruhe, Germany E-mail: matthias.schnepf@student.kit.edu,

More information

IBM High Performance Computing Toolkit

IBM High Performance Computing Toolkit IBM High Performance Computing Toolkit Pidad D'Souza (pidsouza@in.ibm.com) IBM, India Software Labs Top 500 : Application areas (November 2011) Systems Performance Source : http://www.top500.org/charts/list/34/apparea

More information

Distributed Two-way Trees for File Replication on Demand

Distributed Two-way Trees for File Replication on Demand Distributed Two-way Trees for File Replication on Demand Ramprasad Tamilselvan Department of Computer Science Golisano College of Computing and Information Sciences Rochester, NY 14586 rt7516@rit.edu Abstract

More information

New User Seminar: Part 2 (best practices)

New User Seminar: Part 2 (best practices) New User Seminar: Part 2 (best practices) General Interest Seminar January 2015 Hugh Merz merz@sharcnet.ca Session Outline Submitting Jobs Minimizing queue waits Investigating jobs Checkpointing Efficiency

More information

Chapter 2: Operating-System Structures. Operating System Concepts 9 th Edit9on

Chapter 2: Operating-System Structures. Operating System Concepts 9 th Edit9on Chapter 2: Operating-System Structures Operating System Concepts 9 th Edit9on Silberschatz, Galvin and Gagne 2013 Chapter 2: Operating-System Structures 1. Operating System Services 2. User Operating System

More information

Prof. Thomas Sterling

Prof. Thomas Sterling High Performance Computing: Concepts, Methods & Means Performance Measurement 1 Prof. Thomas Sterling Department of Computer Science Louisiana i State t University it February 13 th, 2007 News Alert! Intel

More information

Vertical Profiling: Understanding the Behavior of Object-Oriented Applications

Vertical Profiling: Understanding the Behavior of Object-Oriented Applications Vertical Profiling: Understanding the Behavior of Object-Oriented Applications Matthias Hauswirth, Amer Diwan University of Colorado at Boulder Peter F. Sweeney, Michael Hind IBM Thomas J. Watson Research

More information

Distributed and Parallel Technology

Distributed and Parallel Technology Distributed and Parallel Technology Parallel Performance Tuning Hans-Wolfgang Loidl http://www.macs.hw.ac.uk/~hwloidl School of Mathematical and Computer Sciences Heriot-Watt University, Edinburgh 0 No

More information

Intra Application Data Communication Characterization

Intra Application Data Communication Characterization Intra Application Data Communication Characterization Imran Ashraf, Vlad Mihai Sima, Koen Bertels Computer Engineering Lab, TU Delft, The Netherlands Trends Growing demand of processing Growing number

More information

Profiling and Debugging Tools. Lars Koesterke University of Porto, Portugal May 28-29, 2009

Profiling and Debugging Tools. Lars Koesterke University of Porto, Portugal May 28-29, 2009 Profiling and Debugging Tools Lars Koesterke University of Porto, Portugal May 28-29, 2009 Outline General (Analysis Tools) Listings & Reports Timers Profilers (gprof, tprof, Tau) Hardware performance

More information

Program Profiling. Xiangyu Zhang

Program Profiling. Xiangyu Zhang Program Profiling Xiangyu Zhang Outline What is profiling. Why profiling. Gprof. Efficient path profiling. Object equality profiling 2 What is Profiling Tracing is lossless, recording every detail of a

More information

Full file at

Full file at Chapter 2: Current Hardware and PC Operating Systems Chapter 2 Answers to Review Questions 1. An EPIC CPU design: a. evolved from the CISC processor b. was created in a joint project between Apple and

More information

Configuration Management for Component-based Systems

Configuration Management for Component-based Systems Configuration Management for Component-based Systems Magnus Larsson Ivica Crnkovic Development and Research Department of Computer Science ABB Automation Products AB Mälardalen University 721 59 Västerås,

More information

Performance Analysis. HPC Fall 2007 Prof. Robert van Engelen

Performance Analysis. HPC Fall 2007 Prof. Robert van Engelen Performance Analysis HPC Fall 2007 Prof. Robert van Engelen Overview What to measure? Timers Benchmarking Profiling Finding hotspots Profile-guided compilation Messaging and network performance analysis

More information

Solving Polynomial Systems with PHCpack and phcpy

Solving Polynomial Systems with PHCpack and phcpy Solving Polynomial Systems with PHCpack and phcpy Jan Verschelde University of Illinois at Chicago Department of Mathematics, Statistics, and Computer Science http://www.math.uic.edu/ jan jan@math.uic.edu

More information

Shared Memory Programming With OpenMP Computer Lab Exercises

Shared Memory Programming With OpenMP Computer Lab Exercises Shared Memory Programming With OpenMP Computer Lab Exercises Advanced Computational Science II John Burkardt Department of Scientific Computing Florida State University http://people.sc.fsu.edu/ jburkardt/presentations/fsu

More information

Parallel Merge Sort with Double Merging

Parallel Merge Sort with Double Merging Parallel with Double Merging Ahmet Uyar Department of Computer Engineering Meliksah University, Kayseri, Turkey auyar@meliksah.edu.tr Abstract ing is one of the fundamental problems in computer science.

More information

Accelerating Simulink Optimization, Code Generation & Test Automation Through Parallelization

Accelerating Simulink Optimization, Code Generation & Test Automation Through Parallelization Accelerating Simulink Optimization, Code Generation & Test Automation Through Parallelization Ryan Chladny Application Engineering May 13 th, 2014 2014 The MathWorks, Inc. 1 Design Challenge: Electric

More information

XML Description for Automata Manipulations

XML Description for Automata Manipulations XML Description for Automata Manipulations José Alves Nelma Moreira Rogério Reis {sobuy,nam,rvr@ncc.up.pt DCC-FC & LIACC, Universidade do Porto R. do Campo Alegre 1021/1055, 4169-007 Porto, Portugal Abstract.

More information

Lecture 2. Decidability and Verification

Lecture 2. Decidability and Verification Lecture 2. Decidability and Verification model temporal property Model Checker yes error-trace Advantages Automated formal verification, Effective debugging tool Moderate industrial success In-house groups:

More information

Performance Pack. Benchmarking with PlanetPress Connect and PReS Connect

Performance Pack. Benchmarking with PlanetPress Connect and PReS Connect Performance Pack Benchmarking with PlanetPress Connect and PReS Connect Contents 2 Introduction 4 Benchmarking results 5 First scenario: Print production on demand 5 Throughput vs. Output Speed 6 Second

More information

Oracle SQL Internals Handbook. Donald K. Burleson Joe Celko Dave Ensor Jonathan Lewis Dave Moore Vadim Tropashko John Weeg

Oracle SQL Internals Handbook. Donald K. Burleson Joe Celko Dave Ensor Jonathan Lewis Dave Moore Vadim Tropashko John Weeg Oracle SQL Internals Handbook Donald K. Burleson Joe Celko Dave Ensor Jonathan Lewis Dave Moore Vadim Tropashko John Weeg Oracle SQL Internals Handbook By Donald K. Burleson, Joe Celko, Dave Ensor, Jonathan

More information

AMSC/CMSC 662 Computer Organization and Programming for Scientific Computing Fall 2011 Operating Systems Dianne P. O Leary c 2011

AMSC/CMSC 662 Computer Organization and Programming for Scientific Computing Fall 2011 Operating Systems Dianne P. O Leary c 2011 AMSC/CMSC 662 Computer Organization and Programming for Scientific Computing Fall 2011 Operating Systems Dianne P. O Leary c 2011 1 Operating Systems Notes taken from How Operating Systems Work by Curt

More information

TDDD56 Multicore and GPU computing Lab 2: Non-blocking data structures

TDDD56 Multicore and GPU computing Lab 2: Non-blocking data structures TDDD56 Multicore and GPU computing Lab 2: Non-blocking data structures August Ernstsson, Nicolas Melot august.ernstsson@liu.se November 2, 2017 1 Introduction The protection of shared data structures against

More information

David R. Mackay, Ph.D. Libraries play an important role in threading software to run faster on Intel multi-core platforms.

David R. Mackay, Ph.D. Libraries play an important role in threading software to run faster on Intel multi-core platforms. Whitepaper Introduction A Library Based Approach to Threading for Performance David R. Mackay, Ph.D. Libraries play an important role in threading software to run faster on Intel multi-core platforms.

More information

HPC Fall 2007 Project 1 Fast Matrix Multiply

HPC Fall 2007 Project 1 Fast Matrix Multiply HPC Fall 2007 Project 1 Fast Matrix Multiply Robert van Engelen Due date: October 11, 2007 1 Introduction 1.1 Account and Login For this assignment you need an SCS account. The account gives you access

More information

Cray Performance Tools Enhancements for Next Generation Systems Heidi Poxon

Cray Performance Tools Enhancements for Next Generation Systems Heidi Poxon Cray Performance Tools Enhancements for Next Generation Systems Heidi Poxon Agenda Cray Performance Tools Overview Recent Enhancements Support for Cray systems with KNL 2 Cray Performance Analysis Tools

More information

Portable Power/Performance Benchmarking and Analysis with WattProf

Portable Power/Performance Benchmarking and Analysis with WattProf Portable Power/Performance Benchmarking and Analysis with WattProf Amir Farzad, Boyana Norris University of Oregon Mohammad Rashti RNET Technologies, Inc. Motivation Energy efficiency is becoming increasingly

More information

Improving the Performance of your LabVIEW Applications

Improving the Performance of your LabVIEW Applications Improving the Performance of your LabVIEW Applications 1 Improving Performance in LabVIEW Purpose of Optimization Profiling Tools Memory Optimization Execution Optimization 2 Optimization Cycle Benchmark

More information

J2EE Development Best Practices: Improving Code Quality

J2EE Development Best Practices: Improving Code Quality Session id: 40232 J2EE Development Best Practices: Improving Code Quality Stuart Malkin Senior Product Manager Oracle Corporation Agenda Why analyze and optimize code? Static Analysis Dynamic Analysis

More information

Intel VTune Performance Analyzer 9.1 for Windows* In-Depth

Intel VTune Performance Analyzer 9.1 for Windows* In-Depth Intel VTune Performance Analyzer 9.1 for Windows* In-Depth Contents Deliver Faster Code...................................... 3 Optimize Multicore Performance...3 Highlights...............................................

More information

Cornell Theory Center 1

Cornell Theory Center 1 Cornell Theory Center Cornell Theory Center (CTC) is a high-performance computing and interdisciplinary research center at Cornell University. Scientific and engineering research projects supported by

More information