Use Dynamic Analysis Tools on Linux FTF-SDS-F0407 Gene Fortanely Freescale Software Engineer Catalin Udma A P R. 2 0 1 4 Software Engineer, Digital Networking TM External Use
Session Introduction This session will consist of lecture and demo about the Perf and Valgrind Memcheck tools on Linux Perf is a performance analyzing tool. It is capable of statistical profiling of an entire system (both kernel and user code), single CPU, or several threads Valgrind is a collection of tools for memory debugging, and memory leak detection. Memcheck is just one of the many tools available in Valgrind External Use 1
Session Objectives After completing this session you will have a basic understanding of how to obtain, install, and use the Perf and Valgrind Memcheck tools Expert understanding is beyond the scope of this session External Use 2
Introductions Presenter, Co-Presenter Freescale Software Engineers Engineers attending the session Name, business, current duties, experience with Perf/Valgrind Memcheck, and any learning goals for this session External Use 3
Agenda Introduction, Objectives, Agenda (10 minutes) Intro to Perf (10 minutes) Perf Examples (10 minutes) Intro to Valgrind Memcheck (10 minutes) Valgrind Memcheck Examples (10 minutes) Benefits of using open source tools (5 minutes) Q&A (5 minutes) External Use 4
Intro to Perf (1) Perf is a performance analysis tool that is based on the perf_events interface made available in Linux Kernels Version 2.6 and higher Perf is a user space utility that is part of the kernel repository. You ll need to install Perf onto your Linux system. Try apt-get install perf or equivalent. The interface between a Perf utility and the kernel consists of one syscall and is done via a file descriptor and a mmaped memory region (maps file into memory) External Use 5
Intro to Perf (2) Perf https://perf.wiki.kernel.org/index.php/main_page The Perf command on a command line interface: usage: perf [--version] [--help] COMMAND [ARGS] Perf is used with several commands: 'stat': obtain event counts. 'top': see live event count. 'record': record events for later reporting. 'report': break down events by process, function, etc. 'annotate': annotate assembly or source code with event counts. 'sched': tracing/measuring of scheduler actions and latencies. 'list': list available events. External Use 6
Intro to Perf (3) Freescale s QorIQ Performance Analysis tools provide a user interface that hides much of the complexity. It provides App Notes and User Manuals too. Search for PE_QORIQ_SCENT on www.freescale.com: http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=pe_qoriq_scent External Use 7
Agenda Introduction, Objectives, Agenda (10 minutes) Intro to Perf (10 minutes) Perf Examples (10 minutes) Intro to Valgrind Memcheck (10 minutes) Valgrind Memcheck Examples (10 minutes) Benefits of using open source tools (5 minutes) Q&A (5 minutes) External Use 8
Perf Examples (1) Performance Evaluation #define L 20000 #define C 20000 int a[l][c]; unsigned long long f1() { unsigned long long ret = 0; int x, y; for (x = 0; x < L; x++) for (y = 0; y < C; y++) ret += a[x][y]; return ret; } #define L 20000 #define C 20000 int a[l][c]; unsigned long long f2() { unsigned long long ret = 0; int x, y; for (y = 0; y < C; y++) for (x = 0; x < L; x++) ret += a[x][y]; return ret; } The same algorithm, the same result. The same performance? Recall in C an array is arranged with the elements of the right most index located next to each other. External Use 9
Perf Examples (2) Performance Evaluation (continued) $ perf stat -e cache-misses./f1 Performance counter stats for './f1': 4007 cache-misses 1.855771646 seconds time elapsed $ perf stat -e cache-misses./f2 Performance counter stats for './f2': 11401273 cache-misses 5.551568045 seconds time elapsed External Use 10
Perf Examples (3) Performance Evaluation (continued) What do these perf arguments mean?: $ perf stat -e cache-misses./f1 perf stat -h usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events -i, --no-inherit child tasks do not inherit counters -p, --pid <n> stat events on existing process id -t, --tid <n> stat events on existing thread id -a, --all-cpus system-wide collection from all CPUs -c, --scale scale/normalize counters -v, --verbose be more verbose (show counter open errors, etc) -r, --repeat <n> repeat command and print average + stddev (max: 100) -n, --null null run - dont start any counters -B, --big-num print large numbers with thousands' separators External Use 11
Perf Examples (4) Performance Evaluation $ perf record -e cpu-clock test_perf!!!hello World!!! [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.009 MB perf.data $ perf report # Overhead Command Shared Object Symbol #............ 76.67% perf perf [.] function2 16.67% perf perf [.] function1 6.67% perf [kernel.kallsyms] [k] do_page_fault External Use 12
Perf Examples (5) As with any open source tool, anytime you have a question you should try to solve it by experimenting with the tool, reading the documentation, searching on the Internet, and contacting other users If you are using the Freescale Processor Expert Scenarios Tool, or any other software supported by Freescale then you also have access to Freescale technical support External Use 13
Agenda Introduction, Objectives, Agenda (10 minutes) Intro to Perf (10 minutes) Perf Examples (10 minutes) Intro to Valgrind Memcheck (10 minutes) Valgrind Memcheck Examples (10 minutes) Benefits of using open source tools (5 minutes) Q&A (5 minutes) External Use 14
Intro to Valgrind Memcheck (1) Valgrind is an instrumentation framework for building dynamic analysis tools Valgrind is a collection of tools for dynamic analysis including these and more: Memcheck detects memory management problems Cachegrind a cache profiler Massif a heap profiler Helgrind thread debugger which finds data races in multithreaded programs This session will focus on Valgrind Memcheck External Use 15
Intro to Valgrind Memcheck (2) The Valgrind project is located at: http://valgrind.org/ You can install from the Freescale SDK for your silicon product. Or try apt-get install valgrind or equivalent. The Valgrind Memcheck command on a command-line interface. Memcheck is the default tool: usage: valgrind [--version] [--help] [--tool=memcheck] foo [foo s args] Memcheck can detect: Use of uninitialised memory Reading/writing memory after it has been freed Reading/writing off the end of malloc ed blocks Reading/writing inappropriate areas on the stack Memory leaks -- where pointers to malloc ed blocks are lost forever Mismatched use of malloc/new/new [] vs. free/delete/delete [] Overlapping src and dst pointers in memcpy() and related functions Some misuses of the POSIX pthreads API External Use 16
Agenda Introduction, Objectives, Agenda (10 minutes) Intro to Perf (10 minutes) Perf Examples (10 minutes) Intro to Valgrind Memcheck (10 minutes) Valgrind Memcheck Examples (10 minutes) Benefits of using open source tools (5 minutes) Q&A (5 minutes) External Use 17
Valgrind Memcheck Examples (1) First prepare your executable for use with valgrind memcheck by compiling with the g to generate debugging information: gcc g foo.c o foo Let s take a look at an example program foo.c: 1 #include <stdio.h> 2 #include <stdlib.h> 3 4 int main(){ 5 6 int i; 7 int *a = malloc(sizeof(int) * 10); 8 9 if (!a) return -1; /*malloc failed*/ 10 for (i = 0; i < 11; i++){ 11 a[i] = i; 12 } 12 13 free(a); 14 return 0; 15 } External Use 18
Valgrind Memcheck Examples (2) Running valgrind, this was my output: valgrind --tool=memcheck./a.out ==23224== Memcheck, a memory error detector ==23224== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al. ==23224== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info ==23224== Command:./a.out ==23224== ==23224== Invalid write of size 4 ==23224== at 0x8048454: main (foo.c:11) ==23224== Address 0x41f4050 is 0 bytes after a block of size 40 alloc'd ==23224== at 0x402BE68: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86- linux.so) ==23224== by 0x8048428: main (foo.c:7) ==23224== ==23224== ==23224== HEAP SUMMARY: ==23224== in use at exit: 0 bytes in 0 blocks ==23224== total heap usage: 1 allocs, 1 frees, 40 bytes allocated ==23224== ==23224== All heap blocks were freed -- no leaks are possible ==23224== ==23224== For counts of detected and suppressed errors, rerun with: -v ==23224== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) External Use 19
Valgrind Memcheck Examples (3) Now the array access problem is fixed. But I have commented out the free() in foo.c: #include <stdio.h> #include <stdlib.h> int main(){ int i; int *a = malloc(sizeof(int) * 10); if (!a) return -1; /*malloc failed*/ for (i = 0; i < 10; i++){ a[i] = i; } /*free(a);*/ return 0; } External Use 20
Valgrind Memcheck Examples (4) Recompiling with -g and running valgrind again, this was my output: valgrind --tool=memcheck./a.out ==23394== Memcheck, a memory error detector ==23394== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al. ==23394== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info ==23394== Command:./a.out ==23394== ==23394== ==23394== HEAP SUMMARY: ==23394== in use at exit: 40 bytes in 1 blocks ==23394== total heap usage: 1 allocs, 0 frees, 40 bytes allocated ==23394== ==23394== LEAK SUMMARY: ==23394== definitely lost: 40 bytes in 1 blocks ==23394== indirectly lost: 0 bytes in 0 blocks ==23394== possibly lost: 0 bytes in 0 blocks ==23394== still reachable: 0 bytes in 0 blocks ==23394== suppressed: 0 bytes in 0 blocks ==23394== Rerun with --leak-check=full to see details of leaked memory ==23394== ==23394== For counts of detected and suppressed errors, rerun with: -v ==23394== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) External Use 21
Valgrind Memcheck Examples (5) Some valgrind output showing you other messages: $ valgrind --tool=memcheck --leak-check=yes./test_valgrind ==20699== Use of uninitialised value of size 4 ==20699== at 0x2B1BFB: _itoa_word (in /lib/libc-2.5.so) ==20699== by 0x2B5390: vfprintf (in /lib/libc-2.5.so) ==20699== by 0x2BCE42: printf (in /lib/libc-2.5.so) ==20699== by 0x80483F0: main (test_valgrind.c:9) ==20699== Invalid read of size 4 ==20699== at 0x8048406: main (test_valgrind.c:12) ==20699== Address 0x401608C is 4 bytes after a block of size 40 alloc'd ==20699== at 0x40053C0: malloc (vg_replace_malloc.c:149) ==20699== by 0x80483FC: main (test_valgrind.c:11) ==20699== 4 bytes in 1 blocks are definitely lost in loss record 1 of 2 ==20699== at 0x40053C0: malloc (vg_replace_malloc.c:149) ==20699== by 0x80483D0: main (test_valgrind.c:7) External Use 22
Agenda Introduction, Objectives, Agenda (10 minutes) Intro to Perf (10 minutes) Perf Examples (10 minutes) Intro to Valgrind Memcheck (10 minutes) Valgrind Memcheck Examples (10 minutes) Benefits of using open source tools (5 minutes) Q&A (5 minutes) External Use 23
Benefits of Using Open Source Tools Open source tools are tools where the source code is published and available to view, use, modify, and redistribute. The tool is maintained by a collaborative community These are some of the benefits of using open source tools: Free of charge Source code is available to view, use, modify, and redistribute Technical development by involvement with a community of experts These are some of the costs of using open source tools: When something goes wrong, you can t call the vendor for support You have to consider licensing terms when distributing software You should contribute changes to the community External Use 24
Agenda Introduction, Objectives, Agenda (10 minutes) Intro to Perf (10 minutes) Perf Examples (10 minutes) Intro to Valgrind Memcheck (10 minutes) Valgrind Memcheck Examples (10 minutes) Benefits of using open source tools (5 minutes) Q&A (5 minutes) External Use 25
www.freescale.com 2014 Freescale Semiconductor, Inc. External Use