2017/12/07 21:49 1/6 PERF performance-counter for Odroid XU3/XU4 PERF performance-counter for Odroid XU3/XU4 Linux hardware performance measurement using counters, trace-points, software performance counters, and dynamic probes. Perf as one of the two most commonly used performance counter profiling tools on Linux. Perf basically use to analyses the core internal bottleneck right up to the driver level. Linux support many profiling tools like perf, trace-cmd, blktrace, strace and oprofile. Performance counters are CPU hardware registers that count hardware events such as instructions executed, cache-misses suffered, or branches mispredicted. They form a basis for profiling applications to trace dynamic control flow and identify hotspots. perf provides rich generalized abstractions over hardware specific capabilities. Among others, it provides per task, per CPU and perworkload counters, sampling on top of these and source code event annotation. Using perf we could monitor the performance of the device driver. Build Pref tool In order to build perf you need to install following packages. sudo apt-get install flex bison libdw-dev libnewt-dev binutils-dev libauditdev libgtk2.0-dev binutils-dev libssl-dev python-dev systemtap-sdt-dev libiberty-dev libperl-dev liblzma-dev libpython-dev libunwind-* asciidoc xmlto Check out the kernel source code to build the perf executable $ git clone --depth 1 https://github.com/hardkernel/linux -b odroidxu4-4.14.y $ cd linux/tools/perf $ make $ sudo cp perf /usr/bin/perf Note: perf register pmu is integrated in the kernel, so just need to build the perf binary to test. Check if Kernel supports Perf feature or not (Kernel 4.14 or higher is required) root@odroid:~# dmesg grep PMU [ 0.250870] EXYNOS5420 PMU initialized [ 0.749038] hw perfevents: enabled with armv7_cortex_a7 PMU driver, 5 counters available [ 0.750030] hw perfevents: enabled with armv7_cortex_a15 PMU driver, 7 counters available ODROID Wiki - http://wiki.odroid.com/
Last update: odroid-xu4:application_note:software:perf_perfomace_counter_for_odroid_xu3_xu4 http://wiki.odroid.com/odroid-xu4/application_note/software/perf_perfomace_counter_for_odroid_xu3_xu4 2017/11/21 08:28 root@odroid:~# Check a list of perf events we can monitor root@odroid:~# perf list List of pre-defined events (to be used in -e): branch-instructions OR branches branch-misses bus-cycles cache-misses cache-references cpu-cycles OR cycles instructions alignment-faults bpf-output context-switches OR cs cpu-clock cpu-migrations OR migrations dummy emulation-faults major-faults minor-faults page-faults OR faults task-clock L1-dcache-load-misses L1-dcache-loads L1-dcache-store-misses L1-dcache-stores L1-icache-load-misses L1-icache-loads LLC-load-misses LLC-loads LLC-store-misses LLC-stores branch-load-misses branch-loads dtlb-load-misses dtlb-store-misses itlb-load-misses List of pre-defined events (to be used in -e): branch-instructions OR branches branch-misses bus-cycles http://wiki.odroid.com/ Printed on 2017/12/07 21:49
2017/12/07 21:49 3/6 PERF performance-counter for Odroid XU3/XU4 cache-misses cache-references cpu-cycles OR cycles instructions alignment-faults bpf-output context-switches OR cs cpu-clock cpu-migrations OR migrations dummy emulation-faults major-faults minor-faults page-faults OR faults task-clock L1-dcache-load-misses L1-dcache-loads L1-dcache-store-misses L1-dcache-stores L1-icache-load-misses L1-icache-loads LLC-load-misses LLC-loads LLC-store-misses LLC-stores branch-load-misses branch-loads dtlb-load-misses dtlb-store-misses itlb-load-misses armv7_cortex_a15/br_immed_retired/ armv7_cortex_a15/br_mis_pred/ armv7_cortex_a15/br_pred/ armv7_cortex_a15/br_return_retired/ armv7_cortex_a15/bus_access/ armv7_cortex_a15/bus_cycles/ armv7_cortex_a15/cid_write_retired/ armv7_cortex_a15/cpu_cycles/ armv7_cortex_a15/exc_return/ armv7_cortex_a15/exc_taken/ armv7_cortex_a15/inst_retired/ armv7_cortex_a15/inst_spec/ armv7_cortex_a15/l1d_cache/ armv7_cortex_a15/l1d_cache_refill/ armv7_cortex_a15/l1d_cache_wb/ armv7_cortex_a15/l1d_tlb_refill/ armv7_cortex_a15/l1i_cache/ ODROID Wiki - http://wiki.odroid.com/
Last update: odroid-xu4:application_note:software:perf_perfomace_counter_for_odroid_xu3_xu4 http://wiki.odroid.com/odroid-xu4/application_note/software/perf_perfomace_counter_for_odroid_xu3_xu4 2017/11/21 08:28 armv7_cortex_a15/l1i_cache_refill/ Perf Examples root@odroid:~/perf-examples# perf stat -B dd if=/dev/zero of=/dev/null count=1000000 1000000+ records in 1000000+ records out 512000000 bytes (512 MB, 488 MiB) copied, 0.840694 s, 609 MB/s Performance counter stats for 'dd if=/dev/zero of=/dev/null count=1000000': 842.111288 task-clock (msec) # 0.996 CPUs utilized 1 context-switches # 0.001 K/sec cpu-migrations # 0.000 K/sec 42 page-faults # 0.050 K/sec 1684203841 cycles # 2.000 GHz 1435117503 instructions # 0.85 insn per cycle 311869004 branches # 370.342 M/sec 11924108 branch-misses # 3.82% of all branches 0.845417981 seconds time elapsed root@odroid:~/perf-examples# Note: Exynos5422 is big.little arch so we obtain the counter for each cpu. root@odroid:~/perf-examples# perf stat -B taskset -c 0 dd if=/dev/zero of=/dev/null count=1000000 1000000+ records in 1000000+ records out 512000000 bytes (512 MB, 488 MiB) copied, 1.65277 s, 310 MB/s Performance counter stats for 'taskset -c 0 dd if=/dev/zero of=/dev/null count=1000000': 1655.839284 task-clock (msec) # 0.999 CPUs utilized 7 context-switches # 0.004 K/sec 1 cpu-migrations # 0.001 K/sec 77 page-faults # 0.047 K/sec 1773536 cycles # 0.001 GHz 444207 instructions # 0.25 insn per cycle 93267 branches # 0.056 M/sec 9169 branch-misses # 9.83% of all branches 1.657392774 seconds time elapsed root@odroid:~/perf-examples# perf stat -B taskset -c 4 dd if=/dev/zero http://wiki.odroid.com/ Printed on 2017/12/07 21:49
2017/12/07 21:49 5/6 PERF performance-counter for Odroid XU3/XU4 of=/dev/null count=1000000 1000000+ records in 1000000+ records out 512000000 bytes (512 MB, 488 MiB) copied, 0.809315 s, 633 MB/s Performance counter stats for 'taskset -c 4 dd if=/dev/zero of=/dev/null count=1000000': 811.520288 task-clock (msec) # 0.998 CPUs utilized 6 context-switches # 0.007 K/sec 1 cpu-migrations # 0.001 K/sec 77 page-faults # 0.095 K/sec 1622986577 cycles # 2.000 GHz 1435747079 instructions # 0.88 insn per cycle 311780313 branches # 384.193 M/sec 8700181 branch-misses # 2.79% of all branches 0.812844283 seconds time elapsed root@odroid:~/perf-examples# perf record/report perf record : perf record uses the cycles event as the sampling event. This is a generic hardware event that is mapped to a hardware-specific PMU event by the kernel. perf report: Samples collected by perf record are saved into a binary file called, by default, perf.data. The perf report command reads this file and generates a concise execution profile. By default, samples are sorted by functions with the most samples first. It is possible to customize the sorting order and therefore to view the data differently. root@odroid:~/perf-examples# perf record -a sleep 5 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.103 MB perf.data (289 samples) ] root@odroid:~/perf-examples# root@odroid:~/perf-examples# perf report Samples: 289 of event 'cycles:ppp', Event count (approx.): 28006656 Overhead Command Shared Object Symbol 40.33% swapper [kernel.vmlinux] [k] arch_cpu_idle 7.23% swapper [kernel.vmlinux] [k] tick_nohz_idle_exit 5.40% swapper [kernel.vmlinux] [k] tick_nohz_idle_enter 3.83% swapper [kernel.vmlinux] [k] _raw_spin_unlock_irq 3.27% sleep [kernel.vmlinux] [k] filemap_map_pages 3.25% perf [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore 2.14% sleep [kernel.vmlinux] [k] page_remove_rmap 1.82% perf [kernel.vmlinux] [k] perf_event_ctx_lock_nested 1.78% swapper [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore 1.70% ksoftirqd/4 [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore 1.68% sleep libc-2.23.so [.] 0x00050840 ODROID Wiki - http://wiki.odroid.com/
Last update: odroid-xu4:application_note:software:perf_perfomace_counter_for_odroid_xu3_xu4 http://wiki.odroid.com/odroid-xu4/application_note/software/perf_perfomace_counter_for_odroid_xu3_xu4 2017/11/21 08:28 1.67% perf [kernel.vmlinux] [k] page_remove_rmap 1.61% perf [kernel.vmlinux] [k] remove_vma 1.51% kworker/u16: [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore 1.48% perf [kernel.vmlinux] [k] ext4_da_write_begin 1.44% kworker/u16: [kernel.vmlinux] [k] _find_opp_table_unlocked 1.35% swapper [kernel.vmlinux] [k] exception_text_end 1.33% perf [kernel.vmlinux] [k] alloc_set_pte 1.23% kworker/:1 [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore 1.22% perf [kernel.vmlinux] [k] _test_and_set_bit 1.06% perf [kernel.vmlinux] [k] _raw_spin_lock 1.03% kworker/u16: [kernel.vmlinux] [k] update_devfreq_passive 0.83% kworker/u16: [kernel.vmlinux] [k] _raw_spin_unlock_irq 0.80% kworker/:1 [kernel.vmlinux] [k] memchr_inv 0.79% rs:main Q:Reg [kernel.vmlinux] [k] balance_dirty_pages_ratelimited 0.79% rs:main Q:Reg rsyslogd [.] 0x0002c8ae 0.70% sleep [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore 0.68% systemd-journal systemd-journald [.] 0x00015f1c 0.61% rs:main Q:Reg [kernel.vmlinux] [k] kmap_atomic 0.54% systemd-journal systemd-journald [.] 0x0002aeac External Links You can find more on following links. https://perf.wiki.kernel.org/index.php/tutorial http://www.brendangregg.com/perf.html From: http://wiki.odroid.com/ - ODROID Wiki Permanent link: http://wiki.odroid.com/odroid-xu4/application_note/software/perf_perfomace_counter_for_odroid_xu3_xu4 Last update: 2017/11/21 08:28 http://wiki.odroid.com/ Printed on 2017/12/07 21:49