PERF performance-counter for Odroid XU3/XU4

Size: px
Start display at page:

Download "PERF performance-counter for Odroid XU3/XU4"

Transcription

1 2017/12/07 21:49 1/6 PERF performance-counter for Odroid XU3/XU4 PERF performance-counter for Odroid XU3/XU4 Linux hardware performance measurement using counters, trace-points, software performance counters, and dynamic probes. Perf as one of the two most commonly used performance counter profiling tools on Linux. Perf basically use to analyses the core internal bottleneck right up to the driver level. Linux support many profiling tools like perf, trace-cmd, blktrace, strace and oprofile. Performance counters are CPU hardware registers that count hardware events such as instructions executed, cache-misses suffered, or branches mispredicted. They form a basis for profiling applications to trace dynamic control flow and identify hotspots. perf provides rich generalized abstractions over hardware specific capabilities. Among others, it provides per task, per CPU and perworkload counters, sampling on top of these and source code event annotation. Using perf we could monitor the performance of the device driver. Build Pref tool In order to build perf you need to install following packages. sudo apt-get install flex bison libdw-dev libnewt-dev binutils-dev libauditdev libgtk2.0-dev binutils-dev libssl-dev python-dev systemtap-sdt-dev libiberty-dev libperl-dev liblzma-dev libpython-dev libunwind-* asciidoc xmlto Check out the kernel source code to build the perf executable $ git clone --depth 1 -b odroidxu y $ cd linux/tools/perf $ make $ sudo cp perf /usr/bin/perf Note: perf register pmu is integrated in the kernel, so just need to build the perf binary to test. Check if Kernel supports Perf feature or not (Kernel 4.14 or higher is required) root@odroid:~# dmesg grep PMU [ ] EXYNOS5420 PMU initialized [ ] hw perfevents: enabled with armv7_cortex_a7 PMU driver, 5 counters available [ ] hw perfevents: enabled with armv7_cortex_a15 PMU driver, 7 counters available ODROID Wiki -

2 Last update: odroid-xu4:application_note:software:perf_perfomace_counter_for_odroid_xu3_xu /11/21 08:28 Check a list of perf events we can monitor root@odroid:~# perf list List of pre-defined events (to be used in -e): branch-instructions OR branches branch-misses bus-cycles cache-misses cache-references cpu-cycles OR cycles instructions alignment-faults bpf-output context-switches OR cs cpu-clock cpu-migrations OR migrations dummy emulation-faults major-faults minor-faults page-faults OR faults task-clock L1-dcache-load-misses L1-dcache-loads L1-dcache-store-misses L1-dcache-stores L1-icache-load-misses L1-icache-loads LLC-load-misses LLC-loads LLC-store-misses LLC-stores branch-load-misses branch-loads dtlb-load-misses dtlb-store-misses itlb-load-misses List of pre-defined events (to be used in -e): branch-instructions OR branches branch-misses bus-cycles Printed on 2017/12/07 21:49

3 2017/12/07 21:49 3/6 PERF performance-counter for Odroid XU3/XU4 cache-misses cache-references cpu-cycles OR cycles instructions alignment-faults bpf-output context-switches OR cs cpu-clock cpu-migrations OR migrations dummy emulation-faults major-faults minor-faults page-faults OR faults task-clock L1-dcache-load-misses L1-dcache-loads L1-dcache-store-misses L1-dcache-stores L1-icache-load-misses L1-icache-loads LLC-load-misses LLC-loads LLC-store-misses LLC-stores branch-load-misses branch-loads dtlb-load-misses dtlb-store-misses itlb-load-misses armv7_cortex_a15/br_immed_retired/ armv7_cortex_a15/br_mis_pred/ armv7_cortex_a15/br_pred/ armv7_cortex_a15/br_return_retired/ armv7_cortex_a15/bus_access/ armv7_cortex_a15/bus_cycles/ armv7_cortex_a15/cid_write_retired/ armv7_cortex_a15/cpu_cycles/ armv7_cortex_a15/exc_return/ armv7_cortex_a15/exc_taken/ armv7_cortex_a15/inst_retired/ armv7_cortex_a15/inst_spec/ armv7_cortex_a15/l1d_cache/ armv7_cortex_a15/l1d_cache_refill/ armv7_cortex_a15/l1d_cache_wb/ armv7_cortex_a15/l1d_tlb_refill/ armv7_cortex_a15/l1i_cache/ ODROID Wiki -

4 Last update: odroid-xu4:application_note:software:perf_perfomace_counter_for_odroid_xu3_xu /11/21 08:28 armv7_cortex_a15/l1i_cache_refill/ Perf Examples perf stat -B dd if=/dev/zero of=/dev/null count= records in records out bytes (512 MB, 488 MiB) copied, s, 609 MB/s Performance counter stats for 'dd if=/dev/zero of=/dev/null count= ': task-clock (msec) # CPUs utilized 1 context-switches # K/sec cpu-migrations # K/sec 42 page-faults # K/sec cycles # GHz instructions # 0.85 insn per cycle branches # M/sec branch-misses # 3.82% of all branches seconds time elapsed root@odroid:~/perf-examples# Note: Exynos5422 is big.little arch so we obtain the counter for each cpu. root@odroid:~/perf-examples# perf stat -B taskset -c 0 dd if=/dev/zero of=/dev/null count= records in records out bytes (512 MB, 488 MiB) copied, s, 310 MB/s Performance counter stats for 'taskset -c 0 dd if=/dev/zero of=/dev/null count= ': task-clock (msec) # CPUs utilized 7 context-switches # K/sec 1 cpu-migrations # K/sec 77 page-faults # K/sec cycles # GHz instructions # 0.25 insn per cycle branches # M/sec 9169 branch-misses # 9.83% of all branches seconds time elapsed root@odroid:~/perf-examples# perf stat -B taskset -c 4 dd if=/dev/zero Printed on 2017/12/07 21:49

5 2017/12/07 21:49 5/6 PERF performance-counter for Odroid XU3/XU4 of=/dev/null count= records in records out bytes (512 MB, 488 MiB) copied, s, 633 MB/s Performance counter stats for 'taskset -c 4 dd if=/dev/zero of=/dev/null count= ': task-clock (msec) # CPUs utilized 6 context-switches # K/sec 1 cpu-migrations # K/sec 77 page-faults # K/sec cycles # GHz instructions # 0.88 insn per cycle branches # M/sec branch-misses # 2.79% of all branches seconds time elapsed root@odroid:~/perf-examples# perf record/report perf record : perf record uses the cycles event as the sampling event. This is a generic hardware event that is mapped to a hardware-specific PMU event by the kernel. perf report: Samples collected by perf record are saved into a binary file called, by default, perf.data. The perf report command reads this file and generates a concise execution profile. By default, samples are sorted by functions with the most samples first. It is possible to customize the sorting order and therefore to view the data differently. root@odroid:~/perf-examples# perf record -a sleep 5 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote MB perf.data (289 samples) ] root@odroid:~/perf-examples# root@odroid:~/perf-examples# perf report Samples: 289 of event 'cycles:ppp', Event count (approx.): Overhead Command Shared Object Symbol 40.33% swapper [kernel.vmlinux] [k] arch_cpu_idle 7.23% swapper [kernel.vmlinux] [k] tick_nohz_idle_exit 5.40% swapper [kernel.vmlinux] [k] tick_nohz_idle_enter 3.83% swapper [kernel.vmlinux] [k] _raw_spin_unlock_irq 3.27% sleep [kernel.vmlinux] [k] filemap_map_pages 3.25% perf [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore 2.14% sleep [kernel.vmlinux] [k] page_remove_rmap 1.82% perf [kernel.vmlinux] [k] perf_event_ctx_lock_nested 1.78% swapper [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore 1.70% ksoftirqd/4 [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore 1.68% sleep libc-2.23.so [.] 0x ODROID Wiki -

6 Last update: odroid-xu4:application_note:software:perf_perfomace_counter_for_odroid_xu3_xu /11/21 08: % perf [kernel.vmlinux] [k] page_remove_rmap 1.61% perf [kernel.vmlinux] [k] remove_vma 1.51% kworker/u16: [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore 1.48% perf [kernel.vmlinux] [k] ext4_da_write_begin 1.44% kworker/u16: [kernel.vmlinux] [k] _find_opp_table_unlocked 1.35% swapper [kernel.vmlinux] [k] exception_text_end 1.33% perf [kernel.vmlinux] [k] alloc_set_pte 1.23% kworker/:1 [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore 1.22% perf [kernel.vmlinux] [k] _test_and_set_bit 1.06% perf [kernel.vmlinux] [k] _raw_spin_lock 1.03% kworker/u16: [kernel.vmlinux] [k] update_devfreq_passive 0.83% kworker/u16: [kernel.vmlinux] [k] _raw_spin_unlock_irq 0.80% kworker/:1 [kernel.vmlinux] [k] memchr_inv 0.79% rs:main Q:Reg [kernel.vmlinux] [k] balance_dirty_pages_ratelimited 0.79% rs:main Q:Reg rsyslogd [.] 0x0002c8ae 0.70% sleep [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore 0.68% systemd-journal systemd-journald [.] 0x00015f1c 0.61% rs:main Q:Reg [kernel.vmlinux] [k] kmap_atomic 0.54% systemd-journal systemd-journald [.] 0x0002aeac External Links You can find more on following links. From: - ODROID Wiki Permanent link: Last update: 2017/11/21 08:28 Printed on 2017/12/07 21:49

Profiling: Understand Your Application

Profiling: Understand Your Application Profiling: Understand Your Application Michal Merta michal.merta@vsb.cz 1st of March 2018 Agenda Hardware events based sampling Some fundamental bottlenecks Overview of profiling tools perf tools Intel

More information

Kernel perf tool user guide

Kernel perf tool user guide Kernel perf tool user guide 2017-10-16 Reversion Record Date Rev Change Description Author 2017-10-16 V0.1 Inital Zhang Yongchang 1 / 10 catalog 1 PURPOSE...4 2 TERMINOLOGY...4 3 ENVIRONMENT...4 3.1 HARDWARE

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 2

ECE 571 Advanced Microprocessor-Based Design Lecture 2 ECE 571 Advanced Microprocessor-Based Design Lecture 2 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 4 September 2014 Announcements HW#1 delayed until Tuesday 1 Hardware Performance

More information

ECE 471 Embedded Systems Lecture 23

ECE 471 Embedded Systems Lecture 23 ECE 471 Embedded Systems Lecture 23 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 December 2015 Don t forget projects Announcements HW8, HW9, and HW10 grades were sent out.

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 2

ECE 571 Advanced Microprocessor-Based Design Lecture 2 ECE 571 Advanced Microprocessor-Based Design Lecture 2 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 January 2016 Announcements HW#1 will be posted tomorrow I am handing out

More information

Rover Documentation Tracing with Perf, Conversion to CTF, and analysis with TraceCompass

Rover Documentation Tracing with Perf, Conversion to CTF, and analysis with TraceCompass Rover Documentation Tracing with Perf, Conversion to CTF, and analysis with TraceCompass Version Implementation Supervision & revision June 7, 207 Mustafa O zceliko rs Robert Ho ttger mozcelikors@gmail.com

More information

ΕΛΠ 605: Προχωρηµένη Αρχιτεκτονική Υπολογιστών. Εργαστήριο Αρ. 4. Linux Monitoring Utilities (perf,top,mpstat ps, free) and gdb dissasembler, gnuplot

ΕΛΠ 605: Προχωρηµένη Αρχιτεκτονική Υπολογιστών. Εργαστήριο Αρ. 4. Linux Monitoring Utilities (perf,top,mpstat ps, free) and gdb dissasembler, gnuplot ΕΛΠ 605: Προχωρηµένη Αρχιτεκτονική Υπολογιστών Εργαστήριο Αρ. 4 Linux Monitoring Utilities (perf,top,mpstat ps, free) and gdb dissasembler, gnuplot Lecturer: Zacharias Hadjilambrou Σελ. 1 Realtime monitoring

More information

System administration

System administration System administration Packages and probes Douglas Temple duggles@netsoc.tcd.ie For DU Internet Society [Netsoc] 5 th December, 2016 Tonight s outline Package managers for RHEL/Debian What to do with multiple

More information

Tracing Lustre. New approach to debugging. ORNL is managed by UT-Battelle for the US Department of Energy

Tracing Lustre. New approach to debugging. ORNL is managed by UT-Battelle for the US Department of Energy Tracing Lustre New approach to debugging ORNL is managed by UT-Battelle for the US Department of Energy Current Lustre debugging tools Utility lctl handles profiling developed long before standard kernel

More information

ECE 471 Embedded Systems Lecture 23

ECE 471 Embedded Systems Lecture 23 ECE 471 Embedded Systems Lecture 23 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 2 December 2014 Announcements Project 1 HW9 Example of disatrous code. Why write good code?

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 10

ECE 571 Advanced Microprocessor-Based Design Lecture 10 ECE 571 Advanced Microprocessor-Based Design Lecture 10 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 22 February 2018 Announcements HW#5 will be posted, caches Midterm: Thursday

More information

Linux Strace tool user guide

Linux Strace tool user guide Linux Strace tool user guide 2017-10-13 Reversion Record Date Rev Change Description Author 2017-10-13 V0.1 Initial Zhang Yongchang 1 / 9 catalog 1 PURPOSE...4 2 TERMINOLOGY...4 3 ENVIRONMENT...4 3.1 HARDWARE

More information

EE382M 15: Assignment 2

EE382M 15: Assignment 2 EE382M 15: Assignment 2 Professor: Lizy K. John TA: Jee Ho Ryoo Department of Electrical and Computer Engineering University of Texas, Austin Due: 11:59PM September 28, 2014 1. Introduction The goal of

More information

MemGuard on Raspberry Pi 3

MemGuard on Raspberry Pi 3 EECS 750 Mini Project #1 MemGuard on Raspberry Pi 3 In this mini-project, you will first learn how to build your own kernel on raspberry pi3. You then will learn to compile and use an out-of-source-tree

More information

Linux perf. for Qt developers

Linux perf. for Qt developers Linux perf for Qt developers Milian Wolff / KDAB Agenda Setup Benchmarking Profiling Tracing Scripting Setup Hardware Linux Kernel Prerequisites Building User-space perf Cross-compiling Permissions Hardware

More information

Evaluating Performance Via Profiling

Evaluating Performance Via Profiling Performance Engineering of Software Systems September 21, 2010 Massachusetts Institute of Technology 6.172 Professors Saman Amarasinghe and Charles E. Leiserson Handout 6 Profiling Project 2-1 Evaluating

More information

RALPH BÖHME, SERNET, SAMBA TEAM UNDERSTANDING AND IMPROVING SAMBA FILESERVER PERFORMANCE HOW I FELL IN LOVE WITH SYSTEMTAP AND PERF

RALPH BÖHME, SERNET, SAMBA TEAM UNDERSTANDING AND IMPROVING SAMBA FILESERVER PERFORMANCE HOW I FELL IN LOVE WITH SYSTEMTAP AND PERF UNDERSTANDING AND IMPROVING HOW I FELL IN LOVE WITH SYSTEMTAP AND PERF 2 AGENDA Disclaimer: focus on userspace, not kernel, mostly Linux Linux tracing history tour de force perf Systemtap Samba fileserver

More information

Processors, Performance, and Profiling

Processors, Performance, and Profiling Processors, Performance, and Profiling Architecture 101: 5-Stage Pipeline Fetch Decode Execute Memory Write-Back Registers PC FP ALU Memory Architecture 101 1. Fetch instruction from memory. 2. Decode

More information

Perf with the Linux Kernel. Copyright Kevin Dankwardt

Perf with the Linux Kernel. Copyright Kevin Dankwardt Perf with the Linux Kernel perf commands annotate annotate source code with profile info kmem kernel memory profiling kvm profile guests list list kinds of events lock analyze lock events record save profile

More information

Square Pegs in Round holes. Paweł Moll

Square Pegs in Round holes. Paweł Moll Square Pegs in Round holes or or System System Level Level Performance Performance Data Data and and perf perf Paweł Moll 1 The plan Problem definition s Systems perf and non-s Examples

More information

HPC Lab. Session 4: Profiler. Sebastian Rettenberger, Chaulio Ferreira, Michael Bader. November 9, 2015

HPC Lab. Session 4: Profiler. Sebastian Rettenberger, Chaulio Ferreira, Michael Bader. November 9, 2015 HPC Lab Session 4: Profiler Sebastian Rettenberger, Chaulio Ferreira, Michael Bader November 9, 2015 Session 4: Profiler, November 9, 2015 1 Profiler Profiling allows you to learn where your program spent

More information

Quality in the Data Center: Data Collection and Analysis

Quality in the Data Center: Data Collection and Analysis Quality in the Data Center: Data Collection and Analysis Kingsum Chow, Chief Scientist Alibaba Systems Software Hardware Co-Optimization PNSQC 2017.10.07 3:45pm-5:20pm Acknowledged: Chengdong Li and Wanyi

More information

Efficient and Large Scale Program Flow Tracing in Linux. Alexander Shishkin, Intel

Efficient and Large Scale Program Flow Tracing in Linux. Alexander Shishkin, Intel Efficient and Large Scale Program Flow Tracing in Linux Alexander Shishkin, Intel 16.09.2013 Overview Program flow tracing - What is it? - What is it good for? Intel Processor Trace - Features / capabilities

More information

Xenoprof overview & Networking Performance Analysis

Xenoprof overview & Networking Performance Analysis Xenoprof overview & Networking Performance Analysis J. Renato Santos G. (John) Janakiraman Yoshio Turner Aravind Menon HP Labs Xen Summit January 17-18, 2006 2003 Hewlett-Packard Development Company, L.P.

More information

Final Step #7. Memory mapping For Sunday 15/05 23h59

Final Step #7. Memory mapping For Sunday 15/05 23h59 Final Step #7 Memory mapping For Sunday 15/05 23h59 Remove the packet content print in the rx_handler rx_handler shall not print the first X bytes of the packet anymore nor any per-packet message This

More information

Jackson Marusarz Intel Corporation

Jackson Marusarz Intel Corporation Jackson Marusarz Intel Corporation Intel VTune Amplifier Quick Introduction Get the Data You Need Hotspot (Statistical call tree), Call counts (Statistical) Thread Profiling Concurrency and Lock & Waits

More information

Use Dynamic Analysis Tools on Linux

Use Dynamic Analysis Tools on Linux Use Dynamic Analysis Tools on Linux FTF-SDS-F0407 Gene Fortanely Freescale Software Engineer Catalin Udma A P R. 2 0 1 4 Software Engineer, Digital Networking TM External Use Session Introduction This

More information

Ftrace Profiling. Presenter: Steven Rostedt Red Hat

Ftrace Profiling. Presenter: Steven Rostedt Red Hat Ftrace Profiling Presenter: Steven Rostedt rostedt@goodmis.org Red Hat What do you want to profile? Application Cache misses Memory locality Page faults Finding bad algorithms O(n^2) CPU cycles I/O usage

More information

OpenCL Implementation and Performance Verification on R-Car H3/AGL

OpenCL Implementation and Performance Verification on R-Car H3/AGL 2018 NTT DATA MSE Corporation OpenCL Implementation and Performance Verification on R-Car H3/AGL May 25th, 2018 NTT DATA MSE Corporation Yasumitsu Takahashi Agenda Introduction to our Activities System

More information

Fosdem perf status on ARM and ARM64

Fosdem perf status on ARM and ARM64 Fosdem 2015 perf status on ARM and ARM64 jean.pihet@newoldbits.com 1 Contents Introduction Scope of the presentation Supported tools Call stack unwinding General Methods Corner cases ARM and ARM64 support

More information

perf scripts jiri olsa PERF SCRIPTS JIRI OLSA

perf scripts jiri olsa PERF SCRIPTS JIRI OLSA perf scripts jiri olsa 1 HI basics perf in python post process scripts 2 COUNTING perf stat start CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions' WORKLOAD Performance counter stats for 'find..':

More information

Linux Kernel on RISC-V: Where do we stand?

Linux Kernel on RISC-V: Where do we stand? Linux Kernel on RISC-V: Where do we stand? Atish Patra, Principal R&D Engineer Damien Le Moal, Director, System Software Group 7/19/2018 Overview Software ecosystem status overview Development toolchain

More information

HOW I LEARNED TO LOVE PERF AND SYSTEMTAP

HOW I LEARNED TO LOVE PERF AND SYSTEMTAP RALPH BÖHME / SAMBA TEAM SAMBA FILESERVER PERFORMANCE HOW I LEARNED TO LOVE PERF AND SYSTEMTAP AGENDA 1. Introduction: understanding Samba fileserver performance 1.1.Case study: cp 10k 10 KB files 2. Performance

More information

Host-Assisted Virtual Machine Tracing and Analysis

Host-Assisted Virtual Machine Tracing and Analysis Host-Assisted Virtual Machine Tracing and Analysis Abderrahmane Benbachir Michel Dagenais Dec 7, 2017 École Polytechnique de Montréal Laboratoire DORSAL Agenda Introduction Hypertracing Hypercall Boot-up

More information

Yocto Project components

Yocto Project components Lecture 3 3 Yocto Project components 25 octombrie 2016 Exam questions 1. Please write al least four of the described components of a GNU toolchain 2. List the components which define a Linux distribution

More information

Practical Verification for Edge AI use and Effort for Functional Improvement

Practical Verification for Edge AI use and Effort for Functional Improvement Practical Verification for Edge AI use and Effort for Functional Improvement June 20st, 2018 Yasumitsu Takahashi NTT DATA MSE Corporation 2017 NTT DATA MSE Corporation Who am I? NTT DATA MSE Corporation

More information

Simulating Multi-Core RISC-V Systems in gem5

Simulating Multi-Core RISC-V Systems in gem5 Simulating Multi-Core RISC-V Systems in gem5 Tuan Ta, Lin Cheng, and Christopher Batten School of Electrical and Computer Engineering Cornell University 2nd Workshop on Computer Architecture Research with

More information

Performance Profiling

Performance Profiling Performance Profiling Minsoo Ryu Real-Time Computing and Communications Lab. Hanyang University msryu@hanyang.ac.kr Outline History Understanding Profiling Understanding Performance Understanding Performance

More information

top - 14:43:26 up 25 days, 3:46, 50 users, load average: 0.04, 0.05, 0.01 Tasks: 1326 total, 1 running, 1319 sleeping, 2 stopped, 4 zombie Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si,

More information

When the OS gets in the way

When the OS gets in the way When the OS gets in the way (and what you can do about it) Mark Price @epickrram LMAX Exchange Linux When the OS gets in the way (and what you can do about it) Mark Price @epickrram LMAX Exchange It s

More information

CS 310: Memory Hierarchy and B-Trees

CS 310: Memory Hierarchy and B-Trees CS 310: Memory Hierarchy and B-Trees Chris Kauffman Week 14-1 Matrix Sum Given an M by N matrix X, sum its elements M rows, N columns Sum R given X, M, N sum = 0 for i=0 to M-1{ for j=0 to N-1 { sum +=

More information

Breaking Kernel Address Space Layout Randomization (KASLR) with Intel TSX. Yeongjin Jang, Sangho Lee, and Taesoo Kim Georgia Institute of Technology

Breaking Kernel Address Space Layout Randomization (KASLR) with Intel TSX. Yeongjin Jang, Sangho Lee, and Taesoo Kim Georgia Institute of Technology Breaking Kernel Address Space Layout Randomization (KASLR) with Intel TSX Yeongjin Jang, Sangho Lee, and Taesoo Kim Georgia Institute of Technology Kernel Address Space Layout Randomization (KASLR) A statistical

More information

Writing high performance code. CS448h Nov. 3, 2015

Writing high performance code. CS448h Nov. 3, 2015 Writing high performance code CS448h Nov. 3, 2015 Overview Is it slow? Where is it slow? How slow is it? Why is it slow? deciding when to optimize identifying bottlenecks estimating potential reasons for

More information

Linux Perf Tools. Overview and Current Developments. Arnaldo Carvalho de Melo, Jiri Olsa. May 24, Red Hat Inc.

Linux Perf Tools. Overview and Current Developments. Arnaldo Carvalho de Melo, Jiri Olsa. May 24, Red Hat Inc. Overview and Current Developments Red Hat Inc. May 24, 2013 Overview Multiple events view Annotate GTK UI New perf mem tool Per socket/core aggregation Diff enhancements Group leader sampling DWARF unwind

More information

Cubieboard4 Linux Sdk Guide TF BOOT & TF WRITE EMMC. Website: Support:

Cubieboard4 Linux Sdk Guide TF BOOT & TF WRITE EMMC. Website:  Support: Cubieboard4 Linux Sdk Guide TF BOOT & TF WRITE EMMC Website:http://cubieboard.org/ Support: support@cubietech.com Version Author Modification Check V-0.1-20141226 A.K Init version V-1.0-20150113 A.K Release

More information

Chromium OS audio. CRAS audio server

Chromium OS audio. CRAS audio server Chromium OS audio CRAS audio server Why another audio server? low end hardware (1 core atom, or Tegra 2) optimize for one user (chrome) dynamic stream re-routing maintainability, code size, security Basic

More information

Raspberry Pi Network Boot

Raspberry Pi Network Boot Raspberry Pi Network Boot @Phenomer October 22, 2014 1 Raspberry Pi SD initrd 2 /srv/pxe ( ) /srv/pxe /srv/pxe/tftp - TFTP /srv/pxe/tftp/pxelinux.cfg - /srv/pxe/repo - /srv/pxe/initrd - initrd % sudo mkdir

More information

Intel VTune Amplifier XE

Intel VTune Amplifier XE Intel VTune Amplifier XE Vladimir Tsymbal Performance, Analysis and Threading Lab 1 Agenda Intel VTune Amplifier XE Overview Features Data collectors Analysis types Key Concepts Collecting performance

More information

F28HS Hardware-Software Interface: Systems Programming

F28HS Hardware-Software Interface: Systems Programming F28HS Hardware-Software Interface: Systems Programming Hans-Wolfgang Loidl School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh Semester 2 2017/18 0 No proprietary software has

More information

Profiling and Debugging Games on Mobile Platforms

Profiling and Debugging Games on Mobile Platforms Profiling and Debugging Games on Mobile Platforms Lorenzo Dal Col Senior Software Engineer, Graphics Tools Gamelab 2013, Barcelona 26 th June 2013 Agenda Introduction to Performance Analysis with ARM DS-5

More information

Zephyr Kernel Installation & Setup Manual

Zephyr Kernel Installation & Setup Manual Zephyr Kernel Installation & Setup Manual Zephyr kernel is a small footprint Single address space OS, i.e, it combines application specific code with a custom kernel to create a monolithic image that gets

More information

2

2 1 2 3 4 5 6 For more information, see http://www.intel.com/content/www/us/en/processors/core/core-processorfamily.html 7 8 The logic for identifying issues on Intel Microarchitecture Codename Ivy Bridge

More information

CS3210: Virtual memory. Taesoo Kim w/ minor updates K. Harrigan

CS3210: Virtual memory. Taesoo Kim w/ minor updates K. Harrigan 1 CS3210: Virtual memory Taesoo Kim w/ minor updates K. Harrigan 2 Administrivia Lab2? Lab3 is out! (Oct 4) Quiz #1. Lab1-3, Ch 0-3, Appendix A/B We will provide more information on Thursday (Oct 6) Time

More information

CS3210: Multiprocessors and Locking

CS3210: Multiprocessors and Locking CS3210: Multiprocessors and Locking Kyle Harrigan 1 / 33 Administrivia Lab 3 (Part A), due Feb 24 Lab 3 (Part B), due Mar 3 Drop Date approaching (Mar 15) Team Proposal (3-5 min/team) - Mar 7 2 / 33 Summary

More information

Accurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems

Accurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems Accurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems Matthew J. Walker*, Stephan Diestelhorst, Geoff V. Merrett* and Bashir M. Al-Hashimi* *University of Southampton Arm Ltd.

More information

Android. Separated Kernel build might break the Android build process. Toolchain

Android. Separated Kernel build might break the Android build process. Toolchain 2018/01/19 06:43 1/15 Android Android How to download and compile the Android kernel for ODROID-XU3/XU4. You need use gcc version 4.6 to build the Exynos-5422 Android Kernel. If you have not built Android

More information

The TinyHPC Cluster. Mukarram Ahmad. Abstract

The TinyHPC Cluster. Mukarram Ahmad. Abstract The TinyHPC Cluster Mukarram Ahmad Abstract TinyHPC is a beowulf class high performance computing cluster with a minor physical footprint yet significant computational capacity. The system is of the shared

More information

An Energy-Efficient Asymmetric Multi-Processor for HPC Virtualization

An Energy-Efficient Asymmetric Multi-Processor for HPC Virtualization An Energy-Efficient Asymmetric Multi-Processor for HP Virtualization hung Lee and Peter Strazdins*, omputer Systems Group, Research School of omputer Science, The Australian National University (slides

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 19 Advanced Processors III 2006-11-2 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ 1 Last

More information

Mental models for modern program tuning

Mental models for modern program tuning Mental models for modern program tuning Andi Kleen Intel Corporation Jun 2016 How can we see program performance? VS High level Important to get the common ants fast Army of ants Preliminary optimization

More information

CS333 Project 1 Test Report Your Name Here

CS333 Project 1 Test Report Your Name Here To obtain the L A TEX source for this document, change the file extension to.tex in the url. Testing Aside: Each student will need to provide their own screen shots or other test output as well as the

More information

Virtual memory why? Virtual memory parameters Compared to first-level cache Parameter First-level cache Virtual memory. Virtual memory concepts

Virtual memory why? Virtual memory parameters Compared to first-level cache Parameter First-level cache Virtual memory. Virtual memory concepts Lecture 16 Virtual memory why? Virtual memory: Virtual memory concepts (5.10) Protection (5.11) The memory hierarchy of Alpha 21064 (5.13) Virtual address space proc 0? s space proc 1 Physical memory Virtual

More information

Enhancing PAPI with Low-Overhead rdpmc Reads

Enhancing PAPI with Low-Overhead rdpmc Reads Enhancing PAPI with Low-Overhead rdpmc Reads Yan Liu and Vince Weaver {yan.liu,vincent.weaver}@maine.edu University of Maine ESPT Workshop 2017 12 November 2017 PAPI Background PAPI, the Performance API

More information

Dongjun Shin Samsung Electronics

Dongjun Shin Samsung Electronics 2014.10.31. Dongjun Shin Samsung Electronics Contents 2 Background Understanding CPU behavior Experiments Improvement idea Revisiting Linux I/O stack Conclusion Background Definition 3 CPU bound A computer

More information

Performance Tuning VTune Performance Analyzer

Performance Tuning VTune Performance Analyzer Performance Tuning VTune Performance Analyzer Paul Petersen, Intel Sept 9, 2005 Copyright 2005 Intel Corporation Performance Tuning Overview Methodology Benchmarking Timing VTune Counter Monitor Call Graph

More information

Baking RDKit on a Pi. - Tips and gotchas. Jan Holst Jensen CEO, Biochemfusion

Baking RDKit on a Pi. - Tips and gotchas. Jan Holst Jensen CEO, Biochemfusion Baking RDKit on a Pi - Tips and gotchas Jan Holst Jensen CEO, Biochemfusion jan@biochemfusion.com RDKit UGM 2012 1 Raspberry Pi Image from http://www.raspberrypi.org/faqs An ARM-based, networked, credit-card

More information

IVI Fast boot approach

IVI Fast boot approach IVI Fast boot approach 07/13/2016 Yuichi Kusakabe SS Engineering Group Fujitsu TEN LIMITED 1 About Myself Yuichi Kusakabe (Fujitsu TEN LIMITED) Software Engineer of IVI about 10 years (for 16-bit and 32-bit

More information

Real-Time Cache Management for Multi-Core Virtualization

Real-Time Cache Management for Multi-Core Virtualization Real-Time Cache Management for Multi-Core Virtualization Hyoseung Kim 1,2 Raj Rajkumar 2 1 University of Riverside, California 2 Carnegie Mellon University Benefits of Multi-Core Processors Consolidation

More information

Testing the Performance Impact of the Exact Match Cache

Testing the Performance Impact of the Exact Match Cache Testing the Performance Impact of the Exact Match Cache Now with Signature Match Cache Comparison! Andrew Theurer - Sr. Principal Software Engineer December 2018 Why Does the Cache Matter? DPDK PMD in

More information

SNMP MIBs and Traps Supported

SNMP MIBs and Traps Supported This section describes the MIBs available on your system. When you access your MIB data you will expose additional MIBs not listed in this section. The additional MIBs you expose through the process are

More information

RAS Enhancement Activities for Mission-Critical Linux Systems

RAS Enhancement Activities for Mission-Critical Linux Systems RAS Enhancement Activities for MissionCritical Linux Systems Hitachi Ltd. Yoshihiro YUNOMAE 01 MissionCritical Systems We apply Linux to missioncritical systems. Banking systems/carrier backend systems/train

More information

Transparent Hugepage Support

Transparent Hugepage Support Transparent Hugepage Support Red Hat Inc. Andrea Arcangeli aarcange at redhat.com KVM Forum 2010 Boston Copyright 2010 Red Hat Inc. 9 Aug 2010 Benefit of hugepages Enlarge TLB size TLB is separate for

More information

Simulation-Based Tracing and Profiling for System Software Development

Simulation-Based Tracing and Profiling for System Software Development Simulation-Based Tracing and Profiling for System Software Development Anselm Busse, Reinhardt Karnapke, and Helge Parzyjegla SYSTOR 2017 Haifa 2017-05-22 Motivation Tracing and profiling is crucial to

More information

CROWDCOIN MASTERNODE SETUP COLD WALLET ON WINDOWS WITH LINUX VPS

CROWDCOIN MASTERNODE SETUP COLD WALLET ON WINDOWS WITH LINUX VPS CROWDCOIN MASTERNODE SETUP COLD WALLET ON WINDOWS WITH LINUX VPS This tutorial shows the steps required to setup your Crowdcoin Masternode on a Linux server and run your wallet on a Windows operating system

More information

Revolutionizing the Datacenter. Join the Conversation #OpenPOWERSummit

Revolutionizing the Datacenter. Join the Conversation #OpenPOWERSummit Programming On-Chip Components To Retrieve Sensor Data. Shilpasri G Bhat Linux Kernel Developer, IBM Linux Technology Center IBM India Systems and Technology Labs Revolutionizing

More information

Performance Counters and Tools OpenPOWER Tutorial, SC17, Denver

Performance Counters and Tools OpenPOWER Tutorial, SC17, Denver Performance Counters and Tools OpenPOWER Tutorial, SC17, Denver Andreas Herten, Forschungszentrum Jülich, 13 November 2017 Handout Version Outline Goals of this session Get to know Performance Counters

More information

P6: Trial Build of a ROM Nikhil George. 1. Introduction. Overview of the build task. Cite the build/ wiki articles you read.

P6: Trial Build of a ROM Nikhil George. 1. Introduction. Overview of the build task. Cite the build/ wiki articles you read. P6: Trial Build of a ROM Nikhil George 1. Introduction. Overview of the build task. Cite the build/ wiki articles you read. Installation of required packages sudo apt-get install git gnupg flex bison gperf

More information

Tracing embedded heterogeneous systems

Tracing embedded heterogeneous systems Tracing embedded heterogeneous systems P R O G R E S S R E P O R T M E E T I N G, D E C E M B E R 2015 T H O M A S B E R T A U L D D I R E C T E D B Y M I C H E L D A G E N A I S December 10th 2015 TRACING

More information

Optimizing Datacenter Power with Memory System Levers for Guaranteed Quality-of-Service

Optimizing Datacenter Power with Memory System Levers for Guaranteed Quality-of-Service Optimizing Datacenter Power with Memory System Levers for Guaranteed Quality-of-Service * Kshitij Sudan* Sadagopan Srinivasan Rajeev Balasubramonian* Ravi Iyer Executive Summary Goal: Co-schedule N applications

More information

Security-Aware Processor Architecture Design. CS 6501 Fall 2018 Ashish Venkat

Security-Aware Processor Architecture Design. CS 6501 Fall 2018 Ashish Venkat Security-Aware Processor Architecture Design CS 6501 Fall 2018 Ashish Venkat Agenda Common Processor Performance Metrics Identifying and Analyzing Bottlenecks Benchmarking and Workload Selection Performance

More information

Linux-Ready RV-GC AndesCore with Architecture Extensions Charlie Su, Ph.D. CTO and SVP 2018/05/09

Linux-Ready RV-GC AndesCore with Architecture Extensions Charlie Su, Ph.D. CTO and SVP 2018/05/09 Linux-Ready RV-GC AndesCore with Architecture Extensions Charlie Su, Ph.D. CTO and SVP 2018/05/09 WWW.ANDESTECH.COM Introduction to Andes Asia-based IPO Company 13 years in the pure-play CPU IP business

More information

Mid Term from Feb-2005 to Nov 2012 CS604- Operating System

Mid Term from Feb-2005 to Nov 2012 CS604- Operating System Mid Term from Feb-2005 to Nov 2012 CS604- Operating System Latest Solved from Mid term Papers Resource Person Hina 1-The problem with priority scheduling algorithm is. Deadlock Starvation (Page# 84) Aging

More information

Module I: Measuring Program Performance

Module I: Measuring Program Performance Performance Programming: Theory, Practice and Case Studies Module I: Measuring Program Performance 9 Outline 10 Measuring methodology and guidelines Measurement tools Timing Tools Profiling Tools Process

More information

CS3210: Operating Systems

CS3210: Operating Systems CS3210: Operating Systems Lab 1 Tutorial 1 / 39 Lab session general structure Session A - overview presentation (30 min) Concepts, tutorial, and demo Session B - group activity (30 min) Each student will

More information

Intel profiling tools and roofline model. Dr. Luigi Iapichino

Intel profiling tools and roofline model. Dr. Luigi Iapichino Intel profiling tools and roofline model Dr. Luigi Iapichino luigi.iapichino@lrz.de Which tool do I use in my project? A roadmap to optimization (and to the next hour) We will focus on tools developed

More information

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014 Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline

More information

Operating System. Hanyang University. Hyunmin Yoon Operating System Hanyang University

Operating System. Hanyang University. Hyunmin Yoon Operating System Hanyang University Hyunmin Yoon (fulcanelli86@gmail.com) 2 Linux development ENVIRONMENT 2 3 References ubuntu documentation Kernel/Compile https://help.ubuntu.com/community/kernel/compile 3 4 Tools $ software-properties-gtk

More information

DEVELOPMENT GUIDE VAB-630. Linux BSP v

DEVELOPMENT GUIDE VAB-630. Linux BSP v DEVELOPMENT GUIDE VAB-630 Linux BSP v1.0.1 100-09182017-114400 Copyright Copyright 2017 VIA Technologies Incorporated. All rights reserved. No part of this document may be reproduced, transmitted, transcribed,

More information

Exercise Session 5. Data Processing on Modern Hardware L Fall Semester Cagri Balkesen

Exercise Session 5. Data Processing on Modern Hardware L Fall Semester Cagri Balkesen Cagri Balkesen Data Processing on Modern Hardware Exercises Fall 2012 1 Exercise Session 5 Data Processing on Modern Hardware 263-3502-00L Fall Semester 2012 Cagri Balkesen cagri.balkesen@inf.ethz.ch Department

More information

Potentials and Limitations for Energy Efficiency Auto-Tuning

Potentials and Limitations for Energy Efficiency Auto-Tuning Center for Information Services and High Performance Computing (ZIH) Potentials and Limitations for Energy Efficiency Auto-Tuning Parco Symposium Application Autotuning for HPC (Architectures) Robert Schöne

More information

Question 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. The memory is word addressable The size of the

Question 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. The memory is word addressable The size of the Question 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. he memory is word addressable he size of the cache is 8 blocks; each block is 4 words (32 words cache).

More information

Apache Spark 2.0 Performance Improvements Investigated With Flame Graphs. Luca Canali CERN, Geneva (CH)

Apache Spark 2.0 Performance Improvements Investigated With Flame Graphs. Luca Canali CERN, Geneva (CH) Apache Spark 2.0 Performance Improvements Investigated With Flame Graphs Luca Canali CERN, Geneva (CH) Speaker Intro Database engineer and team lead at CERN IT Hadoop and Spark service Database services

More information

Hidden Linux Metrics with ebpf_exporter. Ivan Babrou

Hidden Linux Metrics with ebpf_exporter. Ivan Babrou Hidden Linux Metrics with ebpf_exporter Ivan Babrou @ibobrik Performance team @Cloudflare What does Cloudflare do CDN Moving content physically closer to visitors with our CDN. Intelligent caching Unlimited

More information

Ftrace Kernel Hooks: More than just tracing. Presenter: Steven Rostedt Red Hat

Ftrace Kernel Hooks: More than just tracing. Presenter: Steven Rostedt Red Hat Ftrace Kernel Hooks: More than just tracing Presenter: Steven Rostedt rostedt@goodmis.org Red Hat Ftrace Function Hooks Function Tracer Function Graph Tracer Function Profiler Stack Tracer Kprobes Uprobes

More information

Android Debugging and Performance Analysis

Android Debugging and Performance Analysis Hands On Exercises for Android Debugging and Performance Analysis v. 2018.10 WARNING: The order of the exercises does not always follow the same order of the explanations in the slides. When carrying out

More information

Solving Difficult Memory Performance Problems

Solving Difficult Memory Performance Problems Solving Difficult Memory Performance Problems Jiri Olsa Joe Mario January 27, 2017 Red Hat Engineering Red Hat Performance Engineering Agenda Overview: Where does my program get its memory from? Types

More information

WHAT YOU WILL NEED FOR THIS GUIDE:

WHAT YOU WILL NEED FOR THIS GUIDE: WHAT YOU WILL NEED FOR THIS GUIDE: 1. Local computer with Windows or Linux. 2. Remote server VPS [This guide uses digitaloceans.com but any provider will work] 3. PuTTY to configure and setup the VPS 4.

More information

Measuring the impacts of the Preempt-RT patch

Measuring the impacts of the Preempt-RT patch Measuring the impacts of the Preempt-RT patch maxime.chevallier@smile.fr October 25, 2017 RT Linux projects Simulation platform : bi-xeon, lots ot RAM 200µs wakeup latency, networking Test bench : Intel

More information

Lab1 tutorial CS https://tc.gtisc.gatech.edu/cs3210/2016/lab/lab1.html

Lab1 tutorial CS https://tc.gtisc.gatech.edu/cs3210/2016/lab/lab1.html Lab1 tutorial CS 3210 https://tc.gtisc.gatech.edu/cs3210/2016/lab/lab1.html Lab session general structure Session A - overview presentation (30 min) - About concept, tutorial and demo Session B - group

More information

Evaluation of Real-time Performance in Embedded Linux. Hiraku Toyooka, Hitachi. LinuxCon Europe Hitachi, Ltd All rights reserved.

Evaluation of Real-time Performance in Embedded Linux. Hiraku Toyooka, Hitachi. LinuxCon Europe Hitachi, Ltd All rights reserved. Evaluation of Real-time Performance in Embedded Linux LinuxCon Europe 2014 Hiraku Toyooka, Hitachi 1 whoami Hiraku Toyooka Software engineer at Hitachi " Working on operating systems Linux (mainly) for

More information