PERF performance-counter for Odroid XU3/XU4
|
|
- Ashlyn Daniels
- 5 years ago
- Views:
Transcription
1 2017/12/07 21:49 1/6 PERF performance-counter for Odroid XU3/XU4 PERF performance-counter for Odroid XU3/XU4 Linux hardware performance measurement using counters, trace-points, software performance counters, and dynamic probes. Perf as one of the two most commonly used performance counter profiling tools on Linux. Perf basically use to analyses the core internal bottleneck right up to the driver level. Linux support many profiling tools like perf, trace-cmd, blktrace, strace and oprofile. Performance counters are CPU hardware registers that count hardware events such as instructions executed, cache-misses suffered, or branches mispredicted. They form a basis for profiling applications to trace dynamic control flow and identify hotspots. perf provides rich generalized abstractions over hardware specific capabilities. Among others, it provides per task, per CPU and perworkload counters, sampling on top of these and source code event annotation. Using perf we could monitor the performance of the device driver. Build Pref tool In order to build perf you need to install following packages. sudo apt-get install flex bison libdw-dev libnewt-dev binutils-dev libauditdev libgtk2.0-dev binutils-dev libssl-dev python-dev systemtap-sdt-dev libiberty-dev libperl-dev liblzma-dev libpython-dev libunwind-* asciidoc xmlto Check out the kernel source code to build the perf executable $ git clone --depth 1 -b odroidxu y $ cd linux/tools/perf $ make $ sudo cp perf /usr/bin/perf Note: perf register pmu is integrated in the kernel, so just need to build the perf binary to test. Check if Kernel supports Perf feature or not (Kernel 4.14 or higher is required) root@odroid:~# dmesg grep PMU [ ] EXYNOS5420 PMU initialized [ ] hw perfevents: enabled with armv7_cortex_a7 PMU driver, 5 counters available [ ] hw perfevents: enabled with armv7_cortex_a15 PMU driver, 7 counters available ODROID Wiki -
2 Last update: odroid-xu4:application_note:software:perf_perfomace_counter_for_odroid_xu3_xu /11/21 08:28 Check a list of perf events we can monitor root@odroid:~# perf list List of pre-defined events (to be used in -e): branch-instructions OR branches branch-misses bus-cycles cache-misses cache-references cpu-cycles OR cycles instructions alignment-faults bpf-output context-switches OR cs cpu-clock cpu-migrations OR migrations dummy emulation-faults major-faults minor-faults page-faults OR faults task-clock L1-dcache-load-misses L1-dcache-loads L1-dcache-store-misses L1-dcache-stores L1-icache-load-misses L1-icache-loads LLC-load-misses LLC-loads LLC-store-misses LLC-stores branch-load-misses branch-loads dtlb-load-misses dtlb-store-misses itlb-load-misses List of pre-defined events (to be used in -e): branch-instructions OR branches branch-misses bus-cycles Printed on 2017/12/07 21:49
3 2017/12/07 21:49 3/6 PERF performance-counter for Odroid XU3/XU4 cache-misses cache-references cpu-cycles OR cycles instructions alignment-faults bpf-output context-switches OR cs cpu-clock cpu-migrations OR migrations dummy emulation-faults major-faults minor-faults page-faults OR faults task-clock L1-dcache-load-misses L1-dcache-loads L1-dcache-store-misses L1-dcache-stores L1-icache-load-misses L1-icache-loads LLC-load-misses LLC-loads LLC-store-misses LLC-stores branch-load-misses branch-loads dtlb-load-misses dtlb-store-misses itlb-load-misses armv7_cortex_a15/br_immed_retired/ armv7_cortex_a15/br_mis_pred/ armv7_cortex_a15/br_pred/ armv7_cortex_a15/br_return_retired/ armv7_cortex_a15/bus_access/ armv7_cortex_a15/bus_cycles/ armv7_cortex_a15/cid_write_retired/ armv7_cortex_a15/cpu_cycles/ armv7_cortex_a15/exc_return/ armv7_cortex_a15/exc_taken/ armv7_cortex_a15/inst_retired/ armv7_cortex_a15/inst_spec/ armv7_cortex_a15/l1d_cache/ armv7_cortex_a15/l1d_cache_refill/ armv7_cortex_a15/l1d_cache_wb/ armv7_cortex_a15/l1d_tlb_refill/ armv7_cortex_a15/l1i_cache/ ODROID Wiki -
4 Last update: odroid-xu4:application_note:software:perf_perfomace_counter_for_odroid_xu3_xu /11/21 08:28 armv7_cortex_a15/l1i_cache_refill/ Perf Examples perf stat -B dd if=/dev/zero of=/dev/null count= records in records out bytes (512 MB, 488 MiB) copied, s, 609 MB/s Performance counter stats for 'dd if=/dev/zero of=/dev/null count= ': task-clock (msec) # CPUs utilized 1 context-switches # K/sec cpu-migrations # K/sec 42 page-faults # K/sec cycles # GHz instructions # 0.85 insn per cycle branches # M/sec branch-misses # 3.82% of all branches seconds time elapsed root@odroid:~/perf-examples# Note: Exynos5422 is big.little arch so we obtain the counter for each cpu. root@odroid:~/perf-examples# perf stat -B taskset -c 0 dd if=/dev/zero of=/dev/null count= records in records out bytes (512 MB, 488 MiB) copied, s, 310 MB/s Performance counter stats for 'taskset -c 0 dd if=/dev/zero of=/dev/null count= ': task-clock (msec) # CPUs utilized 7 context-switches # K/sec 1 cpu-migrations # K/sec 77 page-faults # K/sec cycles # GHz instructions # 0.25 insn per cycle branches # M/sec 9169 branch-misses # 9.83% of all branches seconds time elapsed root@odroid:~/perf-examples# perf stat -B taskset -c 4 dd if=/dev/zero Printed on 2017/12/07 21:49
5 2017/12/07 21:49 5/6 PERF performance-counter for Odroid XU3/XU4 of=/dev/null count= records in records out bytes (512 MB, 488 MiB) copied, s, 633 MB/s Performance counter stats for 'taskset -c 4 dd if=/dev/zero of=/dev/null count= ': task-clock (msec) # CPUs utilized 6 context-switches # K/sec 1 cpu-migrations # K/sec 77 page-faults # K/sec cycles # GHz instructions # 0.88 insn per cycle branches # M/sec branch-misses # 2.79% of all branches seconds time elapsed root@odroid:~/perf-examples# perf record/report perf record : perf record uses the cycles event as the sampling event. This is a generic hardware event that is mapped to a hardware-specific PMU event by the kernel. perf report: Samples collected by perf record are saved into a binary file called, by default, perf.data. The perf report command reads this file and generates a concise execution profile. By default, samples are sorted by functions with the most samples first. It is possible to customize the sorting order and therefore to view the data differently. root@odroid:~/perf-examples# perf record -a sleep 5 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote MB perf.data (289 samples) ] root@odroid:~/perf-examples# root@odroid:~/perf-examples# perf report Samples: 289 of event 'cycles:ppp', Event count (approx.): Overhead Command Shared Object Symbol 40.33% swapper [kernel.vmlinux] [k] arch_cpu_idle 7.23% swapper [kernel.vmlinux] [k] tick_nohz_idle_exit 5.40% swapper [kernel.vmlinux] [k] tick_nohz_idle_enter 3.83% swapper [kernel.vmlinux] [k] _raw_spin_unlock_irq 3.27% sleep [kernel.vmlinux] [k] filemap_map_pages 3.25% perf [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore 2.14% sleep [kernel.vmlinux] [k] page_remove_rmap 1.82% perf [kernel.vmlinux] [k] perf_event_ctx_lock_nested 1.78% swapper [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore 1.70% ksoftirqd/4 [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore 1.68% sleep libc-2.23.so [.] 0x ODROID Wiki -
6 Last update: odroid-xu4:application_note:software:perf_perfomace_counter_for_odroid_xu3_xu /11/21 08: % perf [kernel.vmlinux] [k] page_remove_rmap 1.61% perf [kernel.vmlinux] [k] remove_vma 1.51% kworker/u16: [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore 1.48% perf [kernel.vmlinux] [k] ext4_da_write_begin 1.44% kworker/u16: [kernel.vmlinux] [k] _find_opp_table_unlocked 1.35% swapper [kernel.vmlinux] [k] exception_text_end 1.33% perf [kernel.vmlinux] [k] alloc_set_pte 1.23% kworker/:1 [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore 1.22% perf [kernel.vmlinux] [k] _test_and_set_bit 1.06% perf [kernel.vmlinux] [k] _raw_spin_lock 1.03% kworker/u16: [kernel.vmlinux] [k] update_devfreq_passive 0.83% kworker/u16: [kernel.vmlinux] [k] _raw_spin_unlock_irq 0.80% kworker/:1 [kernel.vmlinux] [k] memchr_inv 0.79% rs:main Q:Reg [kernel.vmlinux] [k] balance_dirty_pages_ratelimited 0.79% rs:main Q:Reg rsyslogd [.] 0x0002c8ae 0.70% sleep [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore 0.68% systemd-journal systemd-journald [.] 0x00015f1c 0.61% rs:main Q:Reg [kernel.vmlinux] [k] kmap_atomic 0.54% systemd-journal systemd-journald [.] 0x0002aeac External Links You can find more on following links. From: - ODROID Wiki Permanent link: Last update: 2017/11/21 08:28 Printed on 2017/12/07 21:49
Profiling: Understand Your Application
Profiling: Understand Your Application Michal Merta michal.merta@vsb.cz 1st of March 2018 Agenda Hardware events based sampling Some fundamental bottlenecks Overview of profiling tools perf tools Intel
More informationKernel perf tool user guide
Kernel perf tool user guide 2017-10-16 Reversion Record Date Rev Change Description Author 2017-10-16 V0.1 Inital Zhang Yongchang 1 / 10 catalog 1 PURPOSE...4 2 TERMINOLOGY...4 3 ENVIRONMENT...4 3.1 HARDWARE
More informationECE 571 Advanced Microprocessor-Based Design Lecture 2
ECE 571 Advanced Microprocessor-Based Design Lecture 2 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 4 September 2014 Announcements HW#1 delayed until Tuesday 1 Hardware Performance
More informationECE 471 Embedded Systems Lecture 23
ECE 471 Embedded Systems Lecture 23 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 December 2015 Don t forget projects Announcements HW8, HW9, and HW10 grades were sent out.
More informationECE 571 Advanced Microprocessor-Based Design Lecture 2
ECE 571 Advanced Microprocessor-Based Design Lecture 2 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 January 2016 Announcements HW#1 will be posted tomorrow I am handing out
More informationRover Documentation Tracing with Perf, Conversion to CTF, and analysis with TraceCompass
Rover Documentation Tracing with Perf, Conversion to CTF, and analysis with TraceCompass Version Implementation Supervision & revision June 7, 207 Mustafa O zceliko rs Robert Ho ttger mozcelikors@gmail.com
More informationΕΛΠ 605: Προχωρηµένη Αρχιτεκτονική Υπολογιστών. Εργαστήριο Αρ. 4. Linux Monitoring Utilities (perf,top,mpstat ps, free) and gdb dissasembler, gnuplot
ΕΛΠ 605: Προχωρηµένη Αρχιτεκτονική Υπολογιστών Εργαστήριο Αρ. 4 Linux Monitoring Utilities (perf,top,mpstat ps, free) and gdb dissasembler, gnuplot Lecturer: Zacharias Hadjilambrou Σελ. 1 Realtime monitoring
More informationSystem administration
System administration Packages and probes Douglas Temple duggles@netsoc.tcd.ie For DU Internet Society [Netsoc] 5 th December, 2016 Tonight s outline Package managers for RHEL/Debian What to do with multiple
More informationTracing Lustre. New approach to debugging. ORNL is managed by UT-Battelle for the US Department of Energy
Tracing Lustre New approach to debugging ORNL is managed by UT-Battelle for the US Department of Energy Current Lustre debugging tools Utility lctl handles profiling developed long before standard kernel
More informationECE 471 Embedded Systems Lecture 23
ECE 471 Embedded Systems Lecture 23 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 2 December 2014 Announcements Project 1 HW9 Example of disatrous code. Why write good code?
More informationECE 571 Advanced Microprocessor-Based Design Lecture 10
ECE 571 Advanced Microprocessor-Based Design Lecture 10 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 22 February 2018 Announcements HW#5 will be posted, caches Midterm: Thursday
More informationLinux Strace tool user guide
Linux Strace tool user guide 2017-10-13 Reversion Record Date Rev Change Description Author 2017-10-13 V0.1 Initial Zhang Yongchang 1 / 9 catalog 1 PURPOSE...4 2 TERMINOLOGY...4 3 ENVIRONMENT...4 3.1 HARDWARE
More informationEE382M 15: Assignment 2
EE382M 15: Assignment 2 Professor: Lizy K. John TA: Jee Ho Ryoo Department of Electrical and Computer Engineering University of Texas, Austin Due: 11:59PM September 28, 2014 1. Introduction The goal of
More informationMemGuard on Raspberry Pi 3
EECS 750 Mini Project #1 MemGuard on Raspberry Pi 3 In this mini-project, you will first learn how to build your own kernel on raspberry pi3. You then will learn to compile and use an out-of-source-tree
More informationLinux perf. for Qt developers
Linux perf for Qt developers Milian Wolff / KDAB Agenda Setup Benchmarking Profiling Tracing Scripting Setup Hardware Linux Kernel Prerequisites Building User-space perf Cross-compiling Permissions Hardware
More informationEvaluating Performance Via Profiling
Performance Engineering of Software Systems September 21, 2010 Massachusetts Institute of Technology 6.172 Professors Saman Amarasinghe and Charles E. Leiserson Handout 6 Profiling Project 2-1 Evaluating
More informationRALPH BÖHME, SERNET, SAMBA TEAM UNDERSTANDING AND IMPROVING SAMBA FILESERVER PERFORMANCE HOW I FELL IN LOVE WITH SYSTEMTAP AND PERF
UNDERSTANDING AND IMPROVING HOW I FELL IN LOVE WITH SYSTEMTAP AND PERF 2 AGENDA Disclaimer: focus on userspace, not kernel, mostly Linux Linux tracing history tour de force perf Systemtap Samba fileserver
More informationProcessors, Performance, and Profiling
Processors, Performance, and Profiling Architecture 101: 5-Stage Pipeline Fetch Decode Execute Memory Write-Back Registers PC FP ALU Memory Architecture 101 1. Fetch instruction from memory. 2. Decode
More informationPerf with the Linux Kernel. Copyright Kevin Dankwardt
Perf with the Linux Kernel perf commands annotate annotate source code with profile info kmem kernel memory profiling kvm profile guests list list kinds of events lock analyze lock events record save profile
More informationSquare Pegs in Round holes. Paweł Moll
Square Pegs in Round holes or or System System Level Level Performance Performance Data Data and and perf perf Paweł Moll 1 The plan Problem definition s Systems perf and non-s Examples
More informationHPC Lab. Session 4: Profiler. Sebastian Rettenberger, Chaulio Ferreira, Michael Bader. November 9, 2015
HPC Lab Session 4: Profiler Sebastian Rettenberger, Chaulio Ferreira, Michael Bader November 9, 2015 Session 4: Profiler, November 9, 2015 1 Profiler Profiling allows you to learn where your program spent
More informationQuality in the Data Center: Data Collection and Analysis
Quality in the Data Center: Data Collection and Analysis Kingsum Chow, Chief Scientist Alibaba Systems Software Hardware Co-Optimization PNSQC 2017.10.07 3:45pm-5:20pm Acknowledged: Chengdong Li and Wanyi
More informationEfficient and Large Scale Program Flow Tracing in Linux. Alexander Shishkin, Intel
Efficient and Large Scale Program Flow Tracing in Linux Alexander Shishkin, Intel 16.09.2013 Overview Program flow tracing - What is it? - What is it good for? Intel Processor Trace - Features / capabilities
More informationXenoprof overview & Networking Performance Analysis
Xenoprof overview & Networking Performance Analysis J. Renato Santos G. (John) Janakiraman Yoshio Turner Aravind Menon HP Labs Xen Summit January 17-18, 2006 2003 Hewlett-Packard Development Company, L.P.
More informationFinal Step #7. Memory mapping For Sunday 15/05 23h59
Final Step #7 Memory mapping For Sunday 15/05 23h59 Remove the packet content print in the rx_handler rx_handler shall not print the first X bytes of the packet anymore nor any per-packet message This
More informationJackson Marusarz Intel Corporation
Jackson Marusarz Intel Corporation Intel VTune Amplifier Quick Introduction Get the Data You Need Hotspot (Statistical call tree), Call counts (Statistical) Thread Profiling Concurrency and Lock & Waits
More informationUse Dynamic Analysis Tools on Linux
Use Dynamic Analysis Tools on Linux FTF-SDS-F0407 Gene Fortanely Freescale Software Engineer Catalin Udma A P R. 2 0 1 4 Software Engineer, Digital Networking TM External Use Session Introduction This
More informationFtrace Profiling. Presenter: Steven Rostedt Red Hat
Ftrace Profiling Presenter: Steven Rostedt rostedt@goodmis.org Red Hat What do you want to profile? Application Cache misses Memory locality Page faults Finding bad algorithms O(n^2) CPU cycles I/O usage
More informationOpenCL Implementation and Performance Verification on R-Car H3/AGL
2018 NTT DATA MSE Corporation OpenCL Implementation and Performance Verification on R-Car H3/AGL May 25th, 2018 NTT DATA MSE Corporation Yasumitsu Takahashi Agenda Introduction to our Activities System
More informationFosdem perf status on ARM and ARM64
Fosdem 2015 perf status on ARM and ARM64 jean.pihet@newoldbits.com 1 Contents Introduction Scope of the presentation Supported tools Call stack unwinding General Methods Corner cases ARM and ARM64 support
More informationperf scripts jiri olsa PERF SCRIPTS JIRI OLSA
perf scripts jiri olsa 1 HI basics perf in python post process scripts 2 COUNTING perf stat start CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions' WORKLOAD Performance counter stats for 'find..':
More informationLinux Kernel on RISC-V: Where do we stand?
Linux Kernel on RISC-V: Where do we stand? Atish Patra, Principal R&D Engineer Damien Le Moal, Director, System Software Group 7/19/2018 Overview Software ecosystem status overview Development toolchain
More informationHOW I LEARNED TO LOVE PERF AND SYSTEMTAP
RALPH BÖHME / SAMBA TEAM SAMBA FILESERVER PERFORMANCE HOW I LEARNED TO LOVE PERF AND SYSTEMTAP AGENDA 1. Introduction: understanding Samba fileserver performance 1.1.Case study: cp 10k 10 KB files 2. Performance
More informationHost-Assisted Virtual Machine Tracing and Analysis
Host-Assisted Virtual Machine Tracing and Analysis Abderrahmane Benbachir Michel Dagenais Dec 7, 2017 École Polytechnique de Montréal Laboratoire DORSAL Agenda Introduction Hypertracing Hypercall Boot-up
More informationYocto Project components
Lecture 3 3 Yocto Project components 25 octombrie 2016 Exam questions 1. Please write al least four of the described components of a GNU toolchain 2. List the components which define a Linux distribution
More informationPractical Verification for Edge AI use and Effort for Functional Improvement
Practical Verification for Edge AI use and Effort for Functional Improvement June 20st, 2018 Yasumitsu Takahashi NTT DATA MSE Corporation 2017 NTT DATA MSE Corporation Who am I? NTT DATA MSE Corporation
More informationSimulating Multi-Core RISC-V Systems in gem5
Simulating Multi-Core RISC-V Systems in gem5 Tuan Ta, Lin Cheng, and Christopher Batten School of Electrical and Computer Engineering Cornell University 2nd Workshop on Computer Architecture Research with
More informationPerformance Profiling
Performance Profiling Minsoo Ryu Real-Time Computing and Communications Lab. Hanyang University msryu@hanyang.ac.kr Outline History Understanding Profiling Understanding Performance Understanding Performance
More informationtop - 14:43:26 up 25 days, 3:46, 50 users, load average: 0.04, 0.05, 0.01 Tasks: 1326 total, 1 running, 1319 sleeping, 2 stopped, 4 zombie Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si,
More informationWhen the OS gets in the way
When the OS gets in the way (and what you can do about it) Mark Price @epickrram LMAX Exchange Linux When the OS gets in the way (and what you can do about it) Mark Price @epickrram LMAX Exchange It s
More informationCS 310: Memory Hierarchy and B-Trees
CS 310: Memory Hierarchy and B-Trees Chris Kauffman Week 14-1 Matrix Sum Given an M by N matrix X, sum its elements M rows, N columns Sum R given X, M, N sum = 0 for i=0 to M-1{ for j=0 to N-1 { sum +=
More informationBreaking Kernel Address Space Layout Randomization (KASLR) with Intel TSX. Yeongjin Jang, Sangho Lee, and Taesoo Kim Georgia Institute of Technology
Breaking Kernel Address Space Layout Randomization (KASLR) with Intel TSX Yeongjin Jang, Sangho Lee, and Taesoo Kim Georgia Institute of Technology Kernel Address Space Layout Randomization (KASLR) A statistical
More informationWriting high performance code. CS448h Nov. 3, 2015
Writing high performance code CS448h Nov. 3, 2015 Overview Is it slow? Where is it slow? How slow is it? Why is it slow? deciding when to optimize identifying bottlenecks estimating potential reasons for
More informationLinux Perf Tools. Overview and Current Developments. Arnaldo Carvalho de Melo, Jiri Olsa. May 24, Red Hat Inc.
Overview and Current Developments Red Hat Inc. May 24, 2013 Overview Multiple events view Annotate GTK UI New perf mem tool Per socket/core aggregation Diff enhancements Group leader sampling DWARF unwind
More informationCubieboard4 Linux Sdk Guide TF BOOT & TF WRITE EMMC. Website: Support:
Cubieboard4 Linux Sdk Guide TF BOOT & TF WRITE EMMC Website:http://cubieboard.org/ Support: support@cubietech.com Version Author Modification Check V-0.1-20141226 A.K Init version V-1.0-20150113 A.K Release
More informationChromium OS audio. CRAS audio server
Chromium OS audio CRAS audio server Why another audio server? low end hardware (1 core atom, or Tegra 2) optimize for one user (chrome) dynamic stream re-routing maintainability, code size, security Basic
More informationRaspberry Pi Network Boot
Raspberry Pi Network Boot @Phenomer October 22, 2014 1 Raspberry Pi SD initrd 2 /srv/pxe ( ) /srv/pxe /srv/pxe/tftp - TFTP /srv/pxe/tftp/pxelinux.cfg - /srv/pxe/repo - /srv/pxe/initrd - initrd % sudo mkdir
More informationIntel VTune Amplifier XE
Intel VTune Amplifier XE Vladimir Tsymbal Performance, Analysis and Threading Lab 1 Agenda Intel VTune Amplifier XE Overview Features Data collectors Analysis types Key Concepts Collecting performance
More informationF28HS Hardware-Software Interface: Systems Programming
F28HS Hardware-Software Interface: Systems Programming Hans-Wolfgang Loidl School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh Semester 2 2017/18 0 No proprietary software has
More informationProfiling and Debugging Games on Mobile Platforms
Profiling and Debugging Games on Mobile Platforms Lorenzo Dal Col Senior Software Engineer, Graphics Tools Gamelab 2013, Barcelona 26 th June 2013 Agenda Introduction to Performance Analysis with ARM DS-5
More informationZephyr Kernel Installation & Setup Manual
Zephyr Kernel Installation & Setup Manual Zephyr kernel is a small footprint Single address space OS, i.e, it combines application specific code with a custom kernel to create a monolithic image that gets
More information2
1 2 3 4 5 6 For more information, see http://www.intel.com/content/www/us/en/processors/core/core-processorfamily.html 7 8 The logic for identifying issues on Intel Microarchitecture Codename Ivy Bridge
More informationCS3210: Virtual memory. Taesoo Kim w/ minor updates K. Harrigan
1 CS3210: Virtual memory Taesoo Kim w/ minor updates K. Harrigan 2 Administrivia Lab2? Lab3 is out! (Oct 4) Quiz #1. Lab1-3, Ch 0-3, Appendix A/B We will provide more information on Thursday (Oct 6) Time
More informationCS3210: Multiprocessors and Locking
CS3210: Multiprocessors and Locking Kyle Harrigan 1 / 33 Administrivia Lab 3 (Part A), due Feb 24 Lab 3 (Part B), due Mar 3 Drop Date approaching (Mar 15) Team Proposal (3-5 min/team) - Mar 7 2 / 33 Summary
More informationAccurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems
Accurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems Matthew J. Walker*, Stephan Diestelhorst, Geoff V. Merrett* and Bashir M. Al-Hashimi* *University of Southampton Arm Ltd.
More informationAndroid. Separated Kernel build might break the Android build process. Toolchain
2018/01/19 06:43 1/15 Android Android How to download and compile the Android kernel for ODROID-XU3/XU4. You need use gcc version 4.6 to build the Exynos-5422 Android Kernel. If you have not built Android
More informationThe TinyHPC Cluster. Mukarram Ahmad. Abstract
The TinyHPC Cluster Mukarram Ahmad Abstract TinyHPC is a beowulf class high performance computing cluster with a minor physical footprint yet significant computational capacity. The system is of the shared
More informationAn Energy-Efficient Asymmetric Multi-Processor for HPC Virtualization
An Energy-Efficient Asymmetric Multi-Processor for HP Virtualization hung Lee and Peter Strazdins*, omputer Systems Group, Research School of omputer Science, The Australian National University (slides
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 19 Advanced Processors III 2006-11-2 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ 1 Last
More informationMental models for modern program tuning
Mental models for modern program tuning Andi Kleen Intel Corporation Jun 2016 How can we see program performance? VS High level Important to get the common ants fast Army of ants Preliminary optimization
More informationCS333 Project 1 Test Report Your Name Here
To obtain the L A TEX source for this document, change the file extension to.tex in the url. Testing Aside: Each student will need to provide their own screen shots or other test output as well as the
More informationVirtual memory why? Virtual memory parameters Compared to first-level cache Parameter First-level cache Virtual memory. Virtual memory concepts
Lecture 16 Virtual memory why? Virtual memory: Virtual memory concepts (5.10) Protection (5.11) The memory hierarchy of Alpha 21064 (5.13) Virtual address space proc 0? s space proc 1 Physical memory Virtual
More informationEnhancing PAPI with Low-Overhead rdpmc Reads
Enhancing PAPI with Low-Overhead rdpmc Reads Yan Liu and Vince Weaver {yan.liu,vincent.weaver}@maine.edu University of Maine ESPT Workshop 2017 12 November 2017 PAPI Background PAPI, the Performance API
More informationDongjun Shin Samsung Electronics
2014.10.31. Dongjun Shin Samsung Electronics Contents 2 Background Understanding CPU behavior Experiments Improvement idea Revisiting Linux I/O stack Conclusion Background Definition 3 CPU bound A computer
More informationPerformance Tuning VTune Performance Analyzer
Performance Tuning VTune Performance Analyzer Paul Petersen, Intel Sept 9, 2005 Copyright 2005 Intel Corporation Performance Tuning Overview Methodology Benchmarking Timing VTune Counter Monitor Call Graph
More informationBaking RDKit on a Pi. - Tips and gotchas. Jan Holst Jensen CEO, Biochemfusion
Baking RDKit on a Pi - Tips and gotchas Jan Holst Jensen CEO, Biochemfusion jan@biochemfusion.com RDKit UGM 2012 1 Raspberry Pi Image from http://www.raspberrypi.org/faqs An ARM-based, networked, credit-card
More informationIVI Fast boot approach
IVI Fast boot approach 07/13/2016 Yuichi Kusakabe SS Engineering Group Fujitsu TEN LIMITED 1 About Myself Yuichi Kusakabe (Fujitsu TEN LIMITED) Software Engineer of IVI about 10 years (for 16-bit and 32-bit
More informationReal-Time Cache Management for Multi-Core Virtualization
Real-Time Cache Management for Multi-Core Virtualization Hyoseung Kim 1,2 Raj Rajkumar 2 1 University of Riverside, California 2 Carnegie Mellon University Benefits of Multi-Core Processors Consolidation
More informationTesting the Performance Impact of the Exact Match Cache
Testing the Performance Impact of the Exact Match Cache Now with Signature Match Cache Comparison! Andrew Theurer - Sr. Principal Software Engineer December 2018 Why Does the Cache Matter? DPDK PMD in
More informationSNMP MIBs and Traps Supported
This section describes the MIBs available on your system. When you access your MIB data you will expose additional MIBs not listed in this section. The additional MIBs you expose through the process are
More informationRAS Enhancement Activities for Mission-Critical Linux Systems
RAS Enhancement Activities for MissionCritical Linux Systems Hitachi Ltd. Yoshihiro YUNOMAE 01 MissionCritical Systems We apply Linux to missioncritical systems. Banking systems/carrier backend systems/train
More informationTransparent Hugepage Support
Transparent Hugepage Support Red Hat Inc. Andrea Arcangeli aarcange at redhat.com KVM Forum 2010 Boston Copyright 2010 Red Hat Inc. 9 Aug 2010 Benefit of hugepages Enlarge TLB size TLB is separate for
More informationSimulation-Based Tracing and Profiling for System Software Development
Simulation-Based Tracing and Profiling for System Software Development Anselm Busse, Reinhardt Karnapke, and Helge Parzyjegla SYSTOR 2017 Haifa 2017-05-22 Motivation Tracing and profiling is crucial to
More informationCROWDCOIN MASTERNODE SETUP COLD WALLET ON WINDOWS WITH LINUX VPS
CROWDCOIN MASTERNODE SETUP COLD WALLET ON WINDOWS WITH LINUX VPS This tutorial shows the steps required to setup your Crowdcoin Masternode on a Linux server and run your wallet on a Windows operating system
More informationRevolutionizing the Datacenter. Join the Conversation #OpenPOWERSummit
Programming On-Chip Components To Retrieve Sensor Data. Shilpasri G Bhat Linux Kernel Developer, IBM Linux Technology Center IBM India Systems and Technology Labs Revolutionizing
More informationPerformance Counters and Tools OpenPOWER Tutorial, SC17, Denver
Performance Counters and Tools OpenPOWER Tutorial, SC17, Denver Andreas Herten, Forschungszentrum Jülich, 13 November 2017 Handout Version Outline Goals of this session Get to know Performance Counters
More informationP6: Trial Build of a ROM Nikhil George. 1. Introduction. Overview of the build task. Cite the build/ wiki articles you read.
P6: Trial Build of a ROM Nikhil George 1. Introduction. Overview of the build task. Cite the build/ wiki articles you read. Installation of required packages sudo apt-get install git gnupg flex bison gperf
More informationTracing embedded heterogeneous systems
Tracing embedded heterogeneous systems P R O G R E S S R E P O R T M E E T I N G, D E C E M B E R 2015 T H O M A S B E R T A U L D D I R E C T E D B Y M I C H E L D A G E N A I S December 10th 2015 TRACING
More informationOptimizing Datacenter Power with Memory System Levers for Guaranteed Quality-of-Service
Optimizing Datacenter Power with Memory System Levers for Guaranteed Quality-of-Service * Kshitij Sudan* Sadagopan Srinivasan Rajeev Balasubramonian* Ravi Iyer Executive Summary Goal: Co-schedule N applications
More informationSecurity-Aware Processor Architecture Design. CS 6501 Fall 2018 Ashish Venkat
Security-Aware Processor Architecture Design CS 6501 Fall 2018 Ashish Venkat Agenda Common Processor Performance Metrics Identifying and Analyzing Bottlenecks Benchmarking and Workload Selection Performance
More informationLinux-Ready RV-GC AndesCore with Architecture Extensions Charlie Su, Ph.D. CTO and SVP 2018/05/09
Linux-Ready RV-GC AndesCore with Architecture Extensions Charlie Su, Ph.D. CTO and SVP 2018/05/09 WWW.ANDESTECH.COM Introduction to Andes Asia-based IPO Company 13 years in the pure-play CPU IP business
More informationMid Term from Feb-2005 to Nov 2012 CS604- Operating System
Mid Term from Feb-2005 to Nov 2012 CS604- Operating System Latest Solved from Mid term Papers Resource Person Hina 1-The problem with priority scheduling algorithm is. Deadlock Starvation (Page# 84) Aging
More informationModule I: Measuring Program Performance
Performance Programming: Theory, Practice and Case Studies Module I: Measuring Program Performance 9 Outline 10 Measuring methodology and guidelines Measurement tools Timing Tools Profiling Tools Process
More informationCS3210: Operating Systems
CS3210: Operating Systems Lab 1 Tutorial 1 / 39 Lab session general structure Session A - overview presentation (30 min) Concepts, tutorial, and demo Session B - group activity (30 min) Each student will
More informationIntel profiling tools and roofline model. Dr. Luigi Iapichino
Intel profiling tools and roofline model Dr. Luigi Iapichino luigi.iapichino@lrz.de Which tool do I use in my project? A roadmap to optimization (and to the next hour) We will focus on tools developed
More informationProfiling and Debugging OpenCL Applications with ARM Development Tools. October 2014
Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline
More informationOperating System. Hanyang University. Hyunmin Yoon Operating System Hanyang University
Hyunmin Yoon (fulcanelli86@gmail.com) 2 Linux development ENVIRONMENT 2 3 References ubuntu documentation Kernel/Compile https://help.ubuntu.com/community/kernel/compile 3 4 Tools $ software-properties-gtk
More informationDEVELOPMENT GUIDE VAB-630. Linux BSP v
DEVELOPMENT GUIDE VAB-630 Linux BSP v1.0.1 100-09182017-114400 Copyright Copyright 2017 VIA Technologies Incorporated. All rights reserved. No part of this document may be reproduced, transmitted, transcribed,
More informationExercise Session 5. Data Processing on Modern Hardware L Fall Semester Cagri Balkesen
Cagri Balkesen Data Processing on Modern Hardware Exercises Fall 2012 1 Exercise Session 5 Data Processing on Modern Hardware 263-3502-00L Fall Semester 2012 Cagri Balkesen cagri.balkesen@inf.ethz.ch Department
More informationPotentials and Limitations for Energy Efficiency Auto-Tuning
Center for Information Services and High Performance Computing (ZIH) Potentials and Limitations for Energy Efficiency Auto-Tuning Parco Symposium Application Autotuning for HPC (Architectures) Robert Schöne
More informationQuestion 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. The memory is word addressable The size of the
Question 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. he memory is word addressable he size of the cache is 8 blocks; each block is 4 words (32 words cache).
More informationApache Spark 2.0 Performance Improvements Investigated With Flame Graphs. Luca Canali CERN, Geneva (CH)
Apache Spark 2.0 Performance Improvements Investigated With Flame Graphs Luca Canali CERN, Geneva (CH) Speaker Intro Database engineer and team lead at CERN IT Hadoop and Spark service Database services
More informationHidden Linux Metrics with ebpf_exporter. Ivan Babrou
Hidden Linux Metrics with ebpf_exporter Ivan Babrou @ibobrik Performance team @Cloudflare What does Cloudflare do CDN Moving content physically closer to visitors with our CDN. Intelligent caching Unlimited
More informationFtrace Kernel Hooks: More than just tracing. Presenter: Steven Rostedt Red Hat
Ftrace Kernel Hooks: More than just tracing Presenter: Steven Rostedt rostedt@goodmis.org Red Hat Ftrace Function Hooks Function Tracer Function Graph Tracer Function Profiler Stack Tracer Kprobes Uprobes
More informationAndroid Debugging and Performance Analysis
Hands On Exercises for Android Debugging and Performance Analysis v. 2018.10 WARNING: The order of the exercises does not always follow the same order of the explanations in the slides. When carrying out
More informationSolving Difficult Memory Performance Problems
Solving Difficult Memory Performance Problems Jiri Olsa Joe Mario January 27, 2017 Red Hat Engineering Red Hat Performance Engineering Agenda Overview: Where does my program get its memory from? Types
More informationWHAT YOU WILL NEED FOR THIS GUIDE:
WHAT YOU WILL NEED FOR THIS GUIDE: 1. Local computer with Windows or Linux. 2. Remote server VPS [This guide uses digitaloceans.com but any provider will work] 3. PuTTY to configure and setup the VPS 4.
More informationMeasuring the impacts of the Preempt-RT patch
Measuring the impacts of the Preempt-RT patch maxime.chevallier@smile.fr October 25, 2017 RT Linux projects Simulation platform : bi-xeon, lots ot RAM 200µs wakeup latency, networking Test bench : Intel
More informationLab1 tutorial CS https://tc.gtisc.gatech.edu/cs3210/2016/lab/lab1.html
Lab1 tutorial CS 3210 https://tc.gtisc.gatech.edu/cs3210/2016/lab/lab1.html Lab session general structure Session A - overview presentation (30 min) - About concept, tutorial and demo Session B - group
More informationEvaluation of Real-time Performance in Embedded Linux. Hiraku Toyooka, Hitachi. LinuxCon Europe Hitachi, Ltd All rights reserved.
Evaluation of Real-time Performance in Embedded Linux LinuxCon Europe 2014 Hiraku Toyooka, Hitachi 1 whoami Hiraku Toyooka Software engineer at Hitachi " Working on operating systems Linux (mainly) for
More information