Linux ftrace, , Android Systrace. Android [2][3]. Linux ftrace. Linux. Intel VTune[6] perf timechart[7]. ,, GPU Intel. .

Size: px
Start display at page:

Download "Linux ftrace, , Android Systrace. Android [2][3]. Linux ftrace. Linux. Intel VTune[6] perf timechart[7]. ,, GPU Intel. ."

Transcription

1 Linux ftrace Dominic Hillenbrand 1 1 1,.,,.,., Linux ftrace., Intel Xeon X7560, ARMv7 equake, art, mpeg2enc OS., 1 Intel Xeon 1.07[us], ARM 4.44[us]., Linux, ftrace, 1...,.,,,., [1].,.,. 1 Waseda University, Android Systrace. Android Systrace Android [2][3]. Linux ftrace, ftrace Linux [4][5].,, Intel VTune[6] perf timechart[7].,. VTune OpenCL, GPU. VTune Intel Intel,., perf timechart Linux.,, Linux ftrace c 2014 Information Processing Society of Japan 1

2 .,,.,, ftrace. ftrace ID,,., HTML.,,., OSCAR., 2, 3 Annotatable Systrace. 4 OSCAR. 5,. 6,, Linux ftrace Linux ftrace ftrace Android Systrace. 2.1 Linux ftrace Linux ftrace Linux,. ftrace CONFIG_FUNCTION_TRACER. ftrace ftrace. # mount -t debugfs nodev /sys/kernel/debug/ ftrace function tracer,., /sys/kernel/debug/tracing/events/.. # cd /sys/kernel/debug/tracing # echo 1 > events/sched/sched_wakeup/enable # echo 1 > events/sched/sched_switch/enable 2.2 Android Systrace Android Systrace, Android SDK, Android,. ID Android HTML. 3. Annotatable Systrace Annotatable Systrace, Android Systrace.,., ID., HTML. HTML ID. 3.1 Annotatable Systrace, Annotatable Systrace Linux ftrace., Linux task_struct oscar_mt_str.., systrace., [8].,., /dev/., write.. trace_sched_switch task_struct., ftrace,. trace_sched_switch,., c 2014 Information Processing Society of Japan 2

3 .,,.,.,,.,., sched_getcpu CPU-ID, ID CPU.,,,., trace_sched_switch,., smp_processor_id., 1., write... trace_sched_switch,, ftrace,. 4. OSCAR, OSCAR [9][10]. OSCAR C Fortran,.,, (BB), BB 1 kernel buffer 0 buffer 1 buffer n trace_sched_switch device file smp_processor_id kernelspace /dev/cdev_char 0 /dev/cdev_char n userspace applica1on write (thread 0) /dev/cdev_char 1 write (thread 1) device file manager write (thread m)..,, (PE) (PG),., (BB) (RB), (SB) ( (MT))., RB, SB. MT, BB, RB, SB MT, (MFG). MFG MT, (MTG). MTG,,, MTG MT PG,. MTG, MT, MT PG. SB RB MTG, PG PE PG., PG MT Doall RB, RB c 2014 Information Processing Society of Japan 3

4 1 RS440 Server HitachiHA8000/RS440 OS Ubuntu LTS (64bit, Linux ) CPU Intel Xeon X7560 (2.27 GHz) 32 L2 Cache 2048KB L3 Cache 24576KB Compiler GCC RAM 32GB 2 Nexus Server Nexus OS Android 4.3 (64bit, Linux 3.4.0) CPU Qualcomm Snapdragon S4 Pro (1.7 GHz) 4 L2 Cache 2048KB Compiler arm-linux-gnueabihf-gcc RAM 2GB, PG PE., OSCAR. 5. OSCAR., Intel Xeon HA8000/RS440 ARM Nexus Intel Xeon X7560 Qualcomm Snapdragon S4 Pro., equake, art, mpeg2enc MPEG2 Encoder. MPEG2, MediaBench. MPEG2, EQUAKE. equake, SPEC (The Standard Performance Evaluation Corporation), SPEC 2000., ART. art, SPEC PE EQUAKE, Adaptive Resonance Theory 2 (ART2).,,. 6., OSCAR,,, 32PE., write,. 6.1 Nexus7 4PE equake,. 2. loop48, CPU0, 1, 2, 3 loop48., PE1 PE3 PE0., loop48 while, loop48.,,. 6.2 Nexus 7 4PE equake art,,. 3., barrier art,, PE pe equake., CPU0 art, CPU2 equake., c 2014 Information Processing Society of Japan 4

5 3 4PE EQUAKE ART 5 32PE EQUAKE 4 MPEG2ENC CPU1 CPU3 art equake., art equake CPU.,. CPU0, 2,.,.,,,,. 6.3 RS440 2PE mpeg2enc,. 4., CPU10 CPU8., CPU8 CPU9., PE RS440 32PE equake,. 5.,.,. loop48,. loop48 6., loop48, PE EQUAKE loop RS440 Nexus7 equake, art, mpeg2enc., write. 3, 4, , OSCAR RS440 32PE, Nexus7 4PE. systrace, nowrite, original, write, write, write., 5 write 1, us/call. 3 4, original, equake art 1.59, mpeg2enc 3.95.,. c 2014 Information Processing Society of Japan 5

6 3 RS440 [ ] systrace nowrite original equake art mpeg2enc Nexus7 [ ] systrace nowrite original equake art mpeg2enc write [us/call] RS Nexus write.,, write write.,. write, 5, 1 RS [us], Nexus7 4.44[us]. (2013). [4] Jake Edge: A look at ftrace, Articles/322666/ (2014). [5] Steven Rostedt: Debugging the kernel using Ftrace - part1, (2014). [6] Intel Corporation: Intel VTune Amplifier XE 2013, intel-vtune-amplifier-xe. [7] Stephane Eranian,Eric Gouriou, Tipp Moseley, Willem de Bruijn: perf, index.php/tutorial (2014). [8] Ariane Keller: Kernel Space - User Space Interfaces, kernel_user_space_howto.html (2014). [9],, : Fortran, (1990). [10],,,, :, (2003). 7. ftrace. OSCAR, RS440, Nexus7,,,,,.,. Android 4.3 Android API Trace,, Android Systrace. Trace. [1] Eileen Kramer, John T. Stasko: The Visualization of Parallel Systems: An Overview, Journal of Parallel and Distributed Computing (1993). [2] Google: Android Systrace, android.com/tools/help/systrace.html (2014). [3],,,,,, : Android 2D SKIA OSCAR, 199 ARC 142 HPC c 2014 Information Processing Society of Japan 6

Annotatable Systrace: An Extended Linux ftrace for Tracing a Parallelized Program

Annotatable Systrace: An Extended Linux ftrace for Tracing a Parallelized Program Annotatable Systrace: An Extended Linux ftrace for Tracing a Parallelized Program Daichi Fukui Mamoru Shimaoka Hiroki Mikami Dominic Hillenbrand Hideo Yamamoto Keiji Kimura Hironori Kasahara Waseda University,

More information

About the Need to Power Instrument the Linux Kernel

About the Need to Power Instrument the Linux Kernel Embedded Linux Conference February 21st, 2017 Portland, OR, USA About the Need to Power Instrument the Linux Kernel Patrick Titiano, System Power Management Expert, BayLibre co-founder. www.baylibre.com

More information

ELC The future of Tracing and Profiling for Power Management and Accelerators. Making Wireless. Jean Pihet v1.

ELC The future of Tracing and Profiling for Power Management and Accelerators. Making Wireless. Jean Pihet v1. ELC 2011 The future of Tracing and Profiling for Power Management and Accelerators Jean Pihet v1.0 Introduction Background Work on ARMv7 support for oprofile/perf/ftrace Work on OMAP PM:

More information

More performance options

More performance options More performance options OpenCL, streaming media, and native coding options with INDE April 8, 2014 2014, Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Inside, Intel Xeon, and Intel

More information

Multicore Cache Coherence Control by a Parallelizing Compiler

Multicore Cache Coherence Control by a Parallelizing Compiler Multicore Cache Coherence Control by a Parallelizing Compiler Hironori Kasahara, Boma A. Adhi, Yohei Kishimoto, Keiji Kimura, Yuhei Hosokawa Masayoshi Mase Department of Computer Science and Engineering

More information

Intel Parallel Studio XE 2015

Intel Parallel Studio XE 2015 2015 Create faster code faster with this comprehensive parallel software development suite. Faster code: Boost applications performance that scales on today s and next-gen processors Create code faster:

More information

Debugging realtime application with Ftrace

Debugging realtime application with Ftrace Debugging realtime application with Ftrace Pierre Ficheux (pierre.ficheux@smile.fr) 02/2018 1 Disclaimer Poor English speaker! But good French speaker and writer :-) Loin du français je meurs (Louis-Ferdinand

More information

Operating System System Call & Debugging Technique

Operating System System Call & Debugging Technique 1 Operating System System Call & Debugging Technique 진주영 jjysienna@gmail.com System Call 2 A way for user-space programs to interact with the kernel System Call enables application programs in user-mode

More information

Kernel perf tool user guide

Kernel perf tool user guide Kernel perf tool user guide 2017-10-16 Reversion Record Date Rev Change Description Author 2017-10-16 V0.1 Inital Zhang Yongchang 1 / 10 catalog 1 PURPOSE...4 2 TERMINOLOGY...4 3 ENVIRONMENT...4 3.1 HARDWARE

More information

Installation Guide and Release Notes

Installation Guide and Release Notes Intel Parallel Studio XE 2013 for Linux* Installation Guide and Release Notes Document number: 323804-003US 10 March 2013 Table of Contents 1 Introduction... 1 1.1 What s New... 1 1.1.1 Changes since Intel

More information

Parallelization of Automobile Engine Control Software on Multicore Processor

Parallelization of Automobile Engine Control Software on Multicore Processor Vol.203-ARC-203 No.2 203//3 2,, C OSCAR RP-X 2.7 Parallelization of Automobile Engine Control Software on Multicore Processor Youhei Kanehagi Dan Umeda Hiroki Mikami Akihiro Hayashi Mitsuo Sawada 2 Keiji

More information

HPC Lab. Session 4: Profiler. Sebastian Rettenberger, Chaulio Ferreira, Michael Bader. November 9, 2015

HPC Lab. Session 4: Profiler. Sebastian Rettenberger, Chaulio Ferreira, Michael Bader. November 9, 2015 HPC Lab Session 4: Profiler Sebastian Rettenberger, Chaulio Ferreira, Michael Bader November 9, 2015 Session 4: Profiler, November 9, 2015 1 Profiler Profiling allows you to learn where your program spent

More information

Android Sdk Tutorial For Windows 7 64 Bit Full Version

Android Sdk Tutorial For Windows 7 64 Bit Full Version Android Sdk Tutorial For Windows 7 64 Bit Full Version I will be doing the same tutorial for Windows 7 next. First of all you need to know which. Windows XP (32-bit), Vista (32- or 64-bit), or Windows

More information

Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes

Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes 23 October 2014 Table of Contents 1 Introduction... 1 1.1 Product Contents... 2 1.2 Intel Debugger (IDB) is

More information

Intel Media Server Studio Professional Edition for Linux*

Intel Media Server Studio Professional Edition for Linux* Intel Media Server Studio 2015 R4 Professional Edition for Linux* Release Notes Overview What's New System Requirements Package Contents Installation Installation Folders Known Limitations Legal Information

More information

Waseda Univ. Green Computing Systems R&D Center

Waseda Univ. Green Computing Systems R&D Center Automatic Parallelization of MATLAB/Simulink on Multicore Processors -- Parallel processing of automobile engine control C code generated by embedded coder -- Hironori Kasahara Professor, Dept. of Computer

More information

Cool Chips, Low Power Multicores, Open the Way to the Future

Cool Chips, Low Power Multicores, Open the Way to the Future Cool Chips, Low Power Multicores, Open the Way to the Future Hironori Kasahara President Elect 2017, President 2018 IEEE Computer Society IEEE Fellow Professor, Dept. of Computer Science & Engineering

More information

Simultaneous Multithreading on Pentium 4

Simultaneous Multithreading on Pentium 4 Hyper-Threading: Simultaneous Multithreading on Pentium 4 Presented by: Thomas Repantis trep@cs.ucr.edu CS203B-Advanced Computer Architecture, Spring 2004 p.1/32 Overview Multiple threads executing on

More information

intel System Studio 2018 Beta 새로운플랫폼을위한새로운맞춤형개발자경험

intel System Studio 2018 Beta 새로운플랫폼을위한새로운맞춤형개발자경험 intel System Studio 2018 Beta 새로운플랫폼을위한새로운맞춤형개발자경험 Introduction to Developer Products Division Technical Computing IoT, Wearables, Embedded & Mobile Systems Computer Vision Performance Client Media & Apps

More information

Path analysis vs. empirical determination of a system's real-time capabilities: The crucial role of latency tests

Path analysis vs. empirical determination of a system's real-time capabilities: The crucial role of latency tests Path analysis vs. empirical determination of a system's real-time capabilities: The crucial role of latency tests Carsten Emde Open Source Automation Development Lab (OSADL) eg Issues leading to system

More information

Bei Wang, Dmitry Prohorov and Carlos Rosales

Bei Wang, Dmitry Prohorov and Carlos Rosales Bei Wang, Dmitry Prohorov and Carlos Rosales Aspects of Application Performance What are the Aspects of Performance Intel Hardware Features Omni-Path Architecture MCDRAM 3D XPoint Many-core Xeon Phi AVX-512

More information

A Case Study in Optimizing GNU Radio s ATSC Flowgraph

A Case Study in Optimizing GNU Radio s ATSC Flowgraph A Case Study in Optimizing GNU Radio s ATSC Flowgraph Presented by Greg Scallon and Kirby Cartwright GNU Radio Conference 2017 Thursday, September 14 th 10am ATSC FLOWGRAPH LOADING 3% 99% 76% 36% 10% 33%

More information

Multigrain Parallelization for Model-based Design Applications Using the OSCAR Compiler

Multigrain Parallelization for Model-based Design Applications Using the OSCAR Compiler Multigrain Parallelization for Model-based Design Applications Using the OSCAR Compiler Dan Umeda, Takahiro Suzuki, Hiroki Mikami, Keiji Kimura, and Hironori Kasahara Green Computing Systems Research Center

More information

Linux Strace tool user guide

Linux Strace tool user guide Linux Strace tool user guide 2017-10-13 Reversion Record Date Rev Change Description Author 2017-10-13 V0.1 Initial Zhang Yongchang 1 / 9 catalog 1 PURPOSE...4 2 TERMINOLOGY...4 3 ENVIRONMENT...4 3.1 HARDWARE

More information

CLU: Open Source API for OpenCL Prototyping

CLU: Open Source API for OpenCL Prototyping CLU: Open Source API for OpenCL Prototyping Presenter: Adam Lake@Intel Lead Developer: Allen Hux@Intel Contributors: Benedict Gaster@AMD, Lee Howes@AMD, Tim Mattson@Intel, Andrew Brownsword@Intel, others

More information

Power Measurements using performance counters

Power Measurements using performance counters Power Measurements using performance counters CSL862: Low-Power Computing By Suman A M (2015SIY7524) Android Power Consumption in Android Power Consumption in Smartphones are powered from batteries which

More information

Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes

Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes Document number: 323804-002US 21 June 2012 Table of Contents 1 Introduction... 1 1.1 What s New... 1 1.2 Product Contents...

More information

A Parallelizing Compiler for Multicore Systems

A Parallelizing Compiler for Multicore Systems A Parallelizing Compiler for Multicore Systems José M. Andión, Manuel Arenaz, Gabriel Rodríguez and Juan Touriño 17th International Workshop on Software and Compilers for Embedded Systems (SCOPES 2014)

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware

More information

Nvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018

Nvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018 Nvidia Jetson TX2 and its Software Toolset João Fernandes 2017/2018 In this presentation Nvidia Jetson TX2: Hardware Nvidia Jetson TX2: Software Machine Learning: Neural Networks Convolutional Neural Networks

More information

Linux Foundation Collaboration Summit 2010

Linux Foundation Collaboration Summit 2010 Linux Foundation Collaboration Summit 2010 LTTng, State of the Union Presentation at: http://www.efficios.com/lfcs2010 E-mail: mathieu.desnoyers@efficios.com 1 > Presenter Mathieu Desnoyers EfficiOS Inc.

More information

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014 Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline

More information

Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes

Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes Document number: 323804-001US 8 October 2010 Table of Contents 1 Introduction... 1 1.1 Product Contents... 1 1.2 What s New...

More information

Multi-platform Automatic Parallelization and Power Reduction by OSCAR Compiler

Multi-platform Automatic Parallelization and Power Reduction by OSCAR Compiler Multi-platform Automatic Parallelization and Power Reduction by OSCAR Compiler Hironori Kasahara Professor Dept. of Computer Science & Engineering Director Advanced Multicore Processor Research Institute

More information

Understanding The Performance of DPDK as a Computer Architect

Understanding The Performance of DPDK as a Computer Architect Understanding The Performance of DPDK as a Computer Architect XIAOBAN WU *, PEILONG LI *, YAN LUO *, LIANG- MIN (LARRY) WANG +, MARC PEPIN +, AND JOHN MORGAN + * UNIVERSITY OF MASSACHUSETTS LOWELL + INTEL

More information

Portable Power/Performance Benchmarking and Analysis with WattProf

Portable Power/Performance Benchmarking and Analysis with WattProf Portable Power/Performance Benchmarking and Analysis with WattProf Amir Farzad, Boyana Norris University of Oregon Mohammad Rashti RNET Technologies, Inc. Motivation Energy efficiency is becoming increasingly

More information

Expressing and Analyzing Dependencies in your C++ Application

Expressing and Analyzing Dependencies in your C++ Application Expressing and Analyzing Dependencies in your C++ Application Pablo Reble, Software Engineer Developer Products Division Software and Services Group, Intel Agenda TBB and Flow Graph extensions Composable

More information

Outline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work

Outline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work Using Non-blocking Operations in HPC to Reduce Execution Times David Buettner, Julian Kunkel, Thomas Ludwig Euro PVM/MPI September 8th, 2009 Outline 1 Motivation 2 Theory of a non-blocking benchmark 3

More information

) Android. , Android. Android, ELF. Dexfuzz [1], DEX (Android. Android DEX. , Dexfuzz DEX, ,. DEX. (backend compiler), (IR), IR 2.

) Android. , Android. Android, ELF. Dexfuzz [1], DEX (Android. Android DEX. , Dexfuzz DEX, ,. DEX. (backend compiler), (IR), IR 2. 一般社団法人電子情報通信学会 THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS 信学技報 IEICE Technical Report VLD2017-95(2018-02)

More information

Snapdragon NPE Overview

Snapdragon NPE Overview March 2018 Linaro Connect Hong Kong Snapdragon NPE Overview Mark Charlebois Director, Engineering Qualcomm Technologies, Inc. Caffe2 Snapdragon Neural Processing Engine Efficient execution on Snapdragon

More information

LinuxCon 2010 Tracing Mini-Summit

LinuxCon 2010 Tracing Mini-Summit LinuxCon 2010 Tracing Mini-Summit A new unified Lockless Ring Buffer library for efficient kernel tracing Presentation at: http://www.efficios.com/linuxcon2010-tracingsummit E-mail: mathieu.desnoyers@efficios.com

More information

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM I5 AND I7 PROCESSORS Juan M. Cebrián 1 Lasse Natvig 1 Jan Christian Meyer 2 1 Depart. of Computer and Information

More information

Interrupt response times on Arduino and Raspberry Pi. Tomaž Šolc

Interrupt response times on Arduino and Raspberry Pi. Tomaž Šolc Interrupt response times on Arduino and Raspberry Pi Tomaž Šolc tomaz.solc@ijs.si Introduction Full-featured Linux-based systems are replacing microcontrollers in some embedded applications for low volumes,

More information

Qualcomm Snapdragon Profiler

Qualcomm Snapdragon Profiler Qualcomm Technologies, Inc. Qualcomm Snapdragon Profiler User Guide September 21, 2018 Qualcomm Snapdragon is a product of Qualcomm Technologies, Inc. Other Qualcomm products referenced herein are products

More information

Improve Linux User-Space Core Libraries with Restartable Sequences

Improve Linux User-Space Core Libraries with Restartable Sequences Open Source Summit 2018 Improve Linux User-Space Core Libraries with Restartable Sequences mathieu.desnoyers@efcios.com Speaker Mathieu Desnoyers CEO at EfficiOS Inc. Maintainer of: LTTng kernel and user-space

More information

JCudaMP: OpenMP/Java on CUDA

JCudaMP: OpenMP/Java on CUDA JCudaMP: OpenMP/Java on CUDA Georg Dotzler, Ronald Veldema, Michael Klemm Programming Systems Group Martensstraße 3 91058 Erlangen Motivation Write once, run anywhere - Java Slogan created by Sun Microsystems

More information

SIMD. Utilization of a SIMD unit in the OS Kernel. Shogo Saito 1 and Shuichi Oikawa 2 2. SIMD. SIMD (Single SIMD SIMD SIMD SIMD

SIMD. Utilization of a SIMD unit in the OS Kernel. Shogo Saito 1 and Shuichi Oikawa 2 2. SIMD. SIMD (Single SIMD SIMD SIMD SIMD OS SIMD 1 2 SIMD (Single Instruction Multiple Data) SIMD OS (Operating System) SIMD SIMD OS Utilization of a SIMD unit in the OS Kernel Shogo Saito 1 and Shuichi Oikawa 2 Nowadays, it is very common that

More information

Efficient and Large Scale Program Flow Tracing in Linux. Alexander Shishkin, Intel

Efficient and Large Scale Program Flow Tracing in Linux. Alexander Shishkin, Intel Efficient and Large Scale Program Flow Tracing in Linux Alexander Shishkin, Intel 16.09.2013 Overview Program flow tracing - What is it? - What is it good for? Intel Processor Trace - Features / capabilities

More information

Alexei Katranov. IWOCL '16, April 21, 2016, Vienna, Austria

Alexei Katranov. IWOCL '16, April 21, 2016, Vienna, Austria Alexei Katranov IWOCL '16, April 21, 2016, Vienna, Austria Hardware: customization, integration, heterogeneity Intel Processor Graphics CPU CPU CPU CPU Multicore CPU + integrated units for graphics, media

More information

Android System Development Training 4-day session

Android System Development Training 4-day session Android System Development Training 4-day session Title Android System Development Training Overview Understanding the Android Internals Understanding the Android Build System Customizing Android for a

More information

Graphics Performance Analyzer for Android

Graphics Performance Analyzer for Android Graphics Performance Analyzer for Android 1 What you will learn from this slide deck Detailed optimization workflow of Graphics Performance Analyzer Android* System Analysis Only Please see subsequent

More information

FDS and Intel MPI. Verification Report. on the. FireNZE Linux IB Cluster

FDS and Intel MPI. Verification Report. on the. FireNZE Linux IB Cluster Consulting Fire Engineers 34 Satara Crescent Khandallah Wellington 6035 New Zealand FDS 6.7.0 and Intel MPI Verification Report on the FireNZE Linux IB Cluster Prepared by: FireNZE Dated: 11 August 2018

More information

Getting Started with Intel SDK for OpenCL Applications

Getting Started with Intel SDK for OpenCL Applications Getting Started with Intel SDK for OpenCL Applications Webinar #1 in the Three-part OpenCL Webinar Series July 11, 2012 Register Now for All Webinars in the Series Welcome to Getting Started with Intel

More information

Debugging Kernel without Debugger

Debugging Kernel without Debugger Debugging Kernel without Debugger Masami Hiramatsu Software Platform Research Dept. Yokohama Research Lab. Hitachi Ltd., 1 Who am I? Masami Hiramatsu Researcher in Hitachi

More information

Martin Kruliš, v

Martin Kruliš, v Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal

More information

Profiling: Understand Your Application

Profiling: Understand Your Application Profiling: Understand Your Application Michal Merta michal.merta@vsb.cz 1st of March 2018 Agenda Hardware events based sampling Some fundamental bottlenecks Overview of profiling tools perf tools Intel

More information

Ftrace Kernel Hooks: More than just tracing. Presenter: Steven Rostedt Red Hat

Ftrace Kernel Hooks: More than just tracing. Presenter: Steven Rostedt Red Hat Ftrace Kernel Hooks: More than just tracing Presenter: Steven Rostedt rostedt@goodmis.org Red Hat Ftrace Function Hooks Function Tracer Function Graph Tracer Function Profiler Stack Tracer Kprobes Uprobes

More information

Illinois Proposal Considerations Greg Bauer

Illinois Proposal Considerations Greg Bauer - 2016 Greg Bauer Support model Blue Waters provides traditional Partner Consulting as part of its User Services. Standard service requests for assistance with porting, debugging, allocation issues, and

More information

Using Intel VTune Amplifier XE and Inspector XE in.net environment

Using Intel VTune Amplifier XE and Inspector XE in.net environment Using Intel VTune Amplifier XE and Inspector XE in.net environment Levent Akyil Technical Computing, Analyzers and Runtime Software and Services group 1 Refresher - Intel VTune Amplifier XE Intel Inspector

More information

Using Intel Math Kernel Library with MathWorks* MATLAB* on Intel Xeon Phi Coprocessor System

Using Intel Math Kernel Library with MathWorks* MATLAB* on Intel Xeon Phi Coprocessor System Using Intel Math Kernel Library with MathWorks* MATLAB* on Intel Xeon Phi Coprocessor System Overview This guide is intended to help developers use the latest version of Intel Math Kernel Library (Intel

More information

Debugging and Tracing of Many-core Processors Simon Marchi

Debugging and Tracing of Many-core Processors Simon Marchi Debugging and Tracing of Many-core Processors Simon Marchi DORSAL Laboratory Department of Computer and Software Engineering Plan Introduction Remote procedure calls debugging Tracing many-core processors

More information

An Introduction to the SPEC High Performance Group and their Benchmark Suites

An Introduction to the SPEC High Performance Group and their Benchmark Suites An Introduction to the SPEC High Performance Group and their Benchmark Suites Robert Henschel Manager, Scientific Applications and Performance Tuning Secretary, SPEC High Performance Group Research Technologies

More information

Lab 6: OS Security for the Internet of Things

Lab 6: OS Security for the Internet of Things Department of Computer Science: Cyber Security Practice Lab 6: OS Security for the Internet of Things Introduction The Internet of Things (IoT) is an emerging technology that will affect our daily life.

More information

Intel Xeon Phi Coprocessor

Intel Xeon Phi Coprocessor Intel Xeon Phi Coprocessor http://tinyurl.com/inteljames twitter @jamesreinders James Reinders it s all about parallel programming Source Multicore CPU Compilers Libraries, Parallel Models Multicore CPU

More information

Lab 6: OS Security for the Internet of Things

Lab 6: OS Security for the Internet of Things Department of Computer Science: Cyber Security Practice Lab 6: OS Security for the Internet of Things Introduction The Internet of Things (IoT) is an emerging technology that will affect our daily life.

More information

Performance Profiler. Klaus-Dieter Oertel Intel-SSG-DPD IT4I HPC Workshop, Ostrava,

Performance Profiler. Klaus-Dieter Oertel Intel-SSG-DPD IT4I HPC Workshop, Ostrava, Performance Profiler Klaus-Dieter Oertel Intel-SSG-DPD IT4I HPC Workshop, Ostrava, 08-09-2016 Faster, Scalable Code, Faster Intel VTune Amplifier Performance Profiler Get Faster Code Faster With Accurate

More information

To hear the audio, please be sure to dial in: ID#

To hear the audio, please be sure to dial in: ID# Introduction to the HPP-Heterogeneous Processing Platform A combination of Multi-core, GPUs, FPGAs and Many-core accelerators To hear the audio, please be sure to dial in: 1-866-440-4486 ID# 4503739 Yassine

More information

CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker

CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker CUDA on ARM Update Developing Accelerated Applications on ARM Bas Aarts and Donald Becker CUDA on ARM: a forward-looking development platform for high performance, energy efficient hybrid computing It

More information

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune PORTING CP2K TO THE INTEL XEON PHI ARCHER Technical Forum, Wed 30 th July Iain Bethune (ibethune@epcc.ed.ac.uk) Outline Xeon Phi Overview Porting CP2K to Xeon Phi Performance Results Lessons Learned Further

More information

Ampere emag Processor Optimized for the Cloud Kumar Sankaran Vice President, Software & Platforms, Ampere

Ampere emag Processor Optimized for the Cloud Kumar Sankaran Vice President, Software & Platforms, Ampere Ampere emag Processor Optimized for the Cloud Kumar Sankaran Vice President, Software & Platforms, Ampere 3 Ampere emag Processor Optimized for the Cloud March 20, 2018 4 Ampere: Targeting the Cloud Processor

More information

Development Environment Embedded Linux Primer Ch 1&2

Development Environment Embedded Linux Primer Ch 1&2 Development Environment Embedded Linux Primer Ch 1&2 Topics 1) Systems: Host and Target 2) Host setup 3) Host-Target communication CMPT 433 Slides #3 Dr. B. Fraser 18-05-05 2 18-05-05 1 Host & Target Host

More information

Achieving Peak Performance on Intel Hardware. Jim Cownie: Intel Software Developer Conference Frankfurt, December 2017

Achieving Peak Performance on Intel Hardware. Jim Cownie: Intel Software Developer Conference Frankfurt, December 2017 Achieving Peak Performance on Intel Hardware Jim Cownie: Intel Software Developer Conference Frankfurt, December 2017 Welcome Aims for the day You understand some of the critical features of Intel processors

More information

ARM Powered SoCs OpenEmbedded: a framework for toolcha. generation and rootfs management

ARM Powered SoCs OpenEmbedded: a framework for toolcha. generation and rootfs management ARM Powered SoCs OpenEmbedded: a framework for toolchain generation and rootfs management jacopo @ Admstaff Reloaded 12-2010 An overview on commercial ARM-Powered SOCs Many low-cost ARM powered devices

More information

Linux Storage System Bottleneck Exploration

Linux Storage System Bottleneck Exploration Linux Storage System Bottleneck Exploration Bean Huo / Zoltan Szubbocsev Beanhuo@micron.com / zszubbocsev@micron.com 215 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications

More information

Arm Processor Technology Update and Roadmap

Arm Processor Technology Update and Roadmap Arm Processor Technology Update and Roadmap ARM Processor Technology Update and Roadmap Cavium: Giri Chukkapalli is a Distinguished Engineer in the Data Center Group (DCG) Introduction to ARM Architecture

More information

Cuda C Programming Guide Appendix C Table C-

Cuda C Programming Guide Appendix C Table C- Cuda C Programming Guide Appendix C Table C-4 Professional CUDA C Programming (1118739329) cover image into the powerful world of parallel GPU programming with this down-to-earth, practical guide Table

More information

7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT

7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT 7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT Draft Printed for SECO Murex S.A.S 2012 all rights reserved Murex Analytics Only global vendor of trading, risk management and processing systems focusing also

More information

Performance analysis tools: Intel VTuneTM Amplifier and Advisor. Dr. Luigi Iapichino

Performance analysis tools: Intel VTuneTM Amplifier and Advisor. Dr. Luigi Iapichino Performance analysis tools: Intel VTuneTM Amplifier and Advisor Dr. Luigi Iapichino luigi.iapichino@lrz.de Which tool do I use in my project? A roadmap to optimisation After having considered the MPI layer,

More information

Enabling and Optimizing MariaDB on Qualcomm Centriq 2400 Arm-based Servers

Enabling and Optimizing MariaDB on Qualcomm Centriq 2400 Arm-based Servers Enabling and Optimizing MariaDB on Qualcomm Centriq 2400 Arm-based Servers World s First 10nm Server Processor Sandeep Sethia Staff Engineer Qualcomm Datacenter Technologies, Inc. February 25, 2018 MariaDB

More information

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems Ed Hinkel Senior Sales Engineer Agenda Overview - Rogue Wave & TotalView GPU Debugging with TotalView Nvdia CUDA Intel Phi 2

More information

Debugging, benchmarking, tuning i.e. software development tools. Martin Čuma Center for High Performance Computing University of Utah

Debugging, benchmarking, tuning i.e. software development tools. Martin Čuma Center for High Performance Computing University of Utah Debugging, benchmarking, tuning i.e. software development tools Martin Čuma Center for High Performance Computing University of Utah m.cuma@utah.edu SW development tools Development environments Compilers

More information

Post-K Supercomputer Overview. Copyright 2016 FUJITSU LIMITED

Post-K Supercomputer Overview. Copyright 2016 FUJITSU LIMITED Post-K Supercomputer Overview 1 Post-K supercomputer overview Developing Post-K as the successor to the K computer with RIKEN Developing HPC-optimized high performance CPU and system software Selected

More information

Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS

Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS HPC User Forum, 7 th September, 2016 Outline of Talk Introduction of FLAGSHIP2020 project An Overview of post K system Concluding Remarks

More information

Jackson Marusarz Software Technical Consulting Engineer

Jackson Marusarz Software Technical Consulting Engineer Jackson Marusarz Software Technical Consulting Engineer What Will Be Covered Overview Memory/Thread analysis New Features Deep dive into debugger integrations Demo Call to action 2 Analysis Tools for Diagnosis

More information

General Purpose GPU Computing in Partial Wave Analysis

General Purpose GPU Computing in Partial Wave Analysis JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data

More information

8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2

8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2 CSE 820 Graduate Computer Architecture Richard Enbody Dr. Enbody 1 st Day 2 1 Why Computer Architecture? Improve coding. Knowledge to make architectural choices. Ability to understand articles about architecture.

More information

ISA-L Performance Report Release Test Date: Sept 29 th 2017

ISA-L Performance Report Release Test Date: Sept 29 th 2017 Test Date: Sept 29 th 2017 Revision History Date Revision Comment Sept 29 th, 2017 1.0 Initial document for release 2 Contents Audience and Purpose... 4 Test setup:... 4 Intel Xeon Platinum 8180 Processor

More information

Accelerating Multicore Architecture Simulation Using Application Profile

Accelerating Multicore Architecture Simulation Using Application Profile 2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip Accelerating Multicore Architecture Simulation Using Application Profile Keiji Kimura, Gakuho Taguchi, Hironori Kasahara

More information

Make technology more simple, Make life more intelligent. Embedded Computer EC-A3288C. Specifications V1.0

Make technology more simple, Make life more intelligent. Embedded Computer EC-A3288C. Specifications V1.0 Embedded Computer EC-A3288C Specifications V1.0 Version Date Updated content V1.0 2018-10-17 Original version - 1 - Directory 1. Product Overview... 4 1.1 Overview... 4 2. Interface description... 5 3.

More information

Lab2 - Bootloader. Conventions. Department of Computer Science and Information Engineering National Taiwan University

Lab2 - Bootloader. Conventions. Department of Computer Science and Information Engineering National Taiwan University Lab2 - Bootloader 1 / 20 Cross-compile U-Boot. Connect to Raspberry Pi via an USB-TTL cable. Boot Raspberry Pi via U-Boot. 2 / 20 Host Machine OS: Windows Target Machine Raspberry Pi (2 or 3) Build Machine

More information

Implemen'ng IPv6 Segment Rou'ng in the Linux Kernel

Implemen'ng IPv6 Segment Rou'ng in the Linux Kernel Implemen'ng IPv6 Segment Rou'ng in the Linux Kernel David Lebrun, Olivier Bonaventure ICTEAM, UCLouvain Work supported by ARC grant 12/18-054 (ARC-SDN) and a Cisco grant Agenda IPv6 Segment Rou'ng Implementa'on

More information

Basic Specification of Oakforest-PACS

Basic Specification of Oakforest-PACS Basic Specification of Oakforest-PACS Joint Center for Advanced HPC (JCAHPC) by Information Technology Center, the University of Tokyo and Center for Computational Sciences, University of Tsukuba Oakforest-PACS

More information

Evaluation of Automatic Power Reduction with OSCAR Compiler on Intel Haswell and ARM Cortex-A9 Multicores

Evaluation of Automatic Power Reduction with OSCAR Compiler on Intel Haswell and ARM Cortex-A9 Multicores Evaluation of Automatic Power Reduction with OSCAR Compiler on Intel Haswell and ARM Cortex-A9 Multicores Tomohiro Hirano 1, Hideo Yamamoto 1, Shuhei Iizuka 1, Kohei Muto 1, Takashi Goto 1, Tamami Wake

More information

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries. Munara Tolubaeva Technical Consulting Engineer 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries. notices and disclaimers Intel technologies features and benefits depend

More information

OpenPOWER Performance

OpenPOWER Performance OpenPOWER Performance Alex Mericas Chief Engineer, OpenPOWER Performance IBM Delivering the Linux ecosystem for Power SOLUTIONS OpenPOWER IBM SOFTWARE LINUX ECOSYSTEM OPEN SOURCE Solutions with full stack

More information

ARM Virtualization: Performance and Architectural Implications. Christoffer Dall, Shih-Wei Li, Jin Tack Lim, Jason Nieh, and Georgios Koloventzos

ARM Virtualization: Performance and Architectural Implications. Christoffer Dall, Shih-Wei Li, Jin Tack Lim, Jason Nieh, and Georgios Koloventzos ARM Virtualization: Performance and Architectural Implications Christoffer Dall, Shih-Wei Li, Jin Tack Lim, Jason Nieh, and Georgios Koloventzos ARM Servers ARM Network Equipment Virtualization Virtualization

More information

N720 OpenLinux Software User Guide Version 1.2

N720 OpenLinux Software User Guide Version 1.2 N720 Hardware User Guide () N720 OpenLinux Software User Guide Version 1.2 Copyright Copyright 2017 Neoway Technology Co., Ltd. All rights reserved. No part of this document may be reproduced or transmitted

More information

Blazer Pro V2.1 Client Requirements & Hardware Performance

Blazer Pro V2.1 Client Requirements & Hardware Performance Blazer Pro V2.1 Client Requirements & Hardware Performance Table of Contents Chapter 1 Client Requirements... 2 Chapter 2 Control Client Performance... 3 2.1 Local Control Client on Blazer Pro Server...

More information

Accelerate block service built on Ceph via SPDK Ziye Yang Intel

Accelerate block service built on Ceph via SPDK Ziye Yang Intel Accelerate block service built on Ceph via SPDK Ziye Yang Intel 1 Agenda SPDK Introduction Accelerate block service built on Ceph SPDK support in Ceph bluestore Summary 2 Agenda SPDK Introduction Accelerate

More information

ART JIT in Android N. Xueliang ZHONG Linaro ART Team

ART JIT in Android N. Xueliang ZHONG Linaro ART Team ART JIT in Android N Xueliang ZHONG Linaro ART Team linaro-art@linaro.org 1 Outline Android Runtime (ART) and the new challenges ART Implementation in Android N Tooling Performance Data & Findings Q &

More information