Tracing embedded heterogeneous systems P R O G R E S S R E P O R T M E E T I N G, M A Y

Size: px
Start display at page:

Download "Tracing embedded heterogeneous systems P R O G R E S S R E P O R T M E E T I N G, M A Y"

Transcription

1 Tracing embedded heterogeneous systems P R O G R E S S R E P O R T M E E T I N G, M A Y T H O M A S B E R T A U L D D I R E C T E D B Y M I C H E L D A G E N A I S May 5th 2016 TRACING EMBEDDED HETEROGENEOUS SYSTEMS 1

2 Presentation plan 1. Introduction 2. The Keystone 2 architecture 3. BareCTF 4. Tracing embedded heterogeneous systems 5. The synchronization process 6. Use-case 7. Conclusion May 5th 2016 TRACING EMBEDDED HETEROGENEOUS SYSTEMS 2

3 Introduction - Why tracing heterogeneous embedded systems? Have you ever wondered how images get processed inside these cameras? May 5th 2016 TRACING EMBEDDED HETEROGENEOUS SYSTEMS 3

4 Introduction - Why tracing heterogeneous embedded systems? Systems designed for specific needs/tasks Often used for real-time applications like signal processing Can be used anywhere Power-efficient Used inside much more complex systems May 5th 2016 TRACING EMBEDDED HETEROGENEOUS SYSTEMS 4

5 Introduction - Challenges Different kind of processors Some may be «unconventional» ones Some may be «bare-metal» ones Complex and specialized hardwares Limited resources No internal storage Little RAM Lack of traditionnal tools May 5th 2016 TRACING EMBEDDED HETEROGENEOUS SYSTEMS 5

6 The Keystone 2 - Specifications 66AK2H TI SoC 4 ARM Cortex A15 running Linux (1.4 GHz) 8 C66x TI CorePacs DSPs (1.2 GHz) 2 GB DDR3 6 MB Multicore Shared Memory May 5th 2016 TRACING EMBEDDED HETEROGENEOUS SYSTEMS 6

7 The Keystone 2 - Benefits and drawbacks Broadly used TI DSPs Powerful SoC 8 processors with built-in signal processing abilities TI s SYS/BIOS modules Full C support on the DSPs No way of tracing the DSPs Complex to use May 5th 2016 TRACING EMBEDDED HETEROGENEOUS SYSTEMS 7

8 BareCTF - Tracing bare-metal systems Python tool created by Philippe Proulx (EfficiOS) Targets bare-metal systems Generates CTF traces Easy-to-use (configuration by YAML files) Lightweight May 5th 2016 TRACING EMBEDDED HETEROGENEOUS SYSTEMS 8

9 Tracing embedded heterogeneous systems - Facts and goal LTTng can be used to trace the ARM side of any board BareCTF can be used to trace every other type of cores For what end? Trace the whole application s chain Detect anomalies, bottlenecks, latencies Have a global view of a process distributed between different type of cores May 5th 2016 TRACING EMBEDDED HETEROGENEOUS SYSTEMS 9

10 Tracing embedded heterogeneous systems - Challenges BareCTF must be ported to any new platform The traces obtained from different processors must be synchronized Necessity to generate matching events in each trace Interrupt-based mechanism May 5th 2016 TRACING EMBEDDED HETEROGENEOUS SYSTEMS 10

11 The synchronization process - Description ARM Generic core SYNC Generic cores ARM May 5th 2016 TRACING EMBEDDED HETEROGENEOUS SYSTEMS 11

12 The synchronization process - Description ARM Generic core SYNC ACK Generic cores ARM May 5th 2016 TRACING EMBEDDED HETEROGENEOUS SYSTEMS 12

13 Use-case - Description Instrumentation of an image processing algorithm Edge detection Sobel s filter May 5th 2016 TRACING EMBEDDED HETEROGENEOUS SYSTEMS 13

14 Use-case - Setup 3000*3000 bmp image 1 ARM process acting as master Gives commands Sends input image and receives result 8 DSPs running acting as slaves Wait for commands Use TI s ImgLib for image processing In charge of memory management May 5th 2016 TRACING EMBEDDED HETEROGENEOUS SYSTEMS 14

15 Use-case - Results May 5th 2016 TRACING EMBEDDED HETEROGENEOUS SYSTEMS 15

16 Use-case - Results Low impact ~95ms to ~96ms of processing time Effective Can show if the work isn t well balanced Allows to keep track of the overall process Uses TraceCompass internal traces synchronization mechanism May 5th 2016 TRACING EMBEDDED HETEROGENEOUS SYSTEMS 16

17 Conclusion - Limitations The barectf platform can be improved Heavy API High latency Wasted memory The synchronization doesn t take drift in account May 5th 2016 TRACING EMBEDDED HETEROGENEOUS SYSTEMS 17

18 Conclusion - Future work Switch to more efficient message-passing methods Determine an optimal synchronization rate Improve the overall overhead of the barectf platform Tests on more complex systems May 5th 2016 TRACING EMBEDDED HETEROGENEOUS SYSTEMS 18

19 Thank you for your attention! Contact : thomas.bertauld@gmail.com May 5th 2016 TRACING EMBEDDED HETEROGENEOUS SYSTEMS 19

Tracing embedded heterogeneous systems

Tracing embedded heterogeneous systems Tracing embedded heterogeneous systems P R O G R E S S R E P O R T M E E T I N G, D E C E M B E R 2015 T H O M A S B E R T A U L D D I R E C T E D B Y M I C H E L D A G E N A I S December 10th 2015 TRACING

More information

Introduction to AM5K2Ex/66AK2Ex Processors

Introduction to AM5K2Ex/66AK2Ex Processors Introduction to AM5K2Ex/66AK2Ex Processors 1 Recommended Pre-Requisite Training Prior to this training, we recommend you review the KeyStone II DSP+ARM SoC Architecture Overview, which provides more details

More information

Integrating DMA capabilities into BLIS for on-chip data movement. Devangi Parikh Ilya Polkovnichenko Francisco Igual Peña Murtaza Ali

Integrating DMA capabilities into BLIS for on-chip data movement. Devangi Parikh Ilya Polkovnichenko Francisco Igual Peña Murtaza Ali Integrating DMA capabilities into BLIS for on-chip data movement Devangi Parikh Ilya Polkovnichenko Francisco Igual Peña Murtaza Ali 5 Generations of TI Multicore Processors Keystone architecture Lowers

More information

KeyStone C665x Multicore SoC

KeyStone C665x Multicore SoC KeyStone Multicore SoC Architecture KeyStone C6655/57: Device Features C66x C6655: One C66x DSP Core at 1.0 or 1.25 GHz C6657: Two C66x DSP Cores at 0.85, 1.0, or 1.25 GHz Fixed and Floating Point Operations

More information

KeyStone II. CorePac Overview

KeyStone II. CorePac Overview KeyStone II ARM Cortex A15 CorePac Overview ARM A15 CorePac in KeyStone II Standard ARM Cortex A15 MPCore processor Cortex A15 MPCore version r2p2 Quad core, dual core, and single core variants 4096kB

More information

Providing Near-Optimal Fair- Queueing Guarantees at Round-Robin Amortized Cost

Providing Near-Optimal Fair- Queueing Guarantees at Round-Robin Amortized Cost Providing Near-Optimal Fair- Queueing Guarantees at Round-Robin Amortized Cost Paolo Valente Department of Physics, Computer Science and Mathematics Modena - Italy Workshop PRIN SFINGI October 2013 2 Contributions

More information

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. CS 320 Ch. 18 Multicore Computers Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. Definitions: Hyper-threading Intel's proprietary simultaneous

More information

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models Jason Andrews Agenda System Performance Analysis IP Configuration System Creation Methodology: Create,

More information

Software Design Challenges for heterogenic SOC's

Software Design Challenges for heterogenic SOC's Software Design Challenges for heterogenic SOC's René Janssen, Product manager Logic Technology 1 Agenda 1. Advantages of heterogenous devices 2. How to manage inter-processor communication 3. Example

More information

Keystone Architecture Inter-core Data Exchange

Keystone Architecture Inter-core Data Exchange Application Report Lit. Number November 2011 Keystone Architecture Inter-core Data Exchange Brighton Feng Vincent Han Communication Infrastructure ABSTRACT This application note introduces various methods

More information

«UNDERSTANDING EMBEDDED LINUX BENCHMARKING USING KERNEL TRACE ANALYSIS» ALEXIS MARTIN INRIA / LIG / UNIV. GRENOBLE, FRANCE

«UNDERSTANDING EMBEDDED LINUX BENCHMARKING USING KERNEL TRACE ANALYSIS» ALEXIS MARTIN INRIA / LIG / UNIV. GRENOBLE, FRANCE «UNDERSTANDING EMBEDDED LINUX BENCHMARKING USING KERNEL TRACE ANALYSIS» ALEXIS MARTIN INRIA / LIG / UNIV. GRENOBLE, FRANCE We do Need Benchmarking! Benchmark : a standard or point of reference against

More information

Optimizing the performance and portability of multicore DSP platforms with a scalable programming model supporting the Multicore Association s MCAPI

Optimizing the performance and portability of multicore DSP platforms with a scalable programming model supporting the Multicore Association s MCAPI Texas Instruments, PolyCore Software, Inc. & The Multicore Association Optimizing the performance and portability of multicore DSP platforms with a scalable programming model supporting the Multicore Association

More information

Heterogeneous Multi-Processor Coherent Interconnect

Heterogeneous Multi-Processor Coherent Interconnect Heterogeneous Multi-Processor Coherent Interconnect Kai Chirca, Matthew Pierson Processors, Texas Instruments Inc, Dallas TX 1 Agenda q TI KeyStoneII Architecture and MSMC (Multicore Shared Memory Controller)

More information

Sri Vidya College of Engineering and Technology. EC6703 Embedded and Real Time Systems Unit IV Page 1.

Sri Vidya College of Engineering and Technology. EC6703 Embedded and Real Time Systems Unit IV Page 1. Sri Vidya College of Engineering and Technology ERTS Course Material EC6703 Embedded and Real Time Systems Page 1 Sri Vidya College of Engineering and Technology ERTS Course Material EC6703 Embedded and

More information

Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing

Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing Gaurav Mitra Andrew Haigh Luke Angove Anish Varghese Eric McCreath Alistair P. Rendell Research School of Computer Science Australian

More information

Leverage Vybrid's asymmetrical multicore architecture for real-time applications by Stefan Agner

Leverage Vybrid's asymmetrical multicore architecture for real-time applications by Stefan Agner Leverage Vybrid's asymmetrical multicore architecture for real-time applications 2014 by Stefan Agner Vybrid Family of ARM processors suitable for embedded devices VF3XX Single core no DDR VF5XX Single

More information

Doing more with multicore! Utilizing the power-efficient, high-performance KeyStone multicore DSPs. November 2012

Doing more with multicore! Utilizing the power-efficient, high-performance KeyStone multicore DSPs. November 2012 Doing more with multicore! Utilizing the power-efficient, high-performance KeyStone multicore DSPs November 2012 How the world is doing more with TI s multicore Using TI multicore for wide variety of applications

More information

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd Optimizing ARM SoC s with Carbon Performance Analysis Kits ARM Technical Symposia, Fall 2014 Andy Ladd Evolving System Requirements Processor Advances big.little Multicore Unicore DSP Cortex -R7 Block

More information

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014 Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline

More information

Parallel Simulation Accelerates Embedded Software Development, Debug and Test

Parallel Simulation Accelerates Embedded Software Development, Debug and Test Parallel Simulation Accelerates Embedded Software Development, Debug and Test Larry Lapides Imperas Software Ltd. larryl@imperas.com Page 1 Modern SoCs Have Many Concurrent Processing Elements SMP cores

More information

KeyStone C66x Multicore SoC Overview. Dec, 2011

KeyStone C66x Multicore SoC Overview. Dec, 2011 KeyStone C66x Multicore SoC Overview Dec, 011 Outline Multicore Challenge KeyStone Architecture Reminder About KeyStone Solution Challenge Before KeyStone Multicore performance degradation Lack of efficient

More information

Designing with ALTERA SoC Hardware

Designing with ALTERA SoC Hardware Designing with ALTERA SoC Hardware Course Description This course provides all theoretical and practical know-how to design ALTERA SoC devices under Quartus II software. The course combines 60% theory

More information

Tutorial: PREESM - Dataflow Programming of Multicore DSPs

Tutorial: PREESM - Dataflow Programming of Multicore DSPs Tutorial: PREESM - Dataflow Programming of Multicore DSPs Karol Desnos, Clément Guy, Maxime Pelcat EDERC 2014 Conference, Milan, September 11 th 1 PREESM http://preesm.sourceforge.net/website Eclipse-based

More information

Implementation of Deep Convolutional Neural Net on a Digital Signal Processor

Implementation of Deep Convolutional Neural Net on a Digital Signal Processor Implementation of Deep Convolutional Neural Net on a Digital Signal Processor Elaina Chai December 12, 2014 1. Abstract In this paper I will discuss the feasibility of an implementation of an algorithm

More information

ADVANCED trouble-shooting of real-time systems. Bernd Hufmann, Ericsson

ADVANCED trouble-shooting of real-time systems. Bernd Hufmann, Ericsson ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson AGENDA 1 Introduction 2 3 Timing Analysis 4 References 5 Q&A Trace Compass Overview ADVANCED trouble-shooting of critical real-time

More information

A new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI SOI Symposium Santa Clara, Apr.

A new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI SOI Symposium Santa Clara, Apr. Dr. Jens Benndorf MD, COO Dream Chip A new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI SOI Symposium Santa Clara, Apr. 13th, 2017 DCT Company Profile Dream

More information

KeyStone Training. Multicore Navigator Overview

KeyStone Training. Multicore Navigator Overview KeyStone Training Multicore Navigator Overview What is Navigator? Overview Agenda Definition Architecture Queue Manager Sub-System (QMSS) Packet DMA () Descriptors and Queuing What can Navigator do? Data

More information

AT-501 Cortex-A5 System On Module Product Brief

AT-501 Cortex-A5 System On Module Product Brief AT-501 Cortex-A5 System On Module Product Brief 1. Scope The following document provides a brief description of the AT-501 System on Module (SOM) its features and ordering options. For more details please

More information

MYC-C437X CPU Module

MYC-C437X CPU Module MYC-C437X CPU Module - Up to 1GHz TI AM437x Series ARM Cortex-A9 Processors - 512MB DDR3 SDRAM, 4GB emmc Flash, 32KB EEPROM - Gigabit Ethernet PHY - Power Management IC - Two 0.8mm pitch 100-pin Board-to-Board

More information

ARM+DSP - a winning combination on Qseven

ARM+DSP - a winning combination on Qseven ...embedding excellence ARM+DSP - a winning combination on Qseven 1 ARM Conference Munich July 2012 ARM on Qseven your first in module technology Over 6 Billion ARM-based chips sold in 2010 10% market

More information

With Fixed Point or Floating Point Processors!!

With Fixed Point or Floating Point Processors!! Product Information Sheet High Throughput Digital Signal Processor OVERVIEW With Fixed Point or Floating Point Processors!! Performance Up to 14.4 GIPS or 7.7 GFLOPS Peak Processing Power Continuous Input

More information

Simplify System Complexity

Simplify System Complexity Simplify System Complexity With the new high-performance CompactRIO controller Fanie Coetzer Field Sales Engineer Northern South Africa 2 3 New control system CompactPCI MMI/Sequencing/Logging FieldPoint

More information

Under The Hood: Performance Tuning With Tizen. Ravi Sankar Guntur

Under The Hood: Performance Tuning With Tizen. Ravi Sankar Guntur Under The Hood: Performance Tuning With Tizen Ravi Sankar Guntur How to write a Tizen App Tools already available in IDE v2.3 Dynamic Analyzer Valgrind 2 What s NEXT? Want to optimize my application App

More information

Introducing the AM57x Sitara Processors from Texas Instruments

Introducing the AM57x Sitara Processors from Texas Instruments Introducing the AM57x Sitara Processors from Texas Instruments ARM Cortex-A15 solutions for automation, HMI, vision, analytics, and other industrial and high-performance applications. Embedded Processing

More information

Speeding AM335x Programmable Realtime Unit (PRU) Application Development Through Improved Debug Tools

Speeding AM335x Programmable Realtime Unit (PRU) Application Development Through Improved Debug Tools Speeding AM335x Programmable Realtime Unit (PRU) Application Development Through Improved Debug Tools The hardware modules and descriptions referred to in this document are *NOT SUPPORTED* by Texas Instruments

More information

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection Hiroyuki Usui, Jun Tanabe, Toru Sano, Hui Xu, and Takashi Miyamori Toshiba Corporation, Kawasaki, Japan Copyright 2013,

More information

KeyStone Training. Turbo Encoder Coprocessor (TCP3E)

KeyStone Training. Turbo Encoder Coprocessor (TCP3E) KeyStone Training Turbo Encoder Coprocessor (TCP3E) Agenda Overview TCP3E Overview TCP3E = Turbo CoProcessor 3 Encoder No previous versions, but came out at same time as third version of decoder co processor

More information

C66x KeyStone Training HyperLink

C66x KeyStone Training HyperLink C66x KeyStone Training HyperLink 1. HyperLink Overview 2. Address Translation 3. Configuration 4. Example and Demo Agenda 1. HyperLink Overview 2. Address Translation 3. Configuration 4. Example and Demo

More information

Linux Storage System Bottleneck Exploration

Linux Storage System Bottleneck Exploration Linux Storage System Bottleneck Exploration Bean Huo / Zoltan Szubbocsev Beanhuo@micron.com / zszubbocsev@micron.com 215 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications

More information

MTAPI: Parallel Programming for Embedded Multicore Systems

MTAPI: Parallel Programming for Embedded Multicore Systems MTAPI: Parallel Programming for Embedded Multicore Systems Urs Gleim Siemens AG, Corporate Technology http://www.ct.siemens.com/ urs.gleim@siemens.com Markus Levy The Multicore Association http://www.multicore-association.org/

More information

Maximizing heterogeneous system performance with ARM interconnect and CCIX

Maximizing heterogeneous system performance with ARM interconnect and CCIX Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable

More information

C66x KeyStone Training HyperLink

C66x KeyStone Training HyperLink C66x KeyStone Training HyperLink 1. HyperLink Overview 2. Address Translation 3. Configuration 4. Example and Demo Agenda 1. HyperLink Overview 2. Address Translation 3. Configuration 4. Example and Demo

More information

Designing with ALTERA SoC

Designing with ALTERA SoC Designing with ALTERA SoC תיאורהקורס קורסזהמספקאתכלהידע התיאורטיוהמעשילתכנוןרכיביSoC שלחברתALTERA תחתסביבת הפיתוחII.Quartus הקורסמשלב 60% תיאוריהו- 40% עבודה מעשית עללוחותפיתוח.SoC הקורסמתחילבסקירתמשפחותרכבי

More information

EyeCheck Smart Cameras

EyeCheck Smart Cameras EyeCheck Smart Cameras 2 3 EyeCheck 9xx & 1xxx series Technical data Memory: DDR RAM 128 MB FLASH 128 MB Interfaces: Ethernet (LAN) RS422, RS232 (not EC900, EC910, EC1000, EC1010) EtherNet / IP PROFINET

More information

ODP Relationship to NFV. Bill Fischofer, LNG 31 October 2013

ODP Relationship to NFV. Bill Fischofer, LNG 31 October 2013 ODP Relationship to NFV Bill Fischofer, LNG 31 October 2013 Alphabet Soup NFV - Network Functions Virtualization, a carrier initiative organized under ETSI (European Telecommunications Standards Institute)

More information

Near Memory Key/Value Lookup Acceleration MemSys 2017

Near Memory Key/Value Lookup Acceleration MemSys 2017 Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy

More information

Chapter 6 Storage and Other I/O Topics

Chapter 6 Storage and Other I/O Topics Department of Electr rical Eng ineering, Chapter 6 Storage and Other I/O Topics 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Feng-Chia Unive ersity Outline 6.1 Introduction 6.2 Dependability,

More information

OpenCL TM & OpenMP Offload on Sitara TM AM57x Processors

OpenCL TM & OpenMP Offload on Sitara TM AM57x Processors OpenCL TM & OpenMP Offload on Sitara TM AM57x Processors 1 Agenda OpenCL Overview of Platform, Execution and Memory models Mapping these models to AM57x Overview of OpenMP Offload Model Compare and contrast

More information

SDSoC: Session 1

SDSoC: Session 1 SDSoC: Session 1 ADAM@ADIUVOENGINEERING.COM What is SDSoC SDSoC is a system optimising compiler which allows us to optimise Zynq PS / PL Zynq MPSoC PS / PL MicroBlaze What does this mean? Following the

More information

Time Synchronization for AV applications across Wired and Wireless 802 LANs [for residential applications]

Time Synchronization for AV applications across Wired and Wireless 802 LANs [for residential applications] Time Synchronization for AV applications across Wired and Wireless 802 LANs [for residential applications] A presentation to 802.11 TGv Kevin Stanton Intel Corporation 5/20/2006 1 Agenda Motivation Time

More information

High Performance Compute Platform Based on multi-core DSP for Seismic Modeling and Imaging

High Performance Compute Platform Based on multi-core DSP for Seismic Modeling and Imaging High Performance Compute Platform Based on multi-core DSP for Seismic Modeling and Imaging Presenter: Murtaza Ali, Texas Instruments Contributors: Murtaza Ali, Eric Stotzer, Xiaohui Li, Texas Instruments

More information

Partitioning of computationally intensive tasks between FPGA and CPUs

Partitioning of computationally intensive tasks between FPGA and CPUs Partitioning of computationally intensive tasks between FPGA and CPUs Tobias Welti, MSc (Author) Institute of Embedded Systems Zurich University of Applied Sciences Winterthur, Switzerland tobias.welti@zhaw.ch

More information

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017!

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017! Advanced Topics on Heterogeneous System Architectures HSA Foundation! Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2

More information

extended external Benchmarking extension (XXBX)

extended external Benchmarking extension (XXBX) extended external Benchmarking extension () John Pham and Jens-Peter Kaps Cryptographic Engineering Research Group (CERG) http://cryptography.gmu.edu Department of ECE, Volgenau School of Engineering,

More information

SoC Platforms and CPU Cores

SoC Platforms and CPU Cores SoC Platforms and CPU Cores COE838: Systems on Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

Real-Timeness and System Integrity on a Asymmetric Multi Processing configuration

Real-Timeness and System Integrity on a Asymmetric Multi Processing configuration Real-Timeness and System Integrity on a Asymmetric Multi Processing configuration D&E Event November 2nd Relator: Manuele Papais Sales & Marketing Manager 1 DAVE Embedded Systems DAVE Embedded Systems'

More information

MYD-C437X-PRU Development Board

MYD-C437X-PRU Development Board MYD-C437X-PRU Development Board MYC-C437X CPU Module as Controller Board Two 0.8mm pitch 100-pin Connectors for Board-to-Board Connections Up to 1GHz TI AM437x Series ARM Cortex-A9 Processors 512MB DDR3

More information

Scalable embedded Realtime

Scalable embedded Realtime Scalable embedded Realtime with OpenComRTOS Bernhard H.C. Sputh bernhard.sputh@altreonic.com, http://www.altreonic.com From Deep Space to Deep Sea Push Button High Reliability Outline History of Altreonic

More information

Porting BLIS to new architectures Early experiences

Porting BLIS to new architectures Early experiences 1st BLIS Retreat. Austin (Texas) Early experiences Universidad Complutense de Madrid (Spain) September 5, 2013 BLIS design principles BLIS = Programmability + Performance + Portability Share experiences

More information

ARM Powered SoCs OpenEmbedded: a framework for toolcha. generation and rootfs management

ARM Powered SoCs OpenEmbedded: a framework for toolcha. generation and rootfs management ARM Powered SoCs OpenEmbedded: a framework for toolchain generation and rootfs management jacopo @ Admstaff Reloaded 12-2010 An overview on commercial ARM-Powered SOCs Many low-cost ARM powered devices

More information

Design Choices for FPGA-based SoCs When Adding a SATA Storage }

Design Choices for FPGA-based SoCs When Adding a SATA Storage } U4 U7 U7 Q D U5 Q D Design Choices for FPGA-based SoCs When Adding a SATA Storage } Lorenz Kolb & Endric Schubert, Missing Link Electronics Rudolf Usselmann, ASICS World Services Motivation for SATA Storage

More information

Tile Processor (TILEPro64)

Tile Processor (TILEPro64) Tile Processor Case Study of Contemporary Multicore Fall 2010 Agarwal 6.173 1 Tile Processor (TILEPro64) Performance # of cores On-chip cache (MB) Cache coherency Operations (16/32-bit BOPS) On chip bandwidth

More information

The Next Steps in the Evolution of Embedded Processors

The Next Steps in the Evolution of Embedded Processors The Next Steps in the Evolution of Embedded Processors Terry Kim Staff FAE, ARM Korea ARM Tech Forum Singapore July 12 th 2017 Cortex-M Processors Serving Connected Applications Energy grid Automotive

More information

Embedded Linux Conference 2010

Embedded Linux Conference 2010 Embedded Linux Conference 2010 Using the LTTng Tracer for System-Wide Performance Analysis and Debugging (Hands-on Tutorial) Presentation and files at: http://www.efficios.com/elc2010 E-mail: mathieu.desnoyers@efficios.com

More information

Development of Real-Time Systems with Embedded Linux. Brandon Shibley Senior Solutions Architect Toradex Inc.

Development of Real-Time Systems with Embedded Linux. Brandon Shibley Senior Solutions Architect Toradex Inc. Development of Real-Time Systems with Embedded Linux Brandon Shibley Senior Solutions Architect Toradex Inc. Overview Toradex ARM-based System-on-Modules Pin-Compatible SoM Families In-house HW and SW

More information

PRU Hardware Overview. Building Blocks for PRU Development: Module 1

PRU Hardware Overview. Building Blocks for PRU Development: Module 1 PRU Hardware Overview Building Blocks for PRU Development: Module 1 Agenda SoC Architecture PRU Submodules Example Applications 2 SoC Architecture Building Blocks for PRU Development: PRU Hardware Overview

More information

Introduction to Sitara AM437x Processors

Introduction to Sitara AM437x Processors Introduction to Sitara AM437x Processors AM437x: Highly integrated, scalable platform with enhanced industrial communications and security AM4376 AM4378 Software Key Features AM4372 AM4377 High-performance

More information

Asymmetric MultiProcessing for embedded vision

Asymmetric MultiProcessing for embedded vision Asymmetric MultiProcessing for embedded vision D. Berardi, M. Brian, M. Melletti, A. Paccoia, M. Rodolfi, C. Salati, M. Sartori T3LAB Bologna, October 18, 2017 A Linux centered SW infrastructure for the

More information

System Wide Tracing User Need

System Wide Tracing User Need System Wide Tracing User Need dominique toupin ericsson com April 2010 About me Developer Tool Manager at Ericsson, helping Ericsson sites to develop better software efficiently Background

More information

Dynamic Fine Grain Scheduling of Pipeline Parallelism. Presented by: Ram Manohar Oruganti and Michael TeWinkle

Dynamic Fine Grain Scheduling of Pipeline Parallelism. Presented by: Ram Manohar Oruganti and Michael TeWinkle Dynamic Fine Grain Scheduling of Pipeline Parallelism Presented by: Ram Manohar Oruganti and Michael TeWinkle Overview Introduction Motivation Scheduling Approaches GRAMPS scheduling method Evaluation

More information

Video Interface Module for TI EVM TMDXEVM8148 and TMDXEVM368

Video Interface Module for TI EVM TMDXEVM8148 and TMDXEVM368 CH-Merge with LVDS HD-SDI for TI EVM TMDXEVM8148 and TMDXEVM368 VIM-HDSDI TMDXEVM8148 cable length over 100m Multi channel HD-SDI DDR3 TMS320DM8148(1GHz Cortex A8, 750MHz C674x ) Dual 32bit DDR3 of 1GB,

More information

Scaling the Peak: Maximizing floating point performance on the Epiphany NoC

Scaling the Peak: Maximizing floating point performance on the Epiphany NoC Scaling the Peak: Maximizing floating point performance on the Epiphany NoC Anish Varghese, Gaurav Mitra, Robert Edwards and Alistair Rendell Research School of Computer Science The Australian National

More information

29 th NATIONAL RADIO SCIENCE CONFERENCE (NRSC 2012) April 10 12, 2012, Faculty of Engineering/Cairo University, Egypt

29 th NATIONAL RADIO SCIENCE CONFERENCE (NRSC 2012) April 10 12, 2012, Faculty of Engineering/Cairo University, Egypt K10. Embedded Real Time Ultrasound System Mostafa A. El-Tager 1,2,3, Ehab A. El-Alamy 1,2,4, Amir S. Mahdy 1, Islam Youssef 1, Medhat N. El-Dien 1, and Yasser M. Kadah 1,2 1 Biomedical Engineering Department,

More information

TMS320C6678 Memory Access Performance

TMS320C6678 Memory Access Performance Application Report Lit. Number April 2011 TMS320C6678 Memory Access Performance Brighton Feng Communication Infrastructure ABSTRACT The TMS320C6678 has eight C66x cores, runs at 1GHz, each of them has

More information

Performance of Host Identity Protocol on Lightweight Hardware

Performance of Host Identity Protocol on Lightweight Hardware Performance of Host Identity Protocol on Lightweight Hardware Andrey Khurri, Ekaterina Vorobyeva, Andrei Gurtov Helsinki Institute for Information Technology MobiArch'07 Kyoto,

More information

Simplify System Complexity

Simplify System Complexity 1 2 Simplify System Complexity With the new high-performance CompactRIO controller Arun Veeramani Senior Program Manager National Instruments NI CompactRIO The Worlds Only Software Designed Controller

More information

Low Level Tracing for Latency Analysis

Low Level Tracing for Latency Analysis Low Level Tracing for Latency Analysis From Baremetal to Hardware Tracing Blocks Suchakrapani Datt Sharma & Thomas Bertauld Oct 12, 2016 École Polytechnique de Montréal Laboratoire DORSAL whoami Suchakra

More information

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink Robert Kaye 1 Agenda Once upon a time ARM designed systems Compute trends Bringing it all together with CoreLink 400

More information

QuartzV: Bringing Quality of Time to Virtual Machines

QuartzV: Bringing Quality of Time to Virtual Machines QuartzV: Bringing Quality of Time to Virtual Machines Sandeep D souza and Raj Rajkumar Carnegie Mellon University IEEE RTAS @ CPS Week 2018 1 A Shared Notion of Time Coordinated Actions Ordering of Events

More information

Introduction to gem5. Nizamudheen Ahmed Texas Instruments

Introduction to gem5. Nizamudheen Ahmed Texas Instruments Introduction to gem5 Nizamudheen Ahmed Texas Instruments 1 Introduction A full-system computer architecture simulator Open source tool focused on architectural modeling BSD license Encompasses system-level

More information

Exploring OpenCL Memory Throughput on the Zynq

Exploring OpenCL Memory Throughput on the Zynq Exploring OpenCL Memory Throughput on the Zynq Technical Report no. 2016:04, ISSN 1652-926X Chalmers University of Technology Bo Joel Svensson bo.joel.svensson@gmail.com Abstract The Zynq platform combines

More information

Getting the Most out of Advanced ARM IP. ARM Technology Symposia November 2013

Getting the Most out of Advanced ARM IP. ARM Technology Symposia November 2013 Getting the Most out of Advanced ARM IP ARM Technology Symposia November 2013 Evolving System Requirements Processor Advances big.little Multicore Unicore DSP Cortex -R7 Block are now Sub-Systems Cortex

More information

64-bit ARM Unikernels on ukvm

64-bit ARM Unikernels on ukvm 64-bit ARM Unikernels on ukvm Wei Chen Senior Software Engineer Tokyo / Open Source Summit Japan 2017 2017-05-31 Thanks to Dan Williams, Martin Lucina, Anil Madhavapeddy and other Solo5

More information

CanSCA4.1ReplaceSTRSinSpace Applications?

CanSCA4.1ReplaceSTRSinSpace Applications? CanSCA4.1ReplaceSTRSinSpace Applications? RanCheng,LiZhou,QiTang,Dongtang Ma, Haitao Zhao,ShanWangandJibo Wei NationalUniversityofDefenseTechnology May17,2017 1 Outline 1. Introduction 2. Core Framework

More information

Next Generation Enterprise Solutions from ARM

Next Generation Enterprise Solutions from ARM Next Generation Enterprise Solutions from ARM Ian Forsyth Director Product Marketing Enterprise and Infrastructure Applications Processor Product Line Ian.forsyth@arm.com 1 Enterprise Trends IT is the

More information

Porting VME-Based Optical-Link Remote I/O Module to a PLC Platform - an Approach to Maximize Cross-Platform Portability Using SoC

Porting VME-Based Optical-Link Remote I/O Module to a PLC Platform - an Approach to Maximize Cross-Platform Portability Using SoC Porting VME-Based Optical-Link Remote I/O Module to a PLC Platform - an Approach to Maximize Cross-Platform Portability Using SoC T. Masuda, A. Kiyomichi Japan Synchrotron Radiation Research Institute

More information

Recovering Disk Storage Metrics from low level Trace events

Recovering Disk Storage Metrics from low level Trace events Recovering Disk Storage Metrics from low level Trace events Progress Report Meeting May 05, 2016 Houssem Daoud Michel Dagenais École Polytechnique de Montréal Laboratoire DORSAL Agenda Introduction and

More information

Buses. Disks PCI RDRAM RDRAM LAN. Some slides adapted from lecture by David Culler. Pentium 4 Processor. Memory Controller Hub.

Buses. Disks PCI RDRAM RDRAM LAN. Some slides adapted from lecture by David Culler. Pentium 4 Processor. Memory Controller Hub. es > 100 MB/sec Pentium 4 Processor L1 and L2 caches Some slides adapted from lecture by David Culler 3.2 GB/sec Display Memory Controller Hub RDRAM RDRAM Dual Ultra ATA/100 24 Mbit/sec Disks LAN I/O Controller

More information

Heterogeneous Software Architecture with OpenAMP

Heterogeneous Software Architecture with OpenAMP Heterogeneous Software Architecture with OpenAMP Shaun Purvis, Xilinx Agenda Heterogeneous SoCs Linux and OpenAMP OpenAMP for HSA Heterogeneous SoCs A System-on-Chip that integrates multiple processor

More information

OpenMP Accelerator Model for TI s Keystone DSP+ARM Devices. SC13, Denver, CO Eric Stotzer Ajay Jayaraj

OpenMP Accelerator Model for TI s Keystone DSP+ARM Devices. SC13, Denver, CO Eric Stotzer Ajay Jayaraj OpenMP Accelerator Model for TI s Keystone DSP+ Devices SC13, Denver, CO Eric Stotzer Ajay Jayaraj 1 High Performance Embedded Computing 2 C Core Architecture 8-way VLIW processor 8 functional units in

More information

Xytech MediaPulse Equipment Guidelines (Version 8 and Sky)

Xytech MediaPulse Equipment Guidelines (Version 8 and Sky) Xytech MediaPulse Equipment Guidelines (Version 8 and Sky) MediaPulse Architecture Xytech Systems MediaPulse solution utilizes a multitier architecture, requiring at minimum three server roles: a database

More information

EMBEDDED SYSTEMS WITH ROBOTICS AND SENSORS USING ERLANG

EMBEDDED SYSTEMS WITH ROBOTICS AND SENSORS USING ERLANG EMBEDDED SYSTEMS WITH ROBOTICS AND SENSORS USING ERLANG Adam Lindberg github.com/eproxus HARDWARE COMPONENTS SOFTWARE FUTURE Boot, Serial console, Erlang shell DEMO THE GRISP BOARD SPECS Hardware & specifications

More information

Software Driven Verification at SoC Level. Perspec System Verifier Overview

Software Driven Verification at SoC Level. Perspec System Verifier Overview Software Driven Verification at SoC Level Perspec System Verifier Overview June 2015 IP to SoC hardware/software integration and verification flows Cadence methodology and focus Applications (Basic to

More information

NanoMind Z7000. Datasheet On-board CPU and FPGA for space applications

NanoMind Z7000. Datasheet On-board CPU and FPGA for space applications NanoMind Z7000 Datasheet On-board CPU and FPGA for space applications 1 Table of Contents 1 TABLE OF CONTENTS... 2 2 OVERVIEW... 3 2.1 HIGHLIGHTED FEATURES... 3 2.2 BLOCK DIAGRAM... 4 2.3 FUNCTIONAL DESCRIPTION...

More information

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems Exploring System Coherency and Maximizing Performance of Mobile Memory Systems Shanghai: William Orme, Strategic Marketing Manager of SSG Beijing & Shenzhen: Mayank Sharma, Product Manager of SSG ARM Tech

More information

VICP Signal Processing Library. Further extending the performance and ease of use for VICP enabled devices

VICP Signal Processing Library. Further extending the performance and ease of use for VICP enabled devices Signal Processing Library Further extending the performance and ease of use for enabled devices Why is library effective for customer application? Get to market faster with ready-to-use signal processing

More information

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, Mateo Valero SC 13, November 19 th 2013, Denver, CO, USA

More information

«Real Time Embedded systems» Multi Masters Systems

«Real Time Embedded systems» Multi Masters Systems «Real Time Embedded systems» Multi Masters Systems rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL Chargé de cours rene.beuchat@hesge.ch LSN/hepia Prof. HES 1 Multi Master on Chip On a System On Chip, Master can

More information

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki An Ultra High Performance Scalable DSP Family for Multimedia Hot Chips 17 August 2005 Stanford, CA Erik Machnicki Media Processing Challenges Increasing performance requirements Need for flexibility &

More information

Altera SDK for OpenCL

Altera SDK for OpenCL Altera SDK for OpenCL A novel SDK that opens up the world of FPGAs to today s developers Altera Technology Roadshow 2013 Today s News Altera today announces its SDK for OpenCL Altera Joins Khronos Group

More information