Project Proposals. Advanced Operating Systems / Embedded Systems (2016/2017)

Size: px
Start display at page:

Download "Project Proposals. Advanced Operating Systems / Embedded Systems (2016/2017)"

Transcription

1 Project Proposals / Embedded Systems (2016/2017) Giuseppe Massari, Federico Terraneo giuseppe.massari@polimi.it federico.terraneo@polimi.it

2 Project Rules 2/40 General rules Two types of project: Code development Bibliography research Up to 2 students per project A meeting/report mail every 15 days Brief discussion/reporting/demo about the project progress Source code, uploaded on a GIT remote repository Version control with GIT Use services like BitBucket, GitHub,... Report Format: LaTeX + PDF Language: English Dowload the template from the professor s website Giuseppe Massari

3 Project Rules 3/40 Evaluation Development Project [0 10] extra points (to add to the written exam grade) Quality of the developed code (5) Quality and completeness of the report (2) Quality of the GIT repository (2) Fulfillment of the timing constraints (1) Monographic Research [0 5] extra points Coverage and relevance of the selected papers (2) Writing quality (1) Presentation (PowerPoint/LibreOffice/Beamer/...) to be held at the tutor s office in 20 + questions (1) Fulfillment of the timing constraints (1) Giuseppe Massari

4 BOSP proposals 4/40 The Barbeque Run-Time Resource Management Open-Source (BOSP) Website: Mailing lists: User / News: Developers : Giuseppe Massari

5 BarbequeRTRM Open Source Project (BOSP) 5/40 Overview The core of the project is a Run-Time Resource Manager for Multi/Many-core Systems (the BarbequeRTRM) Scheduling, resource allocation, power management Open-source written in C++ Includes libraries, benchmarks, sample applications and tools Version control with GIT Giuseppe Massari

6 Barbeque Open Source Project (BOSP) 6/40 EU funded project involving BOSP [ ] 2PARMA: PARallel PAradigms and Run-time MAnagement techniques for Many-core Architecture [ ] CONTREX: Design of embedded mixed-criticality CONTRol systems under consideration of EXtra-functional properties ( [ ] HARPA: Harnessing Performance Variability ( [ ] MANGO: MANGO: exploring Manycore Architectures for Next- GeneratiOn HPC systems [ ] ANTAREX: AutoTuning and Adaptivity approach for Energy efficient exascale HPC systems (

7 The BarbequeRTRM 7/40 Layer view C/C++/OpenCL applications supported RTLib provides API and managing communication between Application and RTRM The resource manager daemon includes core components and plug-in modules Platform support exploiting Linux frameworks or custom drivers and libraries Giuseppe Massari

8 The BarbequeRTRM 8/40 Currently supported hardware systems Intel/AMD x86 single multi-core processor systems + Multiple GPUs (AMD) through OpenCL runtime Intel/AMD x86 multi-processor NUMA systems ARM Cortex A9 multi-core CPU based SoC PandaBoard Freescale imx6 Quad SABRE ARM big.little 8-core (Cortex A7 + Cortex A15) SoC (Samsung Exynos 54xx) Insight Arndale OctaCore ODROID XU-E ODROID XU-3 Giuseppe Massari

9 The BarbequeRTRM 9/40 Application Execution Model Resource-aware execution of the application Performance-aware resource management Website:

10 The BarbequeRTRM 10/40 HARPA project use case Landslide detection and prediction system by HENESIS S.r.l GOAL: The system (solar panel / battery powered) must remain ON Giuseppe Massari

11 BOSP 11/40 Google Protocol Buffer based communication BarbequeRTRM RTLib Application Refactoring of the BarbequeRTRM RTLib communication infrastructure, basing it on Google Protocol Buffer C/C++ Students 1-2 Notes 5/10 CFU

12 BOSP 12/40 RTLib Java wrapper BarbequeRTRM C/C++ RTLib Application Export the Abstract Execution Model to support also Java implemented applications, by wrapping the C++ interfaces through Java classes C/C++, Java, JNI Students 1-2 5/10 CFU

13 BOSP 13/40 RTLib Python wrapper BarbequeRTRM C/C++ RTLib Application Export the Abstract Execution Model to support also Python implemented applications, by wrapping the C++ interfaces through Python classes C/C++, Python Students 1-2 5/10 CFU

14 BOSP 14/40 State-of-the-art Resource Allocation policy implementation BarbequeRTRM Policy implements Implement a resource allocation /scheduling policy taken from the state-of-the-art. A paper describing the policy will be provided. C/C++ Students 1-2 5/10 CFU

15 BOSP 15/40 Linux Process Listener exploitation BarbequeRTRM Linux kernel connector RTLib Application Application Enable the management of generic Linux processes ( not integrated applications) by introducing new internal data structures and a simple policy C/C++ Students 1 5/10 CFU

16 BOSP 16/40 Network bandwith control characterization WiFi Ethernet Network Power consumption? Students 1 Exploits the recently integrated network bandwith control to characterize the performance of network-based applications and system power consumption, by exploring different network interfaces and bandwidth constaints. C/C++, Linux, Scripting languages (e.g., Python, Bash...) 5/10 CFU Notes Possible extension into a thesis and/or a publication

17 BOSP 17/40 Adapteva Epiphany-III Parallela board porting Dual core ARM and 16/64 core RISC processors on board ANSI C/C++ and OpenCL programmable Up to 90 GFLOPS processing capability HDMI, Ethernet, USB and 48 GPIO ports Giuseppe Massari

18 BOSP 18/40 Adapteva Epiphany-III Parallela board porting BarbequeRTRM Development of the BarbequeRTRM support for the Adapteva Parallela board in order enable run-time resource management of the many-core C/C++ Students 1-2 Notes 5/10 CFU Possible to extend into a master thesis

19 BOSP 19/40 Integration of multi-threaded benchmarks Application RTLib Application Modify a PARSEC or RODINIA (OpenMP) multi-threaded benchmark according to the Abstract Execution Model to make it run-time manageable C/C++ Students 1 5 CFU

20 BOSP 20/40 Integration of OpenCV samples Application RTLib Application OpenCV Integrate OpenCV provided samples into the AEM such that the output accuracy/quality depends on the resources assigned by the BarbequeRTRM (e.g. FaceDetection). C/C++, OpenCV Students 1 5 CFU

21 BOSP 21/40 Integration of Approximate Computing applications Application RTLib Application Modify an Approximate Computing application according to the Abstract Execution Model to make it run-time manageable. The application output accuracy must depend on the amount of resources assigned by the BarbequeRTRM. C/C++ Students 1 5 CFU

22 BOSP on Android 22/40 Recover and update RTLib integration C/C++ BarbequeRTRM RTLib Application Re-enable the possibility of managing Android applications integrated in the Abstract Execution Model Java, C++, JNI, Android Students 1 Notes Part of the code already available

23 BOSP on Android 23/40 Android process/vm migration evaluation A A Evaluate the possibility of migrating Android processes (VM) from one device to another by exploiting the Checkpoint/Restore in Userspace (CRIU) framework. Android, Linux Students 1 Notes // Possible to extend into a thesis

24 BOSP on Heterogeneous Systems 24/40 Who does assign devices to applications?

25 BOSP on Heterogeneous Systems 25/40 New heterogeneous resource allocation policy BarbequeRTRM CPU0 CPU1... Policy allocates GPU0 GPU1 Make experiments on CPU+GPU systems to implement a new resource allocation policies for OpenCL applications and heterogeneous system C/C++, OpenCL Students 1 Notes 10 CFU Possible to extend into a master thesis

26 BOSP on Heterogeneous Systems 26/40 Integration of RODINIA OpenCL benchmarks Application RTLib Application Modify a Rodinia OpenCL benchmark according to Abstract Execution Model to make it run-time manageable C/C++, OpenCL Students 1 5 CFU Notes

27 BOSP on Heterogeneous Systems 27/40 OpenCL on ARM CPU Compile a OpenCL runtime (pocl) on a ARM board. Run some benchmarks. Provide a comparison in terms of execution time, power/energy consumption. C/C++, OpenCL, Scripting languages Students 1 5/10 CFU

28 Open MPI 28/40 The MIG framework Migration of MPI application processes over multiple nodes of a distributed system MIG Process 1 Process 2 Process 3 Process 4 Process 5 Process 6 orted orted system0 system1... systemn Giuseppe Massari

29 MIG Open MPI framework 29/40 Rebase on top of Open MPI version 2.0 Open MPI 2.0 MIG The MIG framework for Open MPI must be rebased from version 1.7 to version 2.0 C/C++ Students 1-2 (suggested 2) 5/10 CFU

30 MIG Open MPI framework 30/40 Exploiting MIG implementing a new BarbequeRTRM policy BarbequeRTRM MIG Implement a centralized resource allocation policy to manage the migration/placement of MPI application processes C/C++ Students 1-2 5/10 CFU Notes Possible to extend into a master thesis

31 MIG Open MPI framework 31/40 Improve the framework performance with advanced techniques MIG Reduce the framework overhead by introducing some suitable smart techniques C/C++ Students 1-2 Notes 5/10 CFU Possible to extend into a master thesis / publication

32 MIG Open MPI framework 32/40 [Thesis] OpenMPI run-time application profiling support MIG Design and implement a run-time profiling framework for MPI applications that would increase the possibilities of performing dynamic resource management / process migrations C/C++ Students 1-2 5/10 CFU

33 MIG Open MPI framework 33/40 [Thesis] Development of the InfiniBand support MIG The MIG framework is lacking of the support for migrating application processes over nodes interconnected through InfiniBand C/C++ Students 1 5/10 CFU

34 34/40 STM32F4 development board + Miosix real-time operating system

35 STM32F4 + Miosix 35/40 Electronic pendulum Goal Form a pendulum with the stm32f4discovery and the USB cable Using the accelerometer, measure the pendulum period and estimate the cable length C/C++ Students 1-2 5/10 CFU Notes -

36 STM32F4 + Miosix 36/40 Tone generator Generate a sinewave tone on the headphone output of a given frequency Goal - C/C++ Students CFU Notes -

37 STM32F4 + Miosix 37/40 Audio compression Development of a firmware to compress the audio coming from the on-board microphone and send it to a serial port Goal - C/C++ Students 1-2 Notes 5 CFU The MIOSIX microphone driver is already provided

38 STM32F4 + Miosix 38/40 Port Miosix to a new STM32 board Make the Miosix kernel run on a new stm32 development board Goal - C/C++ Students 1-2 Notes 5/10 CFU A special board will be provided

39 STM32F4 + Miosix 39/40 Port Mxgui to a new development board Goal - Students 1-2 Mxgui is the Miosix library to handle displays. Add support for the display of the new stm32 development board C/C++ Notes 5/10 CFU A special board will be provided. Requires the previous project to be completed

40 Monographic Research 40/40 Topics Linux Cgroups version 2.0 Real-time scheduling algorithms Fault-detection frameworks / techniques Task scheduling on heterogeneous systems Memory management on heterogeneous systems... Giuseppe Massari

HARPA. HARPA-OS Engine, Final Release. Giuseppe Massari, Simone Libutti, William Fornaciari. HARPA Harnessing Performance Variability

HARPA. HARPA-OS Engine, Final Release. Giuseppe Massari, Simone Libutti, William Fornaciari. HARPA Harnessing Performance Variability HARPA Harnessing Performance Variability HARPA Harnessing Performance Variability Project ref. Call ref. Activity FP7-612069 FP7-ICT-2013-10 ICT-10-3.4 HARPA-OS Engine, Final Release Giuseppe Massari,

More information

F28HS Hardware-Software Interface: Systems Programming

F28HS Hardware-Software Interface: Systems Programming F28HS Hardware-Software Interface: Systems Programming Hans-Wolfgang Loidl School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh Semester 2 2017/18 0 No proprietary software has

More information

Ten (or so) Small Computers

Ten (or so) Small Computers Ten (or so) Small Computers by Jon "maddog" Hall Executive Director Linux International and President, Project Cauã 1 of 50 Who Am I? Half Electrical Engineer, Half Business, Half Computer Software In

More information

GPGPU on ARM. Tom Gall, Gil Pitney, 30 th Oct 2013

GPGPU on ARM. Tom Gall, Gil Pitney, 30 th Oct 2013 GPGPU on ARM Tom Gall, Gil Pitney, 30 th Oct 2013 Session Description This session will discuss the current state of the art of GPGPU technologies on ARM SoC systems. What standards are there? Where are

More information

Integrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM

Integrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM Integrating CPU and GPU, The ARM Methodology Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM The ARM Business Model Global leader in the development of

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

Position Paper: OpenMP scheduling on ARM big.little architecture

Position Paper: OpenMP scheduling on ARM big.little architecture Position Paper: OpenMP scheduling on ARM big.little architecture Anastasiia Butko, Louisa Bessad, David Novo, Florent Bruguier, Abdoulaye Gamatié, Gilles Sassatelli, Lionel Torres, and Michel Robert LIRMM

More information

Embedded Systems: Projects

Embedded Systems: Projects December 2015 Embedded Systems: Projects Davide Zoni PhD email: davide.zoni@polimi.it webpage: home.dei.polimi.it/zoni Research Activities Interconnect: bus, NoC Simulation (component design, evaluation)

More information

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE Haibo Xie, Ph.D. Chief HSA Evangelist AMD China OUTLINE: The Challenges with Computing Today Introducing Heterogeneous System Architecture (HSA)

More information

Elaborazione dati real-time su architetture embedded many-core e FPGA

Elaborazione dati real-time su architetture embedded many-core e FPGA Elaborazione dati real-time su architetture embedded many-core e FPGA DAVIDE ROSSI A L E S S A N D R O C A P O T O N D I G I U S E P P E T A G L I A V I N I A N D R E A M A R O N G I U C I R I - I C T

More information

Open Compute Stack (OpenCS) Overview. D.D. Nikolić Updated: 20 August 2018 DAE Tools Project,

Open Compute Stack (OpenCS) Overview. D.D. Nikolić Updated: 20 August 2018 DAE Tools Project, Open Compute Stack (OpenCS) Overview D.D. Nikolić Updated: 20 August 2018 DAE Tools Project, http://www.daetools.com/opencs What is OpenCS? A framework for: Platform-independent model specification 1.

More information

Building supercomputers from embedded technologies

Building supercomputers from embedded technologies http://www.montblanc-project.eu Building supercomputers from embedded technologies Alex Ramirez Barcelona Supercomputing Center Technical Coordinator This project and the research leading to these results

More information

UTILIZING A BIG.LITTLE TM SOLUTION IN AUTOMOTIVE

UTILIZING A BIG.LITTLE TM SOLUTION IN AUTOMOTIVE UTILIZING A BIG.LITTLE TM SOLUTION IN AUTOMOTIVE JUN. 20, 2018 YOSHIYUKI ITO AUTOMOTIVE INFORMATION SOLUTION BUSINESS DIVISION RENESAS ELECTRONICS CORPORATION Today s Topics & Goal Requirement for big.little

More information

Pedraforca: a First ARM + GPU Cluster for HPC

Pedraforca: a First ARM + GPU Cluster for HPC www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu

More information

Energy Efficiency Tuning: READEX. Madhura Kumaraswamy Technische Universität München

Energy Efficiency Tuning: READEX. Madhura Kumaraswamy Technische Universität München Energy Efficiency Tuning: READEX Madhura Kumaraswamy Technische Universität München Project Overview READEX Starting date: 1. September 2015 Duration: 3 years Runtime Exploitation of Application Dynamism

More information

Portable Power/Performance Benchmarking and Analysis with WattProf

Portable Power/Performance Benchmarking and Analysis with WattProf Portable Power/Performance Benchmarking and Analysis with WattProf Amir Farzad, Boyana Norris University of Oregon Mohammad Rashti RNET Technologies, Inc. Motivation Energy efficiency is becoming increasingly

More information

The Benefits of GPU Compute on ARM Mali GPUs

The Benefits of GPU Compute on ARM Mali GPUs The Benefits of GPU Compute on ARM Mali GPUs Tim Hartley 1 SEMICON Europa 2014 ARM Introduction World leading semiconductor IP Founded in 1990 1060 processor licenses sold to more than 350 companies >

More information

Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain)

Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain) Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain) 4th IEEE International Workshop of High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB

More information

Presented By: Gregory M. Kurtzer HPC Systems Architect Lawrence Berkeley National Laboratory CONTAINERS IN HPC WITH SINGULARITY

Presented By: Gregory M. Kurtzer HPC Systems Architect Lawrence Berkeley National Laboratory CONTAINERS IN HPC WITH SINGULARITY Presented By: Gregory M. Kurtzer HPC Systems Architect Lawrence Berkeley National Laboratory gmkurtzer@lbl.gov CONTAINERS IN HPC WITH SINGULARITY A QUICK REVIEW OF THE LANDSCAPE Many types of virtualization

More information

Matrix. Get Started Guide

Matrix. Get Started Guide Matrix Get Started Guide Overview Matrix is a single board mini computer based on ARM with a wide range of interface, equipped with a powerful i.mx6 Freescale processor, it can run Android, Linux and other

More information

Computing on Low Power SoC Architecture

Computing on Low Power SoC Architecture + Computing on Low Power SoC Architecture Andrea Ferraro INFN-CNAF Lucia Morganti INFN-CNAF + Outline 2 Modern Low Power Systems on Chip Computing on System on Chip ARM CPU SoC GPU Low Power from Intel

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware

More information

Android System Development Training 4-day session

Android System Development Training 4-day session Android System Development Training 4-day session Title Android System Development Training Overview Understanding the Android Internals Understanding the Android Build System Customizing Android for a

More information

MediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency

MediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency MediaTek CorePilot 2.0 Heterogeneous Computing Technology Delivering extreme compute performance with maximum power efficiency In July 2013, MediaTek delivered the industry s first mobile system on a chip

More information

8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2

8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2 CSE 820 Graduate Computer Architecture Richard Enbody Dr. Enbody 1 st Day 2 1 Why Computer Architecture? Improve coding. Knowledge to make architectural choices. Ability to understand articles about architecture.

More information

Next Generation Visual Computing

Next Generation Visual Computing Next Generation Visual Computing (Making GPU Computing a Reality with Mali ) Taipei, 18 June 2013 Roberto Mijat ARM Addressing Computational Challenges Trends Growing display sizes and resolutions Increasing

More information

MediaTek CorePilot. Heterogeneous Multi-Processing Technology. Delivering extreme compute performance with maximum power efficiency

MediaTek CorePilot. Heterogeneous Multi-Processing Technology. Delivering extreme compute performance with maximum power efficiency MediaTek CorePilot Heterogeneous Multi-Processing Technology Delivering extreme compute performance with maximum power efficiency In July 2013, MediaTek delivered the industry s first mobile system on

More information

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017!

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017! Advanced Topics on Heterogeneous System Architectures HSA Foundation! Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2

More information

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014 Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline

More information

OP2 FOR MANY-CORE ARCHITECTURES

OP2 FOR MANY-CORE ARCHITECTURES OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC

More information

There s STILL plenty of room at the bottom! Andreas Olofsson

There s STILL plenty of room at the bottom! Andreas Olofsson There s STILL plenty of room at the bottom! Andreas Olofsson 1 Richard Feynman s Lecture (1959) There's Plenty of Room at the Bottom An Invitation to Enter a New Field of Physics Why cannot we write the

More information

Barcelona Supercomputing Center

Barcelona Supercomputing Center www.bsc.es Barcelona Supercomputing Center Centro Nacional de Supercomputación EMIT 2016. Barcelona June 2 nd, 2016 Barcelona Supercomputing Center Centro Nacional de Supercomputación BSC-CNS objectives:

More information

Raspberry Pi Introduction

Raspberry Pi Introduction ECE 1160/2160 Embedded Systems Design Raspberry Pi Introduction Wei Gao ECE 1160/2160 Embedded Systems Design 1 Raspberry Pi Classic embedded computer Single board computer Size of a credit card ECE 1160/2160

More information

IBM High Performance Computing Toolkit

IBM High Performance Computing Toolkit IBM High Performance Computing Toolkit Pidad D'Souza (pidsouza@in.ibm.com) IBM, India Software Labs Top 500 : Application areas (November 2011) Systems Performance Source : http://www.top500.org/charts/list/34/apparea

More information

HSA foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room A. Alario! 23 November, 2015!

HSA foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room A. Alario! 23 November, 2015! Advanced Topics on Heterogeneous System Architectures HSA foundation! Politecnico di Milano! Seminar Room A. Alario! 23 November, 2015! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2

More information

Matrix. Get Started Guide V2.0

Matrix. Get Started Guide V2.0 Matrix Get Started Guide V2.0 Overview Matrix is a single board mini computer based on ARM with a wide range of interface, equipped with a powerful i.mx6 Freescale processor, it can run Android, Linux,

More information

CUDA GPGPU Workshop 2012

CUDA GPGPU Workshop 2012 CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline

More information

DRM(Direct Rendering Manager) of Tizen Kernel Joonyoung Shim

DRM(Direct Rendering Manager) of Tizen Kernel Joonyoung Shim DRM(Direct Rendering Manager) of Tizen Kernel Joonyoung Shim jy0922.shim@samsung.com Contents What is DRM Why DRM What can we do How to implement Tizen kernel DRM Exynos DRM driver Future work 2 What is

More information

R goes Mobile: Efficient Scheduling for Parallel R Programs on Heterogeneous Embedded Systems

R goes Mobile: Efficient Scheduling for Parallel R Programs on Heterogeneous Embedded Systems R goes Mobile: Efficient Scheduling for Parallel R Programs on Heterogeneous Embedded Systems, Andreas Lang Olaf Neugebauer, Peter Marwedel 03/07/2017 SFB 876 Parallel Machine Learning Algorithms Challenge:

More information

7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT

7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT 7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT Draft Printed for SECO Murex S.A.S 2012 all rights reserved Murex Analytics Only global vendor of trading, risk management and processing systems focusing also

More information

HETEROGENEOUS MEMORY MANAGEMENT. Linux Plumbers Conference Jérôme Glisse

HETEROGENEOUS MEMORY MANAGEMENT. Linux Plumbers Conference Jérôme Glisse HETEROGENEOUS MEMORY MANAGEMENT Linux Plumbers Conference 2018 Jérôme Glisse EVERYTHING IS A POINTER All data structures rely on pointers, explicitly or implicitly: Explicit in languages like C, C++,...

More information

Design Space Exploration and Application Autotuning for Runtime Adaptivity in Multicore Architectures

Design Space Exploration and Application Autotuning for Runtime Adaptivity in Multicore Architectures Design Space Exploration and Application Autotuning for Runtime Adaptivity in Multicore Architectures Cristina Silvano Politecnico di Milano cristina.silvano@polimi.it Outline Research challenges in multicore

More information

European energy efficient supercomputer project

European energy efficient supercomputer project http://www.montblanc-project.eu European energy efficient supercomputer project Simon McIntosh-Smith University of Bristol (Based on slides from Alex Ramirez, BSC) Disclaimer: Speaking for myself... All

More information

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, Mateo Valero SC 13, November 19 th 2013, Denver, CO, USA

More information

Building supercomputers from commodity embedded chips

Building supercomputers from commodity embedded chips http://www.montblanc-project.eu Building supercomputers from commodity embedded chips Alex Ramirez Barcelona Supercomputing Center Technical Coordinator This project and the research leading to these results

More information

An Introduction to the SPEC High Performance Group and their Benchmark Suites

An Introduction to the SPEC High Performance Group and their Benchmark Suites An Introduction to the SPEC High Performance Group and their Benchmark Suites Robert Henschel Manager, Scientific Applications and Performance Tuning Secretary, SPEC High Performance Group Research Technologies

More information

Butterfly effect of porting scientific applications to ARM-based platforms

Butterfly effect of porting scientific applications to ARM-based platforms montblanc-project.eu @MontBlanc_EU Butterfly effect of porting scientific applications to ARM-based platforms Filippo Mantovani September 12 th, 2017 This project has received funding from the European

More information

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Scalasca support for Intel Xeon Phi Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Overview Scalasca performance analysis toolset support for MPI & OpenMP

More information

The Mont-Blanc Project

The Mont-Blanc Project http://www.montblanc-project.eu The Mont-Blanc Project Daniele Tafani Leibniz Supercomputing Centre 1 Ter@tec Forum 26 th June 2013 This project and the research leading to these results has received funding

More information

TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT

TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT Eric Kelmelis 28 March 2018 OVERVIEW BACKGROUND Evolution of processing hardware CROSS-PLATFORM KERNEL DEVELOPMENT Write once, target multiple hardware

More information

Accelerating sequential computer vision algorithms using commodity parallel hardware

Accelerating sequential computer vision algorithms using commodity parallel hardware Accelerating sequential computer vision algorithms using commodity parallel hardware Platform Parallel Netherlands GPGPU-day, 28 June 2012 Jaap van de Loosdrecht NHL Centre of Expertise in Computer Vision

More information

Heterogeneous Architecture. Luca Benini

Heterogeneous Architecture. Luca Benini Heterogeneous Architecture Luca Benini lbenini@iis.ee.ethz.ch Intel s Broadwell 03.05.2016 2 Qualcomm s Snapdragon 810 03.05.2016 3 AMD Bristol Ridge Departement Informationstechnologie und Elektrotechnik

More information

Software Driven Verification at SoC Level. Perspec System Verifier Overview

Software Driven Verification at SoC Level. Perspec System Verifier Overview Software Driven Verification at SoC Level Perspec System Verifier Overview June 2015 IP to SoC hardware/software integration and verification flows Cadence methodology and focus Applications (Basic to

More information

Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases. Steve Steele, ARM

Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases. Steve Steele, ARM Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases Steve Steele, ARM 1 Today s Computational Challenges Trends Growing display sizes and resolutions, richer

More information

ECE571: Advanced Microprocessor Design Final Project Spring Officially Due: Friday, 4 May 2018 (Last day of Classes)

ECE571: Advanced Microprocessor Design Final Project Spring Officially Due: Friday, 4 May 2018 (Last day of Classes) Overview: ECE571: Advanced Microprocessor Design Final Project Spring 2018 Officially Due: Friday, 4 May 2018 (Last day of Classes) Design a project that explores the power, energy, and/or performance

More information

Exploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API

Exploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API EuroPAR 2016 ROME Workshop Exploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API Suyang Zhu 1, Sunita Chandrasekaran 2, Peng Sun 1, Barbara Chapman 1, Marcus Winter 3,

More information

Parallel and Distributed Computing

Parallel and Distributed Computing Parallel and Distributed Computing NUMA; OpenCL; MapReduce José Monteiro MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer Science and Engineering

More information

The Mont-Blanc approach towards Exascale

The Mont-Blanc approach towards Exascale http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are

More information

Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing

Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing Gaurav Mitra Andrew Haigh Luke Angove Anish Varghese Eric McCreath Alistair P. Rendell Research School of Computer Science Australian

More information

«UNDERSTANDING EMBEDDED LINUX BENCHMARKING USING KERNEL TRACE ANALYSIS» ALEXIS MARTIN INRIA / LIG / UNIV. GRENOBLE, FRANCE

«UNDERSTANDING EMBEDDED LINUX BENCHMARKING USING KERNEL TRACE ANALYSIS» ALEXIS MARTIN INRIA / LIG / UNIV. GRENOBLE, FRANCE «UNDERSTANDING EMBEDDED LINUX BENCHMARKING USING KERNEL TRACE ANALYSIS» ALEXIS MARTIN INRIA / LIG / UNIV. GRENOBLE, FRANCE We do Need Benchmarking! Benchmark : a standard or point of reference against

More information

AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016

AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016 AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING BILL.BRANTLEY@AMD.COM, FELLOW 3 OCTOBER 2016 AMD S VISION FOR EXASCALE COMPUTING EMBRACING HETEROGENEITY CHAMPIONING OPEN SOLUTIONS ENABLING LEADERSHIP

More information

POWER-AWARE SOFTWARE ON ARM. Paul Fox

POWER-AWARE SOFTWARE ON ARM. Paul Fox POWER-AWARE SOFTWARE ON ARM Paul Fox OUTLINE MOTIVATION LINUX POWER MANAGEMENT INTERFACES A UNIFIED POWER MANAGEMENT SYSTEM EXPERIMENTAL RESULTS AND FUTURE WORK 2 MOTIVATION MOTIVATION» ARM SoCs designed

More information

DEVELOPMENT GUIDE VAB-630. Android BSP v

DEVELOPMENT GUIDE VAB-630. Android BSP v DEVELOPMENT GUIDE VAB-630 Android BSP v1.0.3 1.00-08112017-153900 Copyright Copyright 2017 VIA Technologies Incorporated. All rights reserved. No part of this document may be reproduced, transmitted, transcribed,

More information

Java Embedded on ARM

Java Embedded on ARM Java Embedded on ARM The Embedded Market Evolving Rapidly Internet of Things 2.3B Internet Users Cloud for Embedded Devices Med-Large Embedded Multi-function Devices Enterprise Data and Applications Up

More information

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM I5 AND I7 PROCESSORS Juan M. Cebrián 1 Lasse Natvig 1 Jan Christian Meyer 2 1 Depart. of Computer and Information

More information

Integrating DMA capabilities into BLIS for on-chip data movement. Devangi Parikh Ilya Polkovnichenko Francisco Igual Peña Murtaza Ali

Integrating DMA capabilities into BLIS for on-chip data movement. Devangi Parikh Ilya Polkovnichenko Francisco Igual Peña Murtaza Ali Integrating DMA capabilities into BLIS for on-chip data movement Devangi Parikh Ilya Polkovnichenko Francisco Igual Peña Murtaza Ali 5 Generations of TI Multicore Processors Keystone architecture Lowers

More information

trisycl Open Source C++17 & OpenMP-based OpenCL SYCL prototype Ronan Keryell 05/12/2015 IWOCL 2015 SYCL Tutorial Khronos OpenCL SYCL committee

trisycl Open Source C++17 & OpenMP-based OpenCL SYCL prototype Ronan Keryell 05/12/2015 IWOCL 2015 SYCL Tutorial Khronos OpenCL SYCL committee trisycl Open Source C++17 & OpenMP-based OpenCL SYCL prototype Ronan Keryell Khronos OpenCL SYCL committee 05/12/2015 IWOCL 2015 SYCL Tutorial OpenCL SYCL committee work... Weekly telephone meeting Define

More information

RapidIO.org Update. Mar RapidIO.org 1

RapidIO.org Update. Mar RapidIO.org 1 RapidIO.org Update rickoco@rapidio.org Mar 2015 2015 RapidIO.org 1 Outline RapidIO Overview & Markets Data Center & HPC Communications Infrastructure Industrial Automation Military & Aerospace RapidIO.org

More information

AIST Super Green Cloud

AIST Super Green Cloud AIST Super Green Cloud A build-once-run-everywhere high performance computing platform Takahiro Hirofuchi, Ryosei Takano, Yusuke Tanimura, Atsuko Takefusa, and Yoshio Tanaka Information Technology Research

More information

Quick Start Guide. SABRE Platform for Smart Devices Based on the i.mx 6 Series

Quick Start Guide. SABRE Platform for Smart Devices Based on the i.mx 6 Series Quick Start Guide SABRE Platform for Smart Devices Based on the i.mx 6 Series Quick Start Guide About the SABRE Platform for Smart Devices Based on the i.mx 6 Series The Smart Application Blueprint for

More information

HPC projects. Grischa Bolls

HPC projects. Grischa Bolls HPC projects Grischa Bolls Outline Why projects? 7th Framework Programme Infrastructure stack IDataCool, CoolMuc Mont-Blanc Poject Deep Project Exa2Green Project 2 Why projects? Pave the way for exascale

More information

OmpCloud: Bridging the Gap between OpenMP and Cloud Computing

OmpCloud: Bridging the Gap between OpenMP and Cloud Computing OmpCloud: Bridging the Gap between OpenMP and Cloud Computing Hervé Yviquel, Marcio Pereira and Guido Araújo University of Campinas (UNICAMP), Brazil A bit of background qguido Araujo, PhD Princeton University

More information

Connected Component Labelling, an embarrassingly sequential algorithm

Connected Component Labelling, an embarrassingly sequential algorithm Connected Component Labelling, an embarrassingly sequential algorithm Platform Parallel Netherlands GPGPU-day, 20 June 203 Jaap van de Loosdrecht NHL Centre of Expertise in Computer Vision Van de Loosdrecht

More information

GPU Debugging Made Easy. David Lecomber CTO, Allinea Software

GPU Debugging Made Easy. David Lecomber CTO, Allinea Software GPU Debugging Made Easy David Lecomber CTO, Allinea Software david@allinea.com Allinea Software HPC development tools company Leading in HPC software tools market Wide customer base Blue-chip engineering,

More information

THE LEADER IN VISUAL COMPUTING

THE LEADER IN VISUAL COMPUTING MOBILE EMBEDDED THE LEADER IN VISUAL COMPUTING 2 TAKING OUR VISION TO REALITY HPC DESIGN and VISUALIZATION AUTO GAMING 3 BEST DEVELOPER EXPERIENCE Tools for Fast Development Debug and Performance Tuning

More information

Efficient Programming for Multicore Processor Heterogeneity: OpenMP Versus OmpSs

Efficient Programming for Multicore Processor Heterogeneity: OpenMP Versus OmpSs Efficient Programming for Multicore Processor Heterogeneity: OpenMP Versus OmpSs Anastasiia Butko, Lawrence Berkeley National Laboratory F. Bruguier, A. Gamatié, G Sassatelli, LIRMM/CNRS/UM 2 Heterogeneity:

More information

n N c CIni.o ewsrg.au

n N c CIni.o ewsrg.au @NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU

More information

Cut Power Consumption by 5x Without Losing Performance

Cut Power Consumption by 5x Without Losing Performance Cut Power Consumption by 5x Without Losing Performance A big.little Software Strategy Klaas van Gend FAE, Trainer & Consultant The mandatory Klaas-in-a-Plane picture 2 October 10, 2014 LINUXCON EUROPE

More information

FiPS and M2DC: Novel Architectures for Reconfigurable Hyperscale Servers

FiPS and M2DC: Novel Architectures for Reconfigurable Hyperscale Servers FiPS and M2DC: Novel Architectures for Reconfigurable Hyperscale Servers Rene Griessl, Meysam Peykanu, Lennart Tigges, Jens Hagemeyer, Mario Porrmann Center of Excellence Cognitive Interaction Technology

More information

Introduction. Lecture 1. Operating Systems Practical. 5 October 2016

Introduction. Lecture 1. Operating Systems Practical. 5 October 2016 Introduction Lecture 1 Operating Systems Practical 5 October 2016 This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

More information

How GPUs can find your next hit: Accelerating virtual screening with OpenCL. Simon Krige

How GPUs can find your next hit: Accelerating virtual screening with OpenCL. Simon Krige How GPUs can find your next hit: Accelerating virtual screening with OpenCL Simon Krige ACS 2013 Agenda > Background > About blazev10 > What is a GPU? > Heterogeneous computing > OpenCL: a framework for

More information

Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes

Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes www.bsc.es Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes Vishal Mehta Engineer, Barcelona Supercomputing Center vishal.mehta@bsc.es BSC/UPC CUDA Centre of Excellence (CCOE) Training

More information

Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS

Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS Who am I? Education Master of Technology, NTNU, 2007 PhD, NTNU, 2010. Title: «Managing Shared Resources in Chip Multiprocessor Memory

More information

Kontron s ARM-based COM solutions and software services

Kontron s ARM-based COM solutions and software services Kontron s ARM-based COM solutions and software services Peter Müller Product Line Manager COMs Kontron Munich, 4 th July 2012 Kontron s ARM Strategy Why ARM COMs? How? new markets for mobile applications

More information

Overview of research activities Toward portability of performance

Overview of research activities Toward portability of performance Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into

More information

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013 A Closer Look at the Epiphany IV 28nm 64 core Coprocessor Andreas Olofsson PEGPUM 2013 1 Adapteva Achieves 3 World Firsts 1. First processor company to reach 50 GFLOPS/W 3. First semiconductor company

More information

A Case for High Performance Computing with Virtual Machines

A Case for High Performance Computing with Virtual Machines A Case for High Performance Computing with Virtual Machines Wei Huang*, Jiuxing Liu +, Bulent Abali +, and Dhabaleswar K. Panda* *The Ohio State University +IBM T. J. Waston Research Center Presentation

More information

ARM big.little Technology Unleashed An Improved User Experience Delivered

ARM big.little Technology Unleashed An Improved User Experience Delivered ARM big.little Technology Unleashed An Improved User Experience Delivered Govind Wathan Product Specialist Cortex -A Mobile & Consumer CPU Products 1 Agenda Introduction to big.little Technology Benefits

More information

Tessellation: Space-Time Partitioning in a Manycore Client OS

Tessellation: Space-Time Partitioning in a Manycore Client OS Tessellation: Space-Time ing in a Manycore Client OS Rose Liu 1,2, Kevin Klues 1, Sarah Bird 1, Steven Hofmeyr 3, Krste Asanovic 1, John Kubiatowicz 1 1 Parallel Computing Laboratory, UC Berkeley 2 Data

More information

Code Generation for Embedded Heterogeneous Architectures on Android

Code Generation for Embedded Heterogeneous Architectures on Android This is the author s version of the work. The definitive work was published in Proceedings of the Conference on Design, Automation and Test in Europe (DATE), Dresden, Germany, March 24-28, 2014. Code Generation

More information

SDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center

SDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center SDAccel Environment The Xilinx SDAccel Development Environment Bringing The Best Performance/Watt to the Data Center Introduction Data center operators constantly seek more server performance. Currently

More information

A unified multicore programming model

A unified multicore programming model A unified multicore programming model Simplifying multicore migration By Sven Brehmer Abstract There are a number of different multicore architectures and programming models available, making it challenging

More information

VIProf: A Vertically Integrated Full-System Profiler

VIProf: A Vertically Integrated Full-System Profiler VIProf: A Vertically Integrated Full-System Profiler NGS Workshop, April 2007 Hussam Mousa Chandra Krintz Lamia Youseff Rich Wolski RACELab Research Dynamic software adaptation As program behavior or resource

More information

The Use of Cloud Computing Resources in an HPC Environment

The Use of Cloud Computing Resources in an HPC Environment The Use of Cloud Computing Resources in an HPC Environment Bill, Labate, UCLA Office of Information Technology Prakashan Korambath, UCLA Institute for Digital Research & Education Cloud computing becomes

More information

SIGGRAPH Briefing August 2014

SIGGRAPH Briefing August 2014 Copyright Khronos Group 2014 - Page 1 SIGGRAPH Briefing August 2014 Neil Trevett VP Mobile Ecosystem, NVIDIA President, Khronos Copyright Khronos Group 2014 - Page 2 Significant Khronos API Ecosystem Advances

More information

Energy-Efficient Run-time Mapping and Thread Partitioning of Concurrent OpenCL Applications on CPU-GPU MPSoCs

Energy-Efficient Run-time Mapping and Thread Partitioning of Concurrent OpenCL Applications on CPU-GPU MPSoCs Energy-Efficient Run-time Mapping and Thread Partitioning of Concurrent OpenCL Applications on CPU-GPU MPSoCs AMIT KUMAR SINGH, University of Southampton ALOK PRAKASH, Nanyang Technological University

More information

Porting of Real-Time Publish-Subscribe Middleware to Android

Porting of Real-Time Publish-Subscribe Middleware to Android M.Vajnar, M. Sojka, P. Píša Czech Technical University in Prague Porting of Real-Time Publish-Subscribe Middleware to Android RTLWS15, Lugano-Manno Distributed applications problems 2/23 Distributed applications

More information

UnCovert: Evaluating thermal covert channels on Android systems. Pascal Wild

UnCovert: Evaluating thermal covert channels on Android systems. Pascal Wild UnCovert: Evaluating thermal covert channels on Android systems Pascal Wild August 5, 2016 Contents Introduction v 1: Framework 1 1.1 Source...................................... 1 1.2 Sink.......................................

More information

Enabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager

Enabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager Enabling a Richer Multimedia Experience with GPU Compute Roberto Mijat Visual Computing Marketing Manager 1 What is GPU Compute Operating System and most application processing continue to reside on the

More information

Introduction to gem5. Nizamudheen Ahmed Texas Instruments

Introduction to gem5. Nizamudheen Ahmed Texas Instruments Introduction to gem5 Nizamudheen Ahmed Texas Instruments 1 Introduction A full-system computer architecture simulator Open source tool focused on architectural modeling BSD license Encompasses system-level

More information