Efficient Programming for Multicore Processor Heterogeneity: OpenMP Versus OmpSs
|
|
- Dominick Berry
- 5 years ago
- Views:
Transcription
1 Efficient Programming for Multicore Processor Heterogeneity: OpenMP Versus OmpSs Anastasiia Butko, Lawrence Berkeley National Laboratory F. Bruguier, A. Gamatié, G Sassatelli, LIRMM/CNRS/UM
2 2 Heterogeneity: a gateway to lower power and higher energy efficiency Homogeneous architecture Binary compatibility A different binary for each architecture Heterogeneous architecture Homogeneous multicore Single-ISA, heterogeneous multi-core [Kumar MICRO 03] Multiple-ISA, heterogeneous multi-core + accelerators (FPGA, GPUs, etc.) Higher energy efficiency Easier programmability Heterogeneity
3 3 Heterogeneity: a gateway to lower power and higher energy efficiency Homogeneous architecture Binary compatibility A different binary for each architecture Heterogeneous architecture Homogeneous multicore Single-ISA, heterogeneous multi-core [Kumar MICRO 03] Multiple-ISA, heterogeneous multi-core + accelerators (FPGA, GPUs, etc.) Higher energy efficiency Easier programmability Heterogeneity
4 4 Single-ISA heterogeneous architecture: big.little technology µelj%&foxvwhu A15 A15 µ/,77/(%&foxvwhu A7 A7 A15 A15 A7 A7 L2 L2 Interconnect Memory
5 5 Programming models: OpenMP versus OmpSs STATIC DYNAMIC big core LITTLE core Execution time big core LITTLE core AB CD E F Execution time GUIDED big core LITTLE core ABD E F C G D S Execution time
6 6 Programming models: OpenMP versus OmpSs STATIC Task Dependency Graph DYNAMIC big core LITTLE core Execution time big core LITTLE core AB CD E F Execution time Critical task queue GUIDED big core LITTLE core ABD E F C G D S Execution time priority Non-Critical task queue priority
7 7 Motivation Example!"#$%&'()*&+,-./'0& :*$;<(#=&<(-*&>/*$#=?' '<%(*)?,.*=&'&+'<)<($0,.*=&'&+?*=)-($0,.*=&'&+#;(?*?0,-./'&+$+,/0 8!2 "## $%;=%&'()*&+,.*=&'0
8 8 Focus areas Q1: OpenMP vs OmpSs? Q2: Chunk size & Block size? Q3: Asymmetry in heterogeneous architectures?
9 9 Approach Odroid XU3 board Gem5/McPAT + McPAT Robust results Fast Not flexible Acceptable accuracy (validated big.little mode) Flexible Low similation speed
10 10 Experimental scenarios: real board workload
11 11 Experimental scenarios: real board workload opemp ompss
12 12 Experimental scenarios: real board workload opemp ompss chunk size = 1 2 n block size = 1 2 n
13 13 Experimental scenarios: real board workload opemp ompss chunk size = 1 2 n block size = 1 2 n l/l l/l l/l l/l l/l l/l l/h l/h l/h l/h l/h l/h h/l h/l h/l h/l h/l h/l h/h h/h h/h h/h h/h h/h m/m m/m m/m m/m m/m m/m
14 14 Board Results Performance & Energy trade-offs Ompss/openmp, frequency asymmetry, chunk/block size
15 15 Board Results OmpSs outperforms OpenMP in many cases OmpSs provides better performance OpenMP provides better energy
16 16 Board Results OmpSs is more sensitive to granularity than OpenMP Best configuration stays the same across different scenarios (per workload) Block size 1
17 17 Board Results OmpSs is more sensitive to granularity than OpenMP Best configuration stays the same across different scenarios (per workload) Size-dependent behavior differs across different workloads Block size 1
18 18 Board Results Best scenarios with balanced asymmetry Extreme frequency asymmetry is inefficient
19 (e) fluidanimate (Exynos 5 Octa board) 19 (d) cholesky (gem5/mcpat simulation) (c) cholesky (Exynos 5 Octa board) (a) blackscholes (Exynos 5 Octa board) freqmine blackscholes Board Results (g) freqmine (Exynos 5simulation) Octa board) (f) fluidanimate (gem5/mcpat (a) blackscholes (Exynos 5 Octa board) (b) blackscholes (gem5/mcpat (c) cholesky (Exynos 5 simulation) Octa board) lud cholesky (e) fluidanimate (Exynos 5 Octa board) (g) freqmine (Exynos 5 Octa board) (i) (gem5/mcpat lud (Exynos 5 Octa board) (h) freqmine simulation)
20 20 Experimental scenarios: gem5/mcpat workload opemp ompss Optimal chunk size Optimal block size symmetric HMP asymmetric HMP SMP symmetric HMP asymmetric HMP SMP
21 21 gem5/mcpat Results Performance & Energy trade-offs ) (h) freqmine (gem5/mcpat simulation) Ompss/openmp, cluster asymmetry
22 22 RNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 gem5/mcpat Results 4 ()&*"#%+,-#%.&-/0&*+,123 (,&'456&::%*%&,# pss environment is built on top of Mercurium compiler, 0<9 =<9 ch translates the OmpSs annotation clauses to source!"#$%&' =<9 1&;$ e and Nanos++ runtime system that manages task exe(,&'45-!&$;5-9'+1"*# (,&'45-!&$;5-9'+1"*# on. Nanos++ supports several task scheduling policies, Outliers: asymmetric HMP 7A7/1A &#'%* d) freqmine (gem5/mcpat simulation)!"#$%&' ch play a crucial role in efficient(h) program execution. It 825..&#'%* 05..&#'%* nes execution order and resource allocation for readyxecute tasks, i.e. tasks whose dependencies have been!"#$%&' 9&':+'.;,*&6&::%*%&,# sfied. The Criticality-Aware Task Scheduler (CATS) [35] (,&'456#+60+$"#%+,-/7+"$&23 amically detects the longest path of the task dependency ph using bottom-level longest-past priorities. The tasks, ch belong to the longest path, are determined as critical. Fig. 1: Execution time, Energy-to-Solution and Energy Delay re are two queues for ready tasks: (i) critical task queue Product projection. is intended to big cores and (ii) non-critical task queue is intended to LITTLE cores. Study of these programming alternatives is out of the kernel configuration and are related to the Adaptive Supply pe of this work. Here we use common OpenMP dynamic Voltage (ASV) technique used in Samsung SoCs. The operating temperature strongly depends on the eduling. Our work advances state-of-the-art by exploring erogeneous multicore architectures based on validated cluster architecture and application nature. For the Cortexformance and power models of ARM big.little ar- A7 cluster the temperature always remains below 323K ectures. Models are shown to have sufficient accuracy and the board fan stays off. For the Cortex-A15 cluster the mpared to an actual SoC and are further made freely temperature rises above 323K and the board fan is quickly triggered so as to ensure proper cooling. ilable. >?
23 23 RNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 gem5/mcpat Results ()&*"#%+,-#%.&-/0&*+,123 (,&'456&::%*%&,# mpss environment is built on top of Mercurium compiler, 0<9 =<9 ich translates the OmpSs annotation clauses to source!"#$%&' =<9 1&;$ de and Nanos++ runtime system that manages task exe(,&'45-!&$;5-9'+1"*# (,&'45-!&$;5-9'+1"*# ion. Nanos++ supports task scheduling policies, Need betterseveral heterogeneity management 825..&#'%*!"#$%&' ich play a crucial role in efficient program execution. It 825..&#'%* 05..&#'%* fines execution order and resource allocation for readyexecute tasks, i.e. tasks whose dependencies have been!"#$%&' 9&':+'.;,*&6&::%*%&,# isfied. The Criticality-Aware Task Scheduler (CATS) [35] (,&'456#+60+$"#%+,-/7+"$&23 namically detects the longest path of the task dependency ph using bottom-level longest-past priorities. The tasks, ich belong to the longest path, are determined as critical. Fig. 1: Execution time, Energy-to-Solution and Energy Dela ere are two queues for ready tasks: (i) critical task queue Product projection. t is intended to big cores and (ii) non-critical task queue t is intended to LITTLE cores. Study of these programming alternatives is out of the kernel configuration and are related to the Adaptive Suppl pe of this work. Here we use common OpenMP dynamic Voltage (ASV) technique used in Samsung SoCs. The operating temperature strongly depends on th eduling. Our work advances state-of-the-art by exploring erogeneous multicore architectures based on validated cluster architecture and application nature. For the Cortex formance and power models of ARM big.little ar- A7 cluster the temperature always remains below 323K tectures. Models are shown to have sufficient accuracy and the board fan stays off. For the Cortex-A15 cluster th mpared to an actual SoC and are further made freely temperature rises above 323K and the board fan is quickl triggered so as to ensure proper cooling. ailable. >?
24 24 (f) fluidanimate (gem5/mcpat simulation) (e) fluidanimate (Exynos 5 Octa board) gem5/mcpat Results (f) fluidanimate (gem5/mcpat simulation) freqmine blackscholes (e) fluidanimate (Exynos 5 Octa board) (h) freqmine (gem5/mcpat simulation) (b) blackscholes (gem5/mcpat simulation) (g) freqmine (Exynos 5 Octa board) (h) freqmine (gem5/mcpat simulation) lud cholesky (b) blackscholes (gem5/mcpat simulation) (g) freqmine (Exynos 5 Octa board) (i) lud (Exynos 5 Octa board) (d) cholesky (gem5/mcpat simulation) (j) lud (gem5/mcpat simulation)
25 25 Open Source architecture model and workloads: big.little model (ex5) available in gem5 source tree OmpSs/OpenMP precompiled executables & runtime environment
Position Paper: OpenMP scheduling on ARM big.little architecture
Position Paper: OpenMP scheduling on ARM big.little architecture Anastasiia Butko, Louisa Bessad, David Novo, Florent Bruguier, Abdoulaye Gamatié, Gilles Sassatelli, Lionel Torres, and Michel Robert LIRMM
More informationAsymmetry-Aware Work-Stealing Runtimes
Asymmetry-Aware Work-Stealing Runtimes Christopher Torng, Moyang Wang, and Christopher atten School of Electrical and Computer Engineering Cornell University 43rd Int l Symp. on Computer Architecture,
More informationMediaTek CorePilot. Heterogeneous Multi-Processing Technology. Delivering extreme compute performance with maximum power efficiency
MediaTek CorePilot Heterogeneous Multi-Processing Technology Delivering extreme compute performance with maximum power efficiency In July 2013, MediaTek delivered the industry s first mobile system on
More informationDealing with Asymmetry for Performance and Energy Efficiency
Dealing with Asymmetryfor Performance and Energy Efficiency Enrique S. QUINTANA-ORTÍ Motivation Moore s law is alive, but Dennard s scaling is over Motivation Welcome dark silicon and asymmetric architectures
More informationMAGPIE TUTORIAL. Configuration and usage. Abdoulaye Gamatié, Pierre-Yves Péneau. LIRMM / CNRS-UM, Montpellier
MAGPIE TUTORIAL Configuration and usage Abdoulaye Gamatié, Pierre-Yves Péneau LIRMM / CNRS-UM, Montpellier ComPAS Conference, June 2017, Sophia-Antipolis Other contributors: S. Senni, T. Delobelle, Florent
More informationOmpSs Fundamentals. ISC 2017: OpenSuCo. Xavier Teruel
OmpSs Fundamentals ISC 2017: OpenSuCo Xavier Teruel Outline OmpSs brief introduction OmpSs overview and influence in OpenMP Execution model and parallelization approaches Memory model and target copies
More informationMediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency
MediaTek CorePilot 2.0 Heterogeneous Computing Technology Delivering extreme compute performance with maximum power efficiency In July 2013, MediaTek delivered the industry s first mobile system on a chip
More informationHeterogeneous Architecture. Luca Benini
Heterogeneous Architecture Luca Benini lbenini@iis.ee.ethz.ch Intel s Broadwell 03.05.2016 2 Qualcomm s Snapdragon 810 03.05.2016 3 AMD Bristol Ridge Departement Informationstechnologie und Elektrotechnik
More informationExploration of Performance and Energy Trade-offs for Heterogeneous Multicore Architectures
1 Exploration of Performance and Energy Trade-offs for Heterogeneous Multicore Architectures Anastasiia Butko, Florent Bruguier, David Novo, Abdoulaye Gamatié, Gilles Sassatelli LIRMM (CNRS and University
More informationEnergy Efficiency Analysis of Heterogeneous Platforms: Early Experiences
Energy Efficiency Analysis of Heterogeneous Platforms: Early Experiences Youhuizi Li, Weisong Shi, Congfeng Jiang, Jilin Zhang and Jian Wan Key Laboratory of Complex Systems Modeling and Simulation, Hangzhou
More informationDesign Exploration for next Generation High-Performance Manycore On-chip Systems: Application to big.little Architectures
Design Exploration for next Generation High-Performance Manycore On-chip Systems: Application to big.little Architectures Anastasiia Butko, Abdoulaye Gamatié, Gilles Sassatelli, Lionel Torres, Michel Robert
More informationCriticality-Aware Dynamic Task Scheduling for Heterogeneous Architectures
Criticality-Aware Dynamic Task Scheduling for Heterogeneous Architectures Kallia Chronaki, Alejandro Rico, Rosa M. Badia, Eduard Ayguadé, Jesús Labarta, Mateo Valero Barcelona Supercomputing Center, Barcelona,
More informationCut Power Consumption by 5x Without Losing Performance
Cut Power Consumption by 5x Without Losing Performance A big.little Software Strategy Klaas van Gend FAE, Trainer & Consultant The mandatory Klaas-in-a-Plane picture 2 October 10, 2014 LINUXCON EUROPE
More informationExercise: OpenMP Programming
Exercise: OpenMP Programming Multicore programming with OpenMP 19.04.2016 A. Marongiu - amarongiu@iis.ee.ethz.ch D. Palossi dpalossi@iis.ee.ethz.ch ETH zürich Odroid Board Board Specs Exynos5 Octa Cortex
More informationTaskGenX: A Hardware-Software Proposal for Accelerating Task Parallelism
TaskGenX: A Hardware-Software Proposal for Accelerating Task Parallelism Kallia Chronaki, Marc Casas, Miquel Moreto, Jaume Bosch, Rosa M. Badia Barcelona Supercomputing Center, Artificial Intelligence
More informationHSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017!
Advanced Topics on Heterogeneous System Architectures HSA Foundation! Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2
More informationMAGPIE User Guide (version 1.0)
MAGPIE User Guide (version 1.0) June 2017 Authors: Sophiane Senni, Pierre-Yves Péneau, Abdoulaye Gamatié 1 Contents 1 About this guide 3 2 Introduction 4 3 Getting started with MAGPIE 5 3.1 Cross-compiling
More informationIEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS, VOL. XX, NO. X, DECEMBER
IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS, VOL. XX, NO. X, DECEMBER 216 1 Inter-cluster Thread-to-core Mapping and DVFS on Heterogeneous Multi-cores Basireddy Karunakar Reddy, Amit Kumar Singh,
More informationR goes Mobile: Efficient Scheduling for Parallel R Programs on Heterogeneous Embedded Systems
R goes Mobile: Efficient Scheduling for Parallel R Programs on Heterogeneous Embedded Systems, Andreas Lang Olaf Neugebauer, Peter Marwedel 03/07/2017 SFB 876 Parallel Machine Learning Algorithms Challenge:
More informationOmpSs + OpenACC Multi-target Task-Based Programming Model Exploiting OpenACC GPU Kernel
www.bsc.es OmpSs + OpenACC Multi-target Task-Based Programming Model Exploiting OpenACC GPU Kernel Guray Ozen guray.ozen@bsc.es Exascale in BSC Marenostrum 4 (13.7 Petaflops ) General purpose cluster (3400
More informationBarcelona Supercomputing Center
www.bsc.es Barcelona Supercomputing Center Centro Nacional de Supercomputación EMIT 2016. Barcelona June 2 nd, 2016 Barcelona Supercomputing Center Centro Nacional de Supercomputación BSC-CNS objectives:
More informationHardware-Software Codesign. 1. Introduction
Hardware-Software Codesign 1. Introduction Lothar Thiele 1-1 Contents What is an Embedded System? Levels of Abstraction in Electronic System Design Typical Design Flow of Hardware-Software Systems 1-2
More informationBuilding blocks for 64-bit Systems Development of System IP in ARM
Building blocks for 64-bit Systems Development of System IP in ARM Research seminar @ University of York January 2015 Stuart Kenny stuart.kenny@arm.com 1 2 64-bit Mobile Devices The Mobile Consumer Expects
More informationHSA foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room A. Alario! 23 November, 2015!
Advanced Topics on Heterogeneous System Architectures HSA foundation! Politecnico di Milano! Seminar Room A. Alario! 23 November, 2015! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2
More informationBig.LITTLE Processing with ARM Cortex -A15 & Cortex-A7
Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7 Improving Energy Efficiency in High-Performance Mobile Platforms Peter Greenhalgh, ARM September 2011 This paper presents the rationale and design
More informationHardware Hetergeneous Task Scheduling for Task-based Programming Models
www.bsc.es Hardware Hetergeneous Task Scheduling for Task-based Programming Models Xubin Tan OpenMPCon 2018 Advisors: Carlos Álvarez, Daniel Jiménez-González Agenda > Background, Motivation > Picos++ accelerated
More informationIntelligent Power Allocation for Consumer & Embedded Thermal Control
Intelligent Power Allocation for Consumer & Embedded Thermal Control Ian Rickards ARM Ltd, Cambridge UK ELC San Diego 5-April-2016 Existing Linux Thermal Framework Trip1 Trip0 Thermal trip mechanism using
More informationExploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems
Exploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems J.C. Sáez, A. Pousa, F. Castro, D. Chaver y M. Prieto Complutense University of Madrid, Universidad Nacional de la Plata-LIDI
More informationEnergy-Efficient Run-time Mapping and Thread Partitioning of Concurrent OpenCL Applications on CPU-GPU MPSoCs
Energy-Efficient Run-time Mapping and Thread Partitioning of Concurrent OpenCL Applications on CPU-GPU MPSoCs AMIT KUMAR SINGH, University of Southampton ALOK PRAKASH, Nanyang Technological University
More informationExploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API
EuroPAR 2016 ROME Workshop Exploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API Suyang Zhu 1, Sunita Chandrasekaran 2, Peng Sun 1, Barbara Chapman 1, Marcus Winter 3,
More informationMPSOC Design examples
MPSOC 2007 Eshel Haritan, VP Engineering, Inc. 1 MPSOC Design examples Freescale: ARM1136 + StarCore140e Broadcom: ARM11 + ARM9 + TeakLite + accelerators Qualcomm 4 processors + video, gps, wireless, audio
More informationRe-architecting Virtualization in Heterogeneous Multicore Systems
Re-architecting Virtualization in Heterogeneous Multicore Systems Himanshu Raj, Sanjay Kumar, Vishakha Gupta, Gregory Diamos, Nawaf Alamoosa, Ada Gavrilovska, Karsten Schwan, Sudhakar Yalamanchili College
More informationA Trace-driven Approach for Fast and Accurate Simulation of Manycore Architectures
A Trace-driven Approach for Fast and Accurate Simulation of Manycore Architectures Anastasiia Butko, Luciano Ost, Abdoulaye Gamatié, Vianney Lapôtre, Rafael Garibotti and Gilles Sassatelli LIRMM (CNRS
More informationAccurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems
Accurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems Matthew J. Walker*, Stephan Diestelhorst, Geoff V. Merrett* and Bashir M. Al-Hashimi* *University of Southampton Arm Ltd.
More informationOVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI
CMPE 655- MULTIPLE PROCESSOR SYSTEMS OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI What is MULTI PROCESSING?? Multiprocessing is the coordinated processing
More informationDyPO: Dynamic Pareto-Optimal Configuration Selection for Heterogeneous MpSoCs
1 DyPO: Dynamic Pareto-Optimal Configuration Selection for Heterogeneous MpSoCs UJJWAL GUPTA, Arizona State University CHETAN ARVIND PATIL, Arizona State University GANAPATI BHAT, Arizona State University
More informationTake GPU Processing Power Beyond Graphics with Mali GPU Computing
Take GPU Processing Power Beyond Graphics with Mali GPU Computing Roberto Mijat Visual Computing Marketing Manager August 2012 Introduction Modern processor and SoC architectures endorse parallelism as
More informationTOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT
TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT Eric Kelmelis 28 March 2018 OVERVIEW BACKGROUND Evolution of processing hardware CROSS-PLATFORM KERNEL DEVELOPMENT Write once, target multiple hardware
More informationDongjun Shin Samsung Electronics
2014.10.31. Dongjun Shin Samsung Electronics Contents 2 Background Understanding CPU behavior Experiments Improvement idea Revisiting Linux I/O stack Conclusion Background Definition 3 CPU bound A computer
More informationARM Vision for Thermal Management and Energy Aware Scheduling on Linux
ARM Vision for Management and Energy Aware Scheduling on Linux Charles Garcia-Tobin, Software Power Architect, ARM Thomas Molgaard, Director of Product Management, ARM ARM Tech Symposia China 2015 November
More informationS-Store: Streaming Meets Transaction Processing
S-Store: Streaming Meets Transaction Processing H-Store is an experimental database management system (DBMS) designed for online transaction processing applications Manasa Vallamkondu Motivation Reducing
More informationFeature Detection Plugins Speed-up by
Feature Detection Plugins Speed-up by OmpSs@FPGA Nicola Bettin Daniel Jimenez-Gonzalez Xavier Martorell Pierangelo Nichele Alberto Pomella nicola.bettin@vimar.com, pierangelo.nichele@vimar.com, alberto.pomella@vimar.com
More informationButterfly effect of porting scientific applications to ARM-based platforms
montblanc-project.eu @MontBlanc_EU Butterfly effect of porting scientific applications to ARM-based platforms Filippo Mantovani September 12 th, 2017 This project has received funding from the European
More informationCPU-GPU Heterogeneous Computing
CPU-GPU Heterogeneous Computing Advanced Seminar "Computer Engineering Winter-Term 2015/16 Steffen Lammel 1 Content Introduction Motivation Characteristics of CPUs and GPUs Heterogeneous Computing Systems
More informationARM Intelligent Power Allocation
ARM Intelligent Power Allocation 1 Agenda Background and Motivation What is ARM Intelligent Power Allocation? Results Status and Conclusions 2 Power Consumption Scenarios The illustration to the right
More informationDesigning, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems
Designing, developing, debugging ARM and heterogeneous multi-processor systems Kinjal Dave Senior Product Manager, ARM ARM Tech Symposia India December 7 th 2016 Topics Introduction System design Software
More informationMulticore for mobile: The More the Merrier? Roger Shepherd Chipless Ltd
Multicore for mobile: The More the Merrier? Roger Shepherd Chipless Ltd 1 Topics The Mobile Computing Platform The Application Processor CMOS Power Model Multicore Software: Complexity & Scaling Conclusion
More informationHelio X20: The First Tri-Gear Mobile SoC with CorePilot 3.0 Technology
Helio X20: The First Tri-Gear Mobile SoC with CorePilot 3.0 Technology Tsung-Yao Lin, g-hsien Lee, Loda Chou, Clavin Peng, Jih-g Hsu, Jia-g Chen, John-CC Chen, Alex Chiou, Artis Chiu, David Lee, Carrie
More informationProject Proposals. Advanced Operating Systems / Embedded Systems (2016/2017)
Project Proposals / Embedded Systems (2016/2017) Giuseppe Massari, Federico Terraneo giuseppe.massari@polimi.it federico.terraneo@polimi.it Project Rules 2/40 General rules Two types of project: Code development
More informationExperiences Using Tegra K1 and X1 for Highly Energy Efficient Computing
Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing Gaurav Mitra Andrew Haigh Luke Angove Anish Varghese Eric McCreath Alistair P. Rendell Research School of Computer Science Australian
More informationUniversity of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /EUC.2015.
ikov, K., unez-yanez, J. L., & Horsnell, M. (2015). Evaluation of Hybrid Run-Time Power Models for the ARM Big.LITTLE Architecture. In 2015 IEEE 13th International Conference on Embedded and Ubiquitous
More informationBuilding supercomputers from embedded technologies
http://www.montblanc-project.eu Building supercomputers from embedded technologies Alex Ramirez Barcelona Supercomputing Center Technical Coordinator This project and the research leading to these results
More informationAn Asymmetry-aware Energy-efficient Hypervisor Scheduling Policy for Asymmetric Multi-core
TR-IIS-15-003 An Asymmetry-aware Energy-efficient Hypervisor Scheduling Policy for Asymmetric Multi-core Ching-Chi Lin, You-Cheng Syu, Yi-Chung Chen, Jan-Jan Wu, Pangfeng Liu, Po-Wen Cheng, and Wei-Te
More informationOptimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs
Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs Niu Feng Technical Specialist, ARM Tech Symposia 2016 Agenda Introduction Challenges: Optimizing cache coherent subsystem
More informationTechniques and tools for measuring energy efficiency of scientific software applications
Techniques and tools for measuring energy efficiency of scientific software applications 16th international workshop on Advanced Computing and Analysis Techniques in Physics Research Giulio Eulisse Fermi
More informationTowards Power Management for FreeBSD
Towards Power Management for FreeBSD Robin Randhawa robin.randhawa@arm.com FreeBSD Developer Summit Computer Laboratory University of Cambridge August 2015 Agenda An overview of Energy Aware Scheduling
More informationAn Extension of the StarSs Programming Model for Platforms with Multiple GPUs
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs Eduard Ayguadé 2 Rosa M. Badia 2 Francisco Igual 1 Jesús Labarta 2 Rafael Mayo 1 Enrique S. Quintana-Ortí 1 1 Departamento
More information«Real Time Embedded systems» Multi Masters Systems
«Real Time Embedded systems» Multi Masters Systems rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL Chargé de cours rene.beuchat@hesge.ch LSN/hepia Prof. HES 1 Multi Master on Chip On a System On Chip, Master can
More informationEnergy Discounted Computing On Multicore Smartphones Meng Zhu & Kai Shen. Atul Bhargav
Energy Discounted Computing On Multicore Smartphones Meng Zhu & Kai Shen Atul Bhargav Overview Energy constraints in a smartphone Li-Ion Battery Arm big.little Hardware Sharing What is Energy Discounted
More informationParallel Simulation Accelerates Embedded Software Development, Debug and Test
Parallel Simulation Accelerates Embedded Software Development, Debug and Test Larry Lapides Imperas Software Ltd. larryl@imperas.com Page 1 Modern SoCs Have Many Concurrent Processing Elements SMP cores
More informationUtilization-based Power Modeling of Modern Mobile Application Processor
Utilization-based Power Modeling of Modern Mobile Application Processor Abstract Power modeling of a modern mobile application processor (AP) is challenging because of its complex architectural characteristics.
More informationOpen Compute Stack (OpenCS) Overview. D.D. Nikolić Updated: 20 August 2018 DAE Tools Project,
Open Compute Stack (OpenCS) Overview D.D. Nikolić Updated: 20 August 2018 DAE Tools Project, http://www.daetools.com/opencs What is OpenCS? A framework for: Platform-independent model specification 1.
More informationIntroduction II. Overview
Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and
More informationBackground Heterogeneous Architectures Performance Modeling Single Core Performance Profiling Multicore Performance Estimation Test Cases Multicore
By Dan Stafford Background Heterogeneous Architectures Performance Modeling Single Core Performance Profiling Multicore Performance Estimation Test Cases Multicore Design Space Results & Observations General
More informationA unified multicore programming model
A unified multicore programming model Simplifying multicore migration By Sven Brehmer Abstract There are a number of different multicore architectures and programming models available, making it challenging
More informationAltera SDK for OpenCL
Altera SDK for OpenCL A novel SDK that opens up the world of FPGAs to today s developers Altera Technology Roadshow 2013 Today s News Altera today announces its SDK for OpenCL Altera Joins Khronos Group
More informationRANDOM linear network coding (RLNC) [2], [3] isa
IEEE INTERNET OF THINGS JOURNAL, VOL. 4, NO. 4, AUGUST 2017 917 Network Coding in Heterogeneous Multicore IoT Nodes With DAG Scheduling of Parallel Matrix Block Operations Simon Wunderlich, Juan A. Cabrera,
More informationYan Wang, Kenli Li & Keqin Li
Partition Scheduling on Heterogeneous Multicore Processors for Multi-dimensional Loops Applications Yan Wang, Kenli Li & Keqin Li International Journal of Parallel Programming ISSN 0885-7458 Volume 45
More informationPortable Heterogeneous High-Performance Computing via Domain-Specific Virtualization. Dmitry I. Lyakh.
Portable Heterogeneous High-Performance Computing via Domain-Specific Virtualization Dmitry I. Lyakh liakhdi@ornl.gov This research used resources of the Oak Ridge Leadership Computing Facility at the
More informationEfficient Hardware Acceleration on SoC- FPGA using OpenCL
Efficient Hardware Acceleration on SoC- FPGA using OpenCL Advisor : Dr. Benjamin Carrion Schafer Susmitha Gogineni 30 th August 17 Presentation Overview 1.Objective & Motivation 2.Configurable SoC -FPGA
More informationFROM the early days of computing systems, persistent
IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS Speedup and Power Scaling Models for Heterogeneous Many-Core Systems Ashur Rafiev, Mohammed A. N. Al-hayanni, Student member, IEEE, Fei Xia, Rishad Shafik,
More informationIntroduction to gem5. Nizamudheen Ahmed Texas Instruments
Introduction to gem5 Nizamudheen Ahmed Texas Instruments 1 Introduction A full-system computer architecture simulator Open source tool focused on architectural modeling BSD license Encompasses system-level
More informationThe Mont-Blanc approach towards Exascale
http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are
More informationARM big.little Technology Unleashed An Improved User Experience Delivered
ARM big.little Technology Unleashed An Improved User Experience Delivered Govind Wathan Product Specialist Cortex -A Mobile & Consumer CPU Products 1 Agenda Introduction to big.little Technology Benefits
More informationWaseda Univ. Green Computing Systems R&D Center
Automatic Parallelization of MATLAB/Simulink on Multicore Processors -- Parallel processing of automobile engine control C code generated by embedded coder -- Hironori Kasahara Professor, Dept. of Computer
More informationEmbedded Systems: Projects
November 2016 Embedded Systems: Projects Davide Zoni PhD email: davide.zoni@polimi.it webpage: home.dei.polimi.it/zoni Contacts & Places Prof. William Fornaciari (Professor in charge) email: william.fornaciari@polimi.it
More informationEnergy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS
Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS Who am I? Education Master of Technology, NTNU, 2007 PhD, NTNU, 2010. Title: «Managing Shared Resources in Chip Multiprocessor Memory
More informationHeterogeneous platforms
Heterogeneous platforms Systems combining main processors and accelerators e.g., CPU + GPU, CPU + Intel MIC, AMD APU, ARM SoC Any platform using a GPU is a heterogeneous platform! Further in this talk
More informationSeahawk Power-optimized implementation of High Performance Quad-core Cortex-A15 Processor
Seahawk Power-optimized implementation of High Performance Quad-core Cortex-A15 Processor PD Marketing ARM 1 Introduction to Cortex-A15 & Seahawk ARM Cortex-A15 is a high performance engine for superphones,
More informationThe Mont-Blanc Project
http://www.montblanc-project.eu The Mont-Blanc Project Daniele Tafani Leibniz Supercomputing Centre 1 Ter@tec Forum 26 th June 2013 This project and the research leading to these results has received funding
More informationSimulation Based Analysis and Debug of Heterogeneous Platforms
Simulation Based Analysis and Debug of Heterogeneous Platforms Design Automation Conference, Session 60 4 June 2014 Simon Davidmann, Imperas Page 1 Agenda Programming on heterogeneous platforms Hardware-based
More informationWeb Browser Workload Characterization for Power Management on HMP Platforms
Web Browser Workload Characterization for Power Management on HMP Platforms Nadja Peters, Sangyoung Park, Samarjit Chakraborty, Benedikt Meurer, Hannes Payer, Daniel Clifford Technical University of Munich,
More informationUEFI ARM Update. UEFI PlugFest March 18-22, 2013 Andrew N. Sloss (ARM, Inc.) presented by
presented by UEFI ARM Update UEFI PlugFest March 18-22, 2013 Andrew N. Sloss (ARM, Inc.) Updated 2011-06-01 UEFI Spring PlugFest March 2013 www.uefi.org 1 AGENDA economics technology status summary questions
More informationApplication Programming
Multicore Application Programming For Windows, Linux, and Oracle Solaris Darryl Gove AAddison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris
More informationOverview of research activities Toward portability of performance
Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into
More informationHeterogeneous Software Architecture with OpenAMP
Heterogeneous Software Architecture with OpenAMP Shaun Purvis, Xilinx Agenda Heterogeneous SoCs Linux and OpenAMP OpenAMP for HSA Heterogeneous SoCs A System-on-Chip that integrates multiple processor
More informationMulticore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.
CS 320 Ch. 18 Multicore Computers Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. Definitions: Hyper-threading Intel's proprietary simultaneous
More informationAn Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware
An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware Tao Chen, Shreesha Srinath Christopher Batten, G. Edward Suh Computer Systems Laboratory School of Electrical
More informationSwinger: Processor Relocation on Dynamically Reconfigurable FPGAs
Swinger: Processor Relocation on Dynamically Reconfigurable FPGAs Henrique Miguel Santos da Silva Mendes INESC-ID, Instituto Superior Técnico, Universidade de Lisboa Rua Alves Redol, 9, 1000-029 Lisboa
More informationHeterogeneous SoCs. May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1
COSCOⅣ Heterogeneous SoCs M5171111 HASEGAWA TORU M5171112 IDONUMA TOSHIICHI May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1 Contents Background Heterogeneous technology May 28, 2014 COMPUTER SYSTEM COLLOQUIUM
More informationPerformance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference
The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee
More informationEnergy-Efficiency Prediction of Multithreaded Workloads on Heterogeneous Composite Cores Architectures using Machine Learning Techniques
Energy-Efficiency Prediction of Multithreaded Workloads on Heterogeneous Composite Cores Architectures using Machine Learning Techniques Hossein Sayadi Department of Electrical and Computer Engineering
More informationComputing on Low Power SoC Architecture
+ Computing on Low Power SoC Architecture Andrea Ferraro INFN-CNAF Lucia Morganti INFN-CNAF + Outline 2 Modern Low Power Systems on Chip Computing on System on Chip ARM CPU SoC GPU Low Power from Intel
More informationBoosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors
Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors Shoaib Akram, Jennifer B. Sartor, Kenzo Van Craeynest, Wim Heirman, Lieven Eeckhout Ghent University, Belgium
More informationMTAPI: Parallel Programming for Embedded Multicore Systems
MTAPI: Parallel Programming for Embedded Multicore Systems Urs Gleim Siemens AG, Corporate Technology http://www.ct.siemens.com/ urs.gleim@siemens.com Markus Levy The Multicore Association http://www.multicore-association.org/
More informationInstruction Encoding Synthesis For Architecture Exploration
Instruction Encoding Synthesis For Architecture Exploration "Compiler Optimizations for Code Density of Variable Length Instructions", "Heuristics for Greedy Transport Triggered Architecture Interconnect
More informationEmbedded Systems: Projects
December 2015 Embedded Systems: Projects Davide Zoni PhD email: davide.zoni@polimi.it webpage: home.dei.polimi.it/zoni Research Activities Interconnect: bus, NoC Simulation (component design, evaluation)
More informationThread Affinity Experiments
Thread Affinity Experiments Power implications on Exynos Introduction The LPGPU2 Profiling Tool and API provide support for CPU thread affinity locking and logging, and although this functionality is not
More informationTizen Power Management Service with PASS (Power-Aware System Service)
Tizen Power Management Service with PASS (Power-Aware System Service) 1 Chanwoo Choi cw00.choi@samsung.com S/W R&D Center, Samsung Electronics Copyright 2017 Samsung. All Rights Reserved. Contents Power-Management
More informationProfiling and Debugging OpenCL Applications with ARM Development Tools. October 2014
Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline
More information