Plattformübergreifende Softwareentwicklung für heterogene Multicore-Systeme
|
|
- Ashlie Sparks
- 6 years ago
- Views:
Transcription
1 Plattformübergreifende Softwareentwicklung für heterogene Multicore-Systeme Dr.-Ing. Timo Stripf 1 Managing Director Technolgy
2 Outline Multicore Motivation Automatic Parallelization Interactive Parallelization Model-Based Development Workflow Hardware Accelerators 2
3 Motivation 3
4 Increasing consumer demands accelerate the use of multicore processor High Processing Power! Fast Response Time! Low Energy Consumption! Consumer Electronics Automation Automotive Telecommunication 4
5 Processor Evolution GHz Era Multicore Era Manycore? Heterogeneous? Pentium 4 (Single Core) Athlon X2 (Dual Core) Embedded GPU ARM Cortex-A35 (Quad Core) ZYNQ FPGA 6
6 Performance Parallel hardware needs parallel programming GHz Era Multicore Era Manycore? Heterogeneous? Based on Hans Pabst, 2011: Workshop on Programming of Heterogeneous Systems in Physics Time 7
7 Challenges with embedded multicore software development Difficult to predict performance High test and verification effort Required expertise on diverse target architectures Poor code reusability 25% 3x 4,5x more time! more software developers! more expensive! VDC Research, Next Generation Embedded Hardware Architectures Driving Onset of Project Delays, Costs Overruns, and Software Development Changes 8
8 Software Parallelization 9
9 Automatic Parallelization as a Black Box Automatic Parallelization Sequential Parallel 10
10 Automatic Parallelization as a Black Box Automatic Parallelization We want a one button solution Like C compilers 11
11 Automatic Parallelization Levels Algorithmic Level Decision Impact Code Transformation Level Task Level Communication Level 12
12 Parallelization on Algorithmic Level Fast Fourier Transform (FFT) N Point FFT N/2 Point FFT N/2 Point FFT X 13
13 Loop Transformation Matrix Multiplication Example double c[10][10]; for (i4 = 0; i4 < 10; i4++) { for (i3 = 0; i3 < 10; i3++) { sum1 = 0.0; } } for (i5 = 0; i5 < 10; i5++) sum1 += a[i5][i3] * b[i4][i5]; c[i4][i3] = sum1; Variable Splitting Loop Splitting Loop Fission double c_0[5][10]; double c_1[5][10]; for (i9 = 0; i9 < 5; i9++) { for (i8 = 0; i8 < 10; i8++) { sum2 = 0.0; for (i10 = 0; i10 < 10; i10++) sum2 += a[i10][i8] * b[i9][i10]; c_0[i9][i8] = sum2; } } for (i4 = 5; i4 < 10; i4++) { for (i3 = 0; i3 < 10; i3++) { sum1 = 0.0; } } for (i5 = 0; i5 < 10; i5++) sum1 += a[i5][i3] * b[i4][i5]; c_1[i4-5][i3] = sum1; 14
14 Task Level Data flow / dependency analysis Identify independent code parts Perform mapping & scheduling 15
15 Task Level Pipelining Loop 16
16 Communication Placement & Data Management Decide when to communicate Influences memory allocation per core? 17
17 Performance Estimation Deep Learning Neuronal Network (20 layers) Without Performance Information With Performance Information 18
18 Interactive Parallelization Feedback Control 19
19 Software development with emmtrix: overview Multicore FPGA GPU Sequential Parallel 20
20 Interactive Parallelization starting from MATLAB Development with MATLAB /Scilab Code Generator Sequential C Code Paralleliza tion Parallel C Code Algorithmic Level Different algorithm versions in MATLAB Code Transformation Level Transformation selection in GUI Task Level Automatic user-constraint algorithm Communication Level Automatic algorithm 21
21 Example: Deep Learning Application Dominating Kernel: 2D Convolution 22
22 Loop Transformations Apply variable splitting Loop splitting Loop fission 23
23 Parallel Schedule (8 Cores ARM Cortex-A53) 24
24 Model-Based Development Workflow 25
25 ARGO Project Overview Three year project 01/ /2018 Motivation: Programming real-time applications for embedded heterogeneous multi-core systems is complex and expensive Project goal: Automate real-time software parallelization and code generation starting from high-level descriptions Project partners: Funded by EU: 3.9 Million Euros Coordinator: Juergen Becker (KIT) ARGO has received funding from the European Union s Horizon 2020 research and innovation programme under grant agreement No ARGO. 26
26 Enhanced Ground Proximity Warning System (EGPWS) A flight system (a supervisory controller) that creates visual and aural warnings in order to avoid Controlled Flight into the Terrain Since 1974, the FAA has required all large turbine and turbojet airplanes to install GPWS equipment EGPWS from Honeywell LANDMARK from L3 T2CAS from ACSS TAWS from Universal Avionics 27
27 EGPWS Model 28
28 Xcos Model of Mode 1: Excessive Rate of Descent Models are described using graphs the limit altitudes (the reference being the radio altitude) are described as functions of other parameters like airspeed or rate of descent 29
29 Model with Scilab Scripting: Terrain Awareness Shuttle Radar Topography Mission (SRTM) 3 arc second ( 90 m) as digital elevation model Two-phase collision processing Broad phase Uniform grids for spatial partitioning Narrow Phase Vertical ray casting for collision detection 30
30 Classical Model-based Workflow Parallelization for Real-time Applications! Plant Modeling Code Generation Controller Modeling Software- in-the- Loop Testing Model-in-the-Loop Testing Hardware-in-the- Loop Testing Unit Testing 31
31 Hardware-in-the-Loop Testing Multicore Recore Architecture 32
32 Parallelization of EPGWs application 33
33 Addressing Heterogeneous Architectures 34
34 Challenges for Addressing Heterogeneous Architectures Multicore FPGA GPU Programming Language C C++ VHDL SystemVerilog OpenCL CUDA Data Types Standard Integer Standard Float Fixed Point Standard Integer Standard Float Parallelization Coarse Grained Fine Grained Random Loop Data Locality Caches Local memories Streaming Register L2 Cache 36
35 Programming Language HLS C Code VHDL Code Development with MATLAB /Scilab Code Generator Sequential C Code Parallel Studio Parallel C Code CUDA 37
36 Supporting Hardware Accelerators Algorithmic Level Use FPGA / GPU library Fixed-point algorithms Code Transformation Level HLS-Pragmas HLS/GPU Transformations Task Level Hardware accelerator as special processor Communication Level Heterogeneous communication 40
37 FPGA Example 41
38 FPGA Example (2) 42
39 Benefits of Interactive Parallelization Code quality (reduce errors and test effort) Portability (single source) Transparency & control Productivity Develop sequential, get parallel 43
40 Summary Multicore Motivation Automatic Parallelization Interactive Parallelization Model-Based Development Workflow Hardware Accelerators 44
41 Your emmtrix Dr.-Ing. Timo Stripf emmtrix Technologies GmbH Engesserstraße Karlsruhe Germany Phone: Fax: timo.stripf@emmtrix.com Web: 45
Scilab White Paper Model-based Design of an Enhanced Ground Proximity Warning System
Scilab White Paper Model-based Design of an Enhanced Ground Proximity Warning System 2017/01/18 Umut Durak DLR Braunschweig Institute of Flight System Yann Debray Scilab Enterprises An Enhanced Ground
More informationInteractive Parallelization of Embedded Real-Time Applications Starting from Open-Source Scilab & Xcos
Interactive Parallelization of Embedded Real-Time Applications Starting from Open-Source Scilab & Xcos Oliver Oey, Michael Rückauer, Timo Stripf, Jürgen Becker emmtrix Technologies GmbH Karlsruhe, Germany
More informationProfiling and Debugging OpenCL Applications with ARM Development Tools. October 2014
Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline
More informationExploiting High-Performance Heterogeneous Hardware for Java Programs using Graal
Exploiting High-Performance Heterogeneous Hardware for Java Programs using Graal James Clarkson ±, Juan Fumero, Michalis Papadimitriou, Foivos S. Zakkak, Christos Kotselidis and Mikel Luján ± Dyson, The
More informationWelcome. Altera Technology Roadshow 2013
Welcome Altera Technology Roadshow 2013 Altera at a Glance Founded in Silicon Valley, California in 1983 Industry s first reprogrammable logic semiconductors $1.78 billion in 2012 sales Over 2,900 employees
More informationCUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav
CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture
More informationParallelism. CS6787 Lecture 8 Fall 2017
Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied
More informationOPERA. Low Power Heterogeneous Architecture for the Next Generation of Smart Infrastructure and Platforms in Industrial and Societal Applications
OPERA Low Power Heterogeneous Architecture for the Next Generation of Smart Infrastructure and Platforms in Industrial and Societal Applications Co-funded by the Horizon 2020 Framework Programme of the
More informationAccelerating Financial Applications on the GPU
Accelerating Financial Applications on the GPU Scott Grauer-Gray Robert Searles William Killian John Cavazos Department of Computer and Information Science University of Delaware Sixth Workshop on General
More informationExploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API
EuroPAR 2016 ROME Workshop Exploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API Suyang Zhu 1, Sunita Chandrasekaran 2, Peng Sun 1, Barbara Chapman 1, Marcus Winter 3,
More informationFPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS
FPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS Mike Ashworth, Graham Riley, Andrew Attwood and John Mawer Advanced Processor Technologies Group School
More informationImplementing Long-term Recurrent Convolutional Network Using HLS on POWER System
Implementing Long-term Recurrent Convolutional Network Using HLS on POWER System Xiaofan Zhang1, Mohamed El Hadedy1, Wen-mei Hwu1, Nam Sung Kim1, Jinjun Xiong2, Deming Chen1 1 University of Illinois Urbana-Champaign
More informationIntegration of Mixed Criticality Systems on MultiCores: Limitations, Challenges and Way ahead for Avionics
Integration of Mixed Criticality Systems on MultiCores: Limitations, Challenges and Way ahead for Avionics TecDay 13./14. Oct. 2015 Dietmar Geiger, Bernd Koppenhöfer 1 COTS HW Evolution - Single-Core Multi-Core
More informationUsing GPUs to compute the multilevel summation of electrostatic forces
Using GPUs to compute the multilevel summation of electrostatic forces David J. Hardy Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of
More informationCourse II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan
Course II Parallel Computer Architecture Week 2-3 by Dr. Putu Harry Gunawan www.phg-simulation-laboratory.com Review Review Review Review Review Review Review Review Review Review Review Review Processor
More informationCover TBD. intel Quartus prime Design software
Cover TBD intel Quartus prime Design software Fastest Path to Your Design The Intel Quartus Prime software is revolutionary in performance and productivity for FPGA, CPLD, and SoC designs, providing a
More informationCover TBD. intel Quartus prime Design software
Cover TBD intel Quartus prime Design software Fastest Path to Your Design The Intel Quartus Prime software is revolutionary in performance and productivity for FPGA, CPLD, and SoC designs, providing a
More informationIntel HLS Compiler: Fast Design, Coding, and Hardware
white paper Intel HLS Compiler Intel HLS Compiler: Fast Design, Coding, and Hardware The Modern FPGA Workflow Authors Melissa Sussmann HLS Product Manager Intel Corporation Tom Hill OpenCL Product Manager
More informationOverview of research activities Toward portability of performance
Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into
More informationMINIMUM EQUIPMENT LIST REGISTRATION: SERIAL #:
23 COMMUNICATIONS 23-1 -05-1 Radio Management Unit (RMU) (Honeywell Equipped Aircraft Only) -10-1 Communications System (VHF & UHF) -10-2 High Frequency (HF) Communication System C 2 1 (O) One may be inoperative
More informationEmbarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA
Embarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA Pierre Nowodzienski Engineer pierre.nowodzienski@mathworks.fr 2018 The MathWorks, Inc. 1 From Data to Business value Make decisions Get
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More information"On the Capability and Achievable Performance of FPGAs for HPC Applications"
"On the Capability and Achievable Performance of FPGAs for HPC Applications" Wim Vanderbauwhede School of Computing Science, University of Glasgow, UK Or in other words "How Fast Can Those FPGA Thingies
More informationTOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT
TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT Eric Kelmelis 28 March 2018 OVERVIEW BACKGROUND Evolution of processing hardware CROSS-PLATFORM KERNEL DEVELOPMENT Write once, target multiple hardware
More informationAltera SDK for OpenCL
Altera SDK for OpenCL A novel SDK that opens up the world of FPGAs to today s developers Altera Technology Roadshow 2013 Today s News Altera today announces its SDK for OpenCL Altera Joins Khronos Group
More informationTurning an Automated System into an Autonomous system using Model-Based Design Autonomous Tech Conference 2018
Turning an Automated System into an Autonomous system using Model-Based Design Autonomous Tech Conference 2018 Asaf Moses Systematics Ltd., Technical Product Manager aviasafm@systematics.co.il 1 Autonomous
More informationFPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS
FPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS Mike Ashworth, Graham Riley, Andrew Attwood and John Mawer Advanced Processor Technologies Group School
More informationEuropean energy efficient supercomputer project
http://www.montblanc-project.eu European energy efficient supercomputer project Simon McIntosh-Smith University of Bristol (Based on slides from Alex Ramirez, BSC) Disclaimer: Speaking for myself... All
More informationA multilevel simulation framework for highly automated harvest processes enabled by environmental sensor systems
A multilevel simulation framework for highly automated harvest processes enabled by environmental sensor systems Jannik Redenius, M.Sc., Matthias Dingwerth, M.Sc., Prof. Dr. Arno Ruckelshausen, Faculty
More informationSDSoC: Session 1
SDSoC: Session 1 ADAM@ADIUVOENGINEERING.COM What is SDSoC SDSoC is a system optimising compiler which allows us to optimise Zynq PS / PL Zynq MPSoC PS / PL MicroBlaze What does this mean? Following the
More informationCUDA. Matthew Joyner, Jeremy Williams
CUDA Matthew Joyner, Jeremy Williams Agenda What is CUDA? CUDA GPU Architecture CPU/GPU Communication Coding in CUDA Use cases of CUDA Comparison to OpenCL What is CUDA? What is CUDA? CUDA is a parallel
More informationMULTITHERMAN: Out-of-band High-Resolution HPC Power and Performance Monitoring Support for Big-Data Analysis
MULTITHERMAN: Out-of-band High-Resolution HPC Power and Performance Monitoring Support for Big-Data Analysis EU H2020 FETHPC project ANTAREX (g.a. 671623) EU FP7 ERC Project MULTITHERMAN (g.a.291125) EETHPC,
More informationWarps and Reduction Algorithms
Warps and Reduction Algorithms 1 more on Thread Execution block partitioning into warps single-instruction, multiple-thread, and divergence 2 Parallel Reduction Algorithms computing the sum or the maximum
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References This set of slides is mainly based on: CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory Slide of Applied
More informationMULTITHERMAN: Out-of-band High-Resolution HPC Power and Performance Monitoring Support for Big-Data Analysis
MULTITHERMAN: Out-of-band High-Resolution HPC Power and Performance Monitoring Support for Big-Data Analysis EU H2020 FETHPC project ANTAREX (g.a. 671623) EU FP7 ERC Project MULTITHERMAN (g.a.291125) HPC
More informationA Parallelizing Compiler for Multicore Systems
A Parallelizing Compiler for Multicore Systems José M. Andión, Manuel Arenaz, Gabriel Rodríguez and Juan Touriño 17th International Workshop on Software and Compilers for Embedded Systems (SCOPES 2014)
More informationHigher Level Programming Abstractions for FPGAs using OpenCL
Higher Level Programming Abstractions for FPGAs using OpenCL Desh Singh Supervising Principal Engineer Altera Corporation Toronto Technology Center ! Technology scaling favors programmability CPUs."#/0$*12'$-*
More informationMercury Mission Systems BuildSAFE Graphics Suite Multicore Software Renderer Scott Engle Director of Business Development
Mercury Mission Systems BuildSAFE Graphics Suite Multicore Software Renderer Scott Engle Director of Business Development Mercury acquires Richland Technologies to compliment MMSI Mercury Mission Systems
More informationFiPS and M2DC: Novel Architectures for Reconfigurable Hyperscale Servers
FiPS and M2DC: Novel Architectures for Reconfigurable Hyperscale Servers Rene Griessl, Meysam Peykanu, Lennart Tigges, Jens Hagemeyer, Mario Porrmann Center of Excellence Cognitive Interaction Technology
More informationThe OpenVX Computer Vision and Neural Network Inference
The OpenVX Computer and Neural Network Inference Standard for Portable, Efficient Code Radhakrishna Giduthuri Editor, OpenVX Khronos Group radha.giduthuri@amd.com @RadhaGiduthuri Copyright 2018 Khronos
More informationProgramming Support for Heterogeneous Parallel Systems
Programming Support for Heterogeneous Parallel Systems Siegfried Benkner Department of Scientific Computing Faculty of Computer Science University of Vienna http://www.par.univie.ac.at Outline Introduction
More informationModern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design
Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant
More informationIntroduction to Runtime Systems
Introduction to Runtime Systems Towards Portability of Performance ST RM Static Optimizations Runtime Methods Team Storm Olivier Aumage Inria LaBRI, in cooperation with La Maison de la Simulation Contents
More informationHigh performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli
High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering
More informationModern system architectures in embedded systems
Wir schaffen Wissen heute für morgen Paul Scherrer Institut Timo Korhonen Modern system architectures in embedded systems Outline What is driving the technology? Two most prominent trends How can we take
More informationFinite Element Integration and Assembly on Modern Multi and Many-core Processors
Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,
More informationModern Processor Architectures. L25: Modern Compiler Design
Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions
More informationThroughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Naveen Suda, Vikas Chandra *, Ganesh Dasika *, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, Yu
More informationA new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI based on Cadence VP6 Technology
Dr.-Ing Jens Benndorf (DCT) Gregor Schewior (DCT) A new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI based on Cadence VP6 Technology Tensilica Day 2017 16th
More informationHigh Performance Computing on GPUs using NVIDIA CUDA
High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and
More informationDeos SafeMCTM. - Flight Software Workshop - Thursday December 7 th, Safety Critical Software Solutions for Mission Critical Systems
Deos SafeMCTM Real-Time DO 178C DAL A Operating System for Safety-Critical Multicore Avionics Systems (ARINC 653 and RTEMS POSIX APIS) Presenter : Theresa Rickman Military Aerospace Accounts - Flight Software
More informationIntroduction to GPU computing
Introduction to GPU computing Nagasaki Advanced Computing Center Nagasaki, Japan The GPU evolution The Graphic Processing Unit (GPU) is a processor that was specialized for processing graphics. The GPU
More informationModeling a 4G LTE System in MATLAB
Modeling a 4G LTE System in MATLAB Part 3: Path to implementation (C and HDL) Houman Zarrinkoub PhD. Signal Processing Product Manager MathWorks houmanz@mathworks.com 2011 The MathWorks, Inc. 1 LTE Downlink
More informationdesigning a GPU Computing Solution
designing a GPU Computing Solution Patrick Van Reeth EMEA HPC Competency Center - GPU Computing Solutions Saturday, May the 29th, 2010 1 2010 Hewlett-Packard Development Company, L.P. The information contained
More informationTrends and Challenges in Multicore Programming
Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores
More informationEnergy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package
High Performance Machine Learning Workshop Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package Matheus Souza, Lucas Maciel, Pedro Penna, Henrique Freitas 24/09/2018 Agenda Introduction
More informationExpressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17
Expressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17 Tutorial Instructors [James Reinders, Michael J. Voss, Pablo Reble, Rafael Asenjo]
More informationExploiting CUDA Dynamic Parallelism for low power ARM based prototypes
www.bsc.es Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes Vishal Mehta Engineer, Barcelona Supercomputing Center vishal.mehta@bsc.es BSC/UPC CUDA Centre of Excellence (CCOE) Training
More informationEin Modell - viele Zielsysteme
Ein Modell - viele Zielsysteme Automatische Codegenerierung aus MATLAB und Simulink Dr.-Ing. Daniel Weida 2015 The MathWorks, Inc. 1 Industry trends Code generation is expanding rapidly C C++ VHDL Verilog
More informationEvaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices
Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices Jonas Hahnfeld 1, Christian Terboven 1, James Price 2, Hans Joachim Pflug 1, Matthias S. Müller
More informationHybrid Communication. CODECS Workshop / May 19, 2017 Karsten Roscher, Fraunhofer ESK Enrique Onieva, Deusto
Hybrid Communication CODECS Workshop / May 19, 2017 Karsten Roscher, Fraunhofer ESK Enrique Onieva, Deusto Contents Project Overview Hybrid Communication Concepts Services Enabled by Hybrid Communication
More informationTHE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT HARDWARE PLATFORMS
Computer Science 14 (4) 2013 http://dx.doi.org/10.7494/csci.2013.14.4.679 Dominik Żurek Marcin Pietroń Maciej Wielgosz Kazimierz Wiatr THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT
More informationLecture 4. Instruction Level Parallelism Vectorization, SSE Optimizing for the memory hierarchy
Lecture 4 Instruction Level Parallelism Vectorization, SSE Optimizing for the memory hierarchy Partners? Announcements Scott B. Baden / CSE 160 / Winter 2011 2 Today s lecture Why multicore? Instruction
More informationScalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA
Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo, Sarma Vrudhula School of Electrical, Computer and Energy Engineering School
More informationCoarse Grain Reconfigurable Arrays are Signal Processing Engines!
Coarse Grain Reconfigurable Arrays are Signal Processing Engines! Advanced Topics in Telecommunications, Algorithms and Implementation Platforms for Wireless Communications, TLT-9707 Waqar Hussain Researcher
More informationEnergy Efficiency Tuning: READEX. Madhura Kumaraswamy Technische Universität München
Energy Efficiency Tuning: READEX Madhura Kumaraswamy Technische Universität München Project Overview READEX Starting date: 1. September 2015 Duration: 3 years Runtime Exploitation of Application Dynamism
More informationCASE STUDY: Using Field Programmable Gate Arrays in a Beowulf Cluster
CASE STUDY: Using Field Programmable Gate Arrays in a Beowulf Cluster Mr. Matthew Krzych Naval Undersea Warfare Center Phone: 401-832-8174 Email Address: krzychmj@npt.nuwc.navy.mil The Robust Passive Sonar
More informationFaster Simulations of the National Airspace System
Faster Simulations of the National Airspace System PK Menon Monish Tandale Sandy Wiraatmadja Optimal Synthesis Inc. Joseph Rios NASA Ames Research Center NVIDIA GPU Technology Conference 2010, San Jose,
More informationIntroduction II. Overview
Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and
More informationExperts in Application Acceleration Synective Labs AB
Experts in Application Acceleration 1 2009 Synective Labs AB Magnus Peterson Synective Labs Synective Labs quick facts Expert company within software acceleration Based in Sweden with offices in Gothenburg
More informationParallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model
Parallel Programming Principle and Practice Lecture 9 Introduction to GPGPUs and CUDA Programming Model Outline Introduction to GPGPUs and Cuda Programming Model The Cuda Thread Hierarchy / Memory Hierarchy
More informationApplying Graphics Processor Acceleration in a Software Defined Radio Prototyping Environment
Applying Graphics Processor Acceleration in a Software Defined Radio Prototyping Environment GNU Radio with Graphics Processor Acceleration as a Standalone Package Will Plishker, George F. Zaki, Shuvra
More informationOptimization solutions for the segmented sum algorithmic function
Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code
More informationVersal: AI Engine & Programming Environment
Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY
More informationSYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS
SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS Embedded System System Set of components needed to perform a function Hardware + software +. Embedded Main function not computing Usually not autonomous
More informationHardware and Software Optimisation. Tom Spink
Hardware and Software Optimisation Tom Spink Optimisation Modifying some aspect of a system to make it run more efficiently, or utilise less resources. Optimising hardware: Making it use less energy, or
More informationMARTE Based Modeling Tools Usage Scenarios in Avionics Software Development Workflows
MARTE Based Modeling Tools Usage Scenarios in Avionics Software Development Workflows Alessandra Bagnato, Stefano Genolini Txt e-solutions FMCO 2010, Graz, 29 November 2010 Overview MADES Project and MADES
More informationModel-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany
Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany 2013 The MathWorks, Inc. 1 Agenda Model-Based Design of embedded Systems Software Implementation
More informationSpiral. Computer Generation of Performance Libraries. José M. F. Moura Markus Püschel Franz Franchetti & the Spiral Team. Performance.
Spiral Computer Generation of Performance Libraries José M. F. Moura Markus Püschel Franz Franchetti & the Spiral Team Platforms Performance Applications What is Spiral? Traditionally Spiral Approach Spiral
More informationHigh-Level and Model-Based Design Targeting FPGAs and SoCs
CO-DEVELOPMENT MANUFACTURING INNOVATION & SUPPORT High-Level and Model-Based Design Targeting FPGAs and SoCs Sander Ter Burg, FPGA System Engineer 3T B.V. What we do: Electronic and Embedded Systems Co-Development
More informationDesign methodology for multi processor systems design on regular platforms
Design methodology for multi processor systems design on regular platforms Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline
More informationEnabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager
Enabling a Richer Multimedia Experience with GPU Compute Roberto Mijat Visual Computing Marketing Manager 1 What is GPU Compute Operating System and most application processing continue to reside on the
More informationAccelerating image registration on GPUs
Accelerating image registration on GPUs Harald Köstler, Sunil Ramgopal Tatavarty SIAM Conference on Imaging Science (IS10) 13.4.2010 Contents Motivation: Image registration with FAIR GPU Programming Combining
More informationGPGPU/CUDA/C Workshop 2012
GPGPU/CUDA/C Workshop 2012 Day-1: GPGPU/CUDA/C and WSU Presenter(s): Abu Asaduzzaman Nasrin Sultana Wichita State University July 10, 2012 GPGPU/CUDA/C Workshop 2012 Outline Introduction to the Workshop
More informationNEW FPGA DESIGN AND VERIFICATION TECHNIQUES MICHAL HUSEJKO IT-PES-ES
NEW FPGA DESIGN AND VERIFICATION TECHNIQUES MICHAL HUSEJKO IT-PES-ES Design: Part 1 High Level Synthesis (Xilinx Vivado HLS) Part 2 SDSoC (Xilinx, HLS + ARM) Part 3 OpenCL (Altera OpenCL SDK) Verification:
More informationGViM: GPU-accelerated Virtual Machines
GViM: GPU-accelerated Virtual Machines Vishakha Gupta, Ada Gavrilovska, Karsten Schwan, Harshvardhan Kharche @ Georgia Tech Niraj Tolia, Vanish Talwar, Partha Ranganathan @ HP Labs Trends in Processor
More informationLACORE: A RISC-V BASED LINEAR ALGEBRA ACCELERATOR FOR SOC DESIGNS
1 LACORE: A RISC-V BASED LINEAR ALGEBRA ACCELERATOR FOR SOC DESIGNS Samuel Steffl and Sherief Reda Brown University, Department of Computer Engineering Partially funded by NSF grant 1438958 Published as
More informationOpenMP for next generation heterogeneous clusters
OpenMP for next generation heterogeneous clusters Jens Breitbart Research Group Programming Languages / Methodologies, Universität Kassel, jbreitbart@uni-kassel.de Abstract The last years have seen great
More informationCUDA GPGPU Workshop 2012
CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline
More informationAdaptive Scientific Software Libraries
Adaptive Scientific Software Libraries Lennart Johnsson Advanced Computing Research Laboratory Department of Computer Science University of Houston Challenges Diversity of execution environments Growing
More informationCPU-GPU Heterogeneous Computing
CPU-GPU Heterogeneous Computing Advanced Seminar "Computer Engineering Winter-Term 2015/16 Steffen Lammel 1 Content Introduction Motivation Characteristics of CPUs and GPUs Heterogeneous Computing Systems
More informationExperiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor
Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor Juan C. Pichel Centro de Investigación en Tecnoloxías da Información (CITIUS) Universidade de Santiago de Compostela, Spain
More informationOpenMP tasking model for Ada: safety and correctness
www.bsc.es www.cister.isep.ipp.pt OpenMP tasking model for Ada: safety and correctness Sara Royuela, Xavier Martorell, Eduardo Quiñones and Luis Miguel Pinho Vienna (Austria) June 12-16, 2017 Parallel
More informationSupporting Data Parallelism in Matcloud: Final Report
Supporting Data Parallelism in Matcloud: Final Report Yongpeng Zhang, Xing Wu 1 Overview Matcloud is an on-line service to run Matlab-like script on client s web browser. Internally it is accelerated by
More informationExploring Automatically Generated Platforms in High Performance FPGAs
Exploring Automatically Generated Platforms in High Performance FPGAs Panagiotis Skrimponis b, Georgios Zindros a, Ioannis Parnassos a, Muhsen Owaida b, Nikolaos Bellas a, and Paolo Ienne b a Electrical
More informationPorting Performance across GPUs and FPGAs
Porting Performance across GPUs and FPGAs Deming Chen, ECE, University of Illinois In collaboration with Alex Papakonstantinou 1, Karthik Gururaj 2, John Stratton 1, Jason Cong 2, Wen-Mei Hwu 1 1: ECE
More informationMIGRATION OF LEGACY APPLICATIONS TO HETEROGENEOUS ARCHITECTURES Francois Bodin, CTO, CAPS Entreprise. June 2011
MIGRATION OF LEGACY APPLICATIONS TO HETEROGENEOUS ARCHITECTURES Francois Bodin, CTO, CAPS Entreprise June 2011 FREE LUNCH IS OVER, CODES HAVE TO MIGRATE! Many existing legacy codes needs to migrate to
More informationBuilding supercomputers from embedded technologies
http://www.montblanc-project.eu Building supercomputers from embedded technologies Alex Ramirez Barcelona Supercomputing Center Technical Coordinator This project and the research leading to these results
More informationOffloading Java to Graphics Processors
Offloading Java to Graphics Processors Peter Calvert (prc33@cam.ac.uk) University of Cambridge, Computer Laboratory Abstract Massively-parallel graphics processors have the potential to offer high performance
More information