Tools zur Op+mierung eingebe2eter Mul+core- Systeme. Bernhard Bauer

Similar documents
An Introduction to Parallel Programming

UPCRC. Illiac. Gigascale System Research Center. Petascale computing. Cloud Computing Testbed (CCT) 2

Handling Challenges of Multi-Core Technology in Automotive Software Engineering

Chunking: An Empirical Evalua3on of So7ware Architecture (?)

Achieving Predictable Multicore Execution of Automotive Applications Using the LET Paradigm

Transac'onal Libraries Alexander Spiegelman *, Guy Golan-Gueta, and Idit Keidar * Technion Yahoo Research

Threads. COMP 401 Fall 2017 Lecture 22

There is a tempta7on to say it is really used, it must be good

CS 31: Intro to Systems Threading & Parallel Applications. Kevin Webb Swarthmore College November 27, 2018

ECSE 425 Lecture 1: Course Introduc5on Bre9 H. Meyer

Sec$on 4: Parallel Algorithms. Michelle Ku8el

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 14 Parallelism in Software I

Execu&on Templates: Caching Control Plane Decisions for Strong Scaling of Data Analy&cs

Amol Deshpande, University of Maryland Lisa Hellerstein, Polytechnic University, Brooklyn

Why do we care about parallel?

Workloads Programmierung Paralleler und Verteilter Systeme (PPV)

Efficient Memory and Bandwidth Management for Industrial Strength Kirchhoff Migra<on

Hypergraph Sparsifica/on and Its Applica/on to Par//oning

Lecture 2: Processes. CSE 120: Principles of Opera9ng Systems. UC San Diego: Summer Session I, 2009 Frank Uyeda

Network Coding: Theory and Applica7ons

Understanding the Interleaving Space Overlap across Inputs and So7ware Versions

Habanero-Java Library: a Java 8 Framework for Multicore Programming

The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling

EE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 11 Parallelism in Software II

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

Module 20: Multi-core Computing Multi-processor Scheduling Lecture 39: Multi-processor Scheduling. The Lecture Contains: User Control.

MPI & OpenMP Mixed Hybrid Programming

Towards a Real Time Communica3on Framework for Wireless Sensor Networks

Scalability in a Real-Time Decision Platform

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

Principles of Parallel Algorithm Design: Concurrency and Decomposition

Parallel Programming Concepts. Parallel Algorithms. Peter Tröger

Introduc4on to OpenMP and Threaded Libraries Ivan Giro*o

I-1 Introduction. I-0 Introduction. Objectives:

Architectures, and Protocol Design Issues for Mobile Social Networks: A Survey

AMDC 2017 Liviona Multi-Core in Automotive Powertrain and Next Steps Towards Parallelization

A Model-based Approach for Conditioning Software to Multi-Core using AUTOSAR

SEDA An architecture for Well Condi6oned, scalable Internet Services

RaceMob: Crowdsourced Data Race Detec,on

CrowdCode: A Platform for Crowd Development

Parallelism Marco Serafini

CSC630/COS781: Parallel & Distributed Computing

Virtual Synchrony. Jared Cantwell

Instructor: Randy H. Katz hbp://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #16. Warehouse Scale Computer

Combinatorial Mathema/cs and Algorithms at Exascale: Challenges and Promising Direc/ons

Reusability of So/ware- Defined Networking Applica=ons: A Run=me, Mul=- Controller Approach

Origin- des*na*on Flow Measurement in High- Speed Networks

Outline. In Situ Data Triage and Visualiza8on

Huge market -- essentially all high performance databases work this way

Shared- Memory Programming in OpenMP Advanced Research Computing

Eureka! Task Teams! Kyle Wheeler SC 12 Chapel Lightning Talk SAND: P

Confinement (Running Untrusted Programs)

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

HPCSoC Modeling and Simulation Implications

CSE Opera,ng System Principles

Introduction to Parallel Computing

COSC 310: So*ware Engineering. Dr. Bowen Hui University of Bri>sh Columbia Okanagan

Chapter 4: Multithreaded Programming

Op#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD

Enhancing Feature Interfaces for Suppor8ng So9ware Product Line Maintenance

Map- Reduce. Everything Data CompSci Spring 2014

The LOCUS Distributed Operating System

Heterogeneous Resources Management In Modern Data Centers with Dynamic Workloads Ningfang Mi

The Migra*on of Safety- Cri*cal RT So6ware to Mul*core. Marco Caccamo University of Illinois at Urbana- Champaign

A Script- Based Autotuning Compiler System to Generate High- Performance CUDA code

Security does not live on UI level T

ECE519 Advanced Operating Systems

ProAc&ve Rou&ng In Scalable Data Centers with PARIS

High-Level Synthesis Creating Custom Circuits from High-Level Code

Administrivia. Talks and other opportunities: Expect HW on functions in ASM (printing binary trees) soon

CLOUD SERVICES. Cloud Value Assessment.


EE/CSCI 451: Parallel and Distributed Computation

Trends and Challenges in Multicore Programming

CS 61C: Great Ideas in Computer Architecture. Synchroniza+on, OpenMP. Senior Lecturer SOE Dan Garcia

Computer Architecture Crash course

Urb- IoT Developing a RESTful Communica>on Protocol and an Energy Op>miza>on Algorithm for a Connected Sustainable Home

G Programming Languages Spring 2010 Lecture 13. Robert Grimm, New York University

Claude TADONKI. MINES ParisTech PSL Research University Centre de Recherche Informatique

h7ps://bit.ly/citustutorial

CS 475. Process = Address space + one thread of control Concurrent program = multiple threads of control

Introduction to Parallel Performance Engineering

Profiling & Tuning Applica1ons. CUDA Course July István Reguly

MapReduce, Apache Hadoop

Lecture #8: Performance or How I Learned to Stop Worrying and Love the Parallelism

Many-cores: Supercomputer-on-chip How many? And how? (how not to?)

Timers 1 / 46. Jiffies. Potent and Evil Magic

Hardware-Software Codesign. 1. Introduction

MapReduce, Apache Hadoop

DD2451 Parallel and Distributed Computing --- FDD3008 Distributed Algorithms

MPI Performance Analysis Trace Analyzer and Collector

Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis

Concurrency & Parallelism, 10 mi

Von Irrwegen und Zukunftstrends bei Mikrocontrollern Marcus Gößler

Developing AUTOSAR Compliant Embedded Software Senior Application Engineer Sang-Ho Yoon

WaveScalar. Winter 2006 CSE WaveScalar 1

ON THE REUSE OF RTL ASSERTIONS IN SYSTEMC TLM VERIFICATION

Op#mizing MapReduce for Highly- Distributed Environments

How to sleep *ght and keep your applica*ons running on IPv6 transi*on. The importance of IPv6 Applica*on Tes*ng

Replicate and Migrate Objects in the Run5me not Cache Lines or Pages in Hardware

Transcription:

Tools zur Op+mierung eingebe2eter Mul+core- Systeme Bernhard Bauer

Agenda Mo+va+on So.ware Engineering & Mul5core Think Parallel Models Added Value Tooling Quo Vadis?

The Mul5core Era Moore s Law: The number of transistors on integrated circuits doubles approximately every two years.

The Mul5core Era Moore s Law: The number of transistors on integrated circuits doubles approximately every two years. However

The Mul5core Era SuKer, 2005: The free lunch is over & more performance is in demand! Paralleliza5on: Par55oning Synchroniza5on Todays So.ware? Risks: decreasing quality (complexity) much synchroniza5on overhead side effects (emergence) Think Parallel

Granularity & Par55oning How to find an appropriate granularity together with a par55oning strategy that splits the system up into parts that are as independent as possible? fork join

Timing and Scheduling Division of tasks in smaller sub- tasks (with equal execu5on 5me) Sub- tasks get a pseudo- deadline, overlapping- bit and group- deadline, depending on the task- weight Sub- tasks are scheduled by these proper5es Adapted from WEMUCS Tutorial @ ESE 2014

Synchroniza5on How to handle the necessitated synchroniza5on including the reduc5on of exchanged data as well as the detec5on and resolving of conflicts? Aspects: dependency types how & when to include? new problems:» fine- grained synchroniza5on - expensive» side effects: data races, dead locks, priority inversion» automa5on impossible» avoidance fork join

Agenda Mo5va5on So;ware Engineering & Mul+core Think Parallel Models Added Value Tooling Quo Vadis?

SW- Migra5on So.ware Methodologies for distributed systems Sequen5al Program Decomposi5on RE RE RE RE RE RE RE RE RE RE RE Assignment Task Task Task Task Orchestra5on Task Task Task Task Mapping Core Core Core Core Decomposi5on Iden5fy concurrency and decide at what level to exploit it Break up computa5on into REs to be divided among processes REs may become available dynamically Number of REs may vary with 5me Enough REs to keep processors busy Number of REs available at a 5me is upper bound on achievable speedup Assignment (Granularity) Specify mechanism to divide work among core» Balance work and reduce communica5on Structured approaches usually work well» Code inspec5on or understanding of applica5on As programmers, we worry about par55oning first» Independent of architecture or programming model» But complexity o.en affect decisions! Orchestra5on and Mapping (Locality) Computa5on and communica5on concurrency Preserve locality of data Schedule REs to sa5sfy dependences early

Design Examples Decomposi5on Goal: Parallelism on high level of abstraction Could be derived from exis5ng SW? <<algorithm>> Compute Speed Adjustement func1() {.... } func2() {.... } func3() {.... }

Design Examples Decomposi5on and Assignment Task and Data Partitioning Grouping of Tasks with high communication etc. «algorithm>» Compute Speed Adjustement Task 2 Task 3 calculate Task 1 Task 4 «entity» DesiredSpeed & CurrentSpeed Partition 1 «algorithm» Compute Speed Adjustement Task 2 calculate Task 1 Task 3 Task 4 «entity» DesiredSpeed & CurrentSpeed Partition 1 Task 5 Task 6 Task 7 Task 8 Partition 2 Task 5 Task 6 Task 7 Task 8 Partition 2 Task 9 Task 10 Partition 3 Task 9 Task 10 Partition 3 outputthrottlevalue outputthrottlevalue

Design Examples Orchestra5on calculate Nur Lesend Lesend und Schreibend Datenlokalität <<algorithm>> Compute Speed Adjustement Task 1 Taskgruppe B <<entity>> DesiredSpeed and CurrentSpeed Partition 1 Taskgruppe A Task 2 Task 3 Task 4 Partition 2 Task 5 Task 6 Task 7 Taskgruppe C Task 8 Partition 3 Task 9 Task 10 outputthrottle Value

Agenda Mo5va5on So.ware Engineering & Mul5core Think Parallel Models Added Value Tooling Quo Vadis?

Use Case AUTOSAR (image from http://www.autosar.org/about/technical-overview/)

Tool Chain AUTOSAR Modell Tracing Trace- Informa5on OT 1 Voranalysis AUTOSAR Par55oning DDA Deployment Tasks & Scheduling TA Tool Suite

DDA- Tool 15. Januar 2015 17

DDA- Tool

DDA- Tool

DDA- Tool - Par55oning

DDA- Tool: Filter and Metrics

DDA- Tool: Conflict resolu5on

DDA- Tool: Real World Case Study

DDA- Tool: Real World Case Study

HW/SW Co- Simula5on TA Tool Suite HW/SW Co-Simulation Stimulation / Sampling HW/SW Co-Simulation Application SW Operating System Middleware Processor Event-Trace Evaluation

Deployment Approach - Execu5on Runnable Task Mapping Task Core Mapping OS Configuration R1 R3 R2 R4 R11 P(10) R6 R9 R8 R13 R12 P(8) P(5) R5 R10 R7 P(4) Synchronization Placement Core 1 Core 2 Core 3 P(3) Execution Sequence Improvement LM 1 LM 2 LM 3 P(3) R8 R9 R10 Bus / Crossbar P(1) R8 R10 R9 R8 R10 R9 SM Flash R10 R8 R9

Overview of 5ming analysis techniques 25.11.15

Overview of 5ming analysis techniques Pure model based techniques Simulation based techniques Observation of the real world 25.11.15

Agenda Mo5va5on So.ware Engineering & Mul5core Think Parallel Models Added Value Tooling Quo Vadis?

Con5nuous Development & Op5miza5on Analysis Design Adapted from AMALTHEA

Thank you for your