Tools zur Op+mierung eingebe2eter Mul+core- Systeme. Bernhard Bauer

Size: px
Start display at page:

Download "Tools zur Op+mierung eingebe2eter Mul+core- Systeme. Bernhard Bauer"

Transcription

1 Tools zur Op+mierung eingebe2eter Mul+core- Systeme Bernhard Bauer

2 Agenda Mo+va+on So.ware Engineering & Mul5core Think Parallel Models Added Value Tooling Quo Vadis?

3 The Mul5core Era Moore s Law: The number of transistors on integrated circuits doubles approximately every two years.

4 The Mul5core Era Moore s Law: The number of transistors on integrated circuits doubles approximately every two years. However

5 The Mul5core Era SuKer, 2005: The free lunch is over & more performance is in demand! Paralleliza5on: Par55oning Synchroniza5on Todays So.ware? Risks: decreasing quality (complexity) much synchroniza5on overhead side effects (emergence) Think Parallel

6 Granularity & Par55oning How to find an appropriate granularity together with a par55oning strategy that splits the system up into parts that are as independent as possible? fork join

7 Timing and Scheduling Division of tasks in smaller sub- tasks (with equal execu5on 5me) Sub- tasks get a pseudo- deadline, overlapping- bit and group- deadline, depending on the task- weight Sub- tasks are scheduled by these proper5es Adapted from WEMUCS ESE 2014

8 Synchroniza5on How to handle the necessitated synchroniza5on including the reduc5on of exchanged data as well as the detec5on and resolving of conflicts? Aspects: dependency types how & when to include? new problems:» fine- grained synchroniza5on - expensive» side effects: data races, dead locks, priority inversion» automa5on impossible» avoidance fork join

9 Agenda Mo5va5on So;ware Engineering & Mul+core Think Parallel Models Added Value Tooling Quo Vadis?

10 SW- Migra5on So.ware Methodologies for distributed systems Sequen5al Program Decomposi5on RE RE RE RE RE RE RE RE RE RE RE Assignment Task Task Task Task Orchestra5on Task Task Task Task Mapping Core Core Core Core Decomposi5on Iden5fy concurrency and decide at what level to exploit it Break up computa5on into REs to be divided among processes REs may become available dynamically Number of REs may vary with 5me Enough REs to keep processors busy Number of REs available at a 5me is upper bound on achievable speedup Assignment (Granularity) Specify mechanism to divide work among core» Balance work and reduce communica5on Structured approaches usually work well» Code inspec5on or understanding of applica5on As programmers, we worry about par55oning first» Independent of architecture or programming model» But complexity o.en affect decisions! Orchestra5on and Mapping (Locality) Computa5on and communica5on concurrency Preserve locality of data Schedule REs to sa5sfy dependences early

11 Design Examples Decomposi5on Goal: Parallelism on high level of abstraction Could be derived from exis5ng SW? <<algorithm>> Compute Speed Adjustement func1() {.... } func2() {.... } func3() {.... }

12 Design Examples Decomposi5on and Assignment Task and Data Partitioning Grouping of Tasks with high communication etc. «algorithm>» Compute Speed Adjustement Task 2 Task 3 calculate Task 1 Task 4 «entity» DesiredSpeed & CurrentSpeed Partition 1 «algorithm» Compute Speed Adjustement Task 2 calculate Task 1 Task 3 Task 4 «entity» DesiredSpeed & CurrentSpeed Partition 1 Task 5 Task 6 Task 7 Task 8 Partition 2 Task 5 Task 6 Task 7 Task 8 Partition 2 Task 9 Task 10 Partition 3 Task 9 Task 10 Partition 3 outputthrottlevalue outputthrottlevalue

13 Design Examples Orchestra5on calculate Nur Lesend Lesend und Schreibend Datenlokalität <<algorithm>> Compute Speed Adjustement Task 1 Taskgruppe B <<entity>> DesiredSpeed and CurrentSpeed Partition 1 Taskgruppe A Task 2 Task 3 Task 4 Partition 2 Task 5 Task 6 Task 7 Taskgruppe C Task 8 Partition 3 Task 9 Task 10 outputthrottle Value

14 Agenda Mo5va5on So.ware Engineering & Mul5core Think Parallel Models Added Value Tooling Quo Vadis?

15 Use Case AUTOSAR (image from

16 Tool Chain AUTOSAR Modell Tracing Trace- Informa5on OT 1 Voranalysis AUTOSAR Par55oning DDA Deployment Tasks & Scheduling TA Tool Suite

17 DDA- Tool 15. Januar

18 DDA- Tool

19 DDA- Tool

20 DDA- Tool - Par55oning

21 DDA- Tool: Filter and Metrics

22 DDA- Tool: Conflict resolu5on

23 DDA- Tool: Real World Case Study

24 DDA- Tool: Real World Case Study

25 HW/SW Co- Simula5on TA Tool Suite HW/SW Co-Simulation Stimulation / Sampling HW/SW Co-Simulation Application SW Operating System Middleware Processor Event-Trace Evaluation

26 Deployment Approach - Execu5on Runnable Task Mapping Task Core Mapping OS Configuration R1 R3 R2 R4 R11 P(10) R6 R9 R8 R13 R12 P(8) P(5) R5 R10 R7 P(4) Synchronization Placement Core 1 Core 2 Core 3 P(3) Execution Sequence Improvement LM 1 LM 2 LM 3 P(3) R8 R9 R10 Bus / Crossbar P(1) R8 R10 R9 R8 R10 R9 SM Flash R10 R8 R9

27 Overview of 5ming analysis techniques

28 Overview of 5ming analysis techniques Pure model based techniques Simulation based techniques Observation of the real world

29 Agenda Mo5va5on So.ware Engineering & Mul5core Think Parallel Models Added Value Tooling Quo Vadis?

30 Con5nuous Development & Op5miza5on Analysis Design Adapted from AMALTHEA

31 Thank you for your

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

UPCRC. Illiac. Gigascale System Research Center. Petascale computing. Cloud Computing Testbed (CCT) 2

UPCRC. Illiac. Gigascale System Research Center. Petascale computing. Cloud Computing Testbed (CCT) 2 Illiac UPCRC Petascale computing Gigascale System Research Center Cloud Computing Testbed (CCT) 2 www.parallel.illinois.edu Mul2 Core: All Computers Are Now Parallel We con'nue to have more transistors

More information

Handling Challenges of Multi-Core Technology in Automotive Software Engineering

Handling Challenges of Multi-Core Technology in Automotive Software Engineering Model Based Development Tools for Embedded Multi-Core Systems Handling Challenges of Multi-Core Technology in Automotive Software Engineering VECTOR INDIA CONFERENCE 2017 Timing-Architects Embedded Systems

More information

Chunking: An Empirical Evalua3on of So7ware Architecture (?)

Chunking: An Empirical Evalua3on of So7ware Architecture (?) Chunking: An Empirical Evalua3on of So7ware Architecture (?) Rachana Koneru David M. Weiss Iowa State University weiss@iastate.edu rachana.koneru@gmail.com With participation by Audris Mockus, Jeff St.

More information

Achieving Predictable Multicore Execution of Automotive Applications Using the LET Paradigm

Achieving Predictable Multicore Execution of Automotive Applications Using the LET Paradigm Achieving Predictable Multicore Execution of Automotive Applications Using the LET Paradigm Alessandro Biondi and Marco Di Natale Scuola Superiore Sant Anna, Pisa, Italy Introduction The introduction of

More information

Transac'onal Libraries Alexander Spiegelman *, Guy Golan-Gueta, and Idit Keidar * Technion Yahoo Research

Transac'onal Libraries Alexander Spiegelman *, Guy Golan-Gueta, and Idit Keidar * Technion Yahoo Research Transac'onal Libraries Alexander Spiegelman *, Guy Golan-Gueta, and Idit Keidar * * Technion Yahoo Research 1 Mul'-Threading is Everywhere 2 Agenda Mo@va@on Concurrent Data Structure Libraries (CDSLs)

More information

Threads. COMP 401 Fall 2017 Lecture 22

Threads. COMP 401 Fall 2017 Lecture 22 Threads COMP 401 Fall 2017 Lecture 22 Threads As a generic term Abstrac>on for program execu>on Current point of execu>on. Call stack. Contents of memory. The fundamental unit of processing that can be

More information

There is a tempta7on to say it is really used, it must be good

There is a tempta7on to say it is really used, it must be good Notes from reviews Dynamo Evalua7on doesn t cover all design goals (e.g. incremental scalability, heterogeneity) Is it research? Complexity? How general? Dynamo Mo7va7on Normal database not the right fit

More information

CS 31: Intro to Systems Threading & Parallel Applications. Kevin Webb Swarthmore College November 27, 2018

CS 31: Intro to Systems Threading & Parallel Applications. Kevin Webb Swarthmore College November 27, 2018 CS 31: Intro to Systems Threading & Parallel Applications Kevin Webb Swarthmore College November 27, 2018 Reading Quiz Making Programs Run Faster We all like how fast computers are In the old days (1980

More information

ECSE 425 Lecture 1: Course Introduc5on Bre9 H. Meyer

ECSE 425 Lecture 1: Course Introduc5on Bre9 H. Meyer ECSE 425 Lecture 1: Course Introduc5on 2011 Bre9 H. Meyer Staff Instructor: Bre9 H. Meyer, Professor of ECE Email: bre9 dot meyer at mcgill.ca Phone: 514-398- 4210 Office: McConnell 525 OHs: M 14h00-15h00;

More information

Sec$on 4: Parallel Algorithms. Michelle Ku8el

Sec$on 4: Parallel Algorithms. Michelle Ku8el Sec$on 4: Parallel Algorithms Michelle Ku8el mku8el@cs.uct.ac.za The DAG, or cost graph A program execu$on using fork and join can be seen as a DAG (directed acyclic graph) Nodes: Pieces of work Edges:

More information

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 14 Parallelism in Software I

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 14 Parallelism in Software I EE382 (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 14 Parallelism in Software I Mattan Erez The University of Texas at Austin EE382: Parallelilsm and Locality, Spring 2015

More information

Execu&on Templates: Caching Control Plane Decisions for Strong Scaling of Data Analy&cs

Execu&on Templates: Caching Control Plane Decisions for Strong Scaling of Data Analy&cs Execu&on Templates: Caching Control Plane Decisions for Strong Scaling of Data Analy&cs Omid Mashayekhi Hang Qu Chinmayee Shah Philip Levis July 13, 2017 2 Cloud Frameworks SQL Streaming Machine Learning

More information

Amol Deshpande, University of Maryland Lisa Hellerstein, Polytechnic University, Brooklyn

Amol Deshpande, University of Maryland Lisa Hellerstein, Polytechnic University, Brooklyn Amol Deshpande, University of Maryland Lisa Hellerstein, Polytechnic University, Brooklyn Mo>va>on: Parallel Query Processing Increasing parallelism in compu>ng Shared nothing clusters, mul> core technology,

More information

Why do we care about parallel?

Why do we care about parallel? Threads 11/15/16 CS31 teaches you How a computer runs a program. How the hardware performs computations How the compiler translates your code How the operating system connects hardware and software The

More information

Workloads Programmierung Paralleler und Verteilter Systeme (PPV)

Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Workloads 2 Hardware / software execution environment

More information

Efficient Memory and Bandwidth Management for Industrial Strength Kirchhoff Migra<on

Efficient Memory and Bandwidth Management for Industrial Strength Kirchhoff Migra<on Efficient Memory and Bandwidth Management for Industrial Strength Kirchhoff Migra

More information

Hypergraph Sparsifica/on and Its Applica/on to Par//oning

Hypergraph Sparsifica/on and Its Applica/on to Par//oning Hypergraph Sparsifica/on and Its Applica/on to Par//oning Mehmet Deveci 1,3, Kamer Kaya 1, Ümit V. Çatalyürek 1,2 1 Dept. of Biomedical Informa/cs, The Ohio State University 2 Dept. of Electrical & Computer

More information

Lecture 2: Processes. CSE 120: Principles of Opera9ng Systems. UC San Diego: Summer Session I, 2009 Frank Uyeda

Lecture 2: Processes. CSE 120: Principles of Opera9ng Systems. UC San Diego: Summer Session I, 2009 Frank Uyeda Lecture 2: Processes CSE 120: Principles of Opera9ng Systems UC San Diego: Summer Session I, 2009 Frank Uyeda Announcements PeerWise accounts are now live. First PeerWise ques9ons/reviews due tomorrow

More information

Network Coding: Theory and Applica7ons

Network Coding: Theory and Applica7ons Network Coding: Theory and Applica7ons PhD Course Part IV Tuesday 9.15-12.15 18.6.213 Muriel Médard (MIT), Frank H. P. Fitzek (AAU), Daniel E. Lucani (AAU), Morten V. Pedersen (AAU) Plan Hello World! Intra

More information

Understanding the Interleaving Space Overlap across Inputs and So7ware Versions

Understanding the Interleaving Space Overlap across Inputs and So7ware Versions Understanding the Interleaving Space Overlap across Inputs and So7ware Versions Dongdong Deng, Wei Zhang, Borui Wang, Peisen Zhao, Shan Lu University of Wisconsin, Madison 1 Concurrency bug detec3on is

More information

Habanero-Java Library: a Java 8 Framework for Multicore Programming

Habanero-Java Library: a Java 8 Framework for Multicore Programming Habanero-Java Library: a Java 8 Framework for Multicore Programming PPPJ 2014 September 25, 2014 Shams Imam, Vivek Sarkar shams@rice.edu, vsarkar@rice.edu Rice University https://wiki.rice.edu/confluence/display/parprog/hj+library

More information

The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling

The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling Will Smith and Elizabeth Fehrmann May 23, 2006 Multiple Processor Systems Dr. Muhammad Shaaban Overview Serial Compilers Parallel Compilers

More information

EE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 11 Parallelism in Software II

EE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 11 Parallelism in Software II EE382 (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 11 Parallelism in Software II Mattan Erez The University of Texas at Austin EE382: Parallelilsm and Locality, Fall 2011 --

More information

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable) CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable) Past & Present Have looked at two constraints: Mutual exclusion constraint between two events is a requirement that

More information

Module 20: Multi-core Computing Multi-processor Scheduling Lecture 39: Multi-processor Scheduling. The Lecture Contains: User Control.

Module 20: Multi-core Computing Multi-processor Scheduling Lecture 39: Multi-processor Scheduling. The Lecture Contains: User Control. The Lecture Contains: User Control Reliability Requirements of RT Multi-processor Scheduling Introduction Issues With Multi-processor Computations Granularity Fine Grain Parallelism Design Issues A Possible

More information

MPI & OpenMP Mixed Hybrid Programming

MPI & OpenMP Mixed Hybrid Programming MPI & OpenMP Mixed Hybrid Programming Berk ONAT İTÜ Bilişim Enstitüsü 22 Haziran 2012 Outline Introduc/on Share & Distributed Memory Programming MPI & OpenMP Advantages/Disadvantages MPI vs. OpenMP Why

More information

Towards a Real Time Communica3on Framework for Wireless Sensor Networks

Towards a Real Time Communica3on Framework for Wireless Sensor Networks Towards a Real Time Communica3on Framework for Wireless Sensor Networks Chenyang Lu Department of Computer Science and Engineering Applica3on challenges High data rate Low latency Priori;za;on Predictability

More information

Scalability in a Real-Time Decision Platform

Scalability in a Real-Time Decision Platform Scalability in a Real-Time Decision Platform Kenny Shi Manager Software Development ebay Inc. A Typical Fraudulent Lis3ng fraud detec3on architecture sync vs. async applica3on publish messaging bus request

More information

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004 A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into

More information

Principles of Parallel Algorithm Design: Concurrency and Decomposition

Principles of Parallel Algorithm Design: Concurrency and Decomposition Principles of Parallel Algorithm Design: Concurrency and Decomposition John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 2 12 January 2017 Parallel

More information

Parallel Programming Concepts. Parallel Algorithms. Peter Tröger

Parallel Programming Concepts. Parallel Algorithms. Peter Tröger Parallel Programming Concepts Parallel Algorithms Peter Tröger Sources: Ian Foster. Designing and Building Parallel Programs. Addison-Wesley. 1995. Mattson, Timothy G.; S, Beverly A.; ers,; Massingill,

More information

Introduc4on to OpenMP and Threaded Libraries Ivan Giro*o

Introduc4on to OpenMP and Threaded Libraries Ivan Giro*o Introduc4on to OpenMP and Threaded Libraries Ivan Giro*o igiro*o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) OUTLINE Shared Memory Architectures

More information

I-1 Introduction. I-0 Introduction. Objectives:

I-1 Introduction. I-0 Introduction. Objectives: I-0 Introduction Objectives: Explain necessity of parallel/multithreaded algorithms. Describe different forms of parallel processing. Present commonly used architectures. Introduce a few basic terms. Comments:

More information

Architectures, and Protocol Design Issues for Mobile Social Networks: A Survey

Architectures, and Protocol Design Issues for Mobile Social Networks: A Survey Applica@ons, Architectures, and Protocol Design Issues for Mobile Social Networks: A Survey N. Kayastha,D. Niyato, P. Wang and E. Hossain, Proceedings of the IEEEVol. 99, No. 12, Dec. 2011. Sabita Maharjan

More information

AMDC 2017 Liviona Multi-Core in Automotive Powertrain and Next Steps Towards Parallelization

AMDC 2017 Liviona Multi-Core in Automotive Powertrain and Next Steps Towards Parallelization Bitte decken Sie die schraffierte Fläche mit einem Bild ab. Please cover the shaded area with a picture. (24,4 x 11,0 cm) AMDC 2017 Liviona Multi-Core in Automotive Powertrain and Ralph Mader, 25. April

More information

A Model-based Approach for Conditioning Software to Multi-Core using AUTOSAR

A Model-based Approach for Conditioning Software to Multi-Core using AUTOSAR Model Based Development Tools for Embedded Multi-Core Systems A Model-based Approach for Conditioning Software to Multi-Core using AUTOSAR 9 th AUTOSAR Open Conference in Gothenburg Timing-Architects Embedded

More information

SEDA An architecture for Well Condi6oned, scalable Internet Services

SEDA An architecture for Well Condi6oned, scalable Internet Services SEDA An architecture for Well Condi6oned, scalable Internet Services Ma= Welsh, David Culler, and Eric Brewer University of California, Berkeley Symposium on Operating Systems Principles (SOSP), October

More information

RaceMob: Crowdsourced Data Race Detec,on

RaceMob: Crowdsourced Data Race Detec,on RaceMob: Crowdsourced Data Race Detec,on Baris Kasikci, Cris,an Zamfir, and George Candea School of Computer & Communica3on Sciences Data Races to shared memory loca,on By mul3ple threads At least one

More information

CrowdCode: A Platform for Crowd Development

CrowdCode: A Platform for Crowd Development CrowdCode: A Platform for Crowd Development Thomas D. LaToza 1, Eric Chiquillo 1, 2, W. Ben Towne 3, Christian M. Adriano 1, André van der Hoek 1 1 University of California, Irvine 2 Zynga 3 Carnegie Mellon

More information

Parallelism Marco Serafini

Parallelism Marco Serafini Parallelism Marco Serafini COMPSCI 590S Lecture 3 Announcements Reviews First paper posted on website Review due by this Wednesday 11 PM (hard deadline) Data Science Career Mixer (save the date!) November

More information

CSC630/COS781: Parallel & Distributed Computing

CSC630/COS781: Parallel & Distributed Computing CSC630/COS781: Parallel & Distributed Computing Algorithm Design Chapter 3 (3.1-3.3) 1 Contents Preliminaries of parallel algorithm design Decomposition Task dependency Task dependency graph Granularity

More information

Virtual Synchrony. Jared Cantwell

Virtual Synchrony. Jared Cantwell Virtual Synchrony Jared Cantwell Review Mul7cast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed file systems Goal Distributed programming is hard What

More information

Instructor: Randy H. Katz hbp://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #16. Warehouse Scale Computer

Instructor: Randy H. Katz hbp://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #16. Warehouse Scale Computer CS 61C: Great Ideas in Computer Architecture OpenMP Instructor: Randy H. Katz hbp://inst.eecs.berkeley.edu/~cs61c/fa13 10/23/13 Fall 2013 - - Lecture #16 1 New- School Machine Structures (It s a bit more

More information

Combinatorial Mathema/cs and Algorithms at Exascale: Challenges and Promising Direc/ons

Combinatorial Mathema/cs and Algorithms at Exascale: Challenges and Promising Direc/ons Combinatorial Mathema/cs and Algorithms at Exascale: Challenges and Promising Direc/ons Assefaw Gebremedhin Purdue University (Star/ng August 2014, Washington State University School of Electrical Engineering

More information

Reusability of So/ware- Defined Networking Applica=ons: A Run=me, Mul=- Controller Approach

Reusability of So/ware- Defined Networking Applica=ons: A Run=me, Mul=- Controller Approach Reusability of So/ware- Defined Networking Applica=ons: A Run=me, Mul=- Controller Approach Roberto Doriguzzi Corin (CREATE- NET), Pedro A. Aranda Gu=érrez (Telefonica), Elisa Rojas (Telcaria), Holger

More information

Origin- des*na*on Flow Measurement in High- Speed Networks

Origin- des*na*on Flow Measurement in High- Speed Networks IEEE INFOCOM, 2012 Origin- des*na*on Flow Measurement in High- Speed Networks Tao Li Shigang Chen Yan Qiao Introduc*on (Defini*ons) Origin- des+na+on flow between two routers is the set of packets that

More information

Outline. In Situ Data Triage and Visualiza8on

Outline. In Situ Data Triage and Visualiza8on In Situ Data Triage and Visualiza8on Kwan- Liu Ma University of California at Davis Outline In situ data triage and visualiza8on: Issues and strategies Case study: An earthquake simula8on Case study: A

More information

Huge market -- essentially all high performance databases work this way

Huge market -- essentially all high performance databases work this way 11/5/2017 Lecture 16 -- Parallel & Distributed Databases Parallel/distributed databases: goal provide exactly the same API (SQL) and abstractions (relational tables), but partition data across a bunch

More information

Shared- Memory Programming in OpenMP Advanced Research Computing

Shared- Memory Programming in OpenMP Advanced Research Computing Shared- Memory Programming in OpenMP Advanced Research Computing Outline What is OpenMP? How does OpenMP work? Architecture Fork- join model of parallelism Communica:on OpenMP constructs Direc:ves Run:me

More information

Eureka! Task Teams! Kyle Wheeler SC 12 Chapel Lightning Talk SAND: P

Eureka! Task Teams! Kyle Wheeler SC 12 Chapel Lightning Talk SAND: P GO 08012011 Eureka! Task Teams! Kyle Wheeler SC 12 Chapel Lightning Talk Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary

More information

Confinement (Running Untrusted Programs)

Confinement (Running Untrusted Programs) Confinement (Running Untrusted Programs) Chester Rebeiro Indian Institute of Technology Madras Untrusted Programs How to run untrusted programs and not harm your system? Answer: Confinement (some:mes called

More information

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2 Introduction :- Today single CPU based architecture is not capable enough for the modern database that are required to handle more demanding and complex requirements of the users, for example, high performance,

More information

HPCSoC Modeling and Simulation Implications

HPCSoC Modeling and Simulation Implications Department Name (View Master > Edit Slide 1) HPCSoC Modeling and Simulation Implications (Sharing three concerns from an academic research user perspective using free, open tools. Solutions left to the

More information

CSE Opera,ng System Principles

CSE Opera,ng System Principles CSE 30341 Opera,ng System Principles Lecture 5 Processes / Threads Recap Processes What is a process? What is in a process control bloc? Contrast stac, heap, data, text. What are process states? Which

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen

More information

COSC 310: So*ware Engineering. Dr. Bowen Hui University of Bri>sh Columbia Okanagan

COSC 310: So*ware Engineering. Dr. Bowen Hui University of Bri>sh Columbia Okanagan COSC 310: So*ware Engineering Dr. Bowen Hui University of Bri>sh Columbia Okanagan 1 Admin A2 is up Don t forget to keep doing peer evalua>ons Deadline can be extended but shortens A3 >meframe Labs This

More information

Chapter 4: Multithreaded Programming

Chapter 4: Multithreaded Programming Chapter 4: Multithreaded Programming Silberschatz, Galvin and Gagne 2013 Chapter 4: Multithreaded Programming Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading

More information

Op#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD

Op#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD Op#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD Riyaz Haque and David F. Richards This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore

More information

Enhancing Feature Interfaces for Suppor8ng So9ware Product Line Maintenance

Enhancing Feature Interfaces for Suppor8ng So9ware Product Line Maintenance Enhancing Feature Interfaces for Suppor8ng So9ware Product Line Maintenance Bruno B. P. Cafeo LES DI PUC- Rio - Brazil OPUS Group Introduc

More information

Map- Reduce. Everything Data CompSci Spring 2014

Map- Reduce. Everything Data CompSci Spring 2014 Map- Reduce Everything Data CompSci 290.01 Spring 2014 2 Announcements (Thu. Feb 27) Homework #8 will be posted by noon tomorrow. Project deadlines: 2/25: Project team formation 3/4: Project Proposal is

More information

The LOCUS Distributed Operating System

The LOCUS Distributed Operating System The LOCUS Distributed Operating System Bruce Walker, Gerald Popek, Robert English, Charles Kline and Greg Thiel University of California at Los Angeles 1983 Presented By Quan(Cary) Zhang LOCUS is not just

More information

Heterogeneous Resources Management In Modern Data Centers with Dynamic Workloads Ningfang Mi

Heterogeneous Resources Management In Modern Data Centers with Dynamic Workloads Ningfang Mi Heterogeneous Resources Management In Modern Data Centers with Dynamic Workloads Ningfang Mi Electrical and Computer Engineering Dept. Northeastern University ningfang@ece.neu.edu 1 Research Focus To investigate

More information

The Migra*on of Safety- Cri*cal RT So6ware to Mul*core. Marco Caccamo University of Illinois at Urbana- Champaign

The Migra*on of Safety- Cri*cal RT So6ware to Mul*core. Marco Caccamo University of Illinois at Urbana- Champaign The Migra*on of Safety- Cri*cal RT So6ware to Mul*core Marco Caccamo University of Illinois at Urbana- Champaign Outline Mo*va*on Memory- centric scheduling theory Background: PRedictable Execu*on Model

More information

A Script- Based Autotuning Compiler System to Generate High- Performance CUDA code

A Script- Based Autotuning Compiler System to Generate High- Performance CUDA code A Script- Based Autotuning Compiler System to Generate High- Performance CUDA code Malik Khan, Protonu Basu, Gabe Rudy, Mary Hall, Chun Chen, Jacqueline Chame Mo:va:on Challenges to programming the GPU

More information

Security does not live on UI level T

Security does not live on UI level T Security does not live on UI level T-1105220 LECTURE 28032013 Jarmo Parkkinen What would google do? Google 2 step sign in surface Normal website user name + password Verifica9on code SMS or voice 6 digits

More information

ECE519 Advanced Operating Systems

ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (10 th Week) (Advanced) Operating Systems 10. Multiprocessor, Multicore and Real-Time Scheduling 10. Outline Multiprocessor

More information

ProAc&ve Rou&ng In Scalable Data Centers with PARIS

ProAc&ve Rou&ng In Scalable Data Centers with PARIS ProAc&ve Rou&ng In Scalable Data Centers with PARIS Theophilus Benson Duke University Joint work with Dushyant Arora + and Jennifer Rexford* + Arista Networks *Princeton University Data Center Networks

More information

High-Level Synthesis Creating Custom Circuits from High-Level Code

High-Level Synthesis Creating Custom Circuits from High-Level Code High-Level Synthesis Creating Custom Circuits from High-Level Code Hao Zheng Comp Sci & Eng University of South Florida Exis%ng Design Flow Register-transfer (RT) synthesis - Specify RT structure (muxes,

More information

Administrivia. Talks and other opportunities: Expect HW on functions in ASM (printing binary trees) soon

Administrivia. Talks and other opportunities: Expect HW on functions in ASM (printing binary trees) soon Threads 2/9/18 Administrivia Talks and other opportunities: Game designer and developer talk: Wed noon, Alumni Hall Room 302 (extra credit!) Networking, resume, interview: Wed 4pm, Alumni Hall Room 219

More information

CLOUD SERVICES. Cloud Value Assessment.

CLOUD SERVICES. Cloud Value Assessment. CLOUD SERVICES Cloud Value Assessment www.cloudcomrade.com Comrade a companion who shares one's ac8vi8es or is a fellow member of an organiza8on 2 Today s Agenda! Why Companies Should Consider Moving Business

More information

Exploring different level of parallelism Instruction-level parallelism (ILP): how many of the operations/instructions in a computer program can be performed simultaneously 1. e = a + b 2. f = c + d 3.

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #12 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Last class Outline

More information

Trends and Challenges in Multicore Programming

Trends and Challenges in Multicore Programming Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores

More information

CS 61C: Great Ideas in Computer Architecture. Synchroniza+on, OpenMP. Senior Lecturer SOE Dan Garcia

CS 61C: Great Ideas in Computer Architecture. Synchroniza+on, OpenMP. Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Synchroniza+on, OpenMP Senior Lecturer SOE Dan Garcia 1 Review of Last Lecture Mul@processor systems uses shared memory (single address space) Cache coherence

More information

Computer Architecture Crash course

Computer Architecture Crash course Computer Architecture Crash course Frédéric Haziza Department of Computer Systems Uppsala University Summer 2008 Conclusions The multicore era is already here cost of parallelism is dropping

More information

Urb- IoT Developing a RESTful Communica>on Protocol and an Energy Op>miza>on Algorithm for a Connected Sustainable Home

Urb- IoT Developing a RESTful Communica>on Protocol and an Energy Op>miza>on Algorithm for a Connected Sustainable Home Urb- IoT 2014 Developing a RESTful Communica>on Protocol and an Energy Op>miza>on Algorithm for a Connected Sustainable Home So$rios D. Kotsopoulos, Federico Casalegno, Wesley Graybill, Adrià Recasens

More information

G Programming Languages Spring 2010 Lecture 13. Robert Grimm, New York University

G Programming Languages Spring 2010 Lecture 13. Robert Grimm, New York University G22.2110-001 Programming Languages Spring 2010 Lecture 13 Robert Grimm, New York University 1 Review Last week Exceptions 2 Outline Concurrency Discussion of Final Sources for today s lecture: PLP, 12

More information

Claude TADONKI. MINES ParisTech PSL Research University Centre de Recherche Informatique

Claude TADONKI. MINES ParisTech PSL Research University Centre de Recherche Informatique Got 2 seconds Sequential 84 seconds Expected 84/84 = 1 second!?! Got 25 seconds MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Séminaire MATHEMATIQUES

More information

h7ps://bit.ly/citustutorial

h7ps://bit.ly/citustutorial Before We Start Setup a Citus Cloud account for the exercises: h7ps://bit.ly/citustutorial Designing a Mul

More information

CS 475. Process = Address space + one thread of control Concurrent program = multiple threads of control

CS 475. Process = Address space + one thread of control Concurrent program = multiple threads of control Processes & Threads Concurrent Programs Process = Address space + one thread of control Concurrent program = multiple threads of control Multiple single-threaded processes Multi-threaded process 2 1 Concurrent

More information

Introduction to Parallel Performance Engineering

Introduction to Parallel Performance Engineering Introduction to Parallel Performance Engineering Markus Geimer, Brian Wylie Jülich Supercomputing Centre (with content used with permission from tutorials by Bernd Mohr/JSC and Luiz DeRose/Cray) Performance:

More information

Profiling & Tuning Applica1ons. CUDA Course July István Reguly

Profiling & Tuning Applica1ons. CUDA Course July István Reguly Profiling & Tuning Applica1ons CUDA Course July 21-25 István Reguly Introduc1on Why is my applica1on running slow? Work it out on paper Instrument code Profile it NVIDIA Visual Profiler Works with CUDA,

More information

MapReduce, Apache Hadoop

MapReduce, Apache Hadoop Czech Technical University in Prague, Faculty of Informaon Technology MIE-PDB: Advanced Database Systems hp://www.ksi.mff.cuni.cz/~svoboda/courses/2016-2-mie-pdb/ Lecture 12 MapReduce, Apache Hadoop Marn

More information

Lecture #8: Performance or How I Learned to Stop Worrying and Love the Parallelism

Lecture #8: Performance or How I Learned to Stop Worrying and Love the Parallelism Lecture #8: Performance or How I Learned to Stop Worrying and Love the Parallelism Paul Hartke Phartke@stanford.edu Stanford EE183 April 29, 2002 Lab Stuff Lab #1 writeup due TODAY at midnight Keep considering

More information

Many-cores: Supercomputer-on-chip How many? And how? (how not to?)

Many-cores: Supercomputer-on-chip How many? And how? (how not to?) Many-cores: Supercomputer-on-chip How many? And how? (how not to?) 1 Ran Ginosar Technion Feb 2009 Disclosure and Ack I am co-inventor / co-founder of Plurality Based on 30 years of (on/off) research Presentation

More information

Timers 1 / 46. Jiffies. Potent and Evil Magic

Timers 1 / 46. Jiffies. Potent and Evil Magic Timers 1 / 46 Jiffies Each timer tick, a variable called jiffies is incremented It is thus (roughly) the number of HZ since system boot A 32-bit counter incremented at 1000 Hz wraps around in about 50

More information

Hardware-Software Codesign. 1. Introduction

Hardware-Software Codesign. 1. Introduction Hardware-Software Codesign 1. Introduction Lothar Thiele 1-1 Contents What is an Embedded System? Levels of Abstraction in Electronic System Design Typical Design Flow of Hardware-Software Systems 1-2

More information

MapReduce, Apache Hadoop

MapReduce, Apache Hadoop NDBI040: Big Data Management and NoSQL Databases hp://www.ksi.mff.cuni.cz/ svoboda/courses/2016-1-ndbi040/ Lecture 2 MapReduce, Apache Hadoop Marn Svoboda svoboda@ksi.mff.cuni.cz 11. 10. 2016 Charles University

More information

DD2451 Parallel and Distributed Computing --- FDD3008 Distributed Algorithms

DD2451 Parallel and Distributed Computing --- FDD3008 Distributed Algorithms DD2451 Parallel and Distributed Computing --- FDD3008 Distributed Algorithms Lecture 8 Leader Election Mads Dam Autumn/Winter 2011 Previously... Consensus for message passing concurrency Crash failures,

More information

MPI Performance Analysis Trace Analyzer and Collector

MPI Performance Analysis Trace Analyzer and Collector MPI Performance Analysis Trace Analyzer and Collector Berk ONAT İTÜ Bilişim Enstitüsü 19 Haziran 2012 Outline MPI Performance Analyzing Defini6ons: Profiling Defini6ons: Tracing Intel Trace Analyzer Lab:

More information

Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis

Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis Elif Dede, Madhusudhan Govindaraju Lavanya Ramakrishnan, Dan Gunter, Shane Canon Department of Computer Science, Binghamton

More information

Concurrency & Parallelism, 10 mi

Concurrency & Parallelism, 10 mi The Beauty and Joy of Computing Lecture #7 Concurrency Instructor : Sean Morris Quest (first exam) in 5 days!! In this room! Concurrency & Parallelism, 10 mi up Intra-computer Today s lecture Multiple

More information

Von Irrwegen und Zukunftstrends bei Mikrocontrollern Marcus Gößler

Von Irrwegen und Zukunftstrends bei Mikrocontrollern Marcus Gößler Quo vadis, Multicore? Von Irrwegen und Zukunftstrends bei Mikrocontrollern Marcus Gößler Crispy Quotes the "not parallel" era will appear to be a very primitive time in the history of computers when people

More information

Developing AUTOSAR Compliant Embedded Software Senior Application Engineer Sang-Ho Yoon

Developing AUTOSAR Compliant Embedded Software Senior Application Engineer Sang-Ho Yoon Developing AUTOSAR Compliant Embedded Software Senior Application Engineer Sang-Ho Yoon 2015 The MathWorks, Inc. 1 Agenda AUTOSAR Compliant Code Generation AUTOSAR Workflows Starting from Software Component

More information

WaveScalar. Winter 2006 CSE WaveScalar 1

WaveScalar. Winter 2006 CSE WaveScalar 1 WaveScalar Dataflow machine good at exploiting ILP dataflow parallelism traditional coarser-grain parallelism cheap thread management memory ordering enforced through wave-ordered memory Winter 2006 CSE

More information

ON THE REUSE OF RTL ASSERTIONS IN SYSTEMC TLM VERIFICATION

ON THE REUSE OF RTL ASSERTIONS IN SYSTEMC TLM VERIFICATION ON THE REUSE OF RTL ASSERTIONS IN SYSTEMC TLM VERIFICATION Nicola Bombieri 1,2 Franco Fummi 1,2, Graziano Pravadelli 1,2, Valerio Garnieri 1, Francesco Stefanni 1, Tara Ghasempouri 2, Michele Lora 2, Giovanni

More information

Op#mizing MapReduce for Highly- Distributed Environments

Op#mizing MapReduce for Highly- Distributed Environments Op#mizing MapReduce for Highly- Distributed Environments Abhishek Chandra Associate Professor Department of Computer Science and Engineering University of Minnesota hep://www.cs.umn.edu/~chandra 1 Big

More information

How to sleep *ght and keep your applica*ons running on IPv6 transi*on. The importance of IPv6 Applica*on Tes*ng

How to sleep *ght and keep your applica*ons running on IPv6 transi*on. The importance of IPv6 Applica*on Tes*ng How to sleep *ght and keep your applica*ons running on IPv6 transi*on The importance of IPv6 Applica*on Tes*ng About this presenta*on It presents a generic methodology to test the IPv6 func*onality of

More information

Replicate and Migrate Objects in the Run5me not Cache Lines or Pages in Hardware

Replicate and Migrate Objects in the Run5me not Cache Lines or Pages in Hardware Replicate and Migrate Objects in the Run5me not Cache Lines or Pages in Hardware Manolis Katevenis FORTH ICS and Univ. of Crete, Greece BMW October 2010 FORTH Acknowledgements Alex Ramirez Dimitris Nikolopoulos

More information