Tools zur Op+mierung eingebe2eter Mul+core- Systeme. Bernhard Bauer
|
|
- Helena Underwood
- 6 years ago
- Views:
Transcription
1 Tools zur Op+mierung eingebe2eter Mul+core- Systeme Bernhard Bauer
2 Agenda Mo+va+on So.ware Engineering & Mul5core Think Parallel Models Added Value Tooling Quo Vadis?
3 The Mul5core Era Moore s Law: The number of transistors on integrated circuits doubles approximately every two years.
4 The Mul5core Era Moore s Law: The number of transistors on integrated circuits doubles approximately every two years. However
5 The Mul5core Era SuKer, 2005: The free lunch is over & more performance is in demand! Paralleliza5on: Par55oning Synchroniza5on Todays So.ware? Risks: decreasing quality (complexity) much synchroniza5on overhead side effects (emergence) Think Parallel
6 Granularity & Par55oning How to find an appropriate granularity together with a par55oning strategy that splits the system up into parts that are as independent as possible? fork join
7 Timing and Scheduling Division of tasks in smaller sub- tasks (with equal execu5on 5me) Sub- tasks get a pseudo- deadline, overlapping- bit and group- deadline, depending on the task- weight Sub- tasks are scheduled by these proper5es Adapted from WEMUCS ESE 2014
8 Synchroniza5on How to handle the necessitated synchroniza5on including the reduc5on of exchanged data as well as the detec5on and resolving of conflicts? Aspects: dependency types how & when to include? new problems:» fine- grained synchroniza5on - expensive» side effects: data races, dead locks, priority inversion» automa5on impossible» avoidance fork join
9 Agenda Mo5va5on So;ware Engineering & Mul+core Think Parallel Models Added Value Tooling Quo Vadis?
10 SW- Migra5on So.ware Methodologies for distributed systems Sequen5al Program Decomposi5on RE RE RE RE RE RE RE RE RE RE RE Assignment Task Task Task Task Orchestra5on Task Task Task Task Mapping Core Core Core Core Decomposi5on Iden5fy concurrency and decide at what level to exploit it Break up computa5on into REs to be divided among processes REs may become available dynamically Number of REs may vary with 5me Enough REs to keep processors busy Number of REs available at a 5me is upper bound on achievable speedup Assignment (Granularity) Specify mechanism to divide work among core» Balance work and reduce communica5on Structured approaches usually work well» Code inspec5on or understanding of applica5on As programmers, we worry about par55oning first» Independent of architecture or programming model» But complexity o.en affect decisions! Orchestra5on and Mapping (Locality) Computa5on and communica5on concurrency Preserve locality of data Schedule REs to sa5sfy dependences early
11 Design Examples Decomposi5on Goal: Parallelism on high level of abstraction Could be derived from exis5ng SW? <<algorithm>> Compute Speed Adjustement func1() {.... } func2() {.... } func3() {.... }
12 Design Examples Decomposi5on and Assignment Task and Data Partitioning Grouping of Tasks with high communication etc. «algorithm>» Compute Speed Adjustement Task 2 Task 3 calculate Task 1 Task 4 «entity» DesiredSpeed & CurrentSpeed Partition 1 «algorithm» Compute Speed Adjustement Task 2 calculate Task 1 Task 3 Task 4 «entity» DesiredSpeed & CurrentSpeed Partition 1 Task 5 Task 6 Task 7 Task 8 Partition 2 Task 5 Task 6 Task 7 Task 8 Partition 2 Task 9 Task 10 Partition 3 Task 9 Task 10 Partition 3 outputthrottlevalue outputthrottlevalue
13 Design Examples Orchestra5on calculate Nur Lesend Lesend und Schreibend Datenlokalität <<algorithm>> Compute Speed Adjustement Task 1 Taskgruppe B <<entity>> DesiredSpeed and CurrentSpeed Partition 1 Taskgruppe A Task 2 Task 3 Task 4 Partition 2 Task 5 Task 6 Task 7 Taskgruppe C Task 8 Partition 3 Task 9 Task 10 outputthrottle Value
14 Agenda Mo5va5on So.ware Engineering & Mul5core Think Parallel Models Added Value Tooling Quo Vadis?
15 Use Case AUTOSAR (image from
16 Tool Chain AUTOSAR Modell Tracing Trace- Informa5on OT 1 Voranalysis AUTOSAR Par55oning DDA Deployment Tasks & Scheduling TA Tool Suite
17 DDA- Tool 15. Januar
18 DDA- Tool
19 DDA- Tool
20 DDA- Tool - Par55oning
21 DDA- Tool: Filter and Metrics
22 DDA- Tool: Conflict resolu5on
23 DDA- Tool: Real World Case Study
24 DDA- Tool: Real World Case Study
25 HW/SW Co- Simula5on TA Tool Suite HW/SW Co-Simulation Stimulation / Sampling HW/SW Co-Simulation Application SW Operating System Middleware Processor Event-Trace Evaluation
26 Deployment Approach - Execu5on Runnable Task Mapping Task Core Mapping OS Configuration R1 R3 R2 R4 R11 P(10) R6 R9 R8 R13 R12 P(8) P(5) R5 R10 R7 P(4) Synchronization Placement Core 1 Core 2 Core 3 P(3) Execution Sequence Improvement LM 1 LM 2 LM 3 P(3) R8 R9 R10 Bus / Crossbar P(1) R8 R10 R9 R8 R10 R9 SM Flash R10 R8 R9
27 Overview of 5ming analysis techniques
28 Overview of 5ming analysis techniques Pure model based techniques Simulation based techniques Observation of the real world
29 Agenda Mo5va5on So.ware Engineering & Mul5core Think Parallel Models Added Value Tooling Quo Vadis?
30 Con5nuous Development & Op5miza5on Analysis Design Adapted from AMALTHEA
31 Thank you for your
An Introduction to Parallel Programming
An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe
More informationUPCRC. Illiac. Gigascale System Research Center. Petascale computing. Cloud Computing Testbed (CCT) 2
Illiac UPCRC Petascale computing Gigascale System Research Center Cloud Computing Testbed (CCT) 2 www.parallel.illinois.edu Mul2 Core: All Computers Are Now Parallel We con'nue to have more transistors
More informationHandling Challenges of Multi-Core Technology in Automotive Software Engineering
Model Based Development Tools for Embedded Multi-Core Systems Handling Challenges of Multi-Core Technology in Automotive Software Engineering VECTOR INDIA CONFERENCE 2017 Timing-Architects Embedded Systems
More informationChunking: An Empirical Evalua3on of So7ware Architecture (?)
Chunking: An Empirical Evalua3on of So7ware Architecture (?) Rachana Koneru David M. Weiss Iowa State University weiss@iastate.edu rachana.koneru@gmail.com With participation by Audris Mockus, Jeff St.
More informationAchieving Predictable Multicore Execution of Automotive Applications Using the LET Paradigm
Achieving Predictable Multicore Execution of Automotive Applications Using the LET Paradigm Alessandro Biondi and Marco Di Natale Scuola Superiore Sant Anna, Pisa, Italy Introduction The introduction of
More informationTransac'onal Libraries Alexander Spiegelman *, Guy Golan-Gueta, and Idit Keidar * Technion Yahoo Research
Transac'onal Libraries Alexander Spiegelman *, Guy Golan-Gueta, and Idit Keidar * * Technion Yahoo Research 1 Mul'-Threading is Everywhere 2 Agenda Mo@va@on Concurrent Data Structure Libraries (CDSLs)
More informationThreads. COMP 401 Fall 2017 Lecture 22
Threads COMP 401 Fall 2017 Lecture 22 Threads As a generic term Abstrac>on for program execu>on Current point of execu>on. Call stack. Contents of memory. The fundamental unit of processing that can be
More informationThere is a tempta7on to say it is really used, it must be good
Notes from reviews Dynamo Evalua7on doesn t cover all design goals (e.g. incremental scalability, heterogeneity) Is it research? Complexity? How general? Dynamo Mo7va7on Normal database not the right fit
More informationCS 31: Intro to Systems Threading & Parallel Applications. Kevin Webb Swarthmore College November 27, 2018
CS 31: Intro to Systems Threading & Parallel Applications Kevin Webb Swarthmore College November 27, 2018 Reading Quiz Making Programs Run Faster We all like how fast computers are In the old days (1980
More informationECSE 425 Lecture 1: Course Introduc5on Bre9 H. Meyer
ECSE 425 Lecture 1: Course Introduc5on 2011 Bre9 H. Meyer Staff Instructor: Bre9 H. Meyer, Professor of ECE Email: bre9 dot meyer at mcgill.ca Phone: 514-398- 4210 Office: McConnell 525 OHs: M 14h00-15h00;
More informationSec$on 4: Parallel Algorithms. Michelle Ku8el
Sec$on 4: Parallel Algorithms Michelle Ku8el mku8el@cs.uct.ac.za The DAG, or cost graph A program execu$on using fork and join can be seen as a DAG (directed acyclic graph) Nodes: Pieces of work Edges:
More informationEE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 14 Parallelism in Software I
EE382 (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 14 Parallelism in Software I Mattan Erez The University of Texas at Austin EE382: Parallelilsm and Locality, Spring 2015
More informationExecu&on Templates: Caching Control Plane Decisions for Strong Scaling of Data Analy&cs
Execu&on Templates: Caching Control Plane Decisions for Strong Scaling of Data Analy&cs Omid Mashayekhi Hang Qu Chinmayee Shah Philip Levis July 13, 2017 2 Cloud Frameworks SQL Streaming Machine Learning
More informationAmol Deshpande, University of Maryland Lisa Hellerstein, Polytechnic University, Brooklyn
Amol Deshpande, University of Maryland Lisa Hellerstein, Polytechnic University, Brooklyn Mo>va>on: Parallel Query Processing Increasing parallelism in compu>ng Shared nothing clusters, mul> core technology,
More informationWhy do we care about parallel?
Threads 11/15/16 CS31 teaches you How a computer runs a program. How the hardware performs computations How the compiler translates your code How the operating system connects hardware and software The
More informationWorkloads Programmierung Paralleler und Verteilter Systeme (PPV)
Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Workloads 2 Hardware / software execution environment
More informationEfficient Memory and Bandwidth Management for Industrial Strength Kirchhoff Migra<on
Efficient Memory and Bandwidth Management for Industrial Strength Kirchhoff Migra
More informationHypergraph Sparsifica/on and Its Applica/on to Par//oning
Hypergraph Sparsifica/on and Its Applica/on to Par//oning Mehmet Deveci 1,3, Kamer Kaya 1, Ümit V. Çatalyürek 1,2 1 Dept. of Biomedical Informa/cs, The Ohio State University 2 Dept. of Electrical & Computer
More informationLecture 2: Processes. CSE 120: Principles of Opera9ng Systems. UC San Diego: Summer Session I, 2009 Frank Uyeda
Lecture 2: Processes CSE 120: Principles of Opera9ng Systems UC San Diego: Summer Session I, 2009 Frank Uyeda Announcements PeerWise accounts are now live. First PeerWise ques9ons/reviews due tomorrow
More informationNetwork Coding: Theory and Applica7ons
Network Coding: Theory and Applica7ons PhD Course Part IV Tuesday 9.15-12.15 18.6.213 Muriel Médard (MIT), Frank H. P. Fitzek (AAU), Daniel E. Lucani (AAU), Morten V. Pedersen (AAU) Plan Hello World! Intra
More informationUnderstanding the Interleaving Space Overlap across Inputs and So7ware Versions
Understanding the Interleaving Space Overlap across Inputs and So7ware Versions Dongdong Deng, Wei Zhang, Borui Wang, Peisen Zhao, Shan Lu University of Wisconsin, Madison 1 Concurrency bug detec3on is
More informationHabanero-Java Library: a Java 8 Framework for Multicore Programming
Habanero-Java Library: a Java 8 Framework for Multicore Programming PPPJ 2014 September 25, 2014 Shams Imam, Vivek Sarkar shams@rice.edu, vsarkar@rice.edu Rice University https://wiki.rice.edu/confluence/display/parprog/hj+library
More informationThe Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling
The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling Will Smith and Elizabeth Fehrmann May 23, 2006 Multiple Processor Systems Dr. Muhammad Shaaban Overview Serial Compilers Parallel Compilers
More informationEE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 11 Parallelism in Software II
EE382 (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 11 Parallelism in Software II Mattan Erez The University of Texas at Austin EE382: Parallelilsm and Locality, Fall 2011 --
More informationCSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)
CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable) Past & Present Have looked at two constraints: Mutual exclusion constraint between two events is a requirement that
More informationModule 20: Multi-core Computing Multi-processor Scheduling Lecture 39: Multi-processor Scheduling. The Lecture Contains: User Control.
The Lecture Contains: User Control Reliability Requirements of RT Multi-processor Scheduling Introduction Issues With Multi-processor Computations Granularity Fine Grain Parallelism Design Issues A Possible
More informationMPI & OpenMP Mixed Hybrid Programming
MPI & OpenMP Mixed Hybrid Programming Berk ONAT İTÜ Bilişim Enstitüsü 22 Haziran 2012 Outline Introduc/on Share & Distributed Memory Programming MPI & OpenMP Advantages/Disadvantages MPI vs. OpenMP Why
More informationTowards a Real Time Communica3on Framework for Wireless Sensor Networks
Towards a Real Time Communica3on Framework for Wireless Sensor Networks Chenyang Lu Department of Computer Science and Engineering Applica3on challenges High data rate Low latency Priori;za;on Predictability
More informationScalability in a Real-Time Decision Platform
Scalability in a Real-Time Decision Platform Kenny Shi Manager Software Development ebay Inc. A Typical Fraudulent Lis3ng fraud detec3on architecture sync vs. async applica3on publish messaging bus request
More informationA Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004
A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into
More informationPrinciples of Parallel Algorithm Design: Concurrency and Decomposition
Principles of Parallel Algorithm Design: Concurrency and Decomposition John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 2 12 January 2017 Parallel
More informationParallel Programming Concepts. Parallel Algorithms. Peter Tröger
Parallel Programming Concepts Parallel Algorithms Peter Tröger Sources: Ian Foster. Designing and Building Parallel Programs. Addison-Wesley. 1995. Mattson, Timothy G.; S, Beverly A.; ers,; Massingill,
More informationIntroduc4on to OpenMP and Threaded Libraries Ivan Giro*o
Introduc4on to OpenMP and Threaded Libraries Ivan Giro*o igiro*o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) OUTLINE Shared Memory Architectures
More informationI-1 Introduction. I-0 Introduction. Objectives:
I-0 Introduction Objectives: Explain necessity of parallel/multithreaded algorithms. Describe different forms of parallel processing. Present commonly used architectures. Introduce a few basic terms. Comments:
More informationArchitectures, and Protocol Design Issues for Mobile Social Networks: A Survey
Applica@ons, Architectures, and Protocol Design Issues for Mobile Social Networks: A Survey N. Kayastha,D. Niyato, P. Wang and E. Hossain, Proceedings of the IEEEVol. 99, No. 12, Dec. 2011. Sabita Maharjan
More informationAMDC 2017 Liviona Multi-Core in Automotive Powertrain and Next Steps Towards Parallelization
Bitte decken Sie die schraffierte Fläche mit einem Bild ab. Please cover the shaded area with a picture. (24,4 x 11,0 cm) AMDC 2017 Liviona Multi-Core in Automotive Powertrain and Ralph Mader, 25. April
More informationA Model-based Approach for Conditioning Software to Multi-Core using AUTOSAR
Model Based Development Tools for Embedded Multi-Core Systems A Model-based Approach for Conditioning Software to Multi-Core using AUTOSAR 9 th AUTOSAR Open Conference in Gothenburg Timing-Architects Embedded
More informationSEDA An architecture for Well Condi6oned, scalable Internet Services
SEDA An architecture for Well Condi6oned, scalable Internet Services Ma= Welsh, David Culler, and Eric Brewer University of California, Berkeley Symposium on Operating Systems Principles (SOSP), October
More informationRaceMob: Crowdsourced Data Race Detec,on
RaceMob: Crowdsourced Data Race Detec,on Baris Kasikci, Cris,an Zamfir, and George Candea School of Computer & Communica3on Sciences Data Races to shared memory loca,on By mul3ple threads At least one
More informationCrowdCode: A Platform for Crowd Development
CrowdCode: A Platform for Crowd Development Thomas D. LaToza 1, Eric Chiquillo 1, 2, W. Ben Towne 3, Christian M. Adriano 1, André van der Hoek 1 1 University of California, Irvine 2 Zynga 3 Carnegie Mellon
More informationParallelism Marco Serafini
Parallelism Marco Serafini COMPSCI 590S Lecture 3 Announcements Reviews First paper posted on website Review due by this Wednesday 11 PM (hard deadline) Data Science Career Mixer (save the date!) November
More informationCSC630/COS781: Parallel & Distributed Computing
CSC630/COS781: Parallel & Distributed Computing Algorithm Design Chapter 3 (3.1-3.3) 1 Contents Preliminaries of parallel algorithm design Decomposition Task dependency Task dependency graph Granularity
More informationVirtual Synchrony. Jared Cantwell
Virtual Synchrony Jared Cantwell Review Mul7cast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed file systems Goal Distributed programming is hard What
More informationInstructor: Randy H. Katz hbp://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #16. Warehouse Scale Computer
CS 61C: Great Ideas in Computer Architecture OpenMP Instructor: Randy H. Katz hbp://inst.eecs.berkeley.edu/~cs61c/fa13 10/23/13 Fall 2013 - - Lecture #16 1 New- School Machine Structures (It s a bit more
More informationCombinatorial Mathema/cs and Algorithms at Exascale: Challenges and Promising Direc/ons
Combinatorial Mathema/cs and Algorithms at Exascale: Challenges and Promising Direc/ons Assefaw Gebremedhin Purdue University (Star/ng August 2014, Washington State University School of Electrical Engineering
More informationReusability of So/ware- Defined Networking Applica=ons: A Run=me, Mul=- Controller Approach
Reusability of So/ware- Defined Networking Applica=ons: A Run=me, Mul=- Controller Approach Roberto Doriguzzi Corin (CREATE- NET), Pedro A. Aranda Gu=érrez (Telefonica), Elisa Rojas (Telcaria), Holger
More informationOrigin- des*na*on Flow Measurement in High- Speed Networks
IEEE INFOCOM, 2012 Origin- des*na*on Flow Measurement in High- Speed Networks Tao Li Shigang Chen Yan Qiao Introduc*on (Defini*ons) Origin- des+na+on flow between two routers is the set of packets that
More informationOutline. In Situ Data Triage and Visualiza8on
In Situ Data Triage and Visualiza8on Kwan- Liu Ma University of California at Davis Outline In situ data triage and visualiza8on: Issues and strategies Case study: An earthquake simula8on Case study: A
More informationHuge market -- essentially all high performance databases work this way
11/5/2017 Lecture 16 -- Parallel & Distributed Databases Parallel/distributed databases: goal provide exactly the same API (SQL) and abstractions (relational tables), but partition data across a bunch
More informationShared- Memory Programming in OpenMP Advanced Research Computing
Shared- Memory Programming in OpenMP Advanced Research Computing Outline What is OpenMP? How does OpenMP work? Architecture Fork- join model of parallelism Communica:on OpenMP constructs Direc:ves Run:me
More informationEureka! Task Teams! Kyle Wheeler SC 12 Chapel Lightning Talk SAND: P
GO 08012011 Eureka! Task Teams! Kyle Wheeler SC 12 Chapel Lightning Talk Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary
More informationConfinement (Running Untrusted Programs)
Confinement (Running Untrusted Programs) Chester Rebeiro Indian Institute of Technology Madras Untrusted Programs How to run untrusted programs and not harm your system? Answer: Confinement (some:mes called
More informationB.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2
Introduction :- Today single CPU based architecture is not capable enough for the modern database that are required to handle more demanding and complex requirements of the users, for example, high performance,
More informationHPCSoC Modeling and Simulation Implications
Department Name (View Master > Edit Slide 1) HPCSoC Modeling and Simulation Implications (Sharing three concerns from an academic research user perspective using free, open tools. Solutions left to the
More informationCSE Opera,ng System Principles
CSE 30341 Opera,ng System Principles Lecture 5 Processes / Threads Recap Processes What is a process? What is in a process control bloc? Contrast stac, heap, data, text. What are process states? Which
More informationIntroduction to Parallel Computing
Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen
More informationCOSC 310: So*ware Engineering. Dr. Bowen Hui University of Bri>sh Columbia Okanagan
COSC 310: So*ware Engineering Dr. Bowen Hui University of Bri>sh Columbia Okanagan 1 Admin A2 is up Don t forget to keep doing peer evalua>ons Deadline can be extended but shortens A3 >meframe Labs This
More informationChapter 4: Multithreaded Programming
Chapter 4: Multithreaded Programming Silberschatz, Galvin and Gagne 2013 Chapter 4: Multithreaded Programming Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading
More informationOp#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD
Op#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD Riyaz Haque and David F. Richards This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore
More informationEnhancing Feature Interfaces for Suppor8ng So9ware Product Line Maintenance
Enhancing Feature Interfaces for Suppor8ng So9ware Product Line Maintenance Bruno B. P. Cafeo LES DI PUC- Rio - Brazil OPUS Group Introduc
More informationMap- Reduce. Everything Data CompSci Spring 2014
Map- Reduce Everything Data CompSci 290.01 Spring 2014 2 Announcements (Thu. Feb 27) Homework #8 will be posted by noon tomorrow. Project deadlines: 2/25: Project team formation 3/4: Project Proposal is
More informationThe LOCUS Distributed Operating System
The LOCUS Distributed Operating System Bruce Walker, Gerald Popek, Robert English, Charles Kline and Greg Thiel University of California at Los Angeles 1983 Presented By Quan(Cary) Zhang LOCUS is not just
More informationHeterogeneous Resources Management In Modern Data Centers with Dynamic Workloads Ningfang Mi
Heterogeneous Resources Management In Modern Data Centers with Dynamic Workloads Ningfang Mi Electrical and Computer Engineering Dept. Northeastern University ningfang@ece.neu.edu 1 Research Focus To investigate
More informationThe Migra*on of Safety- Cri*cal RT So6ware to Mul*core. Marco Caccamo University of Illinois at Urbana- Champaign
The Migra*on of Safety- Cri*cal RT So6ware to Mul*core Marco Caccamo University of Illinois at Urbana- Champaign Outline Mo*va*on Memory- centric scheduling theory Background: PRedictable Execu*on Model
More informationA Script- Based Autotuning Compiler System to Generate High- Performance CUDA code
A Script- Based Autotuning Compiler System to Generate High- Performance CUDA code Malik Khan, Protonu Basu, Gabe Rudy, Mary Hall, Chun Chen, Jacqueline Chame Mo:va:on Challenges to programming the GPU
More informationSecurity does not live on UI level T
Security does not live on UI level T-1105220 LECTURE 28032013 Jarmo Parkkinen What would google do? Google 2 step sign in surface Normal website user name + password Verifica9on code SMS or voice 6 digits
More informationECE519 Advanced Operating Systems
IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (10 th Week) (Advanced) Operating Systems 10. Multiprocessor, Multicore and Real-Time Scheduling 10. Outline Multiprocessor
More informationProAc&ve Rou&ng In Scalable Data Centers with PARIS
ProAc&ve Rou&ng In Scalable Data Centers with PARIS Theophilus Benson Duke University Joint work with Dushyant Arora + and Jennifer Rexford* + Arista Networks *Princeton University Data Center Networks
More informationHigh-Level Synthesis Creating Custom Circuits from High-Level Code
High-Level Synthesis Creating Custom Circuits from High-Level Code Hao Zheng Comp Sci & Eng University of South Florida Exis%ng Design Flow Register-transfer (RT) synthesis - Specify RT structure (muxes,
More informationAdministrivia. Talks and other opportunities: Expect HW on functions in ASM (printing binary trees) soon
Threads 2/9/18 Administrivia Talks and other opportunities: Game designer and developer talk: Wed noon, Alumni Hall Room 302 (extra credit!) Networking, resume, interview: Wed 4pm, Alumni Hall Room 219
More informationCLOUD SERVICES. Cloud Value Assessment.
CLOUD SERVICES Cloud Value Assessment www.cloudcomrade.com Comrade a companion who shares one's ac8vi8es or is a fellow member of an organiza8on 2 Today s Agenda! Why Companies Should Consider Moving Business
More informationExploring different level of parallelism Instruction-level parallelism (ILP): how many of the operations/instructions in a computer program can be performed simultaneously 1. e = a + b 2. f = c + d 3.
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #12 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Last class Outline
More informationTrends and Challenges in Multicore Programming
Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores
More informationCS 61C: Great Ideas in Computer Architecture. Synchroniza+on, OpenMP. Senior Lecturer SOE Dan Garcia
CS 61C: Great Ideas in Computer Architecture Synchroniza+on, OpenMP Senior Lecturer SOE Dan Garcia 1 Review of Last Lecture Mul@processor systems uses shared memory (single address space) Cache coherence
More informationComputer Architecture Crash course
Computer Architecture Crash course Frédéric Haziza Department of Computer Systems Uppsala University Summer 2008 Conclusions The multicore era is already here cost of parallelism is dropping
More informationUrb- IoT Developing a RESTful Communica>on Protocol and an Energy Op>miza>on Algorithm for a Connected Sustainable Home
Urb- IoT 2014 Developing a RESTful Communica>on Protocol and an Energy Op>miza>on Algorithm for a Connected Sustainable Home So$rios D. Kotsopoulos, Federico Casalegno, Wesley Graybill, Adrià Recasens
More informationG Programming Languages Spring 2010 Lecture 13. Robert Grimm, New York University
G22.2110-001 Programming Languages Spring 2010 Lecture 13 Robert Grimm, New York University 1 Review Last week Exceptions 2 Outline Concurrency Discussion of Final Sources for today s lecture: PLP, 12
More informationClaude TADONKI. MINES ParisTech PSL Research University Centre de Recherche Informatique
Got 2 seconds Sequential 84 seconds Expected 84/84 = 1 second!?! Got 25 seconds MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Séminaire MATHEMATIQUES
More informationh7ps://bit.ly/citustutorial
Before We Start Setup a Citus Cloud account for the exercises: h7ps://bit.ly/citustutorial Designing a Mul
More informationCS 475. Process = Address space + one thread of control Concurrent program = multiple threads of control
Processes & Threads Concurrent Programs Process = Address space + one thread of control Concurrent program = multiple threads of control Multiple single-threaded processes Multi-threaded process 2 1 Concurrent
More informationIntroduction to Parallel Performance Engineering
Introduction to Parallel Performance Engineering Markus Geimer, Brian Wylie Jülich Supercomputing Centre (with content used with permission from tutorials by Bernd Mohr/JSC and Luiz DeRose/Cray) Performance:
More informationProfiling & Tuning Applica1ons. CUDA Course July István Reguly
Profiling & Tuning Applica1ons CUDA Course July 21-25 István Reguly Introduc1on Why is my applica1on running slow? Work it out on paper Instrument code Profile it NVIDIA Visual Profiler Works with CUDA,
More informationMapReduce, Apache Hadoop
Czech Technical University in Prague, Faculty of Informaon Technology MIE-PDB: Advanced Database Systems hp://www.ksi.mff.cuni.cz/~svoboda/courses/2016-2-mie-pdb/ Lecture 12 MapReduce, Apache Hadoop Marn
More informationLecture #8: Performance or How I Learned to Stop Worrying and Love the Parallelism
Lecture #8: Performance or How I Learned to Stop Worrying and Love the Parallelism Paul Hartke Phartke@stanford.edu Stanford EE183 April 29, 2002 Lab Stuff Lab #1 writeup due TODAY at midnight Keep considering
More informationMany-cores: Supercomputer-on-chip How many? And how? (how not to?)
Many-cores: Supercomputer-on-chip How many? And how? (how not to?) 1 Ran Ginosar Technion Feb 2009 Disclosure and Ack I am co-inventor / co-founder of Plurality Based on 30 years of (on/off) research Presentation
More informationTimers 1 / 46. Jiffies. Potent and Evil Magic
Timers 1 / 46 Jiffies Each timer tick, a variable called jiffies is incremented It is thus (roughly) the number of HZ since system boot A 32-bit counter incremented at 1000 Hz wraps around in about 50
More informationHardware-Software Codesign. 1. Introduction
Hardware-Software Codesign 1. Introduction Lothar Thiele 1-1 Contents What is an Embedded System? Levels of Abstraction in Electronic System Design Typical Design Flow of Hardware-Software Systems 1-2
More informationMapReduce, Apache Hadoop
NDBI040: Big Data Management and NoSQL Databases hp://www.ksi.mff.cuni.cz/ svoboda/courses/2016-1-ndbi040/ Lecture 2 MapReduce, Apache Hadoop Marn Svoboda svoboda@ksi.mff.cuni.cz 11. 10. 2016 Charles University
More informationDD2451 Parallel and Distributed Computing --- FDD3008 Distributed Algorithms
DD2451 Parallel and Distributed Computing --- FDD3008 Distributed Algorithms Lecture 8 Leader Election Mads Dam Autumn/Winter 2011 Previously... Consensus for message passing concurrency Crash failures,
More informationMPI Performance Analysis Trace Analyzer and Collector
MPI Performance Analysis Trace Analyzer and Collector Berk ONAT İTÜ Bilişim Enstitüsü 19 Haziran 2012 Outline MPI Performance Analyzing Defini6ons: Profiling Defini6ons: Tracing Intel Trace Analyzer Lab:
More informationPerformance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis
Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis Elif Dede, Madhusudhan Govindaraju Lavanya Ramakrishnan, Dan Gunter, Shane Canon Department of Computer Science, Binghamton
More informationConcurrency & Parallelism, 10 mi
The Beauty and Joy of Computing Lecture #7 Concurrency Instructor : Sean Morris Quest (first exam) in 5 days!! In this room! Concurrency & Parallelism, 10 mi up Intra-computer Today s lecture Multiple
More informationVon Irrwegen und Zukunftstrends bei Mikrocontrollern Marcus Gößler
Quo vadis, Multicore? Von Irrwegen und Zukunftstrends bei Mikrocontrollern Marcus Gößler Crispy Quotes the "not parallel" era will appear to be a very primitive time in the history of computers when people
More informationDeveloping AUTOSAR Compliant Embedded Software Senior Application Engineer Sang-Ho Yoon
Developing AUTOSAR Compliant Embedded Software Senior Application Engineer Sang-Ho Yoon 2015 The MathWorks, Inc. 1 Agenda AUTOSAR Compliant Code Generation AUTOSAR Workflows Starting from Software Component
More informationWaveScalar. Winter 2006 CSE WaveScalar 1
WaveScalar Dataflow machine good at exploiting ILP dataflow parallelism traditional coarser-grain parallelism cheap thread management memory ordering enforced through wave-ordered memory Winter 2006 CSE
More informationON THE REUSE OF RTL ASSERTIONS IN SYSTEMC TLM VERIFICATION
ON THE REUSE OF RTL ASSERTIONS IN SYSTEMC TLM VERIFICATION Nicola Bombieri 1,2 Franco Fummi 1,2, Graziano Pravadelli 1,2, Valerio Garnieri 1, Francesco Stefanni 1, Tara Ghasempouri 2, Michele Lora 2, Giovanni
More informationOp#mizing MapReduce for Highly- Distributed Environments
Op#mizing MapReduce for Highly- Distributed Environments Abhishek Chandra Associate Professor Department of Computer Science and Engineering University of Minnesota hep://www.cs.umn.edu/~chandra 1 Big
More informationHow to sleep *ght and keep your applica*ons running on IPv6 transi*on. The importance of IPv6 Applica*on Tes*ng
How to sleep *ght and keep your applica*ons running on IPv6 transi*on The importance of IPv6 Applica*on Tes*ng About this presenta*on It presents a generic methodology to test the IPv6 func*onality of
More informationReplicate and Migrate Objects in the Run5me not Cache Lines or Pages in Hardware
Replicate and Migrate Objects in the Run5me not Cache Lines or Pages in Hardware Manolis Katevenis FORTH ICS and Univ. of Crete, Greece BMW October 2010 FORTH Acknowledgements Alex Ramirez Dimitris Nikolopoulos
More information