Islands RTS: a Hybrid Haskell Runtime System for NUMA Machines

Size: px
Start display at page:

Download "Islands RTS: a Hybrid Haskell Runtime System for NUMA Machines"

Transcription

1 Islands RTS: a Hybrid Haskell Runtime System for NUMA Machines Marcin Orczyk University of Glasgow May 13, 2011 Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

2 Paralellism CPU clock rates have plateaued the key to increased performance is scalable parallelism from multicore CPUs, through NUMA and massively parallel hardware (GPUs, FPGAs), to clusters and cloud computing Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

3 NUMA 16-core machine composed of 4 quad-core chips Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

4 NUMA cache coherence is an overhead how high will it be at 20 cores? 100 cores? 500 cores? 16-core machine composed of 4 quad-core chips Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

5 NUMA cache coherence is an overhead how high will it be at 20 cores? 100 cores? 500 cores? 16-core machine composed of 4 quad-core chips non-coherent or partially cache coherent architectures Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

6 Islands RTS Haskell runtime system for partially cache coherent machines take advantage of shared memory/coherent cache when available Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

7 Islands RTS Haskell runtime system for partially cache coherent machines take advantage of shared memory/coherent cache when available Islands RTS is a blend of two existing parallel Haskell implementations: GHC: shared memory runtime system with support for parallel evaluation GUM: distributed runtime system based on message passing Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

8 Island island: a set of processing cores sharing a cache coherent memory system is composed of a number of islands Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

9 Island island: a set of processing cores sharing a cache coherent memory system is composed of a number of islands system with 4 cache-coherent islands, each with 4 processing elements Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

10 Island island: a set of processing cores sharing a cache coherent memory system is composed of a number of islands system with 4 cache-coherent islands, each with 4 processing elements one real heap per island virtual shared heap between islands Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

11 Virtual Shared Heap following GUM, Islands RTS provides virtual shared heap closures can transparently reside on different islands Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

12 Virtual Shared Heap following GUM, Islands RTS provides virtual shared heap closures can transparently reside on different islands consider evaluation of the expression (case x of...), where x exists on a different island Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

13 Virtual Shared Heap following GUM, Islands RTS provides virtual shared heap closures can transparently reside on different islands consider evaluation of the expression (case x of...), where x exists on a different island case x of... x Heap of Island A fetch Heap of Island B Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

14 Virtual Shared Heap following GUM, Islands RTS provides virtual shared heap closures can transparently reside on different islands consider evaluation of the expression (case x of...), where x exists on a different island case x of... x Heap of Island A fetch Heap of Island B packing/unpacking closures global addresses message passing layer and protocol stub closures Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

15 Global Addresses triples (,, ) Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

16 Global Addresses triples (island,, ) Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

17 Global Addresses triples (island, slot, ) Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

18 Global Addresses triples (island, slot, weight) weighted reference counting Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

19 Global Addresses triples (island, slot, weight) weighted reference counting prevent garbage collection slots are reused similar to stable pointers Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

20 Global Addresses triples (island, slot, weight) weighted reference counting prevent garbage collection slots are reused similar to stable pointers Islands RTS calls them hard links they are quite heavyweight and the guarantees they provide are often unnecessary e.g. recognising duplicates, speculative evaluation Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

21 Global Addresses Soft Links analogous to weak pointers triples (island, slot, slot2) do not prevent garbage collection slots never reused Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

22 Global Addresses Soft Links analogous to weak pointers triples (island, slot, slot2) do not prevent garbage collection slots never reused GUM used only hard links reasons to distinguish between hard and soft links: clarifies implementation potentially improves performance Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

23 Message Passing Meassages virtual shared heap FETCH(ga-from, ga-to) UPDATE(ga, data) FREE(ga) work distribution FISH SPARK(data) startup, shutdown messages Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

24 Within Island based on GHC 7 multiple islands in the same process changes: eliminating global data structures scheduler loop: hooks for message handling garbage collector: hooks for global addresses new closures Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

25 Static Thunks compiler allocates certain closures statically some of them are thunks Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

26 Static Thunks compiler allocates certain closures statically some of them are thunks closure on island B refers to a static thunk STATIC static thunk closure Heap of Island A Heap of Island B Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

27 Static Thunks compiler allocates certain closures statically some of them are thunks closure on island B refers to a static thunk island A evaluates the static thunk STATIC static ind thunk closure Heap of Island A Heap of Island B Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

28 Static Thunks compiler allocates certain closures statically some of them are thunks closure on island B refers to a static thunk island A evaluates the static thunk island B evaluates thunk in A s heap STATIC static ind ind closure closure Heap of Island A Heap of Island B Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

29 Static Thunks - Solution a layer of indirection Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

30 Static Thunks - Solution a layer of indirection closure on island B refers to a static thunk STATIC static thunk closure Heap of Island A Heap of Island B Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

31 Static Thunks - Solution a layer of indirection closure on island B refers to a static thunk island A evaluates the static thunk STATIC IndIsland A B NULL static ind thunk closure Heap of Island A Heap of Island B Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

32 Static Thunks - Solution a layer of indirection closure on island B refers to a static thunk island A evaluates the static thunk island B accesses the static thunk STATIC IndIsland A B static ind closure thunk Heap of Island A FetchMv Heap of Island B Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

33 Static Thunks - Solution overhead on evaluation and access memory overhead proportional to the number of local islands additional, nontrivial complexity suggestions for solving this problem are welcomed Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

34 Summary parallel hardware becomes hierarchical Islands RTS matches it with the hierarchical architecture of the runtime itself within a cache coherent island - shared memory graph reduction between the islands - virtual heap based on message passing enables exploiting most appropriate mechanisms at each level future directions non-coherent shared memory port to Barrelfish remote islands heterogenous islands Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

35 Questions? Questions? Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

36 Uniqueness of Soft Links we can have tons! Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to

More information

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture II Benny Thörnberg Associate Professor in Electronics Parallelism Microscopic vs Macroscopic Microscopic parallelism hardware solutions inside system components providing

More information

StackVsHeap SPL/2010 SPL/20

StackVsHeap SPL/2010 SPL/20 StackVsHeap Objectives Memory management central shared resource in multiprocessing RTE memory models that are used in Java and C++ services for Java/C++ programmer from RTE (JVM / OS). Perspectives of

More information

Heterogeneous SoCs. May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1

Heterogeneous SoCs. May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1 COSCOⅣ Heterogeneous SoCs M5171111 HASEGAWA TORU M5171112 IDONUMA TOSHIICHI May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1 Contents Background Heterogeneous technology May 28, 2014 COMPUTER SYSTEM COLLOQUIUM

More information

Lecture 23 Database System Architectures

Lecture 23 Database System Architectures CMSC 461, Database Management Systems Spring 2018 Lecture 23 Database System Architectures These slides are based on Database System Concepts 6 th edition book (whereas some quotes and figures are used

More information

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley Managed runtimes & garbage collection CSE 6341 Some slides by Kathryn McKinley 1 Managed runtimes Advantages? Disadvantages? 2 Managed runtimes Advantages? Reliability Security Portability Performance?

More information

Chapter 3: Naming Page 38. Clients in most cases find the Jini lookup services in their scope by IP

Chapter 3: Naming Page 38. Clients in most cases find the Jini lookup services in their scope by IP Discovery Services - Jini Discovery services require more than search facilities: Discovery Clients in most cases find the Jini lookup services in their scope by IP multicast/broadcast Multicast UDP for

More information

Run-time Environments -Part 1

Run-time Environments -Part 1 Run-time Environments -Part 1 Y.N. Srikant Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Compiler Design Outline of the Lecture Part 1 What is run-time support?

More information

LANGUAGE RUNTIME NON-VOLATILE RAM AWARE SWAPPING

LANGUAGE RUNTIME NON-VOLATILE RAM AWARE SWAPPING Technical Disclosure Commons Defensive Publications Series July 03, 2017 LANGUAGE RUNTIME NON-VOLATILE AWARE SWAPPING Follow this and additional works at: http://www.tdcommons.org/dpubs_series Recommended

More information

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution Datorarkitektur Fö 2-1 Datorarkitektur Fö 2-2 Components of the Memory System The Memory System 1. Components of the Memory System Main : fast, random access, expensive, located close (but not inside)

More information

Managed runtimes & garbage collection

Managed runtimes & garbage collection Managed runtimes Advantages? Managed runtimes & garbage collection CSE 631 Some slides by Kathryn McKinley Disadvantages? 1 2 Managed runtimes Portability (& performance) Advantages? Reliability Security

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Naming Jia Rao http://ranger.uta.edu/~jrao/ 1 Naming Names play a critical role in all computer systems To access resources, uniquely identify entities, or refer to locations

More information

6.1 Multiprocessor Computing Environment

6.1 Multiprocessor Computing Environment 6 Parallel Computing 6.1 Multiprocessor Computing Environment The high-performance computing environment used in this book for optimization of very large building structures is the Origin 2000 multiprocessor,

More information

! Readings! ! Room-level, on-chip! vs.!

! Readings! ! Room-level, on-chip! vs.! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads

More information

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu

More information

CSE 5306 Distributed Systems. Naming

CSE 5306 Distributed Systems. Naming CSE 5306 Distributed Systems Naming 1 Naming Names play a critical role in all computer systems To access resources, uniquely identify entities, or refer to locations To access an entity, you have resolve

More information

Chapter 20: Database System Architectures

Chapter 20: Database System Architectures Chapter 20: Database System Architectures Chapter 20: Database System Architectures Centralized and Client-Server Systems Server System Architectures Parallel Systems Distributed Systems Network Types

More information

Chapter 7. Multicores, Multiprocessors, and Clusters. Goal: connecting multiple computers to get higher performance

Chapter 7. Multicores, Multiprocessors, and Clusters. Goal: connecting multiple computers to get higher performance Chapter 7 Multicores, Multiprocessors, and Clusters Introduction Goal: connecting multiple computers to get higher performance Multiprocessors Scalability, availability, power efficiency Job-level (process-level)

More information

The Bifrost GPU architecture and the ARM Mali-G71 GPU

The Bifrost GPU architecture and the ARM Mali-G71 GPU The Bifrost GPU architecture and the ARM Mali-G71 GPU Jem Davies ARM Fellow and VP of Technology Hot Chips 28 Aug 2016 Introduction to ARM Soft IP ARM licenses Soft IP cores (amongst other things) to our

More information

Computer-System Organization (cont.)

Computer-System Organization (cont.) Computer-System Organization (cont.) Interrupt time line for a single process doing output. Interrupts are an important part of a computer architecture. Each computer design has its own interrupt mechanism,

More information

Field Analysis. Last time Exploit encapsulation to improve memory system performance

Field Analysis. Last time Exploit encapsulation to improve memory system performance Field Analysis Last time Exploit encapsulation to improve memory system performance This time Exploit encapsulation to simplify analysis Two uses of field analysis Escape analysis Object inlining April

More information

SCALABILITY AND HETEROGENEITY MICHAEL ROITZSCH

SCALABILITY AND HETEROGENEITY MICHAEL ROITZSCH Faculty of Computer Science Institute of Systems Architecture, Operating Systems Group SCALABILITY AND HETEROGENEITY MICHAEL ROITZSCH LAYER CAKE Application Runtime OS Kernel ISA Physical RAM 2 COMMODITY

More information

CHAPTER 3 RESOURCE MANAGEMENT

CHAPTER 3 RESOURCE MANAGEMENT CHAPTER 3 RESOURCE MANAGEMENT SUBTOPIC Understand Memory Management Understand Processor Management INTRODUCTION Memory management is the act of managing computer memory. This involves providing ways to

More information

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1 Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip

More information

A program execution is memory safe so long as memory access errors never occur:

A program execution is memory safe so long as memory access errors never occur: A program execution is memory safe so long as memory access errors never occur: Buffer overflows, null pointer dereference, use after free, use of uninitialized memory, illegal free Memory safety categories

More information

Parallelism with Haskell

Parallelism with Haskell Parallelism with Haskell T-106.6200 Special course High-Performance Embedded Computing (HPEC) Autumn 2013 (I-II), Lecture 5 Vesa Hirvisalo & Kevin Hammond (Univ. of St. Andrews) 2013-10-08 V. Hirvisalo

More information

Exploration of Cache Coherent CPU- FPGA Heterogeneous System

Exploration of Cache Coherent CPU- FPGA Heterogeneous System Exploration of Cache Coherent CPU- FPGA Heterogeneous System Wei Zhang Department of Electronic and Computer Engineering Hong Kong University of Science and Technology 1 Outline ointroduction to FPGA-based

More information

AN APPROACH TO IMPROVING THE SCALABILITY OF PARALLEL HASKELL PROGRAMS

AN APPROACH TO IMPROVING THE SCALABILITY OF PARALLEL HASKELL PROGRAMS AN APPROACH TO IMPROVING THE SCALABILITY OF PARALLEL HASKELL PROGRAMS 1 HWAMOK KIM, 1 JOOWON BYUN, 2 SUGWOO BYUN, 3 GYUN WOO 1,3 Dept. of Electrical and Computer Engineering, Pusan National University,

More information

The 7 deadly sins of cloud computing [2] Cloud-scale resource management [1]

The 7 deadly sins of cloud computing [2] Cloud-scale resource management [1] The 7 deadly sins of [2] Cloud-scale resource management [1] University of California, Santa Cruz May 20, 2013 1 / 14 Deadly sins of of sin (n.) - common simplification or shortcut employed by ers; may

More information

Overview of research activities Toward portability of performance

Overview of research activities Toward portability of performance Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into

More information

Dynamic analysis in the Reduceron. Matthew Naylor and Colin Runciman University of York

Dynamic analysis in the Reduceron. Matthew Naylor and Colin Runciman University of York Dynamic analysis in the Reduceron Matthew Naylor and Colin Runciman University of York A question I wonder how popular Haskell needs to become for Intel to optimize their processors for my runtime, rather

More information

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and

More information

Hera-JVM: Abstracting Processor Heterogeneity Behind a Virtual Machine

Hera-JVM: Abstracting Processor Heterogeneity Behind a Virtual Machine Hera-JVM: Abstracting Processor Heterogeneity Behind a Virtual Machine Ross McIlroy and Joe Sventek Department of Computing Science University of Glasgow {ross,joe}@dcs.gla.ac.uk Abstract Heterogeneous

More information

Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME)

Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME) Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME) Pavel Petroshenko, Sun Microsystems, Inc. Ashmi Bhanushali, NVIDIA Corporation Jerry Evans, Sun Microsystems, Inc. Nandini

More information

Last week, David Terei lectured about the compilation pipeline which is responsible for producing the executable binaries of the Haskell code you

Last week, David Terei lectured about the compilation pipeline which is responsible for producing the executable binaries of the Haskell code you Last week, David Terei lectured about the compilation pipeline which is responsible for producing the executable binaries of the Haskell code you actually want to run. Today, we are going to look at an

More information

GPU Fundamentals Jeff Larkin November 14, 2016

GPU Fundamentals Jeff Larkin November 14, 2016 GPU Fundamentals Jeff Larkin , November 4, 206 Who Am I? 2002 B.S. Computer Science Furman University 2005 M.S. Computer Science UT Knoxville 2002 Graduate Teaching Assistant 2005 Graduate

More information

Lecture 9: MIMD Architecture

Lecture 9: MIMD Architecture Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is

More information

Runtime Support for Algol-Like Languages Comp 412

Runtime Support for Algol-Like Languages Comp 412 COMP 412 FALL 2018 Runtime Support for Algol-Like Languages Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students

More information

Lecture Notes on Garbage Collection

Lecture Notes on Garbage Collection Lecture Notes on Garbage Collection 15-411: Compiler Design André Platzer Lecture 20 1 Introduction In the previous lectures we have considered a programming language C0 with pointers and memory and array

More information

Multiple Choice Type Questions

Multiple Choice Type Questions Techno India Batanagar Computer Science and Engineering Model Questions Subject Name: Computer Architecture Subject Code: CS 403 Multiple Choice Type Questions 1. SIMD represents an organization that.

More information

Operating Systems Principles. Segmentation & Shared Memory

Operating Systems Principles. Segmentation & Shared Memory 2 Operating Systems Principles Segmentation & Memory Steve Goddard goddard@cse.unl.edu http://www.cse.unl.edu/~goddard/courses/csce451 1 Memory Management Name spaces Schemes to date have considered only

More information

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester : III/VI Section : CSE-1 & CSE-2 Subject Code : CS2354 Subject Name : Advanced Computer Architecture Degree & Branch : B.E C.S.E. UNIT-1 1.

More information

Shenandoah: Theory and Practice. Christine Flood Roman Kennke Principal Software Engineers Red Hat

Shenandoah: Theory and Practice. Christine Flood Roman Kennke Principal Software Engineers Red Hat Shenandoah: Theory and Practice Christine Flood Roman Kennke Principal Software Engineers Red Hat 1 Shenandoah Christine Flood Roman Kennke Principal Software Engineers Red Hat 2 Shenandoah Why do we need

More information

Distributed Systems. The main method of distributed object communication is with remote method invocation

Distributed Systems. The main method of distributed object communication is with remote method invocation Distributed Systems Unit III Syllabus:Distributed Objects and Remote Invocation: Introduction, Communication between Distributed Objects- Object Model, Distributed Object Modal, Design Issues for RMI,

More information

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 3 Processes

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 3 Processes DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 3 Processes Context Switching Processor context: The minimal collection of values stored in the

More information

Designing Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services. Presented by: Jitong Chen

Designing Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services. Presented by: Jitong Chen Designing Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services Presented by: Jitong Chen Outline Architecture of Web-based Data Center Three-Stage framework to benefit

More information

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. CS 320 Ch. 18 Multicore Computers Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. Definitions: Hyper-threading Intel's proprietary simultaneous

More information

«Real Time Embedded systems» Multi Masters Systems

«Real Time Embedded systems» Multi Masters Systems «Real Time Embedded systems» Multi Masters Systems rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL Chargé de cours rene.beuchat@hesge.ch LSN/hepia Prof. HES 1 Multi Master on Chip On a System On Chip, Master can

More information

Coping with Immutable Data in a JVM for Embedded Real-Time Systems. Christoph Erhardt, Simon Kuhnle, Isabella Stilkerich, Wolfgang Schröder-Preikschat

Coping with Immutable Data in a JVM for Embedded Real-Time Systems. Christoph Erhardt, Simon Kuhnle, Isabella Stilkerich, Wolfgang Schröder-Preikschat The final Frontier Coping with Immutable Data in a JVM for Embedded Real-Time Systems Christoph Erhardt, Simon Kuhnle, Isabella Stilkerich, Wolfgang Schröder-Preikschat https://www4.cs.fau.de/research/keso/

More information

Garbage Collection. Akim D le, Etienne Renault, Roland Levillain. May 15, CCMP2 Garbage Collection May 15, / 35

Garbage Collection. Akim D le, Etienne Renault, Roland Levillain. May 15, CCMP2 Garbage Collection May 15, / 35 Garbage Collection Akim Demaille, Etienne Renault, Roland Levillain May 15, 2017 CCMP2 Garbage Collection May 15, 2017 1 / 35 Table of contents 1 Motivations and Definitions 2 Reference Counting Garbage

More information

Distributed Dense Linear Algebra on Heterogeneous Architectures. George Bosilca

Distributed Dense Linear Algebra on Heterogeneous Architectures. George Bosilca Distributed Dense Linear Algebra on Heterogeneous Architectures George Bosilca bosilca@eecs.utk.edu Centraro, Italy June 2010 Factors that Necessitate to Redesign of Our Software» Steepness of the ascent

More information

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Status Quo Previously, CPU vendors

More information

A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks. Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo

A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks. Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo 1 June 4, 2011 2 Outline Introduction System Architecture A Multi-Chipped

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36

More information

TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT

TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT Eric Kelmelis 28 March 2018 OVERVIEW BACKGROUND Evolution of processing hardware CROSS-PLATFORM KERNEL DEVELOPMENT Write once, target multiple hardware

More information

FPGA & Hybrid Systems in the Enterprise Drivers, Exemplars and Challenges

FPGA & Hybrid Systems in the Enterprise Drivers, Exemplars and Challenges Bob Blainey IBM Software Group 27 Feb 2011 FPGA & Hybrid Systems in the Enterprise Drivers, Exemplars and Challenges Workshop on The Role of FPGAs in a Converged Future with Heterogeneous Programmable

More information

Towards Parallel, Scalable VM Services

Towards Parallel, Scalable VM Services Towards Parallel, Scalable VM Services Kathryn S McKinley The University of Texas at Austin Kathryn McKinley Towards Parallel, Scalable VM Services 1 20 th Century Simplistic Hardware View Faster Processors

More information

Towards a codelet-based runtime for exascale computing. Chris Lauderdale ET International, Inc.

Towards a codelet-based runtime for exascale computing. Chris Lauderdale ET International, Inc. Towards a codelet-based runtime for exascale computing Chris Lauderdale ET International, Inc. What will be covered Slide 2 of 24 Problems & motivation Codelet runtime overview Codelets & complexes Dealing

More information

The Use of Cloud Computing Resources in an HPC Environment

The Use of Cloud Computing Resources in an HPC Environment The Use of Cloud Computing Resources in an HPC Environment Bill, Labate, UCLA Office of Information Technology Prakashan Korambath, UCLA Institute for Digital Research & Education Cloud computing becomes

More information

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering

More information

CUDA GPGPU Workshop 2012

CUDA GPGPU Workshop 2012 CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline

More information

Speculations about Computer Architecture in Next Three Years. Jan. 20, 2018

Speculations about Computer Architecture in Next Three Years. Jan. 20, 2018 Speculations about Computer Architecture in Next Three Years shuchang.zhou@gmail.com Jan. 20, 2018 About me https://zsc.github.io/ Source-to-source transformation Cache simulation Compiler Optimization

More information

High Performance Computing (HPC) Introduction

High Performance Computing (HPC) Introduction High Performance Computing (HPC) Introduction Ontario Summer School on High Performance Computing Scott Northrup SciNet HPC Consortium Compute Canada June 25th, 2012 Outline 1 HPC Overview 2 Parallel Computing

More information

The Z Garbage Collector An Introduction

The Z Garbage Collector An Introduction The Z Garbage Collector An Introduction Per Lidén & Stefan Karlsson HotSpot Garbage Collection Team FOSDEM 2018 Safe Harbor Statement The following is intended to outline our general product direction.

More information

Lecture 4 Naming. Prof. Wilson Rivera. University of Puerto Rico at Mayaguez Electrical and Computer Engineering Department

Lecture 4 Naming. Prof. Wilson Rivera. University of Puerto Rico at Mayaguez Electrical and Computer Engineering Department Lecture 4 Naming Prof. Wilson Rivera University of Puerto Rico at Mayaguez Electrical and Computer Engineering Department Outline Names, identifiers, addresses Flat naming Structured naming Attribute based

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

JOP: A Java Optimized Processor for Embedded Real-Time Systems. Martin Schoeberl

JOP: A Java Optimized Processor for Embedded Real-Time Systems. Martin Schoeberl JOP: A Java Optimized Processor for Embedded Real-Time Systems Martin Schoeberl JOP Research Targets Java processor Time-predictable architecture Small design Working solution (FPGA) JOP Overview 2 Overview

More information

Pointer Analysis in the Presence of Dynamic Class Loading. Hind Presented by Brian Russell

Pointer Analysis in the Presence of Dynamic Class Loading. Hind Presented by Brian Russell Pointer Analysis in the Presence of Dynamic Class Loading Martin Hirzel, Amer Diwan and Michael Hind Presented by Brian Russell Claim: First nontrivial pointer analysis dealing with all Java language features

More information

Barrelfish Project ETH Zurich. Message Notifications

Barrelfish Project ETH Zurich. Message Notifications Barrelfish Project ETH Zurich Message Notifications Barrelfish Technical Note 9 Barrelfish project 16.06.2010 Systems Group Department of Computer Science ETH Zurich CAB F.79, Universitätstrasse 6, Zurich

More information

Parallel & Concurrent Haskell 1: Basic Pure Parallelism. Simon Marlow (Microsoft Research, Cambridge, UK)

Parallel & Concurrent Haskell 1: Basic Pure Parallelism. Simon Marlow (Microsoft Research, Cambridge, UK) Parallel & Concurrent Haskell 1: Basic Pure Parallelism Simon Marlow (Microsoft Research, Cambridge, UK) Background The parallelism landscape is getting crowded Every language has their own solution(s)

More information

FPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB.

FPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB. Agenda The topics that will be addressed are: Scheduling tasks on Reconfigurable FPGA architectures Mauro Marinoni ReTiS Lab, TeCIP Institute Scuola superiore Sant Anna - Pisa Overview on basic characteristics

More information

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Lecture 3: Processes Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Process in General 3.3 Process Concept Process is an active program in execution; process

More information

The Z Garbage Collector Low Latency GC for OpenJDK

The Z Garbage Collector Low Latency GC for OpenJDK The Z Garbage Collector Low Latency GC for OpenJDK Per Lidén & Stefan Karlsson HotSpot Garbage Collection Team Jfokus VM Tech Summit 2018 Safe Harbor Statement The following is intended to outline our

More information

Architecture-Conscious Database Systems

Architecture-Conscious Database Systems Architecture-Conscious Database Systems 2009 VLDB Summer School Shanghai Peter Boncz (CWI) Sources Thank You! l l l l Database Architectures for New Hardware VLDB 2004 tutorial, Anastassia Ailamaki Query

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

殷亚凤. Processes. Distributed Systems [3]

殷亚凤. Processes. Distributed Systems [3] Processes Distributed Systems [3] 殷亚凤 Email: yafeng@nju.edu.cn Homepage: http://cs.nju.edu.cn/yafeng/ Room 301, Building of Computer Science and Technology Review Architectural Styles: Layered style, Object-based,

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers William Stallings Computer Organization and Architecture 8 th Edition Chapter 18 Multicore Computers Hardware Performance Issues Microprocessors have seen an exponential increase in performance Improved

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

Tesla GPU Computing A Revolution in High Performance Computing

Tesla GPU Computing A Revolution in High Performance Computing Tesla GPU Computing A Revolution in High Performance Computing Mark Harris, NVIDIA Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction to Tesla CUDA Architecture Programming & Memory

More information

the gamedesigninitiative at cornell university Lecture 9 Memory Management

the gamedesigninitiative at cornell university Lecture 9 Memory Management Lecture 9 Gaming Memory Constraints Redux Wii-U Playstation 4 2GB of RAM 1GB dedicated to OS Shared with GPGPU 8GB of RAM Shared GPU, 8-core CPU OS footprint unknown 2 Two Main Concerns with Memory Getting

More information

Network-on-Chip Architecture

Network-on-Chip Architecture Multiple Processor Systems(CMPE-655) Network-on-Chip Architecture Performance aspect and Firefly network architecture By Siva Shankar Chandrasekaran and SreeGowri Shankar Agenda (Enhancing performance)

More information

Virtual Machines. 2 Disco: Running Commodity Operating Systems on Scalable Multiprocessors([1])

Virtual Machines. 2 Disco: Running Commodity Operating Systems on Scalable Multiprocessors([1]) EE392C: Advanced Topics in Computer Architecture Lecture #10 Polymorphic Processors Stanford University Thursday, 8 May 2003 Virtual Machines Lecture #10: Thursday, 1 May 2003 Lecturer: Jayanth Gummaraju,

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

Programming Models for Supercomputing in the Era of Multicore

Programming Models for Supercomputing in the Era of Multicore Programming Models for Supercomputing in the Era of Multicore Marc Snir MULTI-CORE CHALLENGES 1 Moore s Law Reinterpreted Number of cores per chip doubles every two years, while clock speed decreases Need

More information

The benefits and costs of writing a POSIX kernel in a high-level language

The benefits and costs of writing a POSIX kernel in a high-level language 1 / 38 The benefits and costs of writing a POSIX kernel in a high-level language Cody Cutler, M. Frans Kaashoek, Robert T. Morris MIT CSAIL Should we use high-level languages to build OS kernels? 2 / 38

More information

Urmas Tamm Tartu 2013

Urmas Tamm Tartu 2013 The implementation and optimization of functional languages Urmas Tamm Tartu 2013 Intro The implementation of functional programming languages, Simon L. Peyton Jones, 1987 Miranda fully lazy strict a predecessor

More information

Pinpointing Data Locality Problems Using Data-centric Analysis

Pinpointing Data Locality Problems Using Data-centric Analysis Center for Scalable Application Development Software Pinpointing Data Locality Problems Using Data-centric Analysis Xu Liu XL10@rice.edu Department of Computer Science Rice University Outline Introduction

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

Computer Science 146. Computer Architecture

Computer Science 146. Computer Architecture Computer Architecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 18: Virtual Memory Lecture Outline Review of Main Memory Virtual Memory Simple Interleaving Cycle

More information

Superscalar Machines. Characteristics of superscalar processors

Superscalar Machines. Characteristics of superscalar processors Superscalar Machines Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any performance

More information

COMP 322: Principles of Parallel Programming. Lecture 17: Understanding Parallel Computers (Chapter 2) Fall 2009

COMP 322: Principles of Parallel Programming. Lecture 17: Understanding Parallel Computers (Chapter 2) Fall 2009 COMP 322: Principles of Parallel Programming Lecture 17: Understanding Parallel Computers (Chapter 2) Fall 2009 http://www.cs.rice.edu/~vsarkar/comp322 Vivek Sarkar Department of Computer Science Rice

More information

Compiling Parallel Algorithms to Memory Systems

Compiling Parallel Algorithms to Memory Systems Compiling Parallel Algorithms to Memory Systems Stephen A. Edwards Columbia University Presented at Jane Street, April 16, 2012 (λx.?)f = FPGA Parallelism is the Big Question Massive On-Chip Parallelism

More information

The Multikernel: A new OS architecture for scalable multicore systems Baumann et al. Presentation: Mark Smith

The Multikernel: A new OS architecture for scalable multicore systems Baumann et al. Presentation: Mark Smith The Multikernel: A new OS architecture for scalable multicore systems Baumann et al. Presentation: Mark Smith Review Introduction Optimizing the OS based on hardware Processor changes Shared Memory vs

More information

PRACE Autumn School Basic Programming Models

PRACE Autumn School Basic Programming Models PRACE Autumn School 2010 Basic Programming Models Basic Programming Models - Outline Introduction Key concepts Architectures Programming models Programming languages Compilers Operating system & libraries

More information

Parallel and High Performance Computing CSE 745

Parallel and High Performance Computing CSE 745 Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel

More information

High Performance Managed Languages. Martin Thompson

High Performance Managed Languages. Martin Thompson High Performance Managed Languages Martin Thompson - @mjpt777 Really, what is your preferred platform for building HFT applications? Why do you build low-latency applications on a GC ed platform? Agenda

More information

G Virtual Memory. Robert Grimm New York University

G Virtual Memory. Robert Grimm New York University G22.3250-001 Virtual Memory Robert Grimm New York University Altogether Now: The Three Questions! What is the problem?! What is new or different?! What are the contributions and limitations? VAX-11 Memory

More information

CITS3211 FUNCTIONAL PROGRAMMING. 14. Graph reduction

CITS3211 FUNCTIONAL PROGRAMMING. 14. Graph reduction CITS3211 FUNCTIONAL PROGRAMMING 14. Graph reduction Summary: This lecture discusses graph reduction, which is the basis of the most common compilation technique for lazy functional languages. CITS3211

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 15

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 15 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2015 Lecture 15 LAST TIME! Discussed concepts of locality and stride Spatial locality: programs tend to access values near values they have already accessed

More information

Towards Lean 4: Sebastian Ullrich 1, Leonardo de Moura 2.

Towards Lean 4: Sebastian Ullrich 1, Leonardo de Moura 2. Towards Lean 4: Sebastian Ullrich 1, Leonardo de Moura 2 1 Karlsruhe Institute of Technology, Germany 2 Microsoft Research, USA 1 2018/12/12 Ullrich, de Moura - Towards Lean 4: KIT The Research An University

More information