Islands RTS: a Hybrid Haskell Runtime System for NUMA Machines
|
|
- Felix Elliott
- 5 years ago
- Views:
Transcription
1 Islands RTS: a Hybrid Haskell Runtime System for NUMA Machines Marcin Orczyk University of Glasgow May 13, 2011 Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
2 Paralellism CPU clock rates have plateaued the key to increased performance is scalable parallelism from multicore CPUs, through NUMA and massively parallel hardware (GPUs, FPGAs), to clusters and cloud computing Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
3 NUMA 16-core machine composed of 4 quad-core chips Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
4 NUMA cache coherence is an overhead how high will it be at 20 cores? 100 cores? 500 cores? 16-core machine composed of 4 quad-core chips Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
5 NUMA cache coherence is an overhead how high will it be at 20 cores? 100 cores? 500 cores? 16-core machine composed of 4 quad-core chips non-coherent or partially cache coherent architectures Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
6 Islands RTS Haskell runtime system for partially cache coherent machines take advantage of shared memory/coherent cache when available Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
7 Islands RTS Haskell runtime system for partially cache coherent machines take advantage of shared memory/coherent cache when available Islands RTS is a blend of two existing parallel Haskell implementations: GHC: shared memory runtime system with support for parallel evaluation GUM: distributed runtime system based on message passing Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
8 Island island: a set of processing cores sharing a cache coherent memory system is composed of a number of islands Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
9 Island island: a set of processing cores sharing a cache coherent memory system is composed of a number of islands system with 4 cache-coherent islands, each with 4 processing elements Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
10 Island island: a set of processing cores sharing a cache coherent memory system is composed of a number of islands system with 4 cache-coherent islands, each with 4 processing elements one real heap per island virtual shared heap between islands Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
11 Virtual Shared Heap following GUM, Islands RTS provides virtual shared heap closures can transparently reside on different islands Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
12 Virtual Shared Heap following GUM, Islands RTS provides virtual shared heap closures can transparently reside on different islands consider evaluation of the expression (case x of...), where x exists on a different island Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
13 Virtual Shared Heap following GUM, Islands RTS provides virtual shared heap closures can transparently reside on different islands consider evaluation of the expression (case x of...), where x exists on a different island case x of... x Heap of Island A fetch Heap of Island B Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
14 Virtual Shared Heap following GUM, Islands RTS provides virtual shared heap closures can transparently reside on different islands consider evaluation of the expression (case x of...), where x exists on a different island case x of... x Heap of Island A fetch Heap of Island B packing/unpacking closures global addresses message passing layer and protocol stub closures Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
15 Global Addresses triples (,, ) Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
16 Global Addresses triples (island,, ) Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
17 Global Addresses triples (island, slot, ) Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
18 Global Addresses triples (island, slot, weight) weighted reference counting Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
19 Global Addresses triples (island, slot, weight) weighted reference counting prevent garbage collection slots are reused similar to stable pointers Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
20 Global Addresses triples (island, slot, weight) weighted reference counting prevent garbage collection slots are reused similar to stable pointers Islands RTS calls them hard links they are quite heavyweight and the guarantees they provide are often unnecessary e.g. recognising duplicates, speculative evaluation Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
21 Global Addresses Soft Links analogous to weak pointers triples (island, slot, slot2) do not prevent garbage collection slots never reused Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
22 Global Addresses Soft Links analogous to weak pointers triples (island, slot, slot2) do not prevent garbage collection slots never reused GUM used only hard links reasons to distinguish between hard and soft links: clarifies implementation potentially improves performance Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
23 Message Passing Meassages virtual shared heap FETCH(ga-from, ga-to) UPDATE(ga, data) FREE(ga) work distribution FISH SPARK(data) startup, shutdown messages Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
24 Within Island based on GHC 7 multiple islands in the same process changes: eliminating global data structures scheduler loop: hooks for message handling garbage collector: hooks for global addresses new closures Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
25 Static Thunks compiler allocates certain closures statically some of them are thunks Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
26 Static Thunks compiler allocates certain closures statically some of them are thunks closure on island B refers to a static thunk STATIC static thunk closure Heap of Island A Heap of Island B Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
27 Static Thunks compiler allocates certain closures statically some of them are thunks closure on island B refers to a static thunk island A evaluates the static thunk STATIC static ind thunk closure Heap of Island A Heap of Island B Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
28 Static Thunks compiler allocates certain closures statically some of them are thunks closure on island B refers to a static thunk island A evaluates the static thunk island B evaluates thunk in A s heap STATIC static ind ind closure closure Heap of Island A Heap of Island B Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
29 Static Thunks - Solution a layer of indirection Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
30 Static Thunks - Solution a layer of indirection closure on island B refers to a static thunk STATIC static thunk closure Heap of Island A Heap of Island B Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
31 Static Thunks - Solution a layer of indirection closure on island B refers to a static thunk island A evaluates the static thunk STATIC IndIsland A B NULL static ind thunk closure Heap of Island A Heap of Island B Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
32 Static Thunks - Solution a layer of indirection closure on island B refers to a static thunk island A evaluates the static thunk island B accesses the static thunk STATIC IndIsland A B static ind closure thunk Heap of Island A FetchMv Heap of Island B Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
33 Static Thunks - Solution overhead on evaluation and access memory overhead proportional to the number of local islands additional, nontrivial complexity suggestions for solving this problem are welcomed Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
34 Summary parallel hardware becomes hierarchical Islands RTS matches it with the hierarchical architecture of the runtime itself within a cache coherent island - shared memory graph reduction between the islands - virtual heap based on message passing enables exploiting most appropriate mechanisms at each level future directions non-coherent shared memory port to Barrelfish remote islands heterogenous islands Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
35 Questions? Questions? Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
36 Uniqueness of Soft Links we can have tons! Marcin Orczyk (University of Glasgow) Islands RTS May 13, / 16
Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering
Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to
More informationComputer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics
Computer and Hardware Architecture II Benny Thörnberg Associate Professor in Electronics Parallelism Microscopic vs Macroscopic Microscopic parallelism hardware solutions inside system components providing
More informationStackVsHeap SPL/2010 SPL/20
StackVsHeap Objectives Memory management central shared resource in multiprocessing RTE memory models that are used in Java and C++ services for Java/C++ programmer from RTE (JVM / OS). Perspectives of
More informationHeterogeneous SoCs. May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1
COSCOⅣ Heterogeneous SoCs M5171111 HASEGAWA TORU M5171112 IDONUMA TOSHIICHI May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1 Contents Background Heterogeneous technology May 28, 2014 COMPUTER SYSTEM COLLOQUIUM
More informationLecture 23 Database System Architectures
CMSC 461, Database Management Systems Spring 2018 Lecture 23 Database System Architectures These slides are based on Database System Concepts 6 th edition book (whereas some quotes and figures are used
More informationManaged runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley
Managed runtimes & garbage collection CSE 6341 Some slides by Kathryn McKinley 1 Managed runtimes Advantages? Disadvantages? 2 Managed runtimes Advantages? Reliability Security Portability Performance?
More informationChapter 3: Naming Page 38. Clients in most cases find the Jini lookup services in their scope by IP
Discovery Services - Jini Discovery services require more than search facilities: Discovery Clients in most cases find the Jini lookup services in their scope by IP multicast/broadcast Multicast UDP for
More informationRun-time Environments -Part 1
Run-time Environments -Part 1 Y.N. Srikant Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Compiler Design Outline of the Lecture Part 1 What is run-time support?
More informationLANGUAGE RUNTIME NON-VOLATILE RAM AWARE SWAPPING
Technical Disclosure Commons Defensive Publications Series July 03, 2017 LANGUAGE RUNTIME NON-VOLATILE AWARE SWAPPING Follow this and additional works at: http://www.tdcommons.org/dpubs_series Recommended
More informationThe Memory System. Components of the Memory System. Problems with the Memory System. A Solution
Datorarkitektur Fö 2-1 Datorarkitektur Fö 2-2 Components of the Memory System The Memory System 1. Components of the Memory System Main : fast, random access, expensive, located close (but not inside)
More informationManaged runtimes & garbage collection
Managed runtimes Advantages? Managed runtimes & garbage collection CSE 631 Some slides by Kathryn McKinley Disadvantages? 1 2 Managed runtimes Portability (& performance) Advantages? Reliability Security
More informationCSE 5306 Distributed Systems
CSE 5306 Distributed Systems Naming Jia Rao http://ranger.uta.edu/~jrao/ 1 Naming Names play a critical role in all computer systems To access resources, uniquely identify entities, or refer to locations
More information6.1 Multiprocessor Computing Environment
6 Parallel Computing 6.1 Multiprocessor Computing Environment The high-performance computing environment used in this book for optimization of very large building structures is the Origin 2000 multiprocessor,
More information! Readings! ! Room-level, on-chip! vs.!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads
More informationProfiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency
Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu
More informationCSE 5306 Distributed Systems. Naming
CSE 5306 Distributed Systems Naming 1 Naming Names play a critical role in all computer systems To access resources, uniquely identify entities, or refer to locations To access an entity, you have resolve
More informationChapter 20: Database System Architectures
Chapter 20: Database System Architectures Chapter 20: Database System Architectures Centralized and Client-Server Systems Server System Architectures Parallel Systems Distributed Systems Network Types
More informationChapter 7. Multicores, Multiprocessors, and Clusters. Goal: connecting multiple computers to get higher performance
Chapter 7 Multicores, Multiprocessors, and Clusters Introduction Goal: connecting multiple computers to get higher performance Multiprocessors Scalability, availability, power efficiency Job-level (process-level)
More informationThe Bifrost GPU architecture and the ARM Mali-G71 GPU
The Bifrost GPU architecture and the ARM Mali-G71 GPU Jem Davies ARM Fellow and VP of Technology Hot Chips 28 Aug 2016 Introduction to ARM Soft IP ARM licenses Soft IP cores (amongst other things) to our
More informationComputer-System Organization (cont.)
Computer-System Organization (cont.) Interrupt time line for a single process doing output. Interrupts are an important part of a computer architecture. Each computer design has its own interrupt mechanism,
More informationField Analysis. Last time Exploit encapsulation to improve memory system performance
Field Analysis Last time Exploit encapsulation to improve memory system performance This time Exploit encapsulation to simplify analysis Two uses of field analysis Escape analysis Object inlining April
More informationSCALABILITY AND HETEROGENEITY MICHAEL ROITZSCH
Faculty of Computer Science Institute of Systems Architecture, Operating Systems Group SCALABILITY AND HETEROGENEITY MICHAEL ROITZSCH LAYER CAKE Application Runtime OS Kernel ISA Physical RAM 2 COMMODITY
More informationCHAPTER 3 RESOURCE MANAGEMENT
CHAPTER 3 RESOURCE MANAGEMENT SUBTOPIC Understand Memory Management Understand Processor Management INTRODUCTION Memory management is the act of managing computer memory. This involves providing ways to
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More informationA program execution is memory safe so long as memory access errors never occur:
A program execution is memory safe so long as memory access errors never occur: Buffer overflows, null pointer dereference, use after free, use of uninitialized memory, illegal free Memory safety categories
More informationParallelism with Haskell
Parallelism with Haskell T-106.6200 Special course High-Performance Embedded Computing (HPEC) Autumn 2013 (I-II), Lecture 5 Vesa Hirvisalo & Kevin Hammond (Univ. of St. Andrews) 2013-10-08 V. Hirvisalo
More informationExploration of Cache Coherent CPU- FPGA Heterogeneous System
Exploration of Cache Coherent CPU- FPGA Heterogeneous System Wei Zhang Department of Electronic and Computer Engineering Hong Kong University of Science and Technology 1 Outline ointroduction to FPGA-based
More informationAN APPROACH TO IMPROVING THE SCALABILITY OF PARALLEL HASKELL PROGRAMS
AN APPROACH TO IMPROVING THE SCALABILITY OF PARALLEL HASKELL PROGRAMS 1 HWAMOK KIM, 1 JOOWON BYUN, 2 SUGWOO BYUN, 3 GYUN WOO 1,3 Dept. of Electrical and Computer Engineering, Pusan National University,
More informationThe 7 deadly sins of cloud computing [2] Cloud-scale resource management [1]
The 7 deadly sins of [2] Cloud-scale resource management [1] University of California, Santa Cruz May 20, 2013 1 / 14 Deadly sins of of sin (n.) - common simplification or shortcut employed by ers; may
More informationOverview of research activities Toward portability of performance
Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into
More informationDynamic analysis in the Reduceron. Matthew Naylor and Colin Runciman University of York
Dynamic analysis in the Reduceron Matthew Naylor and Colin Runciman University of York A question I wonder how popular Haskell needs to become for Intel to optimize their processors for my runtime, rather
More informationChapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!
Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and
More informationHera-JVM: Abstracting Processor Heterogeneity Behind a Virtual Machine
Hera-JVM: Abstracting Processor Heterogeneity Behind a Virtual Machine Ross McIlroy and Joe Sventek Department of Computing Science University of Glasgow {ross,joe}@dcs.gla.ac.uk Abstract Heterogeneous
More informationWhiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME)
Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME) Pavel Petroshenko, Sun Microsystems, Inc. Ashmi Bhanushali, NVIDIA Corporation Jerry Evans, Sun Microsystems, Inc. Nandini
More informationLast week, David Terei lectured about the compilation pipeline which is responsible for producing the executable binaries of the Haskell code you
Last week, David Terei lectured about the compilation pipeline which is responsible for producing the executable binaries of the Haskell code you actually want to run. Today, we are going to look at an
More informationGPU Fundamentals Jeff Larkin November 14, 2016
GPU Fundamentals Jeff Larkin , November 4, 206 Who Am I? 2002 B.S. Computer Science Furman University 2005 M.S. Computer Science UT Knoxville 2002 Graduate Teaching Assistant 2005 Graduate
More informationLecture 9: MIMD Architecture
Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is
More informationRuntime Support for Algol-Like Languages Comp 412
COMP 412 FALL 2018 Runtime Support for Algol-Like Languages Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students
More informationLecture Notes on Garbage Collection
Lecture Notes on Garbage Collection 15-411: Compiler Design André Platzer Lecture 20 1 Introduction In the previous lectures we have considered a programming language C0 with pointers and memory and array
More informationMultiple Choice Type Questions
Techno India Batanagar Computer Science and Engineering Model Questions Subject Name: Computer Architecture Subject Code: CS 403 Multiple Choice Type Questions 1. SIMD represents an organization that.
More informationOperating Systems Principles. Segmentation & Shared Memory
2 Operating Systems Principles Segmentation & Memory Steve Goddard goddard@cse.unl.edu http://www.cse.unl.edu/~goddard/courses/csce451 1 Memory Management Name spaces Schemes to date have considered only
More informationDEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester : III/VI Section : CSE-1 & CSE-2 Subject Code : CS2354 Subject Name : Advanced Computer Architecture Degree & Branch : B.E C.S.E. UNIT-1 1.
More informationShenandoah: Theory and Practice. Christine Flood Roman Kennke Principal Software Engineers Red Hat
Shenandoah: Theory and Practice Christine Flood Roman Kennke Principal Software Engineers Red Hat 1 Shenandoah Christine Flood Roman Kennke Principal Software Engineers Red Hat 2 Shenandoah Why do we need
More informationDistributed Systems. The main method of distributed object communication is with remote method invocation
Distributed Systems Unit III Syllabus:Distributed Objects and Remote Invocation: Introduction, Communication between Distributed Objects- Object Model, Distributed Object Modal, Design Issues for RMI,
More informationDISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 3 Processes
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 3 Processes Context Switching Processor context: The minimal collection of values stored in the
More informationDesigning Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services. Presented by: Jitong Chen
Designing Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services Presented by: Jitong Chen Outline Architecture of Web-based Data Center Three-Stage framework to benefit
More informationMulticore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.
CS 320 Ch. 18 Multicore Computers Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. Definitions: Hyper-threading Intel's proprietary simultaneous
More information«Real Time Embedded systems» Multi Masters Systems
«Real Time Embedded systems» Multi Masters Systems rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL Chargé de cours rene.beuchat@hesge.ch LSN/hepia Prof. HES 1 Multi Master on Chip On a System On Chip, Master can
More informationCoping with Immutable Data in a JVM for Embedded Real-Time Systems. Christoph Erhardt, Simon Kuhnle, Isabella Stilkerich, Wolfgang Schröder-Preikschat
The final Frontier Coping with Immutable Data in a JVM for Embedded Real-Time Systems Christoph Erhardt, Simon Kuhnle, Isabella Stilkerich, Wolfgang Schröder-Preikschat https://www4.cs.fau.de/research/keso/
More informationGarbage Collection. Akim D le, Etienne Renault, Roland Levillain. May 15, CCMP2 Garbage Collection May 15, / 35
Garbage Collection Akim Demaille, Etienne Renault, Roland Levillain May 15, 2017 CCMP2 Garbage Collection May 15, 2017 1 / 35 Table of contents 1 Motivations and Definitions 2 Reference Counting Garbage
More informationDistributed Dense Linear Algebra on Heterogeneous Architectures. George Bosilca
Distributed Dense Linear Algebra on Heterogeneous Architectures George Bosilca bosilca@eecs.utk.edu Centraro, Italy June 2010 Factors that Necessitate to Redesign of Our Software» Steepness of the ascent
More informationCSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore
CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Status Quo Previously, CPU vendors
More informationA Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks. Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo
A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo 1 June 4, 2011 2 Outline Introduction System Architecture A Multi-Chipped
More informationParallel Architectures
Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36
More informationTOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT
TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT Eric Kelmelis 28 March 2018 OVERVIEW BACKGROUND Evolution of processing hardware CROSS-PLATFORM KERNEL DEVELOPMENT Write once, target multiple hardware
More informationFPGA & Hybrid Systems in the Enterprise Drivers, Exemplars and Challenges
Bob Blainey IBM Software Group 27 Feb 2011 FPGA & Hybrid Systems in the Enterprise Drivers, Exemplars and Challenges Workshop on The Role of FPGAs in a Converged Future with Heterogeneous Programmable
More informationTowards Parallel, Scalable VM Services
Towards Parallel, Scalable VM Services Kathryn S McKinley The University of Texas at Austin Kathryn McKinley Towards Parallel, Scalable VM Services 1 20 th Century Simplistic Hardware View Faster Processors
More informationTowards a codelet-based runtime for exascale computing. Chris Lauderdale ET International, Inc.
Towards a codelet-based runtime for exascale computing Chris Lauderdale ET International, Inc. What will be covered Slide 2 of 24 Problems & motivation Codelet runtime overview Codelets & complexes Dealing
More informationThe Use of Cloud Computing Resources in an HPC Environment
The Use of Cloud Computing Resources in an HPC Environment Bill, Labate, UCLA Office of Information Technology Prakashan Korambath, UCLA Institute for Digital Research & Education Cloud computing becomes
More informationEN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)
EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering
More informationCUDA GPGPU Workshop 2012
CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline
More informationSpeculations about Computer Architecture in Next Three Years. Jan. 20, 2018
Speculations about Computer Architecture in Next Three Years shuchang.zhou@gmail.com Jan. 20, 2018 About me https://zsc.github.io/ Source-to-source transformation Cache simulation Compiler Optimization
More informationHigh Performance Computing (HPC) Introduction
High Performance Computing (HPC) Introduction Ontario Summer School on High Performance Computing Scott Northrup SciNet HPC Consortium Compute Canada June 25th, 2012 Outline 1 HPC Overview 2 Parallel Computing
More informationThe Z Garbage Collector An Introduction
The Z Garbage Collector An Introduction Per Lidén & Stefan Karlsson HotSpot Garbage Collection Team FOSDEM 2018 Safe Harbor Statement The following is intended to outline our general product direction.
More informationLecture 4 Naming. Prof. Wilson Rivera. University of Puerto Rico at Mayaguez Electrical and Computer Engineering Department
Lecture 4 Naming Prof. Wilson Rivera University of Puerto Rico at Mayaguez Electrical and Computer Engineering Department Outline Names, identifiers, addresses Flat naming Structured naming Attribute based
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationJOP: A Java Optimized Processor for Embedded Real-Time Systems. Martin Schoeberl
JOP: A Java Optimized Processor for Embedded Real-Time Systems Martin Schoeberl JOP Research Targets Java processor Time-predictable architecture Small design Working solution (FPGA) JOP Overview 2 Overview
More informationPointer Analysis in the Presence of Dynamic Class Loading. Hind Presented by Brian Russell
Pointer Analysis in the Presence of Dynamic Class Loading Martin Hirzel, Amer Diwan and Michael Hind Presented by Brian Russell Claim: First nontrivial pointer analysis dealing with all Java language features
More informationBarrelfish Project ETH Zurich. Message Notifications
Barrelfish Project ETH Zurich Message Notifications Barrelfish Technical Note 9 Barrelfish project 16.06.2010 Systems Group Department of Computer Science ETH Zurich CAB F.79, Universitätstrasse 6, Zurich
More informationParallel & Concurrent Haskell 1: Basic Pure Parallelism. Simon Marlow (Microsoft Research, Cambridge, UK)
Parallel & Concurrent Haskell 1: Basic Pure Parallelism Simon Marlow (Microsoft Research, Cambridge, UK) Background The parallelism landscape is getting crowded Every language has their own solution(s)
More informationFPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB.
Agenda The topics that will be addressed are: Scheduling tasks on Reconfigurable FPGA architectures Mauro Marinoni ReTiS Lab, TeCIP Institute Scuola superiore Sant Anna - Pisa Overview on basic characteristics
More informationAgenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2
Lecture 3: Processes Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Process in General 3.3 Process Concept Process is an active program in execution; process
More informationThe Z Garbage Collector Low Latency GC for OpenJDK
The Z Garbage Collector Low Latency GC for OpenJDK Per Lidén & Stefan Karlsson HotSpot Garbage Collection Team Jfokus VM Tech Summit 2018 Safe Harbor Statement The following is intended to outline our
More informationArchitecture-Conscious Database Systems
Architecture-Conscious Database Systems 2009 VLDB Summer School Shanghai Peter Boncz (CWI) Sources Thank You! l l l l Database Architectures for New Hardware VLDB 2004 tutorial, Anastassia Ailamaki Query
More informationIntroduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed
More information殷亚凤. Processes. Distributed Systems [3]
Processes Distributed Systems [3] 殷亚凤 Email: yafeng@nju.edu.cn Homepage: http://cs.nju.edu.cn/yafeng/ Room 301, Building of Computer Science and Technology Review Architectural Styles: Layered style, Object-based,
More informationWilliam Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers
William Stallings Computer Organization and Architecture 8 th Edition Chapter 18 Multicore Computers Hardware Performance Issues Microprocessors have seen an exponential increase in performance Improved
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,
More informationTesla GPU Computing A Revolution in High Performance Computing
Tesla GPU Computing A Revolution in High Performance Computing Mark Harris, NVIDIA Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction to Tesla CUDA Architecture Programming & Memory
More informationthe gamedesigninitiative at cornell university Lecture 9 Memory Management
Lecture 9 Gaming Memory Constraints Redux Wii-U Playstation 4 2GB of RAM 1GB dedicated to OS Shared with GPGPU 8GB of RAM Shared GPU, 8-core CPU OS footprint unknown 2 Two Main Concerns with Memory Getting
More informationNetwork-on-Chip Architecture
Multiple Processor Systems(CMPE-655) Network-on-Chip Architecture Performance aspect and Firefly network architecture By Siva Shankar Chandrasekaran and SreeGowri Shankar Agenda (Enhancing performance)
More informationVirtual Machines. 2 Disco: Running Commodity Operating Systems on Scalable Multiprocessors([1])
EE392C: Advanced Topics in Computer Architecture Lecture #10 Polymorphic Processors Stanford University Thursday, 8 May 2003 Virtual Machines Lecture #10: Thursday, 1 May 2003 Lecturer: Jayanth Gummaraju,
More informationAdvanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University
Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:
More informationProgramming Models for Supercomputing in the Era of Multicore
Programming Models for Supercomputing in the Era of Multicore Marc Snir MULTI-CORE CHALLENGES 1 Moore s Law Reinterpreted Number of cores per chip doubles every two years, while clock speed decreases Need
More informationThe benefits and costs of writing a POSIX kernel in a high-level language
1 / 38 The benefits and costs of writing a POSIX kernel in a high-level language Cody Cutler, M. Frans Kaashoek, Robert T. Morris MIT CSAIL Should we use high-level languages to build OS kernels? 2 / 38
More informationUrmas Tamm Tartu 2013
The implementation and optimization of functional languages Urmas Tamm Tartu 2013 Intro The implementation of functional programming languages, Simon L. Peyton Jones, 1987 Miranda fully lazy strict a predecessor
More informationPinpointing Data Locality Problems Using Data-centric Analysis
Center for Scalable Application Development Software Pinpointing Data Locality Problems Using Data-centric Analysis Xu Liu XL10@rice.edu Department of Computer Science Rice University Outline Introduction
More informationCS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it
Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1
More informationComputer Science 146. Computer Architecture
Computer Architecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 18: Virtual Memory Lecture Outline Review of Main Memory Virtual Memory Simple Interleaving Cycle
More informationSuperscalar Machines. Characteristics of superscalar processors
Superscalar Machines Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any performance
More informationCOMP 322: Principles of Parallel Programming. Lecture 17: Understanding Parallel Computers (Chapter 2) Fall 2009
COMP 322: Principles of Parallel Programming Lecture 17: Understanding Parallel Computers (Chapter 2) Fall 2009 http://www.cs.rice.edu/~vsarkar/comp322 Vivek Sarkar Department of Computer Science Rice
More informationCompiling Parallel Algorithms to Memory Systems
Compiling Parallel Algorithms to Memory Systems Stephen A. Edwards Columbia University Presented at Jane Street, April 16, 2012 (λx.?)f = FPGA Parallelism is the Big Question Massive On-Chip Parallelism
More informationThe Multikernel: A new OS architecture for scalable multicore systems Baumann et al. Presentation: Mark Smith
The Multikernel: A new OS architecture for scalable multicore systems Baumann et al. Presentation: Mark Smith Review Introduction Optimizing the OS based on hardware Processor changes Shared Memory vs
More informationPRACE Autumn School Basic Programming Models
PRACE Autumn School 2010 Basic Programming Models Basic Programming Models - Outline Introduction Key concepts Architectures Programming models Programming languages Compilers Operating system & libraries
More informationParallel and High Performance Computing CSE 745
Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel
More informationHigh Performance Managed Languages. Martin Thompson
High Performance Managed Languages Martin Thompson - @mjpt777 Really, what is your preferred platform for building HFT applications? Why do you build low-latency applications on a GC ed platform? Agenda
More informationG Virtual Memory. Robert Grimm New York University
G22.3250-001 Virtual Memory Robert Grimm New York University Altogether Now: The Three Questions! What is the problem?! What is new or different?! What are the contributions and limitations? VAX-11 Memory
More informationCITS3211 FUNCTIONAL PROGRAMMING. 14. Graph reduction
CITS3211 FUNCTIONAL PROGRAMMING 14. Graph reduction Summary: This lecture discusses graph reduction, which is the basis of the most common compilation technique for lazy functional languages. CITS3211
More informationCS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 15
CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2015 Lecture 15 LAST TIME! Discussed concepts of locality and stride Spatial locality: programs tend to access values near values they have already accessed
More informationTowards Lean 4: Sebastian Ullrich 1, Leonardo de Moura 2.
Towards Lean 4: Sebastian Ullrich 1, Leonardo de Moura 2 1 Karlsruhe Institute of Technology, Germany 2 Microsoft Research, USA 1 2018/12/12 Ullrich, de Moura - Towards Lean 4: KIT The Research An University
More information