Mul$processor OS COMP9242 Advanced Opera$ng Systems. Mul$processor OS. Overview. Uniprocessor OS

Size: px
Start display at page:

Download "Mul$processor OS COMP9242 Advanced Opera$ng Systems. Mul$processor OS. Overview. Uniprocessor OS"

Transcription

1 Overview Mul$processor COMP9242 Advanced Opera$ng Systems Ihor Kuz S2/2015 Week 12 Mul=processor How does it work? Scalability (Review) Mul=processor Hardware Contemporary systems (Intel, AMD, ARM, Oracle/Sun) Experimental and Future systems (Intel, MS, Polaris) Design for Mul=processors Guidelines Design approaches Divide and Conquer (Disco, Tessela=on) Reduce Sharing (K42, Corey, Linux, FlexSC, scalable commuta=vity) No Sharing (Barrelfish, fos) 2 Uniprocessor App1 App2 Mul$processor Run queue data Process control blocks FS structs Application data App4 App1 App2 App3 Memory 3 4 1

2 Mul$processor Mul$processor App1 App3 App4 App4 App1 App3 App4 App4 Key design challenges: Correctness of of (shared) data structures Scalability Run queue data Process control blocks FS structs Application data App4 App1 App2 App3 Run queue data Process control blocks FS structs Application data App4 App1 App2 App3 Memory Memory 5 6 Correctness of Shared Data Concurrency control Locks Semaphores Transac=ons Lock- free data structures We know how to do this: In the applica=on In the Scalability Speedup as more processors added Ideal Speedup (S) S(N) = T 1 T N number of processors (n) 7 8 2

3 Scalability Speedup as more processors added Reality speedup S(N) = T 1 T N Scalability and Serialisa$on Remember Amdahl s law Serial (non- parallel) por=on: when applica=on not running on all cores Serialisa=on prevents scalability T 1 =1= (1 P)+ P T N = (1 P)+ P N S(N) = T 1 1 = T N (1 P)+ P N S( ) 1 (1 P) number of processors 9 10 From Serialisa$on More Cache- related Serialisa$on Where does serialisa=on show up? Applica=on (e.g. access shared app data) (e.g. performing syscall for app) How much =me is spent in? Sources of Serialisa=on: Locking (explicit serialisa=on) Wai=ng for a lock è stalls self Lock implementa=on: Atomic opera=ons lock bus è stalls everyone Cache coherence traffic loads bus è slows down others Memory access (implicit) Rela=vely high latency to memory è stalls self Cache (implicit) Processor stalled while cache line is fetched or invalidated Affected by latency of interconnect Performance depends on data size (cache lines) and conten=on (number of cores) False sharing Unrelated data structs share the same cache line Accessed from different processors è Cache coherence traffic and delay Cache line bouncing Shared R/W on many processors E.g: bouncing due to locks: each processor spinning on a lock brings it into its own cache è Cache coherence traffic and delay Cache misses Poten=ally direct memory access è stalls self When does cache miss occur? Applica=on accesses data for the first =me, Applica=on runs on new core Cached memory has been evicted Cache footprint too big, another app ran, ran

4 Mul$- What? Mul$processor Hardware Mul=processor, SMP >1 separate processors, connected by off chip bus Mul=core >1 processing cores in a single processor, connected by on chip bus Mul=thread, SMT >1 hardware threads in a single core Mul=core + Mul=processor >1 mul=core processors >1 mul=core dies in a package (mul=- chip module) Interes$ng Proper$es of Mul$processors Scale and Structure How many cores and processors are there What kinds of cores and processors are there How are they organised Interconnect How are the cores and processors connected Memory Locality and Caches Where is the memory What is the cache architecture Interprocessor Communica=on How do cores and processors send messages to each other Contemporary Mul$processor Hardware Intel: Nehalem, Westmere: 10 core, QPI Sandy Bridge, Ivy Bridge: 5 core, ring bus, integrated GPU, L3, IO Haswell (Broadwell): 18 core, ring bus, transac=onal memory, slices (EP) AMD: K10 (Opteron: Barcelona, Magny Cours) 12 core, Hypertransport Bulldozer, Piledriver, Steamroller (Opteron, FX) 16 core, Clustered Mul=thread: module with 2 integer cores Oracle (Sun) UltraSparc T1,T2,T3,T4,T5 (Niagara) 16 cores, 8 threads/core (2 simultaneous), crossbar, 8 sockets ARM Cortex A9, A15 MPCore, big.little 4-8 cores, big.little: A7 + A

5 Scale and Structure ARM Cortex A9 MPCore Scale and Structure ARM big.little 17 From 18 From Scale and Structure Intel Nehalem Interconnect AMD Barcelona 19 From 20 From 5

6 Memory Locality and Caches Interprocessor Communica$on Oracle Sparc T2 FB DIMM FB DIMM FB DIMM FB DIMM MCU MCU MCU MCU L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ Full Cross Bar C0 C1 C2 C3 C4 C5 C6 C7 FPU FPU FPU FPU FPU FPU FPU FPU SPU SPU SPU SPU SPU SPU SPU SPU NIU (Ethernet+) Sys I/F Buffer Switch Core PCIe 21 From x 10 Gigabit Ethernet Power <95 W 2.0 GHz From Sun/Oracle Interprocessor Communica$on Interprocessor Communica$on/ Structure/Memory 23 From 24 From 6

7 Experimental/Future Mul$processor Hardware Microson Beehive Ring bus, no cache coherence Tilera Tile64, Tile- Gx 100 cores, mesh network Intel Polaris 80 cores, mesh network Intel SCC 48 cores, mesh network, no cache coherency Intel MIC (Mul= Integrated Core) (Knight s Corner - Xeon Phi) 60+ cores, ring bus Scale and Structure Tilera Tile64 (newest: EzChips TILE- Gx), Intel Polaris DDR2 Controller 0 DDR2 Controller 1 SerDes SerDes PCIe 0 MAC/ PHY UART, HPI, I2C, GbE JTAG,SPI Flexible GbE 1 I/O Flexible I/O PCIe 1 MAC/ XAUI 1 PHY MAC/ PHY SerDes SerDes PROCESSOR CACHE Reg File L2 CACHE L-1I L-1D P P P I-TLB D-TLB D DMA MDN TDN UDN IDN STN SWITCH DDR2 Controller 3 DDR2 Controller From Cache and Memory Intel SCC Interprocessor Communica$on Beehive 27 From techresearch.intel.com/spaw2/uploads/files/scc_platform_overview.pdf 28 From projects.csail.mit.edu/beehive/beehivev5.pdf 7

8 Interprocess Communica$on Intel MIC (Mul= Integrated Core) (Knight s Corner/Landing - Xeon Phi) Summary Scalability 100+ cores Amdahl s law really kicks in Heterogeneity Heterogeneous cores, memory, etc. Proper=es of similar systems may vary wildly (e.g. interconnect topology and latencies between different AMD plaoorms) NUMA Also variable latencies due to topology and cache coherence Cache coherence may not be possible Can t use it for locking Shared data structures require explicit work Computer is a distributed system Message passing Consistency and Synchronisa=on Fault tolerance 29 From 30 Op$misa$on for Scalability DESIGN for Mul$processors Reduce amount of code in cri=cal sec=ons Increases concurrency Fine grained locking Lock data not code Tradeoff: more concurrency but more locking (and locking causes serialisa=on) Lock free data structures Avoid expensive memory access Avoid uncached memory Access cheap (close) memory

9 Op$misa$on for Scalability Reduce false sharing Pad data structures to cache lines Reduce cache line bouncing Reduce sharing E.g: MCS locks use local data Reduce cache misses Affinity scheduling: run process on the core where it last ran. Avoid cache pollu=on Design Guidelines for Modern (and future) Mul$processors Avoid shared data Performance issues arise less from lock conten=on than from data locality Explicit communica=on Regain control over communica=on costs (and predictability) Some=mes it s the only op=on Tradeoff: parallelism vs synchronisa=on Synchronisa=on introduces serialisa=on Make concurrent threads independent: reduce crit sec=ons & cache misses Allocate for locality E.g. provide memory local to a core Schedule for locality With cached data With local memory Tradeoff: uniprocessor performance vs scalability Design approaches Divide and conquer Divide mul=processor into smaller bits, use them as normal Using virtualisa=on Using exokernel Reduced sharing Brute force & Heroic Effort Find problems in exis=ng and fix them E.g Linux rearchitec=ng: BKL - > fine grained locking By design Avoid shared data as much as possible No sharing Computer is a distributed system Do extra work to share! 35 Divide and Conquer Disco Scalability is too hard! Context: ca. 1995, large ccnuma mul=processors appearing Scaling es requires extensive modifica=ons Idea: Implement a scalable VMM Run mul=ple instances VMM has most of the features of a scalable : NUMA aware allocator Page replica=on, remapping, etc. VMM substan=ally simpler/cheaper to implement Modern incarna=ons of this Virtual servers (Amazon, etc.) Research (Cerberus) 36 Running commodity es on scalable multiprocessors [Bugnion et al., 1997] 9

10 Disco Architecture Disco Performance Space- Time Par$$oning Tessella$on Tessella$on Space- Time par==oning 2- level scheduling Context: highly parallel mul=core systems Berkeley Par Lab 39 Tessellation: Space-Time Partitioning in a Manycore Client [Liu et al., 2010]

11 Reduce Sharing K42 Context: : for ccnuma systems IBM, U Toronto (Tornado, Hurricane) Goals: High locality Scalability Object Oriented Fine grained objects Clustered (Distributed) Objects Data locality Deferred dele=on (RCU) Avoid locking NUMA aware memory allocator Memory locality K42: Fine- grained objects 41 Clustered Objects, Ph.D. thesis [Appavoo, 2005] 42 K42: Clustered objects Globally valid object reference Resolves to Processor local representa=ve Sharing, locking strategy local to each object Transparency Eases complexity Controlled introduc=on of locality Shared counter: inc, dec: local access val: communica=on Fast path: Access mostly local structures K42 Performance

12 Corey Context 2008, high- end mul=core servers, MIT Goals: Applica=on control of sharing Exokernel- like, higher- level services as libraries By default only single core access to data structures Calls to control how data structures are shared Address Ranges Control private per core and shared address spaces Kernel Cores Dedicate cores to run specific kernel func=ons Shares Lookup tables for kernel objects allow control over which object iden=fiers are visible to other cores. Linux Brute Force Scalability Context 2010, high- end mul=core servers, MIT Goals: Scaling commodity Linux scalability (2010 scale Linux to 48 cores) 45 Corey: An Operating System for Many Cores [Boyd-Wickizer et al., 2008] 46 An Analysis of Linux Scalability to Many Cores [Boyd-Wickizer et al., 2010] Linux Brute Force Scalability Apply lessons from parallel compu=ng and past research sloppy counters, per- core data structs, fine- grained lock, lock free, cache lines 3002 lines of code changed Conclusion: no scalability reason to give up on tradi=onal opera=ng system organiza=ons just yet. Scalability of the API Context 2013, previous mul=core projects at MIT Goals How to know if a system is really scalable? Workload- based evalua=on Run workload, plot scalability, fix problems Did we miss any non- scalable workload? Did we find all boulenecks? Is there something fundamental that makes an system non- scalable? The interface might be a fundamental bouleneck The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors [Clements et al., 2013] 12

13 Scalable Commuta$vity Rule The Rule Whenever interface opera1ons commute, they can be implemented in a way that scales. Commuta=ve opera=ons: Cannot dis=nguish order of opera=ons from results Example: Creat: Requires that lowest available FD be returned Not commuta=ve: can tell which one was run first Why are commuta=ve opera=ons scalable? results independent of order communica=on is unnecessary without communica=on, no conflicts Informs sonware design process Design: design guideline for scalable interfaces Implementa=on: clear target Test: workload- independent tes=ng Commuter: An Automated Scalability Tes$ng Tool (sv6) FlexSC Context: 2010, commodity mul=cores U Toronto Goal: Reduce context switch overhead of system calls Syscall context switch: Usual mode switch overhead But: cache and TLB pollu=on! FlexSC Asynchronous system calls Batch system calls Run them on dedicated cores FlexSC- Threads M on N M >> N 51 FlexSC: Flexible System Call Scheduling with Exception-Less System Calls [Soares and Stumm., 2010] 52 13

14 FlexSC Results No sharing Mul=kernel Barrelfish fos: factored opera=ng system Apache FlexSC: batching, sys call core redirect The Multikernel: A new architecture for scalable multicore systems [Baumann et al., 2009] Barrelfish Context: 2007 large mul=core machines appearing 100s of cores on the horizon NUMA (cc and non- cc) ETH Zurich and Microson Goals: Scale to many cores Support and manage heterogeneous hardware Approach: Structure as distributed system Design principles: Interprocessor communica=on is explicit structure hardware neutral State is replicated Microkernel Similar to sel4: capabili=es Barrelfish 55 The Multikernel: A new architecture for scalable multicore systems [Baumann et al., 2009]

15 Barrelfish: Replica$on Kernel + Monitor: Only memory shared for message channels Monitor: Collec=vely coordinate system- wide state System- wide state: Memory alloca=on tables Address space mappings Capability lists What state is replicated in Barrelfish Capability lists Consistency and Coordina=on Retype: two- phase commit to globally execute opera=on in order Page (re/un)mapping: one- phase commit to synchronise TLBs Barrelfish: Communica$on Different mechanisms: Intra- core Kernel endpoints Inter- core URPC URPC Uses cache coherence + polling Shared bufffer Sender writes a cache line Receiver polls on cache line (last word so no part message) Polling? Cache only changes when sender writes, so poll is cheap Switch to block and IPI if wait is too long Barrelfish: Results Message passing vs caching Latency (cycles 1000) SHM8 SHM4 SHM2 SHM1 MSG8 MSG1 Server Barrelfish: Results Broadcast vs Mul=cast Latency (cycles 1000) Broadcast Unicast Multicast NUMA-Aware Multicast Cores Cores 15

16 Barrelfish: Results TLB shootdown Windows Linux Barrelfish Latency (cycles 1000) Summary Cores 62 Summary Trends in mul=core Scale (100+ cores) NUMA No cache coherence Distributed system Heterogeneity design guidelines Avoid shared data Explicit communica=on Locality Approaches to mul=core Par==on the machine (Disco, Tessella=on) Reduce sharing (K42, Corey, Linux, FlexSC, scalable commuta=vity) No sharing (Barrelfish, fos) 63 16

Multiprocessor OS. COMP9242 Advanced Operating Systems S2/2011 Week 7: Multiprocessors Part 2. Scalability of Multiprocessor OS.

Multiprocessor OS. COMP9242 Advanced Operating Systems S2/2011 Week 7: Multiprocessors Part 2. Scalability of Multiprocessor OS. Multiprocessor OS COMP9242 Advanced Operating Systems S2/2011 Week 7: Multiprocessors Part 2 2 Key design challenges: Correctness of (shared) data structures Scalability Scalability of Multiprocessor OS

More information

MULTIPROCESSOR OS. Overview. COMP9242 Advanced Operating Systems S2/2013 Week 11: Multiprocessor OS. Multiprocessor OS

MULTIPROCESSOR OS. Overview. COMP9242 Advanced Operating Systems S2/2013 Week 11: Multiprocessor OS. Multiprocessor OS Overview COMP9242 Advanced Operating Systems S2/2013 Week 11: Multiprocessor OS Multiprocessor OS Scalability Multiprocessor Hardware Contemporary systems Experimental and Future systems OS design for

More information

The Multikernel A new OS architecture for scalable multicore systems

The Multikernel A new OS architecture for scalable multicore systems Systems Group Department of Computer Science ETH Zurich SOSP, 12th October 2009 The Multikernel A new OS architecture for scalable multicore systems Andrew Baumann 1 Paul Barham 2 Pierre-Evariste Dagand

More information

The Multikernel: A new OS architecture for scalable multicore systems Baumann et al. Presentation: Mark Smith

The Multikernel: A new OS architecture for scalable multicore systems Baumann et al. Presentation: Mark Smith The Multikernel: A new OS architecture for scalable multicore systems Baumann et al. Presentation: Mark Smith Review Introduction Optimizing the OS based on hardware Processor changes Shared Memory vs

More information

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Today Non-Uniform

More information

Today. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

Today. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( ) Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Systems Group Department of Computer Science ETH Zürich SMP architecture

More information

Design Principles for End-to-End Multicore Schedulers

Design Principles for End-to-End Multicore Schedulers c Systems Group Department of Computer Science ETH Zürich HotPar 10 Design Principles for End-to-End Multicore Schedulers Simon Peter Adrian Schüpbach Paul Barham Andrew Baumann Rebecca Isaacs Tim Harris

More information

Tile Processor (TILEPro64)

Tile Processor (TILEPro64) Tile Processor Case Study of Contemporary Multicore Fall 2010 Agarwal 6.173 1 Tile Processor (TILEPro64) Performance # of cores On-chip cache (MB) Cache coherency Operations (16/32-bit BOPS) On chip bandwidth

More information

CSE 120 Principles of Operating Systems

CSE 120 Principles of Operating Systems CSE 120 Principles of Operating Systems Spring 2018 Lecture 15: Multicore Geoffrey M. Voelker Multicore Operating Systems We have generally discussed operating systems concepts independent of the number

More information

The Barrelfish operating system for heterogeneous multicore systems

The Barrelfish operating system for heterogeneous multicore systems c Systems Group Department of Computer Science ETH Zurich 8th April 2009 The Barrelfish operating system for heterogeneous multicore systems Andrew Baumann Systems Group, ETH Zurich 8th April 2009 Systems

More information

Implications of Multicore Systems. Advanced Operating Systems Lecture 10

Implications of Multicore Systems. Advanced Operating Systems Lecture 10 Implications of Multicore Systems Advanced Operating Systems Lecture 10 Lecture Outline Hardware trends Programming models Operating systems for NUMA hardware!2 Hardware Trends Power consumption limits

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

SCALABILITY AND HETEROGENEITY MICHAEL ROITZSCH

SCALABILITY AND HETEROGENEITY MICHAEL ROITZSCH Faculty of Computer Science Institute of Systems Architecture, Operating Systems Group SCALABILITY AND HETEROGENEITY MICHAEL ROITZSCH LAYER CAKE Application Runtime OS Kernel ISA Physical RAM 2 COMMODITY

More information

Multiprocessor Systems. Chapter 8, 8.1

Multiprocessor Systems. Chapter 8, 8.1 Multiprocessor Systems Chapter 8, 8.1 1 Learning Outcomes An understanding of the structure and limits of multiprocessor hardware. An appreciation of approaches to operating system support for multiprocessor

More information

WHY PARALLEL PROCESSING? (CE-401)

WHY PARALLEL PROCESSING? (CE-401) PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

Computer Architecture

Computer Architecture Jens Teubner Computer Architecture Summer 2016 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2016 Jens Teubner Computer Architecture Summer 2016 83 Part III Multi-Core

More information

Multi-core Architectures. Dr. Yingwu Zhu

Multi-core Architectures. Dr. Yingwu Zhu Multi-core Architectures Dr. Yingwu Zhu What is parallel computing? Using multiple processors in parallel to solve problems more quickly than with a single processor Examples of parallel computing A cluster

More information

Multiprocessor Systems. COMP s1

Multiprocessor Systems. COMP s1 Multiprocessor Systems 1 Multiprocessor System We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than one CPU to improve

More information

Introduc)on to Xeon Phi

Introduc)on to Xeon Phi Introduc)on to Xeon Phi ACES Aus)n, TX Dec. 04 2013 Kent Milfeld, Luke Wilson, John McCalpin, Lars Koesterke TACC What is it? Co- processor PCI Express card Stripped down Linux opera)ng system Dense, simplified

More information

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple

More information

Chapter 18 - Multicore Computers

Chapter 18 - Multicore Computers Chapter 18 - Multicore Computers Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ Luis Tarrataca Chapter 18 - Multicore Computers 1 / 28 Table of Contents I 1 2 Where to focus your study Luis Tarrataca

More information

Parallel Architecture. Hwansoo Han

Parallel Architecture. Hwansoo Han Parallel Architecture Hwansoo Han Performance Curve 2 Unicore Limitations Performance scaling stopped due to: Power Wire delay DRAM latency Limitation in ILP 3 Power Consumption (watts) 4 Wire Delay Range

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

Virtual Machines Disco and Xen (Lecture 10, cs262a) Ion Stoica & Ali Ghodsi UC Berkeley February 26, 2018

Virtual Machines Disco and Xen (Lecture 10, cs262a) Ion Stoica & Ali Ghodsi UC Berkeley February 26, 2018 Virtual Machines Disco and Xen (Lecture 10, cs262a) Ion Stoica & Ali Ghodsi UC Berkeley February 26, 2018 Today s Papers Disco: Running Commodity Operating Systems on Scalable Multiprocessors, Edouard

More information

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8. Multiprocessor System Multiprocessor Systems Chapter 8, 8.1 We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than

More information

Convergence of Parallel Architecture

Convergence of Parallel Architecture Parallel Computing Convergence of Parallel Architecture Hwansoo Han History Parallel architectures tied closely to programming models Divergent architectures, with no predictable pattern of growth Uncertainty

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework and numa control Examples

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the

More information

Barrelfish Project ETH Zurich. Message Notifications

Barrelfish Project ETH Zurich. Message Notifications Barrelfish Project ETH Zurich Message Notifications Barrelfish Technical Note 9 Barrelfish project 16.06.2010 Systems Group Department of Computer Science ETH Zurich CAB F.79, Universitätstrasse 6, Zurich

More information

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012) Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues

More information

Bruce Shriver University of Tromsø, Norway. November 2010 Óbuda University, Hungary

Bruce Shriver University of Tromsø, Norway. November 2010 Óbuda University, Hungary Bruce Shriver University of Tromsø, Norway November 2010 Óbuda University, Hungary Reconfigurability Issues in Multi-core Systems The first lecture explored the thesis that reconfigurability is an integral

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 19: Multiprocessing Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CSE 502 Stony Brook University] Getting More

More information

Flexible Architecture Research Machine (FARM)

Flexible Architecture Research Machine (FARM) Flexible Architecture Research Machine (FARM) RAMP Retreat June 25, 2009 Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan Bronson Christos Kozyrakis, Kunle Olukotun Motivation Why CPUs + FPGAs make sense

More information

FlexSC. Flexible System Call Scheduling with Exception-Less System Calls. Livio Soares and Michael Stumm. University of Toronto

FlexSC. Flexible System Call Scheduling with Exception-Less System Calls. Livio Soares and Michael Stumm. University of Toronto FlexSC Flexible System Call Scheduling with Exception-Less System Calls Livio Soares and Michael Stumm University of Toronto Motivation The synchronous system call interface is a legacy from the single

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604

More information

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448 1 The Greed for Speed Two general approaches to making computers faster Faster uniprocessor All the techniques we ve been looking

More information

Milestone 7: another core. Multics. Multiprocessor OSes. Multics on GE645. Multics: typical configuration

Milestone 7: another core. Multics. Multiprocessor OSes. Multics on GE645. Multics: typical configuration Milestone 7: another core Chapter 7: Multiprocessing Advanced Operating Systems (263 3800 00L) Timothy Roscoe Herbstsemester 202 http://www.systems.ethz.ch/courses/fall202/aos Assignment: Bring up the

More information

Mul$processor Architecture. CS 5334/4390 Spring 2014 Shirley Moore, Instructor February 4, 2014

Mul$processor Architecture. CS 5334/4390 Spring 2014 Shirley Moore, Instructor February 4, 2014 Mul$processor Architecture CS 5334/4390 Spring 2014 Shirley Moore, Instructor February 4, 2014 1 Agenda Announcements (5 min) Quick quiz (10 min) Analyze results of STREAM benchmark (15 min) Mul$processor

More information

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon

More information

Disco: Running Commodity Operating Systems on Scalable Multiprocessors

Disco: Running Commodity Operating Systems on Scalable Multiprocessors Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Kinskuk Govil and Mendel Rosenblum Stanford University Presented by : Long Zhang Overiew Background

More information

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics CS4230 Parallel Programming Lecture 3: Introduction to Parallel Architectures Mary Hall August 28, 2012 Homework 1: Parallel Programming Basics Due before class, Thursday, August 30 Turn in electronically

More information

Multi-core Architectures. Dr. Yingwu Zhu

Multi-core Architectures. Dr. Yingwu Zhu Multi-core Architectures Dr. Yingwu Zhu Outline Parallel computing? Multi-core architectures Memory hierarchy Vs. SMT Cache coherence What is parallel computing? Using multiple processors in parallel to

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

Multiprocessors and Locking

Multiprocessors and Locking Types of Multiprocessors (MPs) Uniform memory-access (UMA) MP Access to all memory occurs at the same speed for all processors. Multiprocessors and Locking COMP9242 2008/S2 Week 12 Part 1 Non-uniform memory-access

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Multi-{Socket,,Thread} Getting More Performance Keep pushing IPC and/or frequenecy Design complexity (time to market) Cooling (cost) Power delivery (cost) Possible, but too

More information

COSC 6385 Computer Architecture. - Thread Level Parallelism (III)

COSC 6385 Computer Architecture. - Thread Level Parallelism (III) OS 6385 omputer Architecture - Thread Level Parallelism (III) Spring 2013 Some slides are based on a lecture by David uller, University of alifornia, Berkley http://www.eecs.berkeley.edu/~culler/courses/cs252-s05

More information

CSE 392/CS 378: High-performance Computing - Principles and Practice

CSE 392/CS 378: High-performance Computing - Principles and Practice CSE 392/CS 378: High-performance Computing - Principles and Practice Parallel Computer Architectures A Conceptual Introduction for Software Developers Jim Browne browne@cs.utexas.edu Parallel Computer

More information

1/12/11. ECE 1749H: Interconnec3on Networks for Parallel Computer Architectures. Introduc3on. Interconnec3on Networks Introduc3on

1/12/11. ECE 1749H: Interconnec3on Networks for Parallel Computer Architectures. Introduc3on. Interconnec3on Networks Introduc3on ECE 1749H: Interconnec3on Networks for Parallel Computer Architectures Introduc3on Prof. Natalie Enright Jerger Winter 2011 ECE 1749H: Interconnec3on Networks (Enright Jerger) 1 Interconnec3on Networks

More information

Chapter 7: Multiprocessing Advanced Operating Systems ( L)

Chapter 7: Multiprocessing Advanced Operating Systems ( L) Chapter 7: Multiprocessing Advanced Operating Systems (263 3800 00L) Timothy Roscoe Herbstsemester 2012 http://www.systems.ethz.ch/education/courses/hs11/aos/ Systems Group Department of Computer Science

More information

Modern systems: multicore issues

Modern systems: multicore issues Modern systems: multicore issues By Paul Grubbs Portions of this talk were taken from Deniz Altinbuken s talk on Disco in 2009: http://www.cs.cornell.edu/courses/cs6410/2009fa/lectures/09-multiprocessors.ppt

More information

CSE Opera,ng System Principles

CSE Opera,ng System Principles CSE 30341 Opera,ng System Principles Lecture 5 Processes / Threads Recap Processes What is a process? What is in a process control bloc? Contrast stac, heap, data, text. What are process states? Which

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Marble House The Knife (Silent Shout) Before starting The Knife, we were working

More information

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

Master s Thesis Nr. 10

Master s Thesis Nr. 10 Master s Thesis Nr. 10 Systems Group, Department of Computer Science, ETH Zurich Performance isolation on multicore hardware by Kaveh Razavi Supervised by Akhilesh Singhania and Timothy Roscoe November

More information

ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures. Introduc1on. Prof. Natalie Enright Jerger

ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures. Introduc1on. Prof. Natalie Enright Jerger ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures Introduc1on Prof. Natalie Enright Jerger Winter 2011 ECE 1749H: Interconnec1on Networks (Enright Jerger) 1 Interconnec1on Networks

More information

Exploring different level of parallelism Instruction-level parallelism (ILP): how many of the operations/instructions in a computer program can be performed simultaneously 1. e = a + b 2. f = c + d 3.

More information

COMP9242 Advanced OS. Copyright Notice. The Memory Wall. Caching. These slides are distributed under the Creative Commons Attribution 3.

COMP9242 Advanced OS. Copyright Notice. The Memory Wall. Caching. These slides are distributed under the Creative Commons Attribution 3. Copyright Notice COMP9242 Advanced OS S2/2018 W03: s: What Every OS Designer Must Know @GernotHeiser These slides are distributed under the Creative Commons Attribution 3.0 License You are free: to share

More information

How much energy can you save with a multicore computer for web applications?

How much energy can you save with a multicore computer for web applications? How much energy can you save with a multicore computer for web applications? Peter Strazdins Computer Systems Group, Department of Computer Science, The Australian National University seminar at Green

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

NUMA replicated pagecache for Linux

NUMA replicated pagecache for Linux NUMA replicated pagecache for Linux Nick Piggin SuSE Labs January 27, 2008 0-0 Talk outline I will cover the following areas: Give some NUMA background information Introduce some of Linux s NUMA optimisations

More information

4. Shared Memory Parallel Architectures

4. Shared Memory Parallel Architectures Master rogram (Laurea Magistrale) in Computer cience and Networking High erformance Computing ystems and Enabling latforms Marco Vanneschi 4. hared Memory arallel Architectures 4.4. Multicore Architectures

More information

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 10: Introduction to Coherence. The Lecture Contains:

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 10: Introduction to Coherence. The Lecture Contains: The Lecture Contains: Four Organizations Hierarchical Design Cache Coherence Example What Went Wrong? Definitions Ordering Memory op Bus-based SMP s file:///d /...audhary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture10/10_1.htm[6/14/2012

More information

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache. Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network

More information

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence 1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence Parallel Computer Architecture Spring 2018 Shared Memory Multiprocessors Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture

More information

ffwd: delegation is (much) faster than you think Sepideh Roghanchi, Jakob Eriksson, Nilanjana Basu

ffwd: delegation is (much) faster than you think Sepideh Roghanchi, Jakob Eriksson, Nilanjana Basu ffwd: delegation is (much) faster than you think Sepideh Roghanchi, Jakob Eriksson, Nilanjana Basu int get_seqno() { } return ++seqno; // ~1 Billion ops/s // single-threaded int threadsafe_get_seqno()

More information

ECSE 425 Lecture 25: Mul1- threading

ECSE 425 Lecture 25: Mul1- threading ECSE 425 Lecture 25: Mul1- threading H&P Chapter 3 Last Time Theore1cal and prac1cal limits of ILP Instruc1on window Branch predic1on Register renaming 2 Today Mul1- threading Chapter 3.5 Summary of ILP:

More information

Architecture-Conscious Database Systems

Architecture-Conscious Database Systems Architecture-Conscious Database Systems 2009 VLDB Summer School Shanghai Peter Boncz (CWI) Sources Thank You! l l l l Database Architectures for New Hardware VLDB 2004 tutorial, Anastassia Ailamaki Query

More information

Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?

Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? N. S. Islam, X. Lu, M. W. Rahman, and D. K. Panda Network- Based Compu2ng Laboratory Department of Computer

More information

Cluster Communica/on Latency:

Cluster Communica/on Latency: Cluster Communica/on Latency: towards approaching its Minimum Hardware Limits, on Low-Power Pla=orms Manolis Katevenis FORTH, Heraklion, Crete, Greece (in collab. with Univ. of Crete) http://www.ics.forth.gr/carv/

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

INFLUENTIAL OS RESEARCH

INFLUENTIAL OS RESEARCH INFLUENTIAL OS RESEARCH Multiprocessors Jan Bierbaum Tobias Stumpf SS 2017 ROADMAP Roadmap Multiprocessor Architectures Usage in the Old Days (mid 90s) Disco Present Age Research The Multikernel Helios

More information

Online Course Evaluation. What we will do in the last week?

Online Course Evaluation. What we will do in the last week? Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do

More information

ECSE 425 Lecture 1: Course Introduc5on Bre9 H. Meyer

ECSE 425 Lecture 1: Course Introduc5on Bre9 H. Meyer ECSE 425 Lecture 1: Course Introduc5on 2011 Bre9 H. Meyer Staff Instructor: Bre9 H. Meyer, Professor of ECE Email: bre9 dot meyer at mcgill.ca Phone: 514-398- 4210 Office: McConnell 525 OHs: M 14h00-15h00;

More information

Multicore Hardware and Parallelism

Multicore Hardware and Parallelism Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3

More information

Linux multi-core scalability

Linux multi-core scalability Linux multi-core scalability Oct 2009 Andi Kleen Intel Corporation andi@firstfloor.org Overview Scalability theory Linux history Some common scalability trouble-spots Application workarounds Motivation

More information

Markets Demanding More Performance

Markets Demanding More Performance The Tile Processor Architecture: Embedded Multicore for Networking and Digital Multimedia Tilera Corporation August 20 th 2007 Hotchips 2007 Markets Demanding More Performance Networking market - Demand

More information

Kaisen Lin and Michael Conley

Kaisen Lin and Michael Conley Kaisen Lin and Michael Conley Simultaneous Multithreading Instructions from multiple threads run simultaneously on superscalar processor More instruction fetching and register state Commercialized! DEC

More information

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1 Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip

More information

Computer Architecture 计算机体系结构. Lecture 9. CMP and Multicore System 第九讲 片上多处理器与多核系统. Chao Li, PhD. 李超博士

Computer Architecture 计算机体系结构. Lecture 9. CMP and Multicore System 第九讲 片上多处理器与多核系统. Chao Li, PhD. 李超博士 Computer Architecture 计算机体系结构 Lecture 9. CMP and Multicore System 第九讲 片上多处理器与多核系统 Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2017 Review Classification of parallel architectures Shared-memory system Cache coherency

More information

COMP9242 Advanced OS. S2/2017 W03: Caches: What Every OS Designer Must

COMP9242 Advanced OS. S2/2017 W03: Caches: What Every OS Designer Must COMP9242 Advanced OS S2/2017 W03: Caches: What Every OS Designer Must Know @GernotHeiser Copyright Notice These slides are distributed under the Creative Commons Attribution 3.0 License You are free: to

More information

DISTRIBUTED SYSTEMS [COMP9243] Lecture 3b: Distributed Shared Memory DISTRIBUTED SHARED MEMORY (DSM) DSM consists of two components:

DISTRIBUTED SYSTEMS [COMP9243] Lecture 3b: Distributed Shared Memory DISTRIBUTED SHARED MEMORY (DSM) DSM consists of two components: SHARED ADDRESS SPACE DSM consists of two components: DISTRIBUTED SYSTEMS [COMP9243] ➀ Shared address space ➁ Replication and consistency of memory objects Lecture 3b: Distributed Shared Memory Shared address

More information

Interconnection Networks

Interconnection Networks Lecture 18: Interconnection Networks Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of these slides were created by Michael Papamichael This lecture is partially

More information

Arquitecturas y Modelos de. Multicore

Arquitecturas y Modelos de. Multicore Arquitecturas y Modelos de rogramacion para Multicore 17 Septiembre 2008 Castellón Eduard Ayguadé Alex Ramírez Opening statements * Some visionaries already predicted multicores 30 years ago And they have

More information

Computer Architecture

Computer Architecture Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors

More information

NetSlices: Scalable Mul/- Core Packet Processing in User- Space

NetSlices: Scalable Mul/- Core Packet Processing in User- Space NetSlices: Scalable Mul/- Core Packet Processing in - Space Tudor Marian, Ki Suh Lee, Hakim Weatherspoon Cornell University Presented by Ki Suh Lee Packet Processors Essen/al for evolving networks Sophis/cated

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

Efficient inter-kernel communication in a heterogeneous multikernel operating system. Ajithchandra Saya

Efficient inter-kernel communication in a heterogeneous multikernel operating system. Ajithchandra Saya Efficient inter-kernel communication in a heterogeneous multikernel operating system Ajithchandra Saya Report submitted to the faculty of the Virginia Polytechnic Institute and State University in partial

More information

COSC 6385 Computer Architecture. - Multi-Processors (IV) Simultaneous multi-threading and multi-core processors

COSC 6385 Computer Architecture. - Multi-Processors (IV) Simultaneous multi-threading and multi-core processors OS 6385 omputer Architecture - Multi-Processors (IV) Simultaneous multi-threading and multi-core processors Spring 2012 Long-term trend on the number of transistor per integrated circuit Number of transistors

More information

HPC Architectures. Types of resource currently in use

HPC Architectures. Types of resource currently in use HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems 1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase

More information

MegaPipe: A New Programming Interface for Scalable Network I/O

MegaPipe: A New Programming Interface for Scalable Network I/O MegaPipe: A New Programming Interface for Scalable Network I/O Sangjin Han in collabora=on with Sco? Marshall Byung- Gon Chun Sylvia Ratnasamy University of California, Berkeley Yahoo! Research tl;dr?

More information

New directions in OS design: the Barrelfish research operating system. Timothy Roscoe (Mothy) ETH Zurich Switzerland

New directions in OS design: the Barrelfish research operating system. Timothy Roscoe (Mothy) ETH Zurich Switzerland New directions in OS design: the Barrelfish research operating system Timothy Roscoe (Mothy) ETH Zurich Switzerland Acknowledgements ETH Zurich: Zachary Anderson, Pierre Evariste Dagand, Stefan Kästle,

More information

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

Master Program (Laurea Magistrale) in Computer Science and Networking. High Performance Computing Systems and Enabling Platforms.

Master Program (Laurea Magistrale) in Computer Science and Networking. High Performance Computing Systems and Enabling Platforms. Master Program (Laurea Magistrale) in Computer Science and Networking High Performance Computing Systems and Enabling Platforms Marco Vanneschi Multithreading Contents Main features of explicit multithreading

More information