SCALABILITY AND HETEROGENEITY MICHAEL ROITZSCH

Size: px

Start display at page:

Download "SCALABILITY AND HETEROGENEITY MICHAEL ROITZSCH"

Adelia Richards
6 years ago
Views:

1 Faculty of Computer Science Institute of Systems Architecture, Operating Systems Group SCALABILITY AND HETEROGENEITY MICHAEL ROITZSCH

2 LAYER CAKE Application Runtime OS Kernel ISA Physical RAM 2

3 COMMODITY HARDWARE 3

4 LAYER CAKE Application Runtime OS Kernel ISA RAM Coherency Interconnect RAM 4

5 NUMA Non-Uniform Memory Access core-to-ram distance differs various interconnect topologies: bus, star, ring, hypercube more general: different access latencies to data consider cache latency, shared resource contention 5

6 NUMA Daniel Müller: Memory and Thread Management on NUMA Systems, Diploma Thesis,

7 NUMA MECHANISM Daniel Müller: Memory and Thread Management on NUMA Systems, Diploma Thesis,

8 NUMA POLICIES fundamental options: migrate thread vs. migrate data use performance counters to monitor dynamic management shows >10% performance benefit compared to best static placement 8

9 LAYER CAKE Application Runtime OS Kernel ISA RAM Interconnect RAM 9

10 SHARE-NOTHING KERNELS 10

11 DISTRIBUTED App RT MPI RT OS OS ISA RAM Network RAM 11

12 BARRELFISH App App App App Agreement algorithms OS node OS node OS node OS node State replica State replica State replica Async messages State replica Arch-specific code Heterogeneous cores x86 x64 ARM GPU Interconnect Figure 1: The multikernel model. Andrew Baumann et al.: The Multikernel: A new OS architecture for scalable multicore systems, SOSP

13 BARRELFISH concept: multikernel, implementation: Barrelfish treat the machine as cores with a network no inter-core sharing at the lower levels CPU driver plus exokernel-ish structure traditional OS functionality as a tailored distributed system with state replication 13

14 CONCEPT Application RT OS Agreement Messages RT OS ISA ISA RAM Interconnect RAM 14

15 REALITY SO FAR App App RT RT OS Messages OS ISA RAM Interconnect RAM 15

16 BARRELFISH driven by scalability issues of shared kernel designs and cache coherence this may not be a pressing issue today 25 Barrelfish Linux 20 Cycles s (d) SPLASH-2 Barnes-Hut Andrew Baumann et al.: The Multikernel: A new OS architecture for scalable multicore systems, SOSP

17 BARRELFISH 17

18 FOS David Wentzlaff, Anant Agarwal: Factored Operating Systems (fos): The Case for a Scalable Operating System for Multicores, SIGOPS OSR

19 vent multiple serv FOS David Wentzlaff, Anant Agarwal: Factored Operating Systems (fos): The Case for a Scalable Operating System for Multicores, SIGOPS OSR Figu

20 VIRTUALIZATION KERNELS 20

21 A BIT BETTER App RT RT OS Virtualization OS ISA Coherence RAM RAM 21

22 DISCO Edouard Bugnion, Scott Devine, Kinshuk Govil, Mendel Rosenblum: Disco: Running Commodity Operating Systems on Scalable Multiprocessors, ACM Transactions on Computer Systems,

23 DISCO Source: [BDGR97] Edouard Bugnion, Scott Devine, Kinshuk Govil, Mendel Rosenblum: Disco: Running Commodity Operating Systems on Scalable Multiprocessors, ACM Transactions on Computer Systems,

24 DISCO Edouard Bugnion, Scott Devine, Kinshuk Govil, Mendel Rosenblum: Disco: Running Commodity Operating Systems on Scalable Multiprocessors, ACM Transactions on Computer Systems,

25 DISCO motivated by increasing core counts but badly scaling operating systems increase system utilization by co-locating multiple badly scaling systems allow for future better scaling systems employ the opportunities of cachecoherent shared memory instead of dogmatizing the share-nothing kernel 25

26 THE HOLY GRAIL App RT OS? OS ISA RAM Interconnect RAM 26

27 THE HOLY GRAIL what we want: some unifying runtime / OS service on top of non-cache-coherent hardware OS service: distributed shared memory? runtimes from HPC (Charm++, X10) or cloud computing (MapReduce, Dryad)? distributed-systems-on-chip? still ongoing research 27

28 HETEROGENEITY 28

29 THE HOLY GRAIL App RT OS ISA?? OS ISA RAM Interconnect RAM 29

30 GPUS TODAY Application Runtime OS Kernel HLSL OpenCL GPU Driver Compute Kernel ISA ISA Physical RAM 30

31 GPUS TODAY Application Runtime OS Kernel C++ AMP GPU Driver Comp. Kernel ISA ISA Physical RAM 31

32 HELIOS idea: heterogeneous ISA systems need some kind of compiler support ISA-specific kernels: satellite kernels provide uniform OS abstractions memory management, scheduling bootstrap: first kernel becomes coordinator, boots other cores 32

33 HELIOS share-nothing, even on ccnuma processes cannot span across kernels implementation based on Singularity applications compiled to intermediate code 2nd stage compilation to native code of all available ISAs at install time placement based on affinity hints 33

34 HELIOS Placement Example HELIOS Source: [NHM 09] IOS - Multiprocessors slide 51 Edmund B. Nightingale et al.: Helios: Heterogeneous Multiprocessing with Satellite Kernels, SOSP

35 HELIOS Application RT RT OS ISA Channels Compiler OS ISA RAM Interconnect RAM 35

36 ISA-MIGRATION general solution transform memory state transition control flow between versions modify compiler to keep memory state architecture-independent runtime stack transformation binary translation up to next function call 36

37 ISA-MIGRATION Performance Compilation for migratability (w/o migrations) Performance with migrations Performance - Dummy calls in outermost loops Performance - Dummy calls in 2nd innermost loops Migration frequency (milliseconds) Matthew DeVuyst, Ashish Venkat, Dean M. Tullsen: Execution Migration in a Heterogeneous-ISA Chip Multiprocessor, ASPLOS

38 SUMMARY scalability approaches tend to move the problem upwards into runtimes and apps various microkernel-like approaches solutions from distributed systems today s challenge: heterogeneity compilation as an OS primitive future challenge: reconfigurable hardware 38

INFLUENTIAL OS RESEARCH

INFLUENTIAL OS RESEARCH Multiprocessors Jan Bierbaum Tobias Stumpf SS 2017 ROADMAP Roadmap Multiprocessor Architectures Usage in the Old Days (mid 90s) Disco Present Age Research The Multikernel Helios