Leveraging Hybrid Hardware in New Ways: The GPU Paging Cache

Size: px

Start display at page:

Download "Leveraging Hybrid Hardware in New Ways: The GPU Paging Cache"

Thomas Thompson
5 years ago
Views:

1 Leveraging Hybrid Hardware in New Ways: The GPU Paging Cache Frank Feinbube, Peter Tröger, Johannes Henning, Andreas Polze Hasso Plattner Institute Operating Systems and Middleware Prof. Dr. Andreas Polze

2 Everyone should benefit from all available Compute Devices

3 Call for Action Operating Systems must support GPU abstractions Rossbach, Currey, Witchel [HotOS11]

4 Operating System Perspective Operating Systems must support GPU abstractions Rossbach, Currey, Witchel [HotOS11] Accelerators are still a black box No appropriate abstractions ioctl based interface: Unknown architecture, Unknown capabilities No system-wide guarantees Coarse grained scheduling No isolation guarantees, No protection guarantees

5 Related Work: GPU Computing to speed up Operating System Services Operating Systems must support GPU abstractions Rossbach, Currey, Witchel [HotOS11] PTask API by Rossbach et al. (based on CUDA) Workload as acyclic graphs of ptasks executed on GPU Proof-of-concept: encryption / decryption for Linux EncFS GPUStore by Sun et al. (based on CUDA) Specially tailored for storage systems GPUs for encryption and RAID functionality Gdev framework by Kato et al. (based on Direct Rendering Int.) Reverse engineering of graphics card driver internals The GPU Paging Cache ICPADS 2013 Frank Feinbube

8 What about the memory? The GPU Paging Cache ICPADS 2013 Frank Feinbube

6,71% more than 1024 MB; 8,93% 256 MB; 10,68% 512 MB; 19,44% 2 GB; 16,77% Other;

9 Hardware Survey of Memory Sizes System RAM Distribution (Steam HW Survey May 2012) VRAM Distribution (Steam HW Survey May 2012) 5 or more GB; 30,76% less than 2 GB; 6,71% more than 1024 MB; 8,93% 256 MB; 10,68% 512 MB; 19,44% 2 GB; 16,77% Other; 19,68% 4 GB; 23,64% 3 GB; 22,65% 1024 MB; 41,27% 70% less than 5 GB 50% at least 1 GB

10 NVIDIA Quadro K GB of Memory!!

11 The Idea The GPU Paging Cache ICPADS 2013 Frank Feinbube

12 GPU Paging Cache The vision: GPU memory is used as a fast cache for paging information written to secondary storage.

13 Is Paging still a problem? The GPU Paging Cache ICPADS 2013 Frank Feinbube

14 Page Faults still occur Software is a gas. Nathan Myhrvold, the former CTO of Microsoft Software always expands to fit whatever container it is stored in.

15 First Approach: Windows Research Kernel

16 Windows Research Kernel (WRK) Stripped down Windows Server 2003 sources Only kernel itself, no drivers, GUI, user-mode components Missing components: HAL, power management, plug-and-play Released in 2006 Freely available to academic institutions Encouraged by license: Modification Publication (of excerpts)

17 Second Approach: File System Driver

18 Application Application Application Application User Mode Pagefile IO Manager GPU Paging Cache Driver? Kernel Mode Disk RAM GPU Memory Hardware

19 /IO Request initialize driver wait for IO request IO Request/ GPU Paging Cache Driver read request analyze IO Request write request yes GPU active? no GPU active? yes no in GPU cache? no fetch from disk write to GPU cache write to disk yes fetch from GPU cache async write to disk copy result to IO Request The GPU Paging Cache ICPADS 2013 Frank Feinbube

20 Application Application Application Application Upcall Usermode Component OpenCL User Mode Pagefile IO Manager GPU Paging Cache Driver GPU Driver Kernel Mode Disk RAM GPU Memory Hardware

21 initialize events (InvokerEvent, DoneEvent) User mode component initialize shared memory (exchangearea) propagate events and shared memory to driver /DoneEvent wait for request InvokerEvent/ read request read request from exchangearea write request enqueue OpenCL read request enqueue OpenCL write request write result to exchangearea The GPU Paging Cache ICPADS 2013 Frank Feinbube

22 Accessing GPU Memory with OpenCL The GPU Paging Cache ICPADS 2013 Frank Feinbube

23 Application Application Application Application One time Initialization Usermode Component DirectDraw User Mode Pagefile IO Manager GPU Paging Cache Driver GPU Driver Kernel Mode Disk RAM GPU Memory Hardware

24 Accessing GPU Memory with DirectDraw Surface Pointers The GPU Paging Cache ICPADS 2013 Frank Feinbube

25 What are the right Performance Metrics?

26 Distribution of page read block sizes The GPU Paging Cache ICPADS 2013 Frank Feinbube

Occurances The GPU Paging Cache ICPADS 2013 Frank Feinbube 18.12.

27 Occurances The GPU Paging Cache ICPADS 2013 Frank Feinbube Paging Request Write Other 5% Write requests to pagefile 1 MB 95% Size in bytes

28 Distribution of page write block sizes The GPU Paging Cache ICPADS 2013 Frank Feinbube

29 Performance Metric The GPU Paging Cache ICPADS 2013 Frank Feinbube

30 Evaluation: What s in it for me?

31 Tested Hardware ID CPU GPU Memory Disk Operating System 2600k 460gtx E gtx 2620m 540mgt Intel Core i7-2600k 3,7 Ghz Intel Core 2 Duo E8500 3,17 Ghz Intel Core i7-2620m 2,7 Ghz Nvidia GeForce GTX MB GDDR5 Nvidia GeForce GTX MB GDDR3 Nvidia GeForce GT 540M 3 GB GDDR3 8 GB DDR3 8 GB DDR2 8 GB DDR3 Intel Rapid Storage 64 GB SSD 1 TB HDD 240 GB SSD Windows 7 Ultimate SP1 64 bit Windows 7 Ultimate SP1 64 bit Windows 7 Ultimate SP1 64 bit

weighted performance Evaluation 10000,00 MB/s 1000,00 MB/s 100,00 MB/s 10,00 MB/s 1,00 MB/s Disk User Mode GPU-based Paging Cache Kernel Mode GPUbased Paging Cache System Memory 2600k 460gtx 18,33

32 weighted performance Evaluation 10000,00 MB/s 1000,00 MB/s 100,00 MB/s 10,00 MB/s 1,00 MB/s Disk User Mode GPU-based Paging Cache Kernel Mode GPUbased Paging Cache System Memory 2600k 460gtx 18,33 MB/s 112,78 MB/s 106,58 MB/s 1061,18 MB/s 2620m 540mgt 24,96 MB/s 22,01 MB/s 105,97 MB/s 986,77 MB/s E gtx 5,37 MB/s 57,69 MB/s 207,14 MB/s 629,28 MB/s The GPU Paging Cache ICPADS 2013 Frank Feinbube

33 Overview of the measurements for the first test system The GPU Paging Cache ICPADS 2013 Frank Feinbube

34 Disk throughput of the first test system The GPU Paging Cache ICPADS 2013 Frank Feinbube

throughput 10000,00 MB/s The GPU Paging Cache ICPADS 2013

2013 Performance Evaluation 1000,00 MB/s 100,00 MB/s SSD

Kernel Mode GPUbased Paging Cache System Memory E5520

275gtx 9,61 MB/s 62,98 MB/s 233,30 MB/s 726,51 MB/s 2620m

35 throughput 10000,00 MB/s The GPU Paging Cache ICPADS 2013 Frank Feinbube Performance Evaluation 1000,00 MB/s 100,00 MB/s SSD 10,00 MB/s 1,00 MB/s Disk User Mode GPU-based Paging Cache Kernel Mode GPUbased Paging Cache System Memory E gtx 1,87 MB/s 51,29 MB/s 92,05 MB/s 777,76 MB/s E gtx 9,61 MB/s 62,98 MB/s 233,30 MB/s 726,51 MB/s 2620m 540mgt 28,79 MB/s 24,18 MB/s 126,94 MB/s 1108,99 MB/s 2600k 460gtx 34,70 MB/s 118,23 MB/s 130,05 MB/s 1241,82 MB/s

36 Challenges / Open Research Questions We need stable standardized kernel driver interfaces to the GPU Compute Devices and their memory! Optimal page transfer sizes / Optimal placement Bandwidth limitations and communication overhead When to swap pages to disk, or to GPU? Which user types benefit the most? Hardware architecture Integrated GPU vs. dedicated GPU vs. Hybrid Multi-GPU Dynamic GPU properties

37 Conclusion Accelerators New system architectures Operating systems lack appropriate abstractions New concepts to manage/handle accelerators Various approaches (novel OS design vs. legacy integration) GPU Page Cache Leverage GPU Global Memory for caching non-resident pages Reduce paging operations latencies

38 The GPU Paging Cache Operating Systems must support GPU abstractions GPU as yet another processor Resource abstraction, scheduling, security Transparent data transfer and execution on all types of accelerators GPU Memory as a Paging Cache. Everyone should benefit from GPU Computing!

Chris Rossbach, Jon Currey, Microsoft Research Mark Silberstein, Technion Baishakhi Ray, Emmett Witchel, UT Austin SOSP October 25, 2011

Chris Rossbach, Jon Currey, Microsoft Research Mark Silberstein, Technion Baishakhi Ray, Emmett Witchel, UT Austin SOSP October 25, 2011 There are lots of GPUs 3 of top 5 supercomputers use GPUs In all