LoGA: Low-Overhead GPU Accounting Using Events

Size: px

Start display at page:

Download "LoGA: Low-Overhead GPU Accounting Using Events"

Abigayle McCormick
6 years ago
Views:

1 10th ACM International Systems and Storage Conference, (KIT) KIT The Research University in the Helmholtz Association

to the cloud Sharing increases cost-efficiency

2 GPU Sharing GPUs increasingly popular in computing Not every application saturates a GPU time Move GPUs to the cloud Sharing increases cost-efficiency Problem: Need fairness Software scheduling is inefficient 2

3 NEON (University of Rochester, ASPLOS '14) Applies fair queuing to GPUs NEON 3

4 NEON (University of Rochester, ASPLOS '14) Applies fair queuing to GPUs NEON 4

5 NEON (University of Rochester, ASPLOS '14) Applies fair queuing to GPUs NEON 5

6 NEON (University of Rochester, ASPLOS '14) Applies fair queuing to GPUs NEON 6

7 NEON (University of Rochester, ASPLOS '14) Applies fair queuing to GPUs NEON Threshold 7

8 NEON (University of Rochester, ASPLOS '14) Applies fair queuing to GPUs NEON STOP 8

9 NEON: Accounting Problem NEON s accounting disables GPU access GPU idle! STOP Freerun Phase Sampling Phase High accounting overhead if application does not saturate the GPU 9

10 Idea GPUs have lots of status registers for the device driver Leak information about GPU s internal state Reading has no effect on GPU driver or running application Idea: Poll status registers Infer GPU-internal context switches 10

11 Design App STOP Submit Query Poll 11 Scheduler Accounting thread

12 Accounting Thread Accounting thread Poll Status reg 12

13 Accounting Thread Accounting thread Poll Status reg 13

14 Accounting Thread Accounting thread Poll Status reg Poll frequency must be faster than kernel length High CPU load Poll periodically 14

15 Scheduling Scheduling thread queries accounting thread Updates fair queuing counters Stops applications if necessary Different metric than NEON NEON: Average kernel length LoGA: Total GPU time consumed 15

16 Benchmark Scenario Accounting overhead Accuracy of accounting ( scheduling quality) Competing workload: Throttle Creates well-defined GPU load run 16 sleep

17 Accounting Overhead (10% load) 1,6 10 instances of throttle, 1% load each Normalized runtime 1,5 Normalized against execution time w/o accounting 1,4 1,3 Scheduling disabled 1,2 1,1 NEON LoGA 1 0,9 0,8 17 nw cl ef pa ilte th r fin d sr er ad _v 1 sr st ad re am _v2 cl us G eo ter m ea n pa rti nn lu m d er g m pu yo cy te m um cf d dw ga t2d us s he ian ar tw a ho ll ts ho po t ts po t3 hu D ffm hy a br n id so k m rt ea n la s va le MD uk oc yt e bf s b+ tre e ba ck pr op 0,7

18 Accounting Overhead (90% load) 1,6 10 instances of throttle, 9% load each Normalized runtime 1,5 Normalized against execution time w/o accounting 1,4 1,3 Scheduling disabled 1,2 1,1 NEON LoGA 1 0,9 0,8 18 nw cl ef pa ilte th r fin d sr er ad _v 1 sr st ad re am _v2 cl us G eo ter m ea n pa rti nn lu m d er g m pu yo cy te m um cf d dw ga t2d us s he ian ar tw a ho ll ts ho po t ts po t3 hu D ffm hy a br n id so k m rt ea n la s va le MD uk oc yt e bf s b+ tre e ba ck pr op 0,7

19 Fairness Goal: LoGA should not reduce fairness Problem: Which application runtime is fair? Measure total GPU time for each application Calculate optimal scheduling Next slide: Speedup over fair schedule 19

20 Fairness: Results (4x throttle, 20% load each) 5 Speedup over fair schedule 4,5 4 3,5 3 2,5 No scheduling NEON LoGA LoGA-CL 2 1,5 1 0,5 20 km ea ns so rt hy br id an hu ffm ho ts po t ho ts po t3 D he ar tw al l ia n ga us s dw t2 d cf d b+ tre e bf s ba ck pr op 0

21 Conclusion Sharing beneficial if applications do not saturate the GPU Scheduler interference reduces sharing LoGA accounts GPU usage without overhead Poll GPU status registers, detect context switches Time between context switches as input for fair queuing 21

22 Finding Registers Envytools project has documented some registers Unfortunately, far from complete Trial and error: Run workload with known behavior Dump register values See which registers correlate with workload behavior Two registers identified: ID of currently running GPU context Activity status of entire GPU 22

GPrioSwap: Towards a Swapping Policy for GPUs

GPrioSwap: Towards a Swapping Policy for GPUs Jens Kehne, Jonathan Metter, Martin Merkel, Marius Hillenbrand, Marc Rittinghaus, Frank Bellosa 0th ACM International Systems and Storage Conference, (KIT) KIT The Research University in the Helmholtz