Datacenter application interference

Size: px

Start display at page:

Download "Datacenter application interference"

Ashlee McLaughlin
5 years ago
Views:

1 1 Datacenter application interference CMPs (popular in datacenters) offer increased throughput and reduced power consumption They also increase resource sharing between applications, which can result in negative interference.

2 2 Resource contention is well studied at least on single machines. 3 main methods: (1) Gladiator style match-ups (2) Static analysis to predict application resource usage (3) Measure benchmark resource usage; apply to live applications

3 3 New methodology for understanding datacenter interference is needed. One that can handle complexities of a datacenter: (10s of) thousands of applications real user inputs production hardware financially feasible low overhead Hardware counter measurements of live applications.

4 4 Our contributions 1. ID complexities in datacenters 2. New measurement methodology 3. First large-scale study of measured interference on live datacenter applications.

5 Complexities of understanding application interference in a datacenter 5

6 Large chips and high core utilizations Profiling 1000 12-core, 24-hyperthread Google

6 6 Large chips and high core utilizations Profiling core, 24-hyperthread Google servers running production workloads revealed the average machine had >14/24 HW threads in use.

7 7 Heterogeneous application mixes Often applications have more than one co-runner on a machine. 0-1 Co-runners 2-3 Co-runners 4+ Co-runners Observed max of 19 unique corunner threads (out of 24 HW threads).

8 8 Application complexities Fuzzy definitions Varying and sometimes unpredictable inputs Unknown optimal performance

9 Hardware & Economic Complexities Varying micro-arch platforms Necessity for low overhead = limited measurement capabilities Corporate policies 9

10 Measurement methodology 10

11 11 Measurement Methodology The goal: A generic methodology to collect application interference data on live production datacenter servers

12 12 Measurement Methodology Time App. A App. B

13 13 Measurement Methodology Use samplebased monitoring to collect per machine per core event (HW counter) sample data.

14 14 Measurement Methodology 2 M instrs 2 M instrs 2 M instrs M instrs 2 M instrs 2 M instrs M instrs 2 M instrs 2 M instrs M instrs App. A App. B

15 15 Measurement Methodology Identify sample sized co-runner relationships

16 16 Measurement Methodology Samples A:1- A:6 are co-runners with App. B. Samples B:1- B:4 are co-runners with App. A. App. A App. B

17 17 Measurement Methodology Say that a new App. C starts running on CPU 1 App. A B:4 no longer has a co-runner. App. C App. B

18 18 Measurement Methodology Filter relationships by arch. independent interference classes

19 19 Measurement Methodology Be on opp. sockets.

20 20 Measurement Methodology Share only I/O

21 21 Measurement Methodology 4. Aggregate equivalent coschedules. 4.

22 22 Measurement Methodology For example: Aggregate all the samples of App. A that have App. B as a shared core corunner. Aggregate all samples of App. A that have App. B as a shared core co-runner and App. C as a shared socket corunner.

23 23 Measurement Methodology Finally, calculate statistical indicators (means, medians) to get a midpoint performance for app. interference comparisons

24 24 Measurement Methodology Avg. IPC = 2.0 Avg. IPC = 1.5 App. A App. B

25 25 Applying the measurement methodology at Google.

26 26 Applying the Google Experiment Details: Event Sampling period Number of machines* 1000 Instrs IPC 2.5 Million * All had Intel Westmere chips (24 hyperthreads, 12 cores), matching clock speed, RAM, O/S Method: 1. Collect samples

27 27 Applying the Google Experiment Details: Event Sampling period Number of machines* 1000 Instrs IPC 2.5 Million * All had Intel Westmere chips (24 hyperthreads, 12 cores), matching clock speed, RAM, O/S Collection results: Unique binary apps 1102 Co-runner relationships (top 8 apps) Avg. shared core rel ns 1M (min 2K) Avg. shared socket 9.5M (min 12K) Avg. opposite socket 11M (min 14K) Method: 1. Collect samples 2. ID sample size relationships 3. Filter by interference classes

28 28 Applying the Google Method: 4. Aggregate equiv. schedules 5. Calculate statistical indicators

29 29 Analyze Interference streeview s IPC changes with top co-runners Overall median IPC across 1102 applications

30 30 Beyond noisy interferers (shared core) Base Application Less or pos. interference Noisy data Negative interference Co-running applications

31 31 Beyond noisy interferers (shared core) Base Applications Less or pos. interference Noisy data Negative interference Co-running applications * Recall minimum pair has 2K samples; medians across full grid of 1102 apps

32 32 Performance Strategies Restrict negative beyond noisy interferers (or encourage positive interferers as co-runners) Isolate sensitive or antagonistic applications

33 33 Takeaways 1. New datacenter application interference studies can use our identified complexities as a check list. 2. Our measurement methodology (verified at Google in 1st large-scale measurements of live datacenter interference), is generally applicable and shows promising initial performance opportunities.

34 34 Questions?

An Empirical Model for Predicting Cross-Core Performance Interference on Multicore Processors

An Empirical Model for Predicting Cross-Core Performance Interference on Multicore Processors Jiacheng Zhao Institute of Computing Technology, CAS In Conjunction with Prof. Jingling Xue, UNSW, Australia