PageForge: A Near-Memory Content- Aware Page-Merging Architecture

Size: px

Start display at page:

Download "PageForge: A Near-Memory Content- Aware Page-Merging Architecture"

Ross Watson
6 years ago
Views:

1 PageForge: A Near-Memory Content- Aware Page-Merging Architecture Dimitrios Skarlatos, Nam Sung Kim, and Josep Torrellas University of Illinois at Urbana-Champaign Boston

2 Motivation: Server Consolidation in the Cloud 2

3 3 Motivation: Server Consolidation in the Cloud How to consolidate Main Memory?

4 Content-Aware Page Merging or Page Deduplication 4

5 Content-Aware Page Merging or Page Deduplication Hypervisors search the address space and merge identical pages 5

6 Content-Aware Page Merging or Page Deduplication Hypervisors search the address space and merge identical pages VM-0 VM-1 Guest Virtual Page 0 Page 1 Guest Physical Page 0 Page 1 Host Physical Page 0 Page 1 6

7 Content-Aware Page Merging or Page Deduplication Hypervisors search the address space and merge identical pages VM-0 VM-1 VM-0 VM-1 Guest Virtual Page 0 Page 1 Page 0 Page 1 Guest Physical Page 0 Page 1 Page 0 Page 1 Host Physical Page 0 Page 1 Page 0 7

8 8 Problem: Overhead of Software Page Merging Search hundreds of millions of pages Latency sensitive applications get disrupted RedHat s SW page merging (KSM) has execution overhead: Average mean latency overhead: 68% Average tail latency overhead: 136%

9 Contribution: PageForge First solution for hardware-assisted content-aware page merging General, effective, minimal hypervisor involvement & hardware mods Reduced overhead vs state-of-the-art software: Mean latency 68% à 10% Tail latency 136% à 11% Same memory savings as software: 48% à Twice #VMs Novel use of ECC for page content characterization Boston

10 Content Duplication in the Cloud

11 11 Background: Cloud Infrastructure

12 12 Background: Cloud Host Operating System Infrastructure

13 13 Background: Cloud Hypervisor Host Operating System Infrastructure

14 14 Background: Cloud Guest OS Hypervisor Host Operating System Infrastructure

15 15 Background: Cloud Libs Guest OS Hypervisor Host Operating System Infrastructure

16 16 Background: Cloud Data Libs Guest OS Hypervisor Host Operating System Infrastructure

17 17 Background: Cloud Data Libs Data Libs Data Libs Guest OS Guest OS Guest OS Hypervisor Host Operating System Infrastructure

18 18 Background: Content Duplication Data Libs Data Libs Data Libs Guest OS Guest OS Guest OS Hypervisor Host Operating System Infrastructure

19 19 Background: Content Duplication Data Libs Data Libs Data Libs Guest OS Guest OS Guest OS Hypervisor Host Operating System Infrastructure

20 20 Background: Content Duplication Data Libs Data Libs Data Libs Guest OS Guest OS Guest OS Hypervisor Host Operating System Infrastructure

21 Background: Content Duplication Data Data Data Libs Libs Libs Guest OS Guest OS Guest OS Hypervisor Host Operating System Infrastructure 21

22 Background: Content Duplication Data Data Data Libs Libs Libs Guest OS Guest OS Guest OS Hypervisor Host Operating System Infrastructure 22

23 23 Background: Content Duplication Data Data Data Libs Libs Libs Content Guest OS Guest OS Guest OS Duplication Page Merging Hypervisor Host Operating System Infrastructure

24 24 Our Proposal: Content Deduplication in HW Data Data Data Libs Libs Libs Content Guest OS Guest OS Guest OS Duplication Hypervisor Page Merging Host Operating System Infrastructure

25 Software-based Content-Aware Page Merging

26 Software-Based Content-Aware Page Merging Pool of Pages to Scan Pool of Stable Pages 4KB 4KB 4KB 4KB 4KB 4KB

27 Software-Based Content-Aware Page Merging Pool of Pages to Scan Pool of Stable Pages 4KB 4KB 4KB 4KB 4KB 4KB Candidate Page Compare Page

28 Software-Based Content-Aware Page Merging Main Memory Candidate Page 4KB Memory Ctrl L3 $ L2 $ L1 $ Core Compare Page 4KB

29 Software-Based Content-Aware Page Merging Main Memory Candidate 64B Page 4KB Memory L3 $ L2 $ 1B L1 $ Core == 64B Ctrl 1B Compare Page 4KB

30 Software-Based Content-Aware Page Merging Pool of Pages to Scan Pool of Stable Pages 4KB 4KB 4KB 4KB 4KB 4KB Candidate Page Compare Page

31 Software-Based Content-Aware Page Merging Pool of Pages to Scan Pool of Stable Pages 4KB 4KB 4KB 4KB 4KB 4KB Candidate Page Compare Page

32 32 Why Page-Merging is expensive? Search hundreds of millions of pages Many core cycles consumed Caches get polluted Latency sensitive applications get disrupted

33 Software-Based Content-Aware Page Merging Optimization: Identify if the candidate page has recently changed If a page changes too often à Not a good candidate Candidate Page Hash Key 1KB # 4KB JHash

34 PageForge: Hardware Assisted Page-Merging

35 35 PageForge Main Idea Hardware engine in the memory controller Compares pages CPU CPU CPU CPU CPU Generates hashes Advantages: ü No core utilization ü No cache pollution Memory Memory PageForge Mem Controller L2 L2 L2 L2 L3 L3 L3 L3 Interconnect L3 L3 L3 L3 L2 L2 L2 L2 L2 L3 L3 L2 Mem Controller Memory Memory CPU CPU CPU CPU CPU

36 36 PageForge Main Idea Hardware engine in the memory controller Compares pages CPU CPU CPU CPU CPU Generates hashes Advantages: ü No core utilization ü No cache pollution Memory Memory PageForge Mem Controller L2 L2 L2 L2 L3 L3 L3 L3 Interconnect L3 L3 L3 L3 L2 L2 L2 L2 L2 L3 L3 L2 Mem Controller Memory Memory CPU CPU CPU CPU CPU

37 37 The Memory Controller Read Req Buffer Scheduler Command On-Chip Network (FR-FCFS) Write Req Buffer Control Comparator Scan Table PageForge Read Data Buffer ECC Dec Generation Memory Interface Off-Chip Memory Interface ECC Enc Write Data Buffer Memory Controller

38 PageForge Operation 38

39 39 PageForge Operation Operating System Loads in Scan Table: Candidate page PPN PPNs of pages to compare to candidate

40 40 PageForge Operation Operating System Loads in Scan Table: Candidate page PPN PPNs of pages to compare to candidate PageForge Hardware Autonomously performs : Sequence of comparisons Generation of the hash key for candidate page

41 41 PageForge Operation In Hardware

42 42 PageForge Operation Search Pool of Stable Pages Pick Candidate Page In Hardware

43 43 PageForge Operation Search Pool of Stable Pages Match Found? Yes OS Merges Pages In Hardware

44 44 PageForge Operation Search Pool of Match Found? No Old Key = New Key? Stable Pages No Pick Candidate Page In Hardware

45 45 PageForge Operation Search Search Pool of Match Found? Old Key = New Key? Yes Pool of Unprotected Stable Pages Pages OS Merges Pages Match Found? Yes In Hardware

46 46 PageForge Operation Search Search Pool of Match Found? Old Key = New Key? Yes Pool of Unprotected Stable Pages Pages Match Found? No In Hardware Insert Candidate in Unprotected Pool

47 47 PageForge Operation Search Search Pool of Match Found? No Old Key = New Key? Yes Pool of Unprotected Stable Pages Pages Yes No Pick OS Merges Candidate Page Pages Match Found? Yes No In Hardware Insert Candidate in Unprotected Pool

48 Eliminating The Cost of Hash Keys

49 Hash Key for Page Content Characterization 49

50 50 Hash Key for Page Content Characterization If page changed recently, key may be different

51 51 Hash Key for Page Content Characterization If page changed recently, key may be different KSM: JHash à Serial computation + 1KB of data

52 52 Hash Key for Page Content Characterization If page changed recently, key may be different KSM: JHash à Serial computation + 1KB of data PageForge: ECC à Parallel computation + 256B of data

53 53 The Memory Controller Read Req Buffer Scheduler Command On-Chip Network (FR-FCFS) Write Req Buffer Control Comparator Scan Table PageForge Read Data Buffer ECC Dec Generation Memory Interface Off-Chip Memory Interface ECC Enc Write Data Buffer Memory Controller

54 Proposal: ECC for Page Content Characterization 54

55 55 Proposal: ECC for Page Content Characterization Break a 4KB page into 4 segments Data 64B 4K Page

56 56 Proposal: ECC for Page Content Characterization Break a 4KB page into 4 segments Pick a random cache line within each segment Data 64B 4K Page

57 57 Proposal: ECC for Page Content Characterization Break a 4KB page into 4 segments Pick a random cache line within each segment Select the 8-bit ECC of the least significant QWORD ECC 8B Data 64B 4K Page

58 58 Proposal: ECC for Page Content Characterization Break a 4KB page into 4 segments Pick a random cache line within each segment Select the 8-bit ECC of the least significant QWORD Hash Key 4B ECC 8B Data 64B 4K Page

59 59 Proposal: ECC for Page Content Characterization Break a 4KB page into 4 segments Pick a random cache line within each segment Select the 8-bit ECC of the least significant QWORD Hash Key 4B ECC 8B Software JHash = 1KB Data 64B

60 60 Proposal: ECC for Page Content Characterization Break a 4KB page into 4 segments Pick a random cache line within each segment Select the 8-bit ECC of the least significant QWORD Hash Key 4B ECC 8B Software Jhash = 1KB Data 64B

61 61 Proposal: ECC for Page Content Characterization Break a 4KB page into 4 segments Pick a random cache line within each segment Select the 8-bit ECC of the least significant QWORD Hash Key 4B ECC 8B ECC Hash = 256 Bytes Data 64B

62 62 Proposal: ECC for Page Content Characterization Break a 4KB page into 4 segments Pick a random cache line within each segment Select the 8-bit ECC of the least significant QWORD ECC 8B Data 64B 75% Memory Footprint Hash Key 4B Reduction for Hash Key Generation!

63 Evaluation

64 64 Simulation Setup 2GHz, 10-core, 16GB DRAM One VM per core Ubuntu Host, Ubuntu Cloud Guest InHouse simulator: Simics + SST + DRAMSim2 TailBench Suite benchmarks

65 65 Configurations Baseline: Page-merging is disabled KSM: RedHat s implementations shipped with the current Linux PageForge: Hardware-assisted page-merging

66 Memory Savings w/o vs w/ Page Merging w/o Page Merging w/ Page Merging 100 Pages % W/O Page-Merging 48% 20 0 Img-Dnn Masstree Moses Silo Sphinx Average 66

Memory Savings w/o vs w/ Page Merging w/o Page Merging w/ Page Merging 100 Pages % 80 60 40 W/O

67 Memory Savings w/o vs w/ Page Merging w/o Page Merging w/ Page Merging 100 Pages % W/O Page-Merging 48% 20 0 PageForge Achieves Memory Savings of 48% Img-Dnn Masstree Moses Silo Sphinx Average 67

Memory Savings w/o vs w/ Page Merging w/o Page Merging w/ Page Merging 100 Pages % 80 60 40 W/O Page-Merging

68 Memory Savings w/o vs w/ Page Merging w/o Page Merging w/ Page Merging 100 Pages % W/O Page-Merging 48% 20 0 PageForge Achieves Memory Savings of 48% Twice #VMs!! Img-Dnn Masstree Moses Silo Sphinx Average 68

69 Memory Savings w/o vs w/ Page Merging 100 Unmergeable Mergeable Zero Mergeable Non-Zero Pages % W/O Page-Merging W/ Page-Merging 20 0 Img-Dnn Masstree Moses Silo Sphinx Average 69

70 Memory Savings w/o vs w/ Page Merging 100 Unmergeable Mergeable Zero Mergeable Non-Zero Pages % W/O Page-Merging W/ Page-Merging 20 0 Img-Dnn Masstree Moses Silo Sphinx Average 70

71 Memory Savings w/o vs w/ Page Merging 100 Unmergeable Mergeable Zero Mergeable Non-Zero Pages % W/O Page-Merging W/ Page-Merging 48% 20 0 Img-Dnn Masstree Moses Silo Sphinx Average 71

72 Mean Latency of Requests 3 KSM PageForge Normalized Mean Latency Img-Dnn Masstree Moses Silo Sphinx Average 72

73 Mean Latency of Requests 3 Baseline KSM PageForge Normalized Mean Latency Img-Dnn Masstree Moses Silo Sphinx Average 73

74 Mean Latency of Requests 3 Baseline KSM PageForge Normalized Mean Latency Img-Dnn Masstree Moses Silo Sphinx Average 74

75 Mean Latency of Requests 3 Baseline KSM PageForge Normalized Mean Latency % 10% 0 Img-Dnn Masstree Moses Silo Sphinx Average 75

5 PageForge Reduces Mean Latency Overhead From 68%

76 Mean Latency of Requests 3 Baseline KSM PageForge Normalized Mean Latency PageForge Reduces Mean Latency Overhead From 68% Down to 10%! 68% 10% 0 Img-Dnn Masstree Moses Silo Sphinx Average 76

77 Tail Latency of Requests 3 Baseline KSM PageForge th Percentile Latency % 11% 0 Img-Dnn Masstree Moses Silo Sphinx Average 77

78 Tail Latency of Requests 3 Baseline KSM PageForge th Percentile Latency PageForge Reduces Tail Latency Overhead From 136% Down to 11%! 136% 11% 0 Img-Dnn Masstree Moses Silo Sphinx Average 78

79 Also in the Paper Software interface to PageForge Interaction with the cache coherence protocol Alternative designs Bandwidth analysis ECC vs Jhash à ECC keys add negligible collisions Power and area at 22nm is negligible 79

80 Takeaway: PageForge First solution for hardware-assisted content-aware page merging General, effective, minimal hypervisor involvement & hardware mods Reduced overhead vs state-of-the-art software: Mean latency 68% à 10% Tail latency 136% à 11% Same memory savings as software: 48% à Twice #VMs Novel use of ECC for content characterization à 75% less memory footprint Boston

SmartMD: A High Performance Deduplication Engine with Mixed Pages

SmartMD: A High Performance Deduplication Engine with Mixed Pages Fan Guo 1, Yongkun Li 1, Yinlong Xu 1, Song Jiang 2, John C. S. Lui 3 1 University of Science and Technology of China 2 University of Texas,