A Side-channel Attack on HotSpot Heap Management Xiaofeng Wu, Kun Suo, Yong Zhao, Jia Rao The University of Texas at Arlington HotCloud 18 July 9, 2018 1
Side-Channel Attack Attack based on information gained from the implementation of a computer system Shared cache Timing Power consumption Acoustic measurement Steal or infer secrets Infer user activities to launch well-timedattack Attack shared clock in multi-tenant systems to manipulate users time measurement 2
Garbage Collection in HotSpot JVM Parallel Scavenge eden s0 s1 young gen old gen : Minor GC : Major GC Other threads Stop The World VM thread GC threads VM thread Other threads.......... Each individual GC shouldn t take too long large heap Total time spent in GC shouldn t be too much small heap, too frequent GC 3
Adaptive Heap Sizing in PS GC Three objectives Meet pause time target Meet throughput goal Minimize memory footprint JVM automatically determines the heap size in the range of the initial (-Xms) and the maximum (-Xmx) heap sizes Pause time Throughput Footprint Shrink heap Expand heap Shrink heap Time is used as an indirect measure for memory efficiency 4
Minor and Major GC T1 Vulnerability T2 Mem alloc failure Minor GC Fail Check free space in Old Gen Not enough Major GC Fail 1st Alloc in Young Gen Fail Minor GC Major GC Minor GC T3 Major GC T4 step must be performed 2nd Alloc in Young Gen Fail 3rd Alloc in Old Gen Fail Major GC Major GC Minor GC cost = T2 / ( T1+T2 ) Major GC cost = T4 / ( T3+T4 ) Minor time Major time Minor GC time Major GC time step may be performed 5th Alloc in Old Gen Fail Fail 4th Alloc in Young Gen Out of Memory JVM infers heap efficiency based on measured lengths of minor and major GCs, and adjusts heap size accordingly JVM throws an out-of-memory (OOM) error if five GCs fail to resolve the memory allocation failure 5
Shared Clock GC starts JVM is running 1 2 3 JVM is not running JVM is running GC ends 1 + 2 + 3 = wall-clock time 1 + 3 = virtual time Measured GC time gettimeofday() gettimeofday() Time measurement can be inaccurate in the presenceof CPUmultiplexing 6
Three Types of Attacks Cause OOM errors Prevent JVM from expanding the heap in 5 GCs Cause excessive GC Cause bloated heap 7
OOM Attack Attack pausetime target When there is a spike in memory demand and allocation failure, attack major GC measurement Dilated major GC time cause the heap to shrink, missing the opportunity to avoid OOMerrors Pause time Throughput Shrink heap Expand heap Footprint Shrink heap 8
Excessive GC Attack T1 T2 T5 Minor GC Major GC Minor GC Minor GC Major GC T3 T4 T6 T1 T2 T5 Minor GC Major GC Minor GC Minor GC Major GC T3 T4 T6 Similar to OOM attack but more general Targetmajor GC, dilate its time Old generation have a tendency to drop quickly, and the decrement of heap size results in more GCs 9
<latexit sha1_base64="rjym03mwzs1hdtq10uc/ee8jsga=">aaacbnicbvdlsgmxfm3uv62vuzcibisgcgwmclorim5cvpg+ob1kjs10qjpjkgsemszkjb/ixouibv0gd/6natslbt1w4xdovstnbamjsjvot1vawv1b3yhvvra2d3b37p2dthkpxksfbroygybfgowkpalmpjtigukaku4wvp36nqcifrxc05oe+deacrpsjlsrbvaxf0mrjqik1faa9kojcoa5uzlzr54p7kptc2aay8qtsbuuaa7sr/5q4dqmxgogloq5tql9delnmsn5pz8qkia8ripsm5sjmcg/m8xi4alrhjau0gzxckb+vshqrnqkdsxmjhskfr2p+j/xs3v45weum5ce4/ldycqgfndacrxssbbme0mqltt8feiimsq0aa5isnaxiy+tdr3mojx3/qlaucnqkimjcalogasuqqpcgszoaqwewtn4bw/wk/vivvsf89wsvdwcgj+wpn8a/xqyjw==</latexit> <latexit sha1_base64="rjym03mwzs1hdtq10uc/ee8jsga=">aaacbnicbvdlsgmxfm3uv62vuzcibisgcgwmclorim5cvpg+ob1kjs10qjpjkgsemszkjb/ixouibv0gd/6natslbt1w4xdovstnbamjsjvot1vawv1b3yhvvra2d3b37p2dthkpxksfbroygybfgowkpalmpjtigukaku4wvp36nqcifrxc05oe+deacrpsjlsrbvaxf0mrjqik1faa9kojcoa5uzlzr54p7kptc2aay8qtsbuuaa7sr/5q4dqmxgogloq5tql9delnmsn5pz8qkia8ripsm5sjmcg/m8xi4alrhjau0gzxckb+vshqrnqkdsxmjhskfr2p+j/xs3v45weum5ce4/ldycqgfndacrxssbbme0mqltt8feiimsq0aa5isnaxiy+tdr3mojx3/qlaucnqkimjcalogasuqqpcgszoaqwewtn4bw/wk/vivvsf89wsvdwcgj+wpn8a/xqyjw==</latexit> <latexit sha1_base64="rjym03mwzs1hdtq10uc/ee8jsga=">aaacbnicbvdlsgmxfm3uv62vuzcibisgcgwmclorim5cvpg+ob1kjs10qjpjkgsemszkjb/ixouibv0gd/6natslbt1w4xdovstnbamjsjvot1vawv1b3yhvvra2d3b37p2dthkpxksfbroygybfgowkpalmpjtigukaku4wvp36nqcifrxc05oe+deacrpsjlsrbvaxf0mrjqik1faa9kojcoa5uzlzr54p7kptc2aay8qtsbuuaa7sr/5q4dqmxgogloq5tql9delnmsn5pz8qkia8ripsm5sjmcg/m8xi4alrhjau0gzxckb+vshqrnqkdsxmjhskfr2p+j/xs3v45weum5ce4/ldycqgfndacrxssbbme0mqltt8feiimsq0aa5isnaxiy+tdr3mojx3/qlaucnqkimjcalogasuqqpcgszoaqwewtn4bw/wk/vivvsf89wsvdwcgj+wpn8a/xqyjw==</latexit> <latexit sha1_base64="rjym03mwzs1hdtq10uc/ee8jsga=">aaacbnicbvdlsgmxfm3uv62vuzcibisgcgwmclorim5cvpg+ob1kjs10qjpjkgsemszkjb/ixouibv0gd/6natslbt1w4xdovstnbamjsjvot1vawv1b3yhvvra2d3b37p2dthkpxksfbroygybfgowkpalmpjtigukaku4wvp36nqcifrxc05oe+deacrpsjlsrbvaxf0mrjqik1faa9kojcoa5uzlzr54p7kptc2aay8qtsbuuaa7sr/5q4dqmxgogloq5tql9delnmsn5pz8qkia8ripsm5sjmcg/m8xi4alrhjau0gzxckb+vshqrnqkdsxmjhskfr2p+j/xs3v45weum5ce4/ldycqgfndacrxssbbme0mqltt8feiimsq0aa5isnaxiy+tdr3mojx3/qlaucnqkimjcalogasuqqpcgszoaqwewtn4bw/wk/vivvsf89wsvdwcgj+wpn8a/xqyjw==</latexit> <latexit sha1_base64="iz8fzpfeaeztvvcsmzjyvjlopic=">aaacdnicbvdlssnafj3uv62vqes3g6uoccupgm6eohuxfdihtlfmppnm6gqszizccfkcn/6kgxekuhxtzr9x0mahrqcuhm65l5lzvjhrqszr2yitrk6tb5q3k1vbo7t75v5br0ajwksnixajnockyzsttqkkkv4scao9rrre5cb3uw9esbpxr01j4ozozklpmvjagpo1jxbrmg7irn2njxm8ggnfijw6dqbnzgnkajy0q1bdmgeue7sgvvcgnts/bqmijyhhcjmkzd+2yuwmscikgckqg0ssgoejgpo+phyfrlrple4ga1ozqt8sericm/x3rypckaehpzddpak56oxif14/uf6lm1kuwxko5w/5cymqgnk3ceqfwypnnufyup1xiaok61c6wyouwv6mvew6jbpt1e2782rzuqijdi7amtgfnrgatxalwqanmhgez+avvblpxovxbnzmv0tgcxmi/sd4/aem/ztx</latexit> <latexit sha1_base64="iz8fzpfeaeztvvcsmzjyvjlopic=">aaacdnicbvdlssnafj3uv62vqes3g6uoccupgm6eohuxfdihtlfmppnm6gqszizccfkcn/6kgxekuhxtzr9x0mahrqcuhm65l5lzvjhrqszr2yitrk6tb5q3k1vbo7t75v5br0ajwksnixajnockyzsttqkkkv4scao9rrre5cb3uw9esbpxr01j4ozozklpmvjagpo1jxbrmg7irn2njxm8ggnfijw6dqbnzgnkajy0q1bdmgeue7sgvvcgnts/bqmijyhhcjmkzd+2yuwmscikgckqg0ssgoejgpo+phyfrlrple4ga1ozqt8sericm/x3rypckaehpzddpak56oxif14/uf6lm1kuwxko5w/5cymqgnk3ceqfwypnnufyup1xiaok61c6wyouwv6mvew6jbpt1e2782rzuqijdi7amtgfnrgatxalwqanmhgez+avvblpxovxbnzmv0tgcxmi/sd4/aem/ztx</latexit> <latexit sha1_base64="iz8fzpfeaeztvvcsmzjyvjlopic=">aaacdnicbvdlssnafj3uv62vqes3g6uoccupgm6eohuxfdihtlfmppnm6gqszizccfkcn/6kgxekuhxtzr9x0mahrqcuhm65l5lzvjhrqszr2yitrk6tb5q3k1vbo7t75v5br0ajwksnixajnockyzsttqkkkv4scao9rrre5cb3uw9esbpxr01j4ozozklpmvjagpo1jxbrmg7irn2njxm8ggnfijw6dqbnzgnkajy0q1bdmgeue7sgvvcgnts/bqmijyhhcjmkzd+2yuwmscikgckqg0ssgoejgpo+phyfrlrple4ga1ozqt8sericm/x3rypckaehpzddpak56oxif14/uf6lm1kuwxko5w/5cymqgnk3ceqfwypnnufyup1xiaok61c6wyouwv6mvew6jbpt1e2782rzuqijdi7amtgfnrgatxalwqanmhgez+avvblpxovxbnzmv0tgcxmi/sd4/aem/ztx</latexit> <latexit sha1_base64="iz8fzpfeaeztvvcsmzjyvjlopic=">aaacdnicbvdlssnafj3uv62vqes3g6uoccupgm6eohuxfdihtlfmppnm6gqszizccfkcn/6kgxekuhxtzr9x0mahrqcuhm65l5lzvjhrqszr2yitrk6tb5q3k1vbo7t75v5br0ajwksnixajnockyzsttqkkkv4scao9rrre5cb3uw9esbpxr01j4ozozklpmvjagpo1jxbrmg7irn2njxm8ggnfijw6dqbnzgnkajy0q1bdmgeue7sgvvcgnts/bqmijyhhcjmkzd+2yuwmscikgckqg0ssgoejgpo+phyfrlrple4ga1ozqt8sericm/x3rypckaehpzddpak56oxif14/uf6lm1kuwxko5w/5cymqgnk3ceqfwypnnufyup1xiaok61c6wyouwv6mvew6jbpt1e2782rzuqijdi7amtgfnrgatxalwqanmhgez+avvblpxovxbnzmv0tgcxmi/sd4/aem/ztx</latexit> Memory Bloat Attack T1 T2 Minor GC T5 Major GC Minor GC Minor GC Major GC Violatethroughput target T3 T4 T6 Pause time Shrink heap T1 T2 T5 Minor GC Major GC Minor GC Minor GC Major GC Throughput Footprint Expand heap Shrink heap X T3 T4 T6 Throughput = T 1 T 1+T 2 Throughput 0 = T 1 T 1+T 2 0 Attack minor GC to prevent the heap from shrinking evenmemory demand drops 10
Launch Attacks Proof-of-concept attacks Modify JVM source code to manipulate GC time in the adaptive sizing algorithm Realistic attacks Use ebpf to monitor libjvm.so to obtain GC thread ID and slowdown a specific type of GC Use cgroup to limit the CPU usage of GC threads and hence dilate GC time Results Crash a Java-based micro-benchmark with OOM errors Cause ~65% more GC time in DaCapo Inflict up to ~400% memory bloat in SPECjvm2008 11
OOM Attack Attack major GC measurement JAVA_OPTION= -XX:+UseAdaptiveSizePolicy -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:ParallelGCThreads=10 -Xms = 32m -Xmx = 2g Both proof-of-concept and realistic attacks crash the micro-benchmark ArrayDeque 50 Bytes 20 MB A micro-benchmark with a sudden spike in memory demand Pause time Throughput Shrink heap Expand heap Footprint Shrink heap 12
Discussion Essence of the problem Heap size should be determined by the characteristics of a Java program But heap efficiency is measured by GC time, an indirect measure ExternalCPU contention can affect internal heap management Many programs designed for dedicated systems are vulnerable to similarattacks in multi-tenant systems CPU multiplexing à wall-clock time or virtual time? VMs, containers, conventional processes Linux jiffies and userspace gettimeofday track wall-clock time Linux CFS usessteal_clock to track virtual time for thread scheduling See our [Suo-SoCC17] paper for another issuecausedby time discontinuity 13
Is this a real problem? No No evidence that many applications suffer from inaccurate time measurement. Even so, the effect is random and universally distributed among applications. Ourattack is sophisticated and needs to target a specific type of GC, not easy. Should we devise a completely isolated virtual time interface for individual programs/vms/containers? Yes In theory, if not measuring absolute latency, time measurement that is only relevant to a particular program or to measure the relative progress of program threads, should usevirtual time This could be the source of erroneous program behavior, unpredictability and inefficiency 14
Thank you! Questions? xiaofeng.wu@mavs.uta.edu 15
Backup Slides 16
A Realistic Attack All experiments were conducted on a 64-core machine using OpenJDK 1.8 and Linux 4.15.0. The JVM was configured with 10 GC threads. Benchmark Dacapo: h2 SPECjvm2008: mpegaudio 17
Pause time-oriented Attack (excessive GC) A realistic attack usingebpf Benchmark: h2 from Dacapo The initial and maximum heap sizes: 16 MB and 900 MB The maximum pause time is set to 100 ms The attack shrinks the heap, causing 88% more GC time 18
Cont d - Pause time-oriented We choose h2 from Dacapo-9.12-MR1-bach as a case study execute a number of transactions set the maximum pause time as 100 ms The overhead induced by the pause time-oriented attack to the micro-benchmark. 19
Cont d - Throughput-oriented (a) Changes of heap size without attack ArrayDeque 50 Bytes -Xms32m -Xmx32g Heap size is 1.61 larger Size (MB) 1600 1400 1200 1000 800 600 400 200 0 Used heap before GC Used heap after GC Heap size 0 2 4 6 8 10 12 14 Time (s) (b) Changes of heap size under attack pause time throughput footprint reduce generation increase generation reduce generation Size (MB) 1600 1400 1200 1000 800 600 400 200 0 0 2 4 6 8 10 12 14 20 Time (s)
Throughput-oriented Attack (memory bloat) w/o attack (a) Changes of heap size without attack Size (MB) 900 800 700 600 500 400 300 200 100 0 Used heap before GC Used heap after GC Heap size 50 100 150 200 250 300 350 400 A realistic attack usingebpf mpegaudio from SPECjvm2008 The initial and maximum heap sizes: 32 MB and 2.5GB Time (s) under attack (b) Changes of heap size under attack Size (MB) 900 800 700 600 500 400 300 200 100 0 50 100 150 200 250 300 350 400 Time (s) The attack prevents the heap fromshrinking when memory demand drops, causing more than 400% waste of memory 21