Cross-Layer Memory Management to Reduce DRAM Power Consumption
|
|
- Mark Neal
- 6 years ago
- Views:
Transcription
1 Cross-Layer Memory Management to Reduce DRAM Power Consumption Michael Jantz Assistant Professor University of Tennessee, Knoxville 1
2 Introduction Assistant Professor at UT since August 2014 Before UT PhD in Computer Science at KU (July 2014) Intern at Intel Corporation ( ) Research interests: Compilers (optimization, phase ordering) Operating Systems (kernel instrumentation, memory and power management) Runtime Systems (dynamic compilation, object mgmt.) Courses taught: Compilers (COSC 461), Discrete Structures (COSC 311) 2
3 Outline Compiler Optimization Phase Ordering Dynamic Compilation Cross-Layer Memory Management Motivation Design Experimental Evaluation Future Directions Conclusions 3
4 Compiler Optimization Phase Ordering 4
5 Phase Ordering Compiler optimizations operate in phases Phases interact with each other Phase ordering: different phase orderings produce different quality code Problem: finding the best ordering for each function or program takes a very long time Iterative search is the most common technique 5
6 Exploiting Phase Interactions Our approach: identify and exploit phase interactions during search Major contributions: Reduce exhaustive phase ordering search time Increase applicability and effectiveness of individual optimization phases Improve phase ordering heuristics Publications: LCTES 10 [1], CASES 10, [2] CASES 13 [3], S:P&E (Jan. 13) [4] 6
7 Dynamic Compilation 7
8 Tradeoffs in Dynamic Compilation Managed language applications (e.g. Java) Distributed as machine-independent codes Require compilation at runtime Dynamic compilation policies involve tradeoffs Can potentially slow down overall performance Must consider several factors when setting policy: Compiling speed and quality of compiled code Execution frequency of individual methods Availability of compilation resources 8
9 Dynamic Compilation Strategies Conducted multiple studies on how, when, and if to compile program methods Employ industrial-grade Java VM (HotSpot) Major studies: Performance potential of phase selection in dynamic compilers (VEE '13-A [5]) Dynamic compilation strategy on modern machines (TACO, Dec. '13 [6]) 9
10 Cross-Layer Memory Management 10
11 A Collaborative Approach to Memory Management Memory has become a significant player in power and performance Memory power management is challenging Propose a collaborative approach between applications, operating system, and hardware: Applications communicate memory usage intent to OS OS re-architect memory mgmt. to interpret application intent and manage memory over hardware units Hardware communicate hardware layout to the OS to guide memory management decisions 11
12 A Collaborative Approach to Memory Management Implemented framework by re-architecting a recent Linux kernel Experimental evaluation Publications: VEE 13-B [7], Linux Symposium 14 [8], manuscript in submission [9] 12
13 Why CPU and Memory are most significant players for power and performance In servers, memory power == 40% of total power [10] Applications can direct CPU usage threads may be affinitized to individual cores or migrated b/w cores prioritize threads for task deadlines (with nice) individual cores may be turned off when unused Surprisingly, much of this flexibility does not exist for controlling memory 13
14 Example Scenario System with database workload with 512GB DRAM All memory in use, but only 2% of pages are accessed frequently CPU utilization is low How to reduce power consumption? 14
15 Challenges in Managing Memory Power Memory refs. have temporal and spatial variation At least two levels of virtualization: Virtual memory abstracts away application-level info Physical memory viewed as single, contiguous array of storage No way for agents to cooperate with the OS and with each other Lack of a tuning methodology 15
16 A Collaborative Approach Our approach: enable applications to guide mem. mgmt. Requires collaboration between the application, OS, and hardware: Interface for communicating application intent to OS Ability to keep track of which memory modules host which physical pages during memory mgmt. To achieve this, we propose the following abstractions: Colors Trays 16
17 Communicating Application Intent with Colors Software Intent Color Tray Memory Allocation and Freeing Color = a hint for how pages will be used Colors applied to sets of virtual pages that are alike Attributes associated with each color Attributes express different types of distinctions: Hot and cold pages (frequency of access) Pages belonging to data structures with different usage patterns Allow applications to remain agnostic to lower level details of mem. mgmt. 17
18 Power-Manageable Units Represented as Trays Software Intent Color Tray Tray = software structure containing sets of pages that constitute a power-manageable unit Requires mapping from physical addresses to power-manageable units ACPI 5.0 defines memory power state table (MPST) to expose this mapping Re-architect a recent Linux Kernel to perform memory management over trays Memory Allocation and Freeing 18
19 M0 M1 M2 M3 M4 M5 M6 M7 Application Hot pages Application colors pages to indicate a range of pages will be hot V1 V2 VN Cold pages Seq. Access Operating System P1 P2 Physical memory allocation and recyclying PN OS looks up attribute associated with the virtual pages color Trays: T0 T1 T2 T3 T4 T5 T6 T7 Pages: Hardware Memory topology represented in the OS using trays Controller Controller Controller Controller CH0 CH1 CH0 CH1 NUMA Node 0 NUMA Node 1 19
20 Experimental Evaluation Emulating NUMA API s Memory prioritization for applications Reducing DRAM power consumption Power-saving potential of containerized memory management Localized allocation and recycling Exploiting generational garbage collection 20
21 Automatic Cross-Layer Memory Management Limitations of application guidance: Little understanding of which colors or coloring hints will be most useful for existing workloads All colors and hints must be manually inserted Our approach: integrate with profiling and analysis to automatically provide power / bandwidth mgmt. Implemented using the HotSpot JVM Instrumentation and analysis to build memory profile Partition live objects into separately colored regions 21
22 Application Heap Young generation Execution Engine Hot eden Hot survivors Cold eden Cold survivors Object profiling and analysis JIT Compiler Hot tenured Tenured generation Cold tenured Garbage Collection Employ the default HotSpot config. for server-class applications Divide survivor / tenured spaces into spaces for hot / cold objects 22
23 Application Heap Young generation Execution Engine Hot eden Hot survivors Cold eden Cold survivors Object profiling and analysis JIT Compiler Hot tenured Tenured generation Cold tenured Garbage Collection Color spaces on creation or resize Partition allocation sites and objects into hot / cold sets 23
24 Potential of JVM Framework Our goal: evaluate power-saving potential when hot / cold objects are known statically MemBench: Java benchmark that uses different object types for hot / cold memory HotObject and ColdObject Contain memory resources (array of integers) Implement different functions for accessing mem. 24
25 Experimental Platform Hardware Single node of 2-socket server machine Processor: Intel Xeon E (12 2.1GHz) Memory: 32GB DDR3 memory (four DIMM s, each connected to its own channel) Operating System CentOS 6.5 with Linux HotSpot JVM v _24, 64-bit Default configuration for server-class applications 25
26 The MemBench Benchmark Object allocation Creates HotObject and ColdObject objects in a large in-memory array # of hots < # of colds (~15% of all objects) Object array occupies most (~90%) system mem. Multi-threaded object access Object array divided into 12 separate parts, each passed to its own thread Iterate over object array, only accessing hot objects Optional delay parameter 26
27 MemBench Configurations Three configurations Default Tray-based kernel (custom kernel, default HotSpot) Hot/cold organize (custom kernel, custom HotSpot) Delay varied from "no delay" to 1000ns With no delay, 85ns between memory accesses 27
28 Perf. (runtime) (P(X) / P(DEF)) Bandwidth (GB /s) MemBench Performance default tray-based kernel hot/cold organize Time (ns) between memory accesses Tray-based kernel has about same performance as default Hot/cold organize exhibits poor performance with low delay 28
29 Perf. (runtime) (P(X) / P(DEF)) Bandwidth (GB /s) MemBench Bandwidth default tray-based kernel hot/cold organize Time (ns) between memory accesses Default and tray-based kernel produce high memory bandwidth when delay is low Placement of hot objects across multiple channels enables higher bandwidth 29
30 Perf. (runtime) (P(X) / P(DEF)) Bandwidth (GB /s) MemBench Bandwidth default tray-based kernel hot/cold organize Time (ns) between memory accesses Hot/cold organize - hot objects co-located on single channel Increased delays reduces bandwidth reqs. of the workload 30
31 Energy consumed relative to default (J) (J(X) / J(DEF)) MemBench Energy tray-based kernel (DRAM only) tray-based kernel (CPU+DRAM) hot/cold organize (DRAM only) hot/cold organize (CPU+DRAM) Time (ns) between memory accesses Hot/cold organize consumes much less power with low delay Even when BW reqs. are reduced, hot/cold organize consumes less power than other configurations 31
32 Energy consumed relative to default (J) (J(X) / J(DEF)) MemBench Energy tray-based kernel (DRAM only) tray-based kernel (CPU+DRAM) hot/cold organize (DRAM only) hot/cold organize (CPU+DRAM) Time (ns) between memory accesses Significant energy savings potential with custom JVM Max. DRAM energy savings of ~39%, max. CPU+DRAM energy savings of ~15% 32
33 Results Summary Object partitioning strategies Offline approach partitions allocation points Online approach uses sampling to predict object access patterns Evaluate with standard sets of benchmarks DaCapo, SciMark Achieve 10% average DRAM energy savings, 2.8% CPU+DRAM reduction Performance overhead 2.2% for offline, 5% for online 33
34 Current and Future Projects in Cross-Layer Memory Management Immediate future work: address performance losses of our current approach Improve the online sampling Automatic bandwidth management Applications for heterogeneous memory architectures Exploit data object placement within each page to improve efficiency 34
35 Conclusions Research focuses on software systems Compilers, operating systems, and runtime systems Cross-layer memory management Achieving power/performance efficiency in memory requires a cross-layer approach First framework to use usage patterns of application objects to steer low-level memory mgmt. Approach shows promise for reducing DRAM energy Opens several avenues for future research in collaborative memory management 35
36 Questions? 36
37 References 1. Prasad Kulkarni, Michael Jantz, and David Whalley. Improving Both the Performance Benefits and Speed of Optimization Phase Sequence Searches In the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES '10), April Michael Jantz and Prasad Kulkarni. Eliminating False Phase Interactions to Reduce Optimization Phase Order Search Space. In the ACM/IEEE International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES '10), October 24-29, Michael Jantz and Prasad Kulkarni. Exploiting Phase Inter-Dependencies for Faster Iterative Compiler Optimization Phase Order Searches. In the ACM/IEEE International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES '13), September 29 - October 4, Michael Jantz and Prasad Kulkarni. Analyzing and Addressing False Phase Interactions During Compiler Optimization Phase Ordering. In Software: Practice and Experience. January Michael Jantz and Prasad Kulkarni. Exploring Single and Multi-Level JIT Compilation Policy for Modern Machines. In ACM Transactions on Architecture and Code Optimization (TACO). December Michael Jantz and Prasad Kulkarni. Performance Potential of Optimization Phase Selection During Dynamic JIT Compilation. In the ACM SIGPLAN Conference on Virtual Execution Environments (VEE '13), March 16-17,
38 References 7. Michael Jantz, Carl Strickland, Karthik Kumar, Martin Dimitrov, and Kshitij A. Doshi. A Framework for Application Guidance in Virtual Memory Systems. In the ACM SIGPLAN Conference on Virtual Execution Environments (VEE '13), March 16-17, Michael Jantz, Kshitij Doshi, Prasad Kulkarni, and Heechul Yun. Leveraging MPST in Linux with Application Guidance to Achieve Power-Performance Goals. In Linux Symposium, Ottawa, Canada. May Michael Jantz, Forrest Robinson, Prasad Kulkarni, and Kshitij Doshi. Cross-Layer Memory Management for Managed Language Applications. In submission. July C. Lefurgy, K. Rajamani, F. Rawson, W. Felter, M. Kistler, and T. W. Keller. Energy management for commercial servers. Computer,36 (12):39 48, Dec
Cross-Layer Memory Management for Managed Language Applications
Cross-Layer Memory Management for Managed Language Applications Michael R. Jantz University of Tennessee mrjantz@utk.edu Forrest J. Robinson Prasad A. Kulkarni University of Kansas {fjrobinson,kulkarni}@ku.edu
More informationCross-Layer Memory Management for Managed Language Applications
Cross-Layer Memory Management for Managed Language Applications Michael R. Jantz University of Tennessee mrjantz@utk.edu Forrest J. Robinson Prasad A. Kulkarni University of Kansas {fjrobinson,kulkarni}@ku.edu
More information39 Cross-Layer Memory Management to Improve DRAM Energy Efficiency 1
39 Cross-Layer Memory Management to Improve DRAM Energy Efficiency MATTHEW BENJAMIN OLSON, University of Tennessee, Knoxville, Tennessee JOSEPH T. TEAGUE, University of Tennessee, Knoxville, Tennessee
More informationCross-Layer Memory Management to Improve DRAM Energy Efficiency
Cross-Layer Memory Management to Improve DRAM Energy Efficiency MATTHEW BENJAMIN OLSON, JOSEPH T. TEAGUE, DIVYANI RAO, and MICHAEL R. JANTZ, University of Tennessee KSHITIJ A. DOSHI, Intel Corporation
More informationOptimising Multicore JVMs. Khaled Alnowaiser
Optimising Multicore JVMs Khaled Alnowaiser Outline JVM structure and overhead analysis Multithreaded JVM services JVM on multicore An observational study Potential JVM optimisations Basic JVM Services
More informationExploring Dynamic Compilation and Cross-Layer Object Management Policies for Managed Language Applications. Michael Jantz
Exploring Dynamic Compilation and Cross-Layer Object Management Policies for Managed Language Applications By Michael Jantz Submitted to the Department of Electrical Engineering and Computer Science and
More informationVIProf: A Vertically Integrated Full-System Profiler
VIProf: A Vertically Integrated Full-System Profiler NGS Workshop, April 2007 Hussam Mousa Chandra Krintz Lamia Youseff Rich Wolski RACELab Research Dynamic software adaptation As program behavior or resource
More informationWorkload Characterization and Optimization of TPC-H Queries on Apache Spark
Workload Characterization and Optimization of TPC-H Queries on Apache Spark Tatsuhiro Chiba and Tamiya Onodera IBM Research - Tokyo April. 17-19, 216 IEEE ISPASS 216 @ Uppsala, Sweden Overview IBM Research
More informationHierarchical Real-time Garbage Collection
Hierarchical Real-time Garbage Collection Filip Pizlo Antony L. Hosking Jan Vitek Presenter: Petur Olsen October 4, 2007 The general idea Introduction The Article The Authors 2/28 Pizlo, Hosking, Vitek
More informationHigh Performance Managed Languages. Martin Thompson
High Performance Managed Languages Martin Thompson - @mjpt777 Really, what is your preferred platform for building HFT applications? Why do you build low-latency applications on a GC ed platform? Agenda
More informationMemory Energy Management for an Enterprise Decision Support System
Memory Energy Management for an Enterprise Decision Support System Karthik Kumar School of Electrical and Computer Engineering Purdue University West Lafayette, IN 47907 kumar25@purdue.edu Kshitij Doshi
More informationSAY-Go: Towards Transparent and Seamless Storage-As-You-Go with Persistent Memory
SAY-Go: Towards Transparent and Seamless Storage-As-You-Go with Persistent Memory Hyeonho Song, Sam H. Noh UNIST HotStorage 2018 Contents Persistent Memory Motivation SAY-Go Design Implementation Evaluation
More informationEfficient Runtime Tracking of Allocation Sites in Java
Efficient Runtime Tracking of Allocation Sites in Java Rei Odaira, Kazunori Ogata, Kiyokuni Kawachiya, Tamiya Onodera, Toshio Nakatani IBM Research - Tokyo Why Do You Need Allocation Site Information?
More informationTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Prathap Kumar Valsan, Heechul Yun, Farzad Farshchi University of Kansas 1 Why? High-Performance Multicores for Real-Time Systems
More informationJAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder
JAVA PERFORMANCE PR SW2 S18 Dr. Prähofer DI Leopoldseder OUTLINE 1. What is performance? 1. Benchmarking 2. What is Java performance? 1. Interpreter vs JIT 3. Tools to measure performance 4. Memory Performance
More informationEECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun
EECS750: Advanced Operating Systems 2/24/2014 Heechul Yun 1 Administrative Project Feedback of your proposal will be sent by Wednesday Midterm report due on Apr. 2 3 pages: include intro, related work,
More informationJava Application Performance Tuning for AMD EPYC Processors
Java Application Performance Tuning for AMD EPYC Processors Publication # 56245 Revision: 0.70 Issue Date: January 2018 Advanced Micro Devices 2018 Advanced Micro Devices, Inc. All rights reserved. The
More informationHigh Performance Managed Languages. Martin Thompson
High Performance Managed Languages Martin Thompson - @mjpt777 Really, what s your preferred platform for building HFT applications? Why would you build low-latency applications on a GC ed platform? Some
More informationDynamic Vertical Memory Scalability for OpenJDK Cloud Applications
Dynamic Vertical Memory Scalability for OpenJDK Cloud Applications Rodrigo Bruno, Paulo Ferreira: INESC-ID / Instituto Superior Técnico, University of Lisbon Ruslan Synytsky, Tetiana Fydorenchyk: Jelastic
More informationLow latency & Mechanical Sympathy: Issues and solutions
Low latency & Mechanical Sympathy: Issues and solutions Jean-Philippe BEMPEL Performance Architect @jpbempel http://jpbempel.blogspot.com ULLINK 2016 Low latency order router pure Java SE application FIX
More informationPerformance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference
The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee
More informationPhase-based Adaptive Recompilation in a JVM
Phase-based Adaptive Recompilation in a JVM Dayong Gu Clark Verbrugge Sable Research Group, School of Computer Science McGill University, Montréal, Canada {dgu1, clump}@cs.mcgill.ca April 7, 2008 Sable
More informationReliability, Availability, Serviceability (RAS) and Management for Non-Volatile Memory Storage
Reliability, Availability, Serviceability (RAS) and Management for Non-Volatile Memory Storage Mohan J. Kumar, Intel Corp Sammy Nachimuthu, Intel Corp Dimitris Ziakas, Intel Corp August 2015 1 Agenda NVDIMM
More informationNUMA-aware OpenMP Programming
NUMA-aware OpenMP Programming Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group schmidl@itc.rwth-aachen.de Christian Terboven IT Center, RWTH Aachen University Deputy lead of the HPC
More informationJVM Performance Study Comparing Java HotSpot to Azul Zing Using Red Hat JBoss Data Grid
JVM Performance Study Comparing Java HotSpot to Azul Zing Using Red Hat JBoss Data Grid Legal Notices JBoss, Red Hat and their respective logos are trademarks or registered trademarks of Red Hat, Inc.
More informationUnderstanding Reduced-Voltage Operation in Modern DRAM Devices
Understanding Reduced-Voltage Operation in Modern DRAM Devices Experimental Characterization, Analysis, and Mechanisms Kevin Chang A. Giray Yaglikci, Saugata Ghose,Aditya Agrawal *, Niladrish Chatterjee
More informationA Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler
A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler Hiroshi Inoue, Hiroshige Hayashizaki, Peng Wu and Toshio Nakatani IBM Research Tokyo IBM Research T.J. Watson Research Center April
More informationRunning class Timing on Java HotSpot VM, 1
Compiler construction 2009 Lecture 3. A first look at optimization: Peephole optimization. A simple example A Java class public class A { public static int f (int x) { int r = 3; int s = r + 5; return
More informationMethod-Level Phase Behavior in Java Workloads
Method-Level Phase Behavior in Java Workloads Andy Georges, Dries Buytaert, Lieven Eeckhout and Koen De Bosschere Ghent University Presented by Bruno Dufour dufour@cs.rutgers.edu Rutgers University DCS
More informationVirtual Asymmetric Multiprocessor for Interactive Performance of Consolidated Desktops
Virtual Asymmetric Multiprocessor for Interactive Performance of Consolidated Desktops Hwanju Kim 12, Sangwook Kim 1, Jinkyu Jeong 1, and Joonwon Lee 1 Sungkyunkwan University 1 University of Cambridge
More informationJava Garbage Collector Performance Measurements
WDS'09 Proceedings of Contributed Papers, Part I, 34 40, 2009. ISBN 978-80-7378-101-9 MATFYZPRESS Java Garbage Collector Performance Measurements P. Libič and P. Tůma Charles University, Faculty of Mathematics
More informationExploiting Phase Inter-Dependencies for Faster Iterative Compiler Optimization Phase Order Searches
1/26 Exploiting Phase Inter-Dependencies for Faster Iterative Compiler Optimization Phase Order Searches Michael R. Jantz Prasad A. Kulkarni Electrical Engineering and Computer Science, University of Kansas
More informationExpressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17
Expressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17 Tutorial Instructors [James Reinders, Michael J. Voss, Pablo Reble, Rafael Asenjo]
More informationFundamentals of GC Tuning. Charlie Hunt JVM & Performance Junkie
Fundamentals of GC Tuning Charlie Hunt JVM & Performance Junkie Who is this guy? Charlie Hunt Currently leading a variety of HotSpot JVM projects at Oracle Held various performance architect roles at Oracle,
More informationThe benefits and costs of writing a POSIX kernel in a high-level language
1 / 38 The benefits and costs of writing a POSIX kernel in a high-level language Cody Cutler, M. Frans Kaashoek, Robert T. Morris MIT CSAIL Should we use high-level languages to build OS kernels? 2 / 38
More informationHierarchical PLABs, CLABs, TLABs in Hotspot
Hierarchical s, CLABs, s in Hotspot Christoph M. Kirsch ck@cs.uni-salzburg.at Hannes Payer hpayer@cs.uni-salzburg.at Harald Röck hroeck@cs.uni-salzburg.at Abstract Thread-local allocation buffers (s) are
More informationNUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems
NUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems Carl Pearson 1, I-Hsin Chung 2, Zehra Sura 2, Wen-Mei Hwu 1, and Jinjun Xiong 2 1 University of Illinois Urbana-Champaign, Urbana
More informationCompiler construction 2009
Compiler construction 2009 Lecture 3 JVM and optimization. A first look at optimization: Peephole optimization. A simple example A Java class public class A { public static int f (int x) { int r = 3; int
More informationOutline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work
Using Non-blocking Operations in HPC to Reduce Execution Times David Buettner, Julian Kunkel, Thomas Ludwig Euro PVM/MPI September 8th, 2009 Outline 1 Motivation 2 Theory of a non-blocking benchmark 3
More informationGarbage Collection. Hwansoo Han
Garbage Collection Hwansoo Han Heap Memory Garbage collection Automatically reclaim the space that the running program can never access again Performed by the runtime system Two parts of a garbage collector
More informationPerformance of Multicore LUP Decomposition
Performance of Multicore LUP Decomposition Nathan Beckmann Silas Boyd-Wickizer May 3, 00 ABSTRACT This paper evaluates the performance of four parallel LUP decomposition implementations. The implementations
More informationSANDPIPER: BLACK-BOX AND GRAY-BOX STRATEGIES FOR VIRTUAL MACHINE MIGRATION
SANDPIPER: BLACK-BOX AND GRAY-BOX STRATEGIES FOR VIRTUAL MACHINE MIGRATION Timothy Wood, Prashant Shenoy, Arun Venkataramani, and Mazin Yousif * University of Massachusetts Amherst * Intel, Portland Data
More informationArachne. Core Aware Thread Management Henry Qin Jacqueline Speiser John Ousterhout
Arachne Core Aware Thread Management Henry Qin Jacqueline Speiser John Ousterhout Granular Computing Platform Zaharia Winstein Levis Applications Kozyrakis Cluster Scheduling Ousterhout Low-Latency RPC
More informationRuntime Application Self-Protection (RASP) Performance Metrics
Product Analysis June 2016 Runtime Application Self-Protection (RASP) Performance Metrics Virtualization Provides Improved Security Without Increased Overhead Highly accurate. Easy to install. Simple to
More informationPresented by: Nafiseh Mahmoudi Spring 2017
Presented by: Nafiseh Mahmoudi Spring 2017 Authors: Publication: Type: ACM Transactions on Storage (TOS), 2016 Research Paper 2 High speed data processing demands high storage I/O performance. Flash memory
More informationGeneral Purpose GPU Programming. Advanced Operating Systems Tutorial 7
General Purpose GPU Programming Advanced Operating Systems Tutorial 7 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous
More informationOptimization Coaching for Fork/Join Applications on the Java Virtual Machine
Optimization Coaching for Fork/Join Applications on the Java Virtual Machine Eduardo Rosales Advisor: Research area: PhD stage: Prof. Walter Binder Parallel applications, performance analysis Planner EuroDW
More informationPAGE PLACEMENT STRATEGIES FOR GPUS WITHIN HETEROGENEOUS MEMORY SYSTEMS
PAGE PLACEMENT STRATEGIES FOR GPUS WITHIN HETEROGENEOUS MEMORY SYSTEMS Neha Agarwal* David Nellans Mark Stephenson Mike O Connor Stephen W. Keckler NVIDIA University of Michigan* ASPLOS 2015 EVOLVING GPU
More informationEffective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management
International Journal of Computer Theory and Engineering, Vol., No., December 01 Effective Memory Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management Sultan Daud Khan, Member,
More informationExploiting hardware heterogeneity in public clouds
Exploiting hardware heterogeneity in public clouds Zhonghong Ou Dept. of Computer Science and Engineering, Aalto University Finland Aalto University 12/11/2013 Exploiting hardware heterogeneity in public
More informationScaling PostgreSQL on SMP Architectures
Scaling PostgreSQL on SMP Architectures Doug Tolbert, David Strong, Johney Tsai {doug.tolbert, david.strong, johney.tsai}@unisys.com PGCon 2007, Ottawa, May 21-24, 2007 Page 1 Performance vs. Scalability
More informationImproving Real-Time Performance on Multicore Platforms Using MemGuard
Improving Real-Time Performance on Multicore Platforms Using MemGuard Heechul Yun University of Kansas 2335 Irving hill Rd, Lawrence, KS heechul@ittc.ku.edu Abstract In this paper, we present a case-study
More informationReal-Time Cache Management for Multi-Core Virtualization
Real-Time Cache Management for Multi-Core Virtualization Hyoseung Kim 1,2 Raj Rajkumar 2 1 University of Riverside, California 2 Carnegie Mellon University Benefits of Multi-Core Processors Consolidation
More informationHigh Performance Java Remote Method Invocation for Parallel Computing on Clusters
High Performance Java Remote Method Invocation for Parallel Computing on Clusters Guillermo L. Taboada*, Carlos Teijeiro, Juan Touriño taboada@udc.es UNIVERSIDADE DA CORUÑA SPAIN IEEE Symposium on Computers
More informationEnabling Java-based VoIP backend platforms through JVM performance tuning
Enabling Java-based VoIP backend platforms through JVM performance tuning (Bruno Van Den Bossche, Filip De Turck, April 3rd 2006) 3 April, 2006, 1 Outline Introduction Java 4 Telecom Evaluation Setup Hardware
More informationCGO:U:Auto-tuning the HotSpot JVM
CGO:U:Auto-tuning the HotSpot JVM Milinda Fernando, Tharindu Rusira, Chalitha Perera, Chamara Philips Department of Computer Science and Engineering University of Moratuwa Sri Lanka {milinda.10, tharindurusira.10,
More informationThe Z Garbage Collector An Introduction
The Z Garbage Collector An Introduction Per Lidén & Stefan Karlsson HotSpot Garbage Collection Team FOSDEM 2018 Safe Harbor Statement The following is intended to outline our general product direction.
More informationUsing Transparent Compression to Improve SSD-based I/O Caches
Using Transparent Compression to Improve SSD-based I/O Caches Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr
More informationIntroduction to Virtual Machines. Michael Jantz
Introduction to Virtual Machines Michael Jantz Acknowledgements Slides adapted from Chapter 1 in Virtual Machines: Versatile Platforms for Systems and Processes by James E. Smith and Ravi Nair Credit to
More informationDynamic Partitioned Global Address Spaces for Power Efficient DRAM Virtualization
Dynamic Partitioned Global Address Spaces for Power Efficient DRAM Virtualization Jeffrey Young, Sudhakar Yalamanchili School of Electrical and Computer Engineering, Georgia Institute of Technology Talk
More informationNew Java performance developments: compilation and garbage collection
New Java performance developments: compilation and garbage collection Jeroen Borgers @jborgers #jfall17 Part 1: New in Java compilation Part 2: New in Java garbage collection 2 Part 1 New in Java compilation
More informationMoneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010
Moneta: A High-performance Storage Array Architecture for Nextgeneration, Non-volatile Memories Micro 2010 NVM-based SSD NVMs are replacing spinning-disks Performance of disks has lagged NAND flash showed
More informationJVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra
JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra Legal Notices Apache Cassandra, Spark and Solr and their respective logos are trademarks or registered trademarks of
More informationNG2C: Pretenuring Garbage Collection with Dynamic Generations for HotSpot Big Data Applications
NG2C: Pretenuring Garbage Collection with Dynamic Generations for HotSpot Big Data Applications Rodrigo Bruno Luis Picciochi Oliveira Paulo Ferreira 03-160447 Tomokazu HIGUCHI Paper Information Published
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationFor extreme parallelism, your OS is sooooolast-millennium
For extreme parallelism, your OS is sooooolast-millennium Rob Knauerhase, Romain Cledat, Justin Teller Government Purpose Rights Purchase Order Number: N/A Agreement No.: HR001 10 3 0007 Contractor Name:
More informationThe G1 GC in JDK 9. Erik Duveblad Senior Member of Technical Staf Oracle JVM GC Team October, 2017
The G1 GC in JDK 9 Erik Duveblad Senior Member of Technical Staf racle JVM GC Team ctober, 2017 Copyright 2017, racle and/or its affiliates. All rights reserved. 3 Safe Harbor Statement The following is
More informationHPC in Cloud. Presenter: Naresh K. Sehgal Contributors: Billy Cox, John M. Acken, Sohum Sohoni
HPC in Cloud Presenter: Naresh K. Sehgal Contributors: Billy Cox, John M. Acken, Sohum Sohoni 2 Agenda What is HPC? Problem Statement(s) Cloud Workload Characterization Translation from High Level Issues
More informationBei Wang, Dmitry Prohorov and Carlos Rosales
Bei Wang, Dmitry Prohorov and Carlos Rosales Aspects of Application Performance What are the Aspects of Performance Intel Hardware Features Omni-Path Architecture MCDRAM 3D XPoint Many-core Xeon Phi AVX-512
More informationTPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage
TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage Performance Study of Microsoft SQL Server 2016 Dell Engineering February 2017 Table of contents
More informationRe-architecting Virtualization in Heterogeneous Multicore Systems
Re-architecting Virtualization in Heterogeneous Multicore Systems Himanshu Raj, Sanjay Kumar, Vishakha Gupta, Gregory Diamos, Nawaf Alamoosa, Ada Gavrilovska, Karsten Schwan, Sudhakar Yalamanchili College
More informationVirtual Machines. 2 Disco: Running Commodity Operating Systems on Scalable Multiprocessors([1])
EE392C: Advanced Topics in Computer Architecture Lecture #10 Polymorphic Processors Stanford University Thursday, 8 May 2003 Virtual Machines Lecture #10: Thursday, 1 May 2003 Lecturer: Jayanth Gummaraju,
More informationE-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems
E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems Rebecca Taft, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J. Elmore, Ashraf Aboulnaga, Andrew Pavlo, Michael
More informationKartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18
Accelerating PageRank using Partition-Centric Processing Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Outline Introduction Partition-centric Processing Methodology Analytical Evaluation
More informationData Center Virtualization: Xen and Xen-blanket
Data Center Virtualization: Xen and Xen-blanket Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking November 17, 2014 Slides from ACM European
More informationGaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems
Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems 1 Presented by Hadeel Alabandi Introduction and Motivation 2 A serious issue to the effective utilization
More informationAOT Vs. JIT: Impact of Profile Data on Code Quality
AOT Vs. JIT: Impact of Profile Data on Code Quality April W. Wade University of Kansas t982w485@ku.edu Prasad A. Kulkarni University of Kansas prasadk@ku.edu Michael R. Jantz University of Tennessee mrjantz@utk.edu
More informationI, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3.
5 Solutions Chapter 5 Solutions S-3 5.1 5.1.1 4 5.1.2 I, J 5.1.3 A[I][J] 5.1.4 3596 8 800/4 2 8 8/4 8000/4 5.1.5 I, J 5.1.6 A(J, I) 5.2 5.2.1 Word Address Binary Address Tag Index Hit/Miss 5.2.2 3 0000
More informationA Case Study in Optimizing GNU Radio s ATSC Flowgraph
A Case Study in Optimizing GNU Radio s ATSC Flowgraph Presented by Greg Scallon and Kirby Cartwright GNU Radio Conference 2017 Thursday, September 14 th 10am ATSC FLOWGRAPH LOADING 3% 99% 76% 36% 10% 33%
More informationChangpeng Liu. Cloud Storage Software Engineer. Intel Data Center Group
Changpeng Liu Cloud Storage Software Engineer Intel Data Center Group Notices & Disclaimers Intel technologies features and benefits depend on system configuration and may require enabled hardware, software
More informationNVMe Over Fabrics: Scaling Up With The Storage Performance Development Kit
NVMe Over Fabrics: Scaling Up With The Storage Performance Development Kit Ben Walker Data Center Group Intel Corporation 2018 Storage Developer Conference. Intel Corporation. All Rights Reserved. 1 Notices
More informationVARIABILITY IN OPERATING SYSTEMS
VARIABILITY IN OPERATING SYSTEMS Brian Kocoloski Assistant Professor in CSE Dept. October 8, 2018 1 CLOUD COMPUTING Current estimate is that 94% of all computation will be performed in the cloud by 2021
More informationThe Z Garbage Collector Low Latency GC for OpenJDK
The Z Garbage Collector Low Latency GC for OpenJDK Per Lidén & Stefan Karlsson HotSpot Garbage Collection Team Jfokus VM Tech Summit 2018 Safe Harbor Statement The following is intended to outline our
More informationOVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI
CMPE 655- MULTIPLE PROCESSOR SYSTEMS OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI What is MULTI PROCESSING?? Multiprocessing is the coordinated processing
More informationExploiting the Behavior of Generational Garbage Collector
Exploiting the Behavior of Generational Garbage Collector I. Introduction Zhe Xu, Jia Zhao Garbage collection is a form of automatic memory management. The garbage collector, attempts to reclaim garbage,
More informationAdaptive Multi-Level Compilation in a Trace-based Java JIT Compiler
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler Hiroshi Inoue, Hiroshige Hayashizaki, Peng Wu and Toshio Nakatani IBM Research Tokyo IBM Research T.J. Watson Research Center October
More informationGeneral Purpose GPU Programming. Advanced Operating Systems Tutorial 9
General Purpose GPU Programming Advanced Operating Systems Tutorial 9 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous
More informationRAIN: Reinvention of RAID for the World of NVMe
RAIN: Reinvention of RAID for the World of NVMe Dmitrii Smirnov Principal Software Developer smirnov.d@raidix.com RAIDIX LLC 1 About the company RAIDIX is an innovative solution provider and developer
More informationTowards Energy-Efficient Reactive Thermal Management in Instrumented Datacenters
Towards Energy-Efficient Reactive Thermal Management in Instrumented Datacenters Ivan Rodero1, Eun Kyung Lee1, Dario Pompili1, Manish Parashar1, Marc Gamell2, Renato J. Figueiredo3 1 NSF Center for Autonomic
More informationSPECjAppServer2002 Statistics. Methodology. Agenda. Tuning Philosophy. More Hardware Tuning. Hardware Tuning.
Scaling Up the JBoss Application Server. Peter Johnson JBoss World 2005 March 1, 2005 Conclusion Configuration. 8-CPU ES7000 (32-bit) SPECjAppServer 2002 JBoss Application Server 3.2.6 Unisys JVM 1.4.1_07
More informationHPC Architectures. Types of resource currently in use
HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationChangpeng Liu. Senior Storage Software Engineer. Intel Data Center Group
Changpeng Liu Senior Storage Software Engineer Intel Data Center Group Legal Notices and Disclaimers Intel technologies features and benefits depend on system configuration and may require enabled hardware,
More informationIdentifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters. Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo
Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo June 15, 2012 ISMM 2012 at Beijing, China Motivation
More informationJIT Compilation Policy for Modern Machines
JIT Compilation Policy for Modern Machines Prasad A. Kulkarni Department of Electrical Engineering and Computer Science, University of Kansas prasadk@ku.edu Abstract Dynamic or Just-in-Time (JIT) compilation
More informationBalancing DRAM Locality and Parallelism in Shared Memory CMP Systems
Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Min Kyu Jeong, Doe Hyun Yoon^, Dam Sunwoo*, Michael Sullivan, Ikhwan Lee, and Mattan Erez The University of Texas at Austin Hewlett-Packard
More informationPage 2 of 6 SUT Model Form Factor CPU CPU Characteristics Number of Systems 1 Nodes Per System 1 Chips Per System 2 Hardware hw_1 Cores Per System 44
Page 1 of 6 SPECjbb2015 Copyright 2015-2016 Standard Performance Evaluation Corporation Cisco Systems Cisco UCS C220 M4 Tested by: Cisco Systems SPEC license #: 9019 94667 SPECjbb2015-Multi max-jops 71951
More informationSFS: Random Write Considered Harmful in Solid State Drives
SFS: Random Write Considered Harmful in Solid State Drives Changwoo Min 1, 2, Kangnyeon Kim 1, Hyunjin Cho 2, Sang-Won Lee 1, Young Ik Eom 1 1 Sungkyunkwan University, Korea 2 Samsung Electronics, Korea
More informationPerformance Tools for Technical Computing
Christian Terboven terboven@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University Intel Software Conference 2010 April 13th, Barcelona, Spain Agenda o Motivation and Methodology
More informationEnabling Hybrid Parallel Runtimes Through Kernel and Virtualization Support. Kyle C. Hale and Peter Dinda
Enabling Hybrid Parallel Runtimes Through Kernel and Virtualization Support Kyle C. Hale and Peter Dinda Hybrid Runtimes the runtime IS the kernel runtime not limited to abstractions exposed by syscall
More informationSPECjbb2005. Alan Adamson, IBM Canada David Dagastine, Sun Microsystems Stefan Sarne, BEA Systems
SPECjbb2005 Alan Adamson, IBM Canada David Dagastine, Sun Microsystems Stefan Sarne, BEA Systems Topics Benchmarks SPECjbb2000 Impact Reasons to Update SPECjbb2005 Development Execution Benchmarking Uses
More information